Skip to main content

Overview

The Granite Embedding model collection consists of embedding models to generate high-quality text embeddings and a reranker model to improve the relevance and quality of search results or recommendations. The embedding models output vector representations (aka embeddings) of textual inputs such as queries, passages, and documents to capture the semantic meaning of the input text. The primary use cases for these embeddings are in semantic search and retrieval-augmented generation (RAG) applications. The reranker model is optional, but useful to further improve the relevance and quality of search results or recommendations. After the initial retrieval of items based on their embeddings, the reranker refines the ranking by considering additional factors and more complex criteria. Built on a foundation of carefully curated, permissibly licensed public datasets, the Granite Embedding models set a high standard for performance, achieving state-of-the-art results in their respective weight classes. See the MTEB Leaderboard where Granite Embedding ranks in the top 10 amongst models of a similar size (as of 10/2/2025). Granite Embedding models are released under the Apache 2.0 license, making them freely available for both research and commercial purposes, with full transparency into their training data.

Model cards

Run locally with Ollama

Learn more about Granite Embedding on Ollama.

Examples

Granite Embedding with sentence transformers

This is a simple example of how to use granite-embedding-30m-english model with sentence_transformers. First, install the sentence transformers library
pip install sentence_transformers
The model can then be used to encode pairs of text and find the similarity between their representations
from sentence_transformers import SentenceTransformer, util

model_path = "ibm-granite/granite-embedding-30m-english"
# Load the Sentence Transformer model
model = SentenceTransformer(model_path)

input_queries = [
    ' Who made the song My achy breaky heart? ',
    'summit define'
    ]

input_passages = [
    "Achy Breaky Heart is a country song written by Don Von Tress. Originally titled Don't Tell My Heart and performed by The Marcy Brothers in 1991. ",
    "Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments."
    ]

# encode queries and passages
query_embeddings = model.encode(input_queries)
passage_embeddings = model.encode(input_passages)

# calculate cosine similarity
print(util.cos_sim(query_embeddings, passage_embeddings))

Granite Embedding with Hugging Face transformers

This is a simple example of how to use the granite-embedding-30m-english model with the Transformers library and PyTorch. First, install the required libraries
pip install transformers torch
The model can then be used to encode pairs of text
import torch
from transformers import AutoModel, AutoTokenizer

model_path = "ibm-granite/granite-embedding-30m-english"

# Load the model and tokenizer
model = AutoModel.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model.eval()

input_queries = [
    ' Who made the song My achy breaky heart? ',
    'summit define'
    ]

# tokenize inputs
tokenized_queries = tokenizer(input_queries, padding=True, truncation=True, return_tensors='pt')

# encode queries
with torch.no_grad():
    # Queries
    model_output = model(**tokenized_queries)
    # Perform pooling. granite-embedding-30m-english uses CLS Pooling
    query_embeddings = model_output[0][:, 0]

# normalize the embeddings
query_embeddings = torch.nn.functional.normalize(query_embeddings, dim=1)

Granite Embedding with LangChain

This is how you could use our models for Retrieval using IBM LangChain. First, install LangChain dependencies
pip install git+https://github.com/ibm-granite-community/utils \
#     "langchain_community<0.3.0" \
#     langchain-huggingface \
#     langchain-milvus \
#     replicate \
#     wget
The below recipe, with granite-embedding-30m-english model, shows how to:
  • Setup an database: how to setup a local Milvus VectorDB, process the corpus to produce indexable documents, and ingest those documents using an embedding model.
  • Retrieve relevant passages from the database: how to use an embedding of the query to retrieve semantically similar passages.
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_milvus import Milvus
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
import uuid
import os, wget

#load the embedding model
embeddings_model = HuggingFaceEmbeddings(model_name="ibm-granite/granite-embedding-30m-english")

#setup the vectordb
db_file = f"/tmp/milvus_{str(uuid.uuid4())[:8]}.db"
print(f"The vector database will be saved to {db_file}")
vector_db = Milvus(embedding_function=embeddings_model, connection_args={"uri": db_file}, auto_id=True)

#load example corpus file
filename = 'state_of_the_union.txt'
url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/foundation_models/state_of_the_union.txt'

if not os.path.isfile(filename):
  wget.download(url, out=filename)

loader = TextLoader(filename)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

#add processed documents to the vectordb
vector_db.add_documents(texts)

#search the vectordb with the query
query = "What did the president say about Ketanji Brown Jackson"
docs = vector_db.similarity_search(query)
print(docs[0].page_content)
I