Granite Embedding - IBM Granite

Model card

View the full model card on Hugging Face

Overview

The Granite Embedding model collection consists of embedding models to generate high-quality text embeddings and a reranker model to improve the relevance and quality of search results or recommendations. The embedding models output vector representations (aka embeddings) of textual inputs such as queries, passages, and documents to capture the semantic meaning of the input text. The primary use cases for these embeddings are in semantic search and retrieval-augmented generation (RAG) applications. The reranker model is optional, but useful to further improve the relevance and quality of search results or recommendations. After the initial retrieval of items based on their embeddings, the reranker refines the ranking by considering additional factors and more complex criteria. Built on a foundation of carefully curated, permissibly licensed public datasets, the Granite Embedding models set a high standard for performance, achieving state-of-the-art results in their respective weight classes. See the MTEB Leaderboard where Granite Embedding ranks in the top 10 amongst models of a similar size (as of 10/2/2025). Granite Embedding models are released under the Apache 2.0 license, making them freely available for both research and commercial purposes, with full transparency into their training data.

Model cards

Granite Embedding English r2

View model on Hugging Face

Granite Embedding Small English r2

View model on Hugging Face

Granite Embedding Reranker English r2

View model on Hugging Face

Granite Embedding 125m English r1

View model on Hugging Face

Granite Embedding 30m English r1

View model on Hugging Face

Granite Embedding 107m Multilingual r1

View model on Hugging Face

Granite Embedding 278m Multilingual r1

View model on Hugging Face

Run locally with Ollama

Learn more about Granite Embedding on Ollama.

Granite Embedding 278m Multilingual

Download for Ollama

Granite Embedding 30m English

Download for Ollama

Examples

Granite Embedding with sentence transformers

This is a simple example of how to use granite-embedding-30m-english model with sentence_transformers. First, install the sentence transformers library

pip install sentence_transformers

The model can then be used to encode pairs of text and find the similarity between their representations

from sentence_transformers import SentenceTransformer, util

model_path = "ibm-granite/granite-embedding-30m-english"
# Load the Sentence Transformer model
model = SentenceTransformer(model_path)

input_queries = [
    ' Who made the song My achy breaky heart? ',
    'summit define'
    ]

input_passages = [
    "Achy Breaky Heart is a country song written by Don Von Tress. Originally titled Don't Tell My Heart and performed by The Marcy Brothers in 1991. ",
    "Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments."
    ]

# encode queries and passages
query_embeddings = model.encode(input_queries)
passage_embeddings = model.encode(input_passages)

# calculate cosine similarity
print(util.cos_sim(query_embeddings, passage_embeddings))

Granite Embedding with Hugging Face transformers

This is a simple example of how to use the granite-embedding-30m-english model with the Transformers library and PyTorch. First, install the required libraries

pip install transformers torch

The model can then be used to encode pairs of text

import torch
from transformers import AutoModel, AutoTokenizer

model_path = "ibm-granite/granite-embedding-30m-english"

# Load the model and tokenizer
model = AutoModel.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model.eval()

input_queries = [
    ' Who made the song My achy breaky heart? ',
    'summit define'
    ]

# tokenize inputs
tokenized_queries = tokenizer(input_queries, padding=True, truncation=True, return_tensors='pt')

# encode queries
with torch.no_grad():
    # Queries
    model_output = model(**tokenized_queries)
    # Perform pooling. granite-embedding-30m-english uses CLS Pooling
    query_embeddings = model_output[0][:, 0]

# normalize the embeddings
query_embeddings = torch.nn.functional.normalize(query_embeddings, dim=1)

Granite Embedding with LangChain

This is how you could use our models for Retrieval using IBM LangChain. First, install LangChain dependencies

pip install git+https://github.com/ibm-granite-community/utils \
#     "langchain_community<0.3.0" \
#     langchain-huggingface \
#     langchain-milvus \
#     replicate \
#     wget

The below recipe, with granite-embedding-30m-english model, shows how to:

Setup an database: how to setup a local Milvus VectorDB, process the corpus to produce indexable documents, and ingest those documents using an embedding model.
Retrieve relevant passages from the database: how to use an embedding of the query to retrieve semantically similar passages.

from langchain_huggingface import HuggingFaceEmbeddings
from langchain_milvus import Milvus
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
import uuid
import os, wget

#load the embedding model
embeddings_model = HuggingFaceEmbeddings(model_name="ibm-granite/granite-embedding-30m-english")

#setup the vectordb
db_file = f"/tmp/milvus_{str(uuid.uuid4())[:8]}.db"
print(f"The vector database will be saved to {db_file}")
vector_db = Milvus(embedding_function=embeddings_model, connection_args={"uri": db_file}, auto_id=True)

#load example corpus file
filename = 'state_of_the_union.txt'
url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/foundation_models/state_of_the_union.txt'

if not os.path.isfile(filename):
  wget.download(url, out=filename)

loader = TextLoader(filename)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

#add processed documents to the vectordb
vector_db.add_documents(texts)

#search the vectordb with the query
query = "What did the president say about Ketanji Brown Jackson"
docs = vector_db.similarity_search(query)
print(docs[0].page_content)

Granite Time SeriesThe Granite Time Series models are compact pre-trained models for Multivariate Time-Series Forecasting, open-sourced by IBM Research.

⌘I

Overview
Model cards
Run locally with Ollama
Examples
Granite Embedding with sentence transformers
Granite Embedding with Hugging Face transformers
Granite Embedding with LangChain