Skip to main content

Model Collection

View the full Granite 4.1 collection on Hugging Face

GitHub Repository

Granite 4.1 models and documentation

Overview

Granite 4.1 is a family of dense language models available in three sizes: 3B, 8B, and 30B parameters. Each size is available in both base and instruction-tuned variants, with optional FP8 quantization for efficient deployment. Built with a dense architecture, Granite 4.1 demonstrates significant improvements over Granite 4.0 in tool calling, instruction following, coding capabilities, and mathematical reasoning.

Model Variants

  • granite-4.1-3b-base & granite-4.1-3b-instruct: Compact model optimized for edge deployment and resource-constrained environments
  • granite-4.1-8b-base & granite-4.1-8b-instruct: Balanced model for general-purpose enterprise applications
  • granite-4.1-30b-base & granite-4.1-30b-instruct: High-capacity model for complex reasoning and specialized tasks
All models are released under the Apache 2.0 license with cryptographic signatures, ISO certification, and full transparency disclosures.

Key Capabilities

Tool Calling: Granite 4.1 demonstrates strong ability to understand and execute tool-based instructions, enabling seamless integration with various software tools and APIs. This capability allows enterprises to create powerful AI-driven workflows and automate complex tasks. Instruction Following: Granite 4.1 exhibits improved comprehension and adherence to user instructions, ensuring reliable and accurate task completion for enterprise automation. Code Generation & Explanation: Granite 4.1 generates code snippets and explains complex codebases across multiple programming languages with higher accuracy, accelerating software development workflows. Mathematical Reasoning: Granite 4.1 tackles complex mathematical problems from basic arithmetic to advanced calculus and linear algebra, enabling automated calculation and decision-making.

Getting Started

First, install the required libraries:
pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

Generation

This is a simple example of how to use Granite-4.1-30B model:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model_path = "ibm-granite/granite-4.1-30b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

# change input text as desired
chat = [
    { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)

# generate output tokens
output = model.generate(**input_tokens,
                        max_new_tokens=100)

# decode output tokens into text
output = tokenizer.batch_decode(output)

# print output
print(output[0])
Expected output:
<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>IBM Research - Almaden, San Jose, California<|end_of_text|>

Tool Calling

Granite-4.1-30B comes with enhanced tool calling capabilities, enabling seamless integration with external functions and APIs. Define a list of tools using OpenAI’s function definition schema:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model_path = "ibm-granite/granite-4.1-30b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a specified city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "Name of the city"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

# change input text as desired
chat = [
    { "role": "user", "content": "What's the weather like in Boston right now?" },
]
chat = tokenizer.apply_chat_template(chat, \
                                     tokenize=False, \
                                     tools=tools, \
                                     add_generation_prompt=True)

# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)

# generate output tokens
output = model.generate(**input_tokens,
                        max_new_tokens=100)

# decode output tokens into text
output = tokenizer.batch_decode(output)

# print output
print(output[0])
Expected output:
<|start_of_role|>system<|end_of_role|>You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather for a specified city.", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "Name of the city"}}, "required": ["city"]}}}
</tools>

For each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>What's the weather like in Boston right now?<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|><tool_call>
{"name": "get_current_weather", "arguments": {"city": "Boston"}}
</tool_call><|end_of_text|>