Skip to main content

Overview

In this guide, we’ll use Ollama, an open-source tool that makes it easy to download and run AI models locally.

1. Install Ollama

The easiest way is to install the desktop app.
You can also install it via Homebrew:
brew install ollama
Or download the latest release directly from the GitHub releases page.

2. Start Ollama

Once installed, start the Ollama service:
ollama serve
This runs the background service that manages models.

3. Download Granite Models

Ollama supports a range of IBM Granite models. Larger models give better results but require more resources. To download Granite 4:
ollama pull granite4

4. Run Granite

To start chatting with Granite:
ollama run granite4
If you want to use a different variant, replace the model name (e.g., granite4).

5. Notes on Context Length

By default, Ollama runs models with a short context length to save memory.
For longer conversations, you can adjust it by setting:
/set parameter num_ctx <desired_context_length>
The largest supported context for Granite 3.1 models is 131072 (128k).

6. Using the API

You can also interact with Granite programmatically using Ollama’s OpenAI-compatible API:
curl -X POST http://localhost:11434/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "granite4",
        "messages": [{"role": "user", "content": "How are you today?"}]
      }'
I