Skip to main content

Overview

In this guide, we’ll use Ollama, an open-source tool that makes it easy to download and run AI models locally.

1. Install Ollama

Install Ollama for Linux with:
curl -fsSL https://ollama.com/install.sh | sh
This will install Ollama on your system and set up a systemd service named ollama.service to run the server in the background.
The automatic installation script requires root access. For manual installation (without root), see the manual installation instructions.
To manage the service manually:
# Start the Ollama service
systemctl --user start ollama

# Enable Ollama to start automatically on login
systemctl --user enable ollama

# Check Ollama status
systemctl --user status ollama

2. Download Granite Models

Ollama supports a range of IBM Granite models. Larger models provide better results but require more resources. To download Granite 4:
ollama pull granite4

3. Run Granite

To start chatting with Granite:
ollama run granite4
If you want to use a different variant, replace the model name (e.g., granite4).

4. Notes on Context Length

By default, Ollama runs models with a short context length to save memory.
For longer conversations, you can adjust it by setting:
/set parameter num_ctx <desired_context_length>

5. Using the API

You can also interact with Granite programmatically using Ollama’s OpenAI-compatible API:
curl -X POST http://localhost:11434/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "granite4",
        "messages": [{"role": "user", "content": "How are you today?"}]
      }'
I