Skip to main content

Overview

In this guide, we’ll use vLLM running inside a container to serve Granite models.

Prerequisites


1. Pull the vLLM container image

docker pull vllm/vllm-openai:latest
(For stability, you may pin a version tag, e.g., vllm/vllm-openai:v0.10.2. Granite models require vllm version 0.10.2 and above.)

2. Run Granite in the container

Run the container with your Hugging Face cache mounted:
docker run --runtime nvidia --gpus all     -v ~/.cache/huggingface:/root/.cache/huggingface     -p 8000:8000     vllm/vllm-openai:latest     --model ibm-granite/granite-4.0-h-small
You can pre-download the model into ~/.cache/huggingface using:
huggingface-cli download ibm-granite/granite-4.0-h-small
If not pre-downloaded, the model will be fetched automatically when vLLM starts and cached in the mounted directory.

3. Run a sample request

curl -X POST http://localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "ibm-granite/granite-4.0-h-small",
        "messages": [
          {"role": "user", "content": "How are you today?"}
        ]
      }'

4. Enabling tool calling and other extended capabilities

To run vllm with the Granite 4.0 models and enabling capabilities such as tool calling, use the additional parameters --tool-call-parser hermes and --enable-auto-tool-choice. Refer to the vLLM documentation here for more details on these and other parameters. Now run the container with the added parameters:
docker run --runtime nvidia --gpus all     -v ~/.cache/huggingface:/root/.cache/huggingface     -p 8000:8000     vllm/vllm-openai:latest     --model ibm-granite/granite-4.0-h-small --tool-call-parser hermes --enable-auto-tool-choice
Once the container is up, you can start running requests using the OpenAI API. Refer to the documentation on OpenAI API tool calling for examples. To run vllm with the Granite 3 models and tool calling, use the additional parameters specified in the vLLM documentation here as part of the docker run command share in Section 2.
I