Skip to main content
This recipe showcases a Granite Guardian model designed to detect hate, abuse, and profanity, either in a prompt or LLM output. This is an example of a “guard rail” used in generative AI applications for safety. The model used in this recipe has been fine-tuned on several English HAP benchmarks and utilizes the slate.38m.english.distilled base model. You will need a Hugging Face token to run this recipe in Colab. Instructions for obtaining this credential can be found here.
HAP Detection examples may contain profanities.

Get started

Explore sample code in a GitHub repo
https://mintcdn.com/ibmgranite/m3dncz2KrKeb3pcV/granite/docs/images/icons8-google-colab.svg?fit=max&auto=format&n=m3dncz2KrKeb3pcV&q=85&s=fb39ef667c012d0fcef53599b6c5c0fd

Try it out

Execute sample code in Colab