Skip to main content
This recipe showcases a Granite Guardian model designed to detect hate, abuse, and profanity, either in a prompt or LLM output. This is an example of a “guard rail” used in generative AI applications for safety. The model used in this recipe has been fine-tuned on several English HAP benchmarks and utilizes the slate.38m.english.distilled base model. You will need a Hugging Face token to run this recipe in Colab. Instructions for obtaining this credential can be found here.
HAP Detection examples may contain profanities.
I