Use Granite Guardian to detect Hate, Abuse, and Profanity (HAP).
This recipe showcases a Granite Guardian model designed to detect hate, abuse, and profanity, either in a prompt or LLM output. This is an example of a “guard rail” used in generative AI applications for safety.The model used in this recipe has been fine-tuned on several English HAP benchmarks and utilizes the slate.38m.english.distilled base model.You will need a Hugging Face token to run this recipe in Colab. Instructions for obtaining this credential can be found here.