Skip to main content
In this recipe, you’ll learn how to harness the power of advanced tools to build an AI-powered multimodal RAG pipeline. This tutorial will guide you through the following processes:
  1. Document preprocessing: Learn how to handle documents from various sources, parse and transform them into usable formats and store them in vector databases by using Docling. You will use a Granite LLM to generate image descriptions of images in the documents.
  2. RAG: Understand how to connect LLMs such as Granite with external knowledge bases to enhance query responses and generate valuable insights.
  3. LangChain for workflow integration: Discover how to use LangChain to streamline and orchestrate document processing and retrieval workflows, enabling seamless interaction between different components of the system.
This recipe uses three cutting-edge technologies:
  • Docling: An open-source toolkit used to parse and convert documents.
  • Granite: A state-of-the-art MLLM that provides robust natural language capabilities and a vision language model that provides image to text generation.
  • LangChain: A powerful framework used to build applications powered by language models, designed to simplify complex workflows and integrate external tools seamlessly.
You will need a Replicate API token to run this recipe in Colab. Instructions for obtaining this credential can be found here.
I