ibm-granite/granite-3.0-2b-instruct
model, a small instruction model, on a
custom ‘pirate-talk’ dataset using the qLoRA (Quantized Low-Rank Adaptation)
technique. This experiment serves two primary purposes:
- Educational: It showcases the process of adapting a pre-trained model to a new domain.
- Practical: It illustrates how a model’s interpretation of domain-specific terms (like ‘inheritance’) can shift based on the training data.
- Installing necessary dependencies
- Loading and exploring the dataset
- Setting up the quantized model
- Performing a sanity check
- Configuring and executing the training process
Dataset preparation
We’re using thealespalla/chatbot_instruction_prompts
dataset, which contains
various chat prompts and responses. This dataset will be used to create our
pirate talk
data set, where we keep the prompts the same, but we have a model
change all answers to be spoken like a pirate.
The dataset is split into training and testing subsets, allowing us to both
train the model and evaluate its performance on unseen data.
Model loading and quantization
Next, we load the quantized model. Quantization is a technique that reduces the model size and increases inference speed by approximating the weights of the model. We use theBitsAndBytes
library, which allows us to load the model in a
more memory-efficient format without significantly compromising performance.
This step is crucial as it enables us to work with a large language model within
the memory constraints of our hardware, making the fine-tuning process more
accessible and efficient.
Model sanity check
Before proceeding with fine-tuning, we perform a sanity check on the loaded model. We feed it an example prompt about ‘inheritance’ to ensure it produces intelligible and contextually appropriate responses. At this stage, the model should interpret ‘inheritance’ in a programming context, explaining how classes inherit properties and methods from one another. This output serves as a baseline, allowing us to compare how the model’s understanding shifts after fine-tuning on legal data. Note that the output is truncated because of us settingmax_new_tokens=100
Sample output
Training setup
In this section, we set up the training environment. Key steps include:- Defining the format for training prompts to align with the model’s expected inputs.
- Configuring the qLoRA technique, which allows us to fine-tune the model efficiently by only training a small number of additional parameters.
- Setting up the
SFTTrainer
(Supervised Fine-Tuning Trainer) with appropriate hyperparameters.
Training process
With all the preparations complete, we now start the training process. The model will be exposed to numerous examples from our legal dataset, gradually adjusting its understanding of legal concepts. We’ll monitor the training loss over time, which should decrease as the model improves its performance on the task. After training, we’ll save the fine-tuned model for future use.Saving the fine-tuned model
After the training process is complete, it’s crucial to save our fine-tuned model. This step ensures that we can reuse the model later without having to retrain it. We’ll save both the model weights and the tokenizer, as they work in tandem to process and generate text. Saving the model allows us to distribute it, use it in different environments, or continue fine-tuning it in the future. It’s a critical step in the machine learning workflow, preserving the knowledge our model has acquired through the training process.Persisting the model to Hugging Face
After fine-tuning and validating our model, a optional step is to make it easily accessible for future use or sharing with the community. The Hugging Face Hub provides an excellent platform for this purpose. Uploading our model to the Hugging Face Hub offers several benefits:- Easy sharing and collaboration with other researchers or developers
- Version control for your model iterations
- Integration with various libraries and tools in the Hugging Face ecosystem
- Simplified deployment options
Check with your own legal counsel before pushing models to Huggingface Hub.
Loading the fine-tuned model
Once we’ve saved our model, we can demonstrate how to load it back for inference. This step is crucial for real-world applications where you want to use your trained model without going through the training process again. Loading a saved model is typically much faster than training from scratch, making it efficient for deployment scenarios. We’ll show how to load both the model and the tokenizer, ensuring that we have all the components necessary for text generation.Loading the model from Hugging Face
Once a model is pushed to the Hugging Face Hub, loading it for inference or further fine-tuning becomes remarkably straightforward. This ease of use is one of the key advantages of the Hugging Face ecosystem. We’ll show how to load our fine-tuned model directly from the Hugging Face Hub using just a few lines of code. This process works not only for our own uploaded models but for any public model on the Hub, demonstrating the power and flexibility of this approach. Loading from the Hub allows you to:- Quickly experiment with different models
- Easily integrate state-of-the-art models into your projects
- Ensure you’re using the latest version of a model
- Access models from various devices or environments without needing to manually transfer files
Evaluation
Finally, we’ll evaluate our fine-tuned model by presenting it with the same ‘inheritance’ prompt we used in the sanity check. This comparison will reveal how the model’s understanding has shifted from a programming context to a legal one. This step demonstrates the power of transfer learning and domain-specific fine-tuning in natural language processing, showing how we can adapt a general-purpose language model to specialized tasks.Sample output
Execution times and performance metrics
Throughout this notebook, we’ve been tracking the time taken for various stages of our process. These execution times provide valuable insights into the computational requirements of fine-tuning a large language model. We’ll summarize the time taken for:- Loading the initial model
- Performing the sanity check
- Setting up the training environment
- The actual training process