Guide to Fine-Tuning the Mistral 7B LLM with Your Own Data

Fine-tuning a language model is an exciting journey into customizing a pre-trained model for specific uses. In this comprehensive guide, we’ll explore the process of fine-tuning the Mistral 7B LLM, diving into the theoretical foundations that support this adaptation.

Understanding Mistral 7B LLM

The Mistral 7B LLM is a powerful member of the GPT (Generative Pre-trained Transformer) family, known for its impressive natural language processing abilities. What makes Mistral 7B special is its massive size, with a remarkable 7 billion parameters. This large parameter count shows its ability to understand and generate text, making it a valuable tool for various language tasks.

At its core, the Mistral 7B LLM has some key features:

- Pre-Trained Foundation: Before fine-tuning, the model goes through a pre-training phase where it learns from a vast amount of text. This helps it grasp the nuances of language, including grammar and meaning, resulting in a strong and versatile language model.

- Self-Attention Mechanism: Mistral 7B uses a self-attention mechanism, an essential part of the Transformer architecture. This feature helps the model understand the relationships between words in a sentence while considering context. It enables the model to generate coherent and contextually relevant text.

- Transfer Learning Paradigm: Mistral 7B exemplifies transfer learning in deep learning. It uses the knowledge gained during pre-training to excel in various tasks. Fine-tuning connects the model's general understanding of language to specific applications, enhancing its performance.

A Theoretical Exploration of Fine-Tuning the Mistral 7B LLM

Step 1: Set Up Your Environment

Before starting the fine-tuning process, it's important to set up the right environment. Here are the key steps to prepare:

1. Computational Power: Mistral 7B LLM requires significant computational resources. For effective training, it's best to use GPUs or TPUs.

2. Deep Learning Frameworks: You'll need a popular deep learning framework like PyTorch or TensorFlow to implement the fine-tuning process.

3. Model Access: Make sure you have access to the Mistral 7B model weights or a pre-trained version to get started.

4. Domain-Specific Data: Having a substantial dataset that is relevant to your specific area is crucial. The quality and quantity of this data will greatly influence the success of your fine-tuning efforts.

Step 2: Preparing Data for Fine-Tuning

Data preparation is a crucial first step for fine-tuning:

1. Data Collection: Gather text data that is relevant to your specific application or domain. This data will be the foundation for fine-tuning the model.

2. Data Cleaning: Pre-process the data by removing any noise, correcting errors, and ensuring a consistent format. Clean data is essential for a successful fine-tuning process.

3. Data Splitting: Divide the dataset into training, validation, and test sets, typically following the standard split of 80% for training, 10% for validation, and 10% for testing. This structure helps evaluate the model's performance effectively.

Step 3: Fine-Tuning the Model - The Theory

Fine-tuning is a complex process that involves several key theoretical concepts:

1. Loading a Pre-trained Model: The Mistral 7B model is imported into your chosen deep learning framework. This model has a rich understanding of language structures due to its pre-training phase.

2. Tokenization: Tokenization is essential as it transforms the text data into a format compatible with the model. This step ensures your domain-specific data can be smoothly integrated into the pre-trained architecture.

3. Defining the Fine-Tuning Task: This involves clearly specifying the task you want to tackle, whether it's text classification, text generation, or another language-related task. Defining the task helps the model understand its objectives.

4. Data Loaders: Set up data loaders for training, validation, and testing. These loaders enable efficient training by feeding data in batches, allowing the model to learn effectively from the dataset.

5. Fine-Tuning Configuration: This step involves choosing hyperparameters like learning rate, batch size, and the number of training epochs. These settings determine how the model adapts to your specific task and can be fine-tuned for better performance.

6. Fine-Tuning Loop: Central to fine-tuning is the concept of minimizing a loss function, which quantifies the difference between the model's predictions and the actual outcomes. By iteratively adjusting the model's parameters, it gradually aligns with the target task.

Step 4: Evaluation and Validation - Theoretical Insights

After fine-tuning, it's essential to rigorously evaluate the model's performance:

1. Test Set: This step involves using the test set created in Step 2 to measure the model's performance in real-world scenarios. Metrics like accuracy, precision, recall, and F1-score are used to gain insights into its effectiveness and ability to generalize.

2. Iterative Improvement: Based on the evaluation results, you may need to revisit the fine-tuning process. Adjust hyperparameters and data as needed, using the theoretical knowledge gained from assessing the model’s performance to guide your modifications.

Step 5: Deployment - A Theoretical Perspective

Once the fine-tuned model meets your performance criteria, it’s ready for deployment. The infrastructure you choose for serving model predictions should be efficient, scalable, and responsive to effectively support your application or service. This ensures that the model can handle varying loads and deliver results promptly, providing a seamless experience for users.

Tutorial: Fine-Tuning Mistral 7B using QLoRA

In this tutorial, we'll guide you through fine-tuning the Mistral 7B model using the QLoRA (Quantization and LoRA) method. This technique combines quantization with LoRA adapters to enhance the model's performance. We'll also utilize the PEFT library from Hugging Face to streamline the fine-tuning process.

Note: Before we get started, make sure you have access to a GPU environment with at least 24GB of memory and all necessary dependencies installed.

If you need additional GPU resources for the upcoming tutorials, consider checking out E2E CLOUD. They offer a variety of GPUs that are perfect for more advanced LLM-based applications.

0. Install necessary dependencies

# You only need to run this once per machine
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q -U datasets scipy ipywidgets

1. Accelerator

First, we set up the accelerator using the FullyShardedDataParallelPlugin and Accelerator. While this step might not be necessary for QLoRA, it's included for your reference. If you prefer to skip this setup, you can simply comment it out and continue without an accelerator.

from accelerate import FullyShardedDataParallelPlugin, Accelerator
from torch.distributed.fsdp.fully_sharded_data_parallel import FullOptimStateDictConfig, FullStateDictConfig
fsdp_plugin = FullyShardedDataParallelPlugin(
    state_dict_config=FullStateDictConfig(offload_to_cpu=True, rank0_only=False),
    optim_state_dict_config=FullOptimStateDictConfig(offload_to_cpu=True, rank0_only=False),
)
accelerator = Accelerator(fsdp_plugin=fsdp_plugin)

2. Load Dataset

We load a meaning representation dataset for fine-tuning the Mistral 7B model. This dataset helps the model learn a specific form of desired output. If you have your own dataset, feel free to replace it as needed.

from datasets import load_dataset


train_dataset = load_dataset('gem/viggo', split='train')
eval_dataset = load_dataset('gem/viggo', split='validation')
test_dataset = load_dataset('gem/viggo', split='test')

print(train_dataset)
print(eval_dataset)
print(test_dataset)

3. Load Base Model

Now, we load the Mistral 7B base model with 4-bit quantization. This approach helps reduce the model's memory usage while maintaining performance.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig


base_model_id = "mistralai/Mistral-7B-v0.1"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config)

4. Tokenization

Next, we set up the tokenizer and create functions for tokenization. We'll use self-supervised fine-tuning to align the labels with the input_ids, ensuring the model learns effectively from the data.

tokenizer = AutoTokenizer.from_pretrained(
    base_model_id,
    model_max_length=512,
    padding_side="left",
    add_eos_token=True)
tokenizer.pad_token = tokenizer.eos_token

def tokenize(prompt):
    result = tokenizer(
        prompt,
        truncation=True,
        max_length=512,
        padding="max_length",
    )
    result["labels"] = result["input_ids"].copy()
    return result


def generate_and_tokenize_prompt(data_point):
    full_prompt =f"""Given a target sentence construct the underlying meaning representation of the input sentence as a single function with attributes and attribute values.
This function should describe the target string accurately and the function must be one of the following ['inform', 'request', 'give_opinion', 'confirm', 'verify_attribute', 'suggest', 'request_explanation', 'recommend', 'request_attribute'].
The attributes must be one of the following: ['name', 'exp_release_date', 'release_year', 'developer', 'esrb', 'rating', 'genres', 'player_perspective', 'has_multiplayer', 'platforms', 'available_on_steam', 'has_linux_release', 'has_mac_release', 'specifier']


### Target sentence:
{data_point["target"]}


### Meaning representation:
{data_point["meaning_representation"]}
"""
    return tokenize(full_prompt)


  def generate_and_tokenize_prompt(data_point):
    full_prompt =f"""Given a target sentence construct the underlying meaning representation of the input sentence as a single function with attributes and attribute values.
This function should describe the target string accurately and the function must be one of the following ['inform', 'request', 'give_opinion', 'confirm', 'verify_attribute', 'suggest', 'request_explanation', 'recommend', 'request_attribute'].
The attributes must be one of the following: ['name', 'exp_release_date', 'release_year', 'developer', 'esrb', 'rating', 'genres', 'player_perspective', 'has_multiplayer', 'platforms', 'available_on_steam', 'has_linux_release', 'has_mac_release', 'specifier']


### Target sentence:
{data_point["target"]}


### Meaning representation:
{data_point["meaning_representation"]}
"""
    return tokenize(full_prompt)


tokenized_train_dataset = train_dataset.map(generate_and_tokenize_prompt)
tokenized_val_dataset = eval_dataset.map(generate_and_tokenize_prompt)

print(tokenized_train_dataset[4]['input_ids'])

print(len(tokenized_train_dataset[4]['input_ids']))


print("Target Sentence: " + test_dataset[1]['target'])
print("Meaning Representation: " + test_dataset[1]['meaning_representation'] + "\n")

eval_prompt = """Given a target sentence construct the underlying meaning representation of the input sentence as a single function with attributes and attribute values.
This function should describe the target string accurately and the function must be one of the following ['inform', 'request', 'give_opinion', 'confirm', 'verify_attribute', 'suggest', 'request_explanation', 'recommend', 'request_attribute'].
The attributes must be one of the following: ['name', 'exp_release_date', 'release_year', 'developer', 'esrb', 'rating', 'genres', 'player_perspective', 'has_multiplayer', 'platforms', 'available_on_steam', 'has_linux_release', 'has_mac_release', 'specifier']


### Target sentence:
Earlier, you stated that you didn't have strong feelings about PlayStation's Little Big Adventure. Is your opinion true for all games which don't have multiplayer?


### Meaning representation:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=256, pad_token_id=2)[0], skip_special_tokens=True))

5. Set Up LoRA

Now, we prepare the model for fine-tuning by applying LoRA adapters to the linear layers. This helps optimize the model’s performance while keeping the training efficient.

from peft import prepare_model_for_kbit_training
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

from peft import LoraConfig, get_peft_model
config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
    bias="none",
    lora_dropout=0.05,  # Conventional
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, config)
print_trainable_parameters(model)
# Apply the accelerator. You can comment this out to remove the accelerator.
model = accelerator.prepare_model(model)


print(model)

6. Run Training

In this step, we begin training the fine-tuned model. Feel free to adjust the training parameters to suit your specific needs.

if torch.cuda.device_count() > 1: # If more than 1 GPU
    model.is_parallelizable = True
    model.model_parallel = True


import transformers
from datetime import datetime


project = "viggo-finetune"
base_model_name = "mistral"
run_name = base_model_name + "-" + project
output_dir = "./" + run_name


tokenizer.pad_token = tokenizer.eos_token


trainer = transformers.Trainer(
    model=model,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_val_dataset,
    args=transformers.TrainingArguments(
        output_dir=output_dir,
        warmup_steps=5,
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        max_steps=1000,
        learning_rate=2.5e-5, # Want about 10x smaller than the Mistral learning rate
        logging_steps=50,
        bf16=True,
        optim="paged_adamw_8bit",
        logging_dir="./logs",        # Directory for storing logs
        save_strategy="steps",       # Save the model checkpoint every logging step
        save_steps=50,                # Save checkpoints every 50 steps
        evaluation_strategy="steps", # Evaluate the model every logging step
        eval_steps=50,               # Evaluate and save checkpoints every 50 steps
        do_eval=True,                # Perform evaluation at the end of training
        report_to="wandb",           # Comment this out if you don't want to use weights & baises
        run_name=f"{run_name}-{datetime.now().strftime('%Y-%m-%d-%H-%M')}"          # Name of the W&B run (optional)
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()


base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,  # Mistral, same as before
    quantization_config=bnb_config,  # Same quantization config as before
    device_map="auto",
    trust_remote_code=True,
    use_auth_token=True
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

7. Try the Trained Model

After training, you can use the fine-tuned model for inference. First, load the base Mistral model from the Hugging Face Hub, then load the QLoRA adapters from the best-performing checkpoint directory.


from peft import PeftModel
ft_model = PeftModel.from_pretrained(base_model, "mistral-viggo-finetune/checkpoint-1000")

ft_model.eval()
with torch.no_grad():
    print(tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=100, pad_token_id=2)[0], skip_special_tokens=True))

Conclusion

Fine-tuning the Mistral 7B LLM is an engaging blend of theory and practical application. By grasping the theoretical framework behind this process, you can better appreciate the extensive customization options available with such a powerful language model. Keep in mind that achieving optimal performance often requires experimentation and refinement. This guide provides you with the knowledge needed to tailor Mistral 7B to meet your specific linguistic needs.