Notebooks
T
Together
Finetuning Guide

πŸŽ›οΈ Fine-tuning Guide

Open In Colab

Large Language Models (LLMs) offer powerful general capabilities, but often require fine-tuning to excel at specific tasks or understand domain-specific language. Fine-tuning adapts a trained model to a smaller, targeted dataset, enhancing its performance for your unique needs.

This notebook provides a step-by-step guide to fine-tuning models using the Together AI platform. We will walk through the entire process, from preparing your data to evaluating your fine-tuned model.

We will cover:

  1. Dataset Preparation: Loading a standard dataset, transforming it into the required format for supervised fine-tuning on Together AI, and uploading your formatted dataset to Together AI Files.
  2. Fine-tuning Job Launch: Configuring and initiating a fine-tuning job using the Together AI API.
  3. Job Monitoring: Checking the status and progress of your fine-tuning job.
  4. Inference: Using your newly finetuned model via the Together AI API for predictions.
  5. Evaluation: Comparing the performance of the finetuned model against the base model on a test set.

By following this guide, you'll gain practical experience in creating specialized LLMs tailored to your specific requirements using Together AI.

Setup and Installation


First, install the necessary Python libraries. We need:

  • together: The official Together AI Python client for interacting with the API (fine-tuning, inference, files, etc.).
  • datasets: A library from Hugging Face for easily downloading and manipulating datasets.
  • transformers: Although we won't be training locally, this can be useful for running evals and other utilities if needed.
  • tqdm: To enable interactive elements like progress bars within the notebook.
[1]

1. Dataset Preparation


Fine-tuning requires data formatted in a specific way. We'll use the a conversational dataset as an example - here the goal of the fine-tuning is to improve the model on multi-turn conversations.

First we need to transform this dataset into the chat format expected by Together AI for supervised fine-tuning.

The required format is a JSON object per line, where each object contains a list of conversation turns under the "messages" key.

Each message must have a "role" (system, user, or assistant) and "content".

Conversation Data Example:

{"messages": [{"role": "system", "content": "You are a helpful assistant."}, 
              {"role": "user", "content": "Hello!"}, 
              {"role": "assistant", "content": "Hi! How can I help you?"}]}

πŸ”—Depending on what type of fine-tuning you want to perform you can also pass in instruction data, preference data or even simple text data.

Load Raw Dataset

We use the datasets library to download the CoQA dataset from the Hugging Face Hub.

Let's examine the structure of the raw dataset. CoQA provides a story, a series of questions related to the story, and corresponding answers.

[2]
[3]

Transform Data to Chat Format

Now, we need to convert each row of the CoQA dataset into the required chat format ([{'role': ..., 'content': ...}, ...]).

We'll create a function map_coqa_to_chat_format that takes a row from the dataset and structures it as a conversation:

  1. A system message containing the story (context).
  2. Alternating user (question) and assistant (answer) messages.
[4]

We apply this transformation function to the entire dataset using the .map() method. We also remove the original columns as they are no longer needed after transformation.

[5]

Let's check the structure of our transformed dataset. It should now only contain the messages column.

Here's an example of a single processed data point:

[6]
Dataset({
,    features: ['messages'],
,    num_rows: 7199
,})

Write the dataset out to a json file:

[7]
Creating json from Arrow format:   0%|          | 0/8 [00:00<?, ?ba/s]
23777505

Upload Data to Together AI

Now that we have our formatted coqa_prepared_train.jsonl files, we need to check if they meet the format specification and then upload them to Together AI. Fine-tuning jobs read data directly from your uploaded files.

We use the check_file function to check the file and files.upload() method. This returns information about the uploaded file, including its ID, which we'll need later to start the fine-tuning job.

[8]
[9]
{
  "is_check_passed": true,
  "message": "Checks passed",
  "found": true,
  "file_size": 23777505,
  "utf8": true,
  "line_type": true,
  "text_field": true,
  "key_value": true,
  "has_min_samples": true,
  "num_samples": 7199,
  "load_json": true,
  "filetype": "jsonl"
}
[10]
Uploading file coqa_prepared_train.jsonl: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 23.8M/23.8M [00:01<00:00, 15.7MB/s]
Train file response: file-9554964d-5711-419a-bcc2-c4edaaa07ee3

2. Launch Fine-tuning Job


With our data uploaded, we can now launch the fine-tuning job using together.Finetune.create().

Key parameters:

  • model: The base model you want to finetune (e.g., 'togethercomputer/llama-2-7b-chat'). Choose from the models available for fine-tuning on Together AI.
  • training_file: The ID of your uploaded training JSONL file.
  • validation_file: The ID of your uploaded validation JSONL file (optional, but highly recommended for monitoring).
  • suffix: A custom string added to the base model name to create your unique finetuned model name (e.g., my-coqa-ft). Keep it short and descriptive.
  • n_epochs: The number of times the model will see the entire training dataset.
  • n_checkpoints: Number of checkpoints to save during training (useful for resuming or selecting the best model). Set to 1 if you only need the final model.
  • learning_rate: Controls how much the model weights are updated during training. Needs tuning.
  • batch_size: Number of training examples processed in one iteration. Depends on model size and available resources.

πŸ”— For an exhaustive list of all the available fine-tuning parameters refer to the Together AI Fine-tuning API Reference docs.

πŸ”— For a list of all the models you can fine-tune on the Together AI platform see docs here.

[ ]

3. Monitor Fine-tuning Job


Fine-tuning can take time depending on the model size, dataset size, and hyperparameters. You can monitor and alter the job's progress using the following methods:

  • List all jobs: client.fine_tuning.list()
  • Status of a Job:client.fine_tuning.retrieve(id=ft_resp.id)
  • List all events for a Job: client.fine_tuning.list_events(id=ft_resp.id): Retrieves logs and events generated during the job
  • Cancel job: client.fine_tuning.cancel(id=ft_resp.id)
  • Download model after done: client.fine_tuning.download(id=ft_resp.id)

Once the job is complete (status == 'completed'), the response from retrieve will contain the name of your newly created finetuned model. It follows the pattern: <your-account>/<base-model-name>:<suffix>:<job-id>.

[ ]
FinetuneJobStatus.STATUS_COMPLETED
[ ]
Fine tune request created
Job started at Wed Apr  9 19:48:05 UTC 2025
Model data downloaded for togethercomputer/Meta-Llama-3.1-8B-Instruct-Reference__TOG__FT at Wed Apr  9 19:48:07 UTC 2025
Data downloaded for togethercomputer/Meta-Llama-3.1-8B-Instruct-Reference__TOG__FT at $2025-04-09T19:48:14.918488
WandB run initialized.
Training started for model togethercomputer/Meta-Llama-3.1-8B-Instruct-Reference__TOG__FT
Epoch completed, at step 24
Epoch completed, at step 48
Epoch completed, at step 72
Training completed for togethercomputer/Meta-Llama-3.1-8B-Instruct-Reference__TOG__FT at Wed Apr  9 20:02:24 UTC 2025
Uploading adapter model
Compressing output model
Model compression complete
Uploading output model
Model upload complete
Job finished at Wed Apr  9 20:06:33 UTC 2025

πŸ”— You can also navigate to the WandB page linked in your fine-tuning dashboard to see the fine-tuning related loss curves and more.

4. Inference with Fine-tuned Model


Option 1: Serverless LoRA Inference

Now, let's use our finetuned model! We can call it just like any other model on the Together AI platform, by providing the unique fine-tuned model output_name we retrieved from our fine-tuning job earlier.

πŸ”— See the list of all models that support LoRA Inference.

[ ]
Fine-tuned model output_name: zainhas/Meta-Llama-3.1-8B-Instruct-Reference-test1_8b-e5a0fb5d
[ ]
The capital of France is Paris.

You can also prompt the model in our playground, if it support serverless LoRA Inference, by going to your your models dashboard and clicking "OPEN IN PLAYGROUND".

Open in Playground button

Option 2: Deploy Dedicated Endpoint

Another way to run your fine-tuned model is to deploy it on a custom dedicated endpoint.

Once your fine-tuning job completes, you should see your new model in your models dashboard. You can click the "+ CREATE DEDICATED ENDPOINT" button to deploy the selected model to a DE.

You can then select the hardware configuration for your dedicated endpoint including the min and max replicas which increases the maximum QPS the deployment can support.

You can also deploy the model to a DE programmatically using the Endpoints API via the SDK:

response = client.endpoints.create(
    display_name="Fine-tuned Meta Llama 3.1 8B Instruct 04-09-25",
    model="zainhas/Meta-Llama-3.1-8B-Instruct-Reference-test1_8b-e5a0fb5d",
    hardware="4x_nvidia_h100_80gb_sxm",
    autoscaling={
        min_replicas: 1,
        max_replicas: 1
    }
)

print(response)

⚠️ If you run this code it will deploy a dedicated endpoint for you. For an detailed documentation around how to deploy, delete and modify endpoints see the Endpoints API Reference.

Once deployed you'll be able to see the model details under your Endpoints Dashboard:

[ ]
The capital of France is Paris.

5. Evaluation


To assess the impact of fine-tuning, we can compare the responses of our finetuned model with the original base model on the same prompt in out test set.

This provides a way to measure improvements, after fine-tuning, to the model's behavior for our specific task (conversational QA based on a story).

[15]
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[16]
[17]
Dataset({
,    features: ['source', 'story', 'questions', 'answers'],
,    num_rows: 50
,})

We'll use the code below to generate answers from the baseline and fine-tuned model.

[18]

We'll use the function below to calculate the exact match and F1 score metrics

[19]
[20]
  0%|          | 0/50 [00:00<?, ?it/s]
[21]
  0%|          | 0/50 [00:00<?, ?it/s]
Model: Baseline, 

EM: 0.0175, F1: 0.18467257739023207
[22]
zainhas/Meta-Llama-3.1-8B-Instruct-Reference-test1_8b-e5a0fb5d
  0%|          | 0/50 [00:00<?, ?it/s]
[23]
  0%|          | 0/50 [00:00<?, ?it/s]
Model: zainhas/Meta-Llama-3.1-8B-Instruct-Reference-test1_8b-e5a0fb5d, 

EM: 0.31, F1: 0.41019649357988347
Llama 3.1 8BEMF1
Original0.010.18
Fine-tuned0.310.41

We can see that the fine-tuned model performs twice as well on the test set when measuring the F1 score.

For a more detailed guide on Fine-tuning follow our docs here.