This notebook demonstrates end-to-end Supervised Fine-Tuning (SFT) of gpt-4.1-mini on Microsoft Foundry, applied to our trail gear recommendation agent.
Execute each cell in sequence. Use ⚙ Apply in Section 4 to lock in hyperparameters before submitting.
Adventure Works customers use the trail guide agent to get gear recommendations for upcoming hikes. The agent knows its products well — but its response format is wildly inconsistent:
Pack: Osprey Stratos 24 | 24L | 1.25 kg | $160 | Denver (5)The downstream app parses the pipe-separated format to populate product cards, so format inconsistency directly causes broken UI and failed integrations.
SFT is the right tool when:
✅ SFT is the right tool when you know exactly what the output should look like and can write it down.
Each record in training.jsonl is a chat conversation that demonstrates the correct format:
{
"messages": [
{"role": "system", "content": "You are the Adventure Works gear advisor..."},
{"role": "user", "content": "Gear list for a 3-day rocky trail with heavy load?"},
{"role": "assistant", "content": "Pack: Osprey Atmos AG 65 | 65L | 2.0 kg | $340 | Denver (4)\nShoes: Salomon X Ultra 4 GTX | Wide | $189.95 | Boulder (4)\nPoles: Black Diamond Trail Ergo | Adj | $79.95 | Denver (3)\nComplements: blister kit, trekking gaiters"}
]
}
Trail Guide SFT Dataset — 450 examples ------------------------------------------------------------ User : Gear list for a 3-day rocky trail with heavy load? Assistant : Pack: Osprey Atmos AG 65 | 65L | 2.0 kg | $340 | Denver (4)… ------------------------------------------------------------ User : Compare Merrell Moab 3 vs Salomon X Ultra 4 for rocky trails. Assistant : Pack: N/A | — | — | — | — … ------------------------------------------------------------ User : What gear do I need for a winter day hike above 3000m? Assistant : Pack: Osprey Talon 22 | 22L | 0.86 kg | $140 | All (★)… ------------------------------------------------------------
Install all required packages. We need:
azure-ai-projects — Microsoft Foundry project clientazure-ai-evaluation — automated quality metricsazure-mgmt-cognitiveservices — deploy the fine-tuned modelopenai — fine-tuning jobs API and inferencepython-dotenv — load environment config from .envCollecting azure-ai-projects>=2.0.0b1 ... ✅ Collecting azure-ai-evaluation>=1.13.0 ... ✅ Collecting azure-mgmt-cognitiveservices ... ✅ Collecting openai ... ✅ Collecting python-dotenv ... ✅ Successfully installed all packages. Note: you may need to restart the kernel to use updated packages.
✅ All libraries imported successfully Endpoint : https://myproject.api.azureml.ms/… Model : gpt-4.1-mini
✅ Connected to Microsoft Foundry Project
Upload the training and validation JSONL files to Microsoft Foundry. Each file is assigned a unique ID that will be referenced when creating the fine-tuning job.
.env.template to .env and fill in:
MICROSOFT_FOUNDRY_PROJECT_ENDPOINT, MODEL_NAME, AZURE_SUBSCRIPTION_ID, AZURE_RESOURCE_GROUP, AZURE_AOAI_ACCOUNTUploading training file… Uploading validation file… Training file ID : file-a1b2c3d4e5f6478a9b0c1d2e3f4a5b6c Validation file ID : file-b9c8d7e6f5a4b3c2d1e0f9a8b7c6d5e4
If your training data is already stored in Azure Blob Storage, you can import it directly without re-uploading, using the /openai/files/import preview endpoint.
Prerequisites: Generate SAS tokens with read permissions, then set:
TRAINING_FILE_BLOB_URL=https://<storage>.blob.core.windows.net/<container>/training.jsonl?<sas> VALIDATION_FILE_BLOB_URL=https://<storage>.blob.core.windows.net/<container>/validation.jsonl?<sas>
ℹ️ Blob URLs not set — using files uploaded in the previous cell.
Waiting for files to be processed… ✅ Files ready!
With our training data uploaded, we launch a supervised fine-tuning job. The job uses method.type = "supervised" and a suffix that gets appended to the deployed model name.
| Parameter | What it controls |
|---|---|
| n_epochs | Full passes through the training data. More epochs → better format retention, but risks overfitting to training examples. Typical range: 1–10. |
| batch_size | Examples per gradient step. Smaller = more updates per epoch, often better for format tasks. Typical range: 1–8. |
| learning_rate_multiplier | Scales the default learning rate. Too high risks destroying existing knowledge; too low slows format adoption. Typical range: 0.1–2.0. |
🎛 Try it: Adjust the hyperparameters below, click ⚙ Apply, then re-run the job and monitor cells!
ℹ️ Click ⚙ Apply (above) to lock in your hyperparameters before submitting the job.
⏳ Submitting SFT job to Azure OpenAI…
Training is complete. We retrieve the fine-tuned model ID, deploy it to Azure OpenAI, and run a quick inference test to verify format consistency has improved.
Did your hyperparameter choices pay off? Run the cells below to find out!
⏳ Waiting for training to complete…
⏳ Waiting for training to complete before deploying…
⏳ Waiting for deployment to be ready…
⏳ Run the training and deployment cells first…
Run the training and evaluation cells above to see your results, then this section will summarise what you learned.