Design and optimize prompts
This exercise takes approximately 40 minutes.
Note: This lab assumes a pre-configured lab environment with Visual Studio Code, Azure CLI, and Python already installed.
Introduction
In this exercise, you’ll test prompt optimizations for the Adventure Works Trail Guide Agent using a Git-based experimentation workflow. You’ll first establish a quantified baseline using the current production prompt, then create experiment branches to test optimization variants. You’ll run automated scripts to capture agent responses, manually score quality, and compare results to make evidence-based decisions about which optimization to deploy to production.
Scenario: You’re operating the Adventure Works Trail Guide Agent with a v3 production prompt. Before optimizing, you’ll measure baseline performance. Then you’ll test if a token-optimized prompt (v4) can maintain quality while reducing costs by 40-50%. Finally, you’ll test the same optimized prompt with the GPT-4.1-mini model to see if it can further reduce costs while maintaining acceptable quality.
Set up the environment
To complete the tasks in this exercise, you need:
- Visual Studio Code
- Azure subscription with Microsoft Foundry access
- Git and GitHub account
- Python 3.9 or later
- Azure CLI and Azure Developer CLI (azd) installed
Tip: If you haven’t installed these prerequisites yet, see Lab 00: Prerequisites for installation instructions and links.
All steps in this lab will be performed using Visual Studio Code and its integrated terminal.
Create repository from template
You’ll start by creating your own repository from the template to practice realistic workflows.
- In a web browser, navigate to the template repository on GitHub at
https://github.com/MicrosoftLearning/mslearn-genaiops. - Select Use this template > Create a new repository.
- Enter a name for your repository (e.g.,
mslearn-genaiops). - Set the repository to Public or Private based on your preference.
- Select Create repository.
Clone the repository in Visual Studio Code
After creating your repository, clone it to your local machine.
- In Visual Studio Code, open the Command Palette by pressing Ctrl+Shift+P.
- Type Git: Clone and select it.
- Enter your repository URL:
https://github.com/[your-username]/mslearn-genaiops.git - Select a location on your local machine to clone the repository.
- When prompted, select Open to open the cloned repository in VS Code.
Deploy Microsoft Foundry resources
Now you’ll use the Azure Developer CLI to deploy all required Azure resources.
-
In Visual Studio Code, open a terminal by selecting Terminal > New Terminal from the menu.
-
Authenticate with Azure Developer CLI:
azd auth loginThis opens a browser window for Azure authentication. Sign in with your Azure credentials.
-
Authenticate with Azure CLI:
az loginSign in with your Azure credentials when prompted.
⚠️ Important In some environments, the VS Code integrated terminal may crash or close during the interactive login flow. If this happens, authenticate using explicit credentials instead:
az login --username <your-username> --password <your-password> -
Provision resources:
azd upWhen prompted, provide:
- Environment name (e.g.,
dev,test) - Used to name all resources - Azure subscription - Where resources will be created
- Location - Azure region (recommended: Sweden Central)
The command deploys the infrastructure from the
infra\folder, creating:- Resource Group - Container for all resources
- Foundry (AI Services) - The hub with access to models like GPT-4.1
- Foundry Project - Your workspace for creating and managing prompts
- Log Analytics Workspace - Collects logs and telemetry data
- Application Insights - Monitors performance and usage
- Environment name (e.g.,
-
Create a
.envfile with the environment variables:azd env get-values > .env⚠️ Important – File Encoding
After generating the
.envfile, make sure it is saved using UTF-8 encoding.In editors like VS Code, check the encoding indicator in the bottom-right corner.
If it shows UTF-16 LE (or any encoding other than UTF-8), click it, choose Save with Encoding, and select UTF-8.Using the wrong encoding may cause environment variables to be read incorrectly.
This creates a
.envfile in your project root with all the provisioned resource information.
Install Python dependencies
With your Azure resources deployed, install the required Python packages.
-
In the VS Code terminal, create and activate a virtual environment:
python -m venv .venv .venv\Scripts\Activate.ps1 -
Install the required dependencies:
python -m pip install -r requirements.txtThis installs necessary dependencies including:
azure-ai-projects- SDK for working with AI Foundryazure-identity- Azure authenticationpython-dotenv- Load environment variables
-
Add the agent configuration to your
.envfile:Open the
.envfile in your repository root and add:AGENT_NAME="trail-guide" MODEL_NAME="gpt-4.1"
Understand the experimental workflow
The repository contains the Trail Guide Agent, along with folders for testing and experiments:
mslearn-genaiops/
├── experiments/
│ ├── baseline/
│ │ ├── agent-responses.json # Raw agent outputs
│ │ └── evaluation.csv # Manual quality scores
│ ├── optimized-concise/
│ │ ├── agent-responses.json
│ │ └── evaluation.csv
│ └── gpt41mini/
│ ├── agent-responses.json
│ └── evaluation.csv
└── src/
├── agents/trail_guide_agent/
│ ├── trail_guide_agent.py # Agent creation script (modify per branch)
│ └── prompts/
│ ├── v1_instructions.txt # Basic prompt
│ ├── v2_instructions.txt # Enhanced prompt
│ ├── v3_instructions.txt # Production prompt (baseline)
│ └── v4_optimized_concise.txt # This lab: Token-optimized
└── tests/
├── test-prompts/ # Test scenarios for evaluation
│ ├── day-hike-gear.txt # Test: Essential day hike gear
│ ├── overnight-camping.txt # Test: First overnight camping
│ ├── three-day-backpacking.txt # Test: Extended trip safety
│ ├── winter-hiking.txt # Test: Seasonal differences
│ └── trail-difficulty.txt # Test: Beginner assessment
└── run_batch_tests.py # Batch script to test agent with all prompts
Workflow per experiment:
- Create experiment branch
- Modify
trail_guide_agent.pyto point to the prompt variant being tested - Run
trail_guide_agent.pyto create/update agent version - Run
src/tests/run_batch_tests.pyto test agent with all prompts - Manually score responses in CSV
- Compare results and merge winner
Establish baseline
Before testing optimizations, establish a baseline by deploying and evaluating the current production prompt (v3).
Why baseline first?
The baseline provides:
- Reference point - Quantify actual improvement from optimizations
- Quality floor - Ensure experiments don’t degrade below current performance
- Token benchmark - Measure cost reduction from optimization
Deploy baseline agent
-
Verify you’re on the main branch:
git checkout main -
Verify
src/agents/trail_guide_agent/trail_guide_agent.pypoints to v3 (should be the default on main branch):# Line 14 should be: prompt_file = Path(__file__).parent / 'prompts' / 'v3_instructions.txt' # Line 23 should be: model=os.getenv("MODEL_NAME", "gpt-4.1"), -
Deploy the baseline agent:
python src/agents/trail_guide_agent/trail_guide_agent.pyThis creates the “trail-guide-agent” with your v3 prompt.
Run baseline tests
-
Run batch tests to capture baseline responses:
python src/tests/run_batch_tests.py baselineThis tests all 5 prompts and saves results to
experiments/baseline/agent-responses.json. -
Review the captured responses:
cat experiments/baseline/agent-responses.json
Score baseline responses
Create your baseline evaluation scores.
-
Check if it created or create
experiments/baseline/evaluation.csv:test_prompt,agent_response_excerpt,intent_resolution,relevance,groundedness,comments day-hike-gear,"For a summer day hike in moderate terrain, essential gear includes...",5,5,4,Clear and comprehensive gear list overnight-camping,"For your first overnight camping trip, start with these essentials...",4,4,5,Good beginner advice with safety focus three-day-backpacking,"Critical safety considerations for October mountain backpacking...",5,5,5,Excellent detailed safety guidance winter-hiking,"Winter hiking requires additional gear and precautions...",4,4,4,Good comparison of seasonal differences trail-difficulty,"To assess trail difficulty for your fitness level as a beginner...",5,5,5,Helpful framework for self-assessmentTip: Use the actual responses from
agent-responses.jsonto score accurately.
You now have a quantified baseline to compare optimizations against.
Run optimization experiment 1: Token reduction
Test the optimized-concise prompt (v4) which reduces token usage while maintaining quality.
Create experiment branch
Each optimization experiment lives in its own branch, keeping experimental changes separate from your production agent.
-
In the VS Code terminal, ensure you’re in the repository root:
cd /path/to/mslearn-genaiops -
Ensure you’re on the main branch with latest changes:
git checkout main git pull origin main -
Create an experiment branch for testing the optimized-concise prompt:
git checkout -b experiment/optimized-conciseNote: Using
experiment/prefix clearly identifies this as an experimental change.
Verify test prompts exist
The test prompts are already created in the src/tests/test-prompts/ folder.
-
List the test prompt files:
ls src/tests/test-prompts/You should see:
day-hike-gear.txt- Essential gear questionovernight-camping.txt- First camping trip prepthree-day-backpacking.txt- Extended trip safetywinter-hiking.txt- Seasonal gear differencestrail-difficulty.txt- Beginner trail assessment
-
Examine one of the test prompts:
cat src/tests/test-prompts/day-hike-gear.txtOutput:
What essential gear do I need for a summer day hike in moderate terrain?
These test prompts represent realistic user scenarios you’ll use to evaluate each prompt variant.
Configure agent for the experiment
Modify the agent script to use the optimized-concise prompt (v4).
-
Open
src/agents/trail_guide_agent/trail_guide_agent.pyand update line 14 to point to v4:# Change from: prompt_file = Path(__file__).parent / 'prompts' / 'v3_instructions.txt' # To: prompt_file = Path(__file__).parent / 'prompts' / 'v4_optimized_concise.txt' -
Save the file and commit this change to your experiment branch:
git add src/agents/trail_guide_agent/trail_guide_agent.py git commit -m "Configure agent to use v4 optimized-concise prompt"
Deploy agent and run batch tests
Deploy the agent with the v4 prompt and capture responses from all test prompts.
-
Create/update the agent version from the repository root:
python src/agents/trail_guide_agent/trail_guide_agent.pyExpected output:
Agent created (id: asst_abc123, name: trail-guide, version: 2) -
Run the batch test script to test with all prompts:
python src/tests/run_batch_tests.py optimized-conciseThe script will:
- Find the deployed agent by name
- Run each test prompt through the agent
- Capture responses with token usage metadata
- Save results to
experiments/optimized-concise/agent-responses.json
Expected output:
Running 5 test prompts for experiment: optimized-concise ================================================================================ Using agent: trail-guide (id: asst_abc123, version: 2) Testing: day-hike-gear Prompt: What essential gear do I need for a summer day hike... Response captured (245 tokens) Testing: overnight-camping Prompt: How should I prepare for my first overnight camping... Response captured (312 tokens) Testing: three-day-backpacking Prompt: I'm planning a 3-day backpacking trip in the mountains... Response captured (385 tokens) Testing: winter-hiking Prompt: What additional gear and precautions do I need for... Response captured (298 tokens) Testing: trail-difficulty Prompt: How do I know if a trail is appropriate for my... Response captured (267 tokens) ================================================================================ Results saved to: experiments/optimized-concise/agent-responses.json Total tests: 5 Total tokens used: 1507 -
Review the captured responses:
cat experiments/optimized-concise/agent-responses.jsonThe JSON file contains structured data with test names, prompts, responses, and token usage.
Score responses manually
Review the agent responses and create an evaluation CSV with quality scores.
-
Create a new file
experiments/optimized-concise/evaluation.csv:New-Item experiments/optimized-concise/evaluation.csv -
Open the file in VS Code and verify or add the CSV header and scores:
test_prompt,agent_response_excerpt,intent_resolution,relevance,groundedness,comments day-hike-gear,"Essential gear for summer moderate terrain day hikes...",5,5,4,Maintains quality with reduced verbosity overnight-camping,"First overnight camping essentials...",4,5,5,More direct while preserving safety focus three-day-backpacking,"October mountain backpacking safety priorities...",5,5,5,Excellent concise safety guidance winter-hiking,"Winter vs summer gear additions...",5,4,4,Clear comparison without excess detail trail-difficulty,"Beginner trail difficulty assessment...",5,5,5,Helpful and actionable guidanceScoring criteria (1-5 scale):
- Intent Resolution: Did the response address what the user asked?
- Relevance: Is the response on-topic and appropriate?
- Groundedness: Are claims factually accurate?
Tip: Align your evaluation format with what Microsoft Foundry portal exports. Consistent criteria across manual testing, portal evaluations, and automated testing makes it easy to consolidate results.
-
Commit your experiment results:
git add experiments/optimized-concise/ git commit -m "Complete optimized-concise experiment with evaluation" git tag experiment-1-optimized-concise
Run optimization experiment 2: Model comparison
Test the same optimized prompt (v4) with the GPT-4.1-mini model to explore cost vs. quality tradeoffs.
Investigation goal
Determine if GPT-4.1-mini can maintain acceptable quality while providing additional cost savings beyond prompt optimization alone.
Run the experiment
-
Create a new experiment branch:
git checkout main git checkout -b experiment/gpt41mini -
Modify
src/agents/trail_guide_agent/trail_guide_agent.pyto use GPT-4.1-mini model.Update line 23 to change the model:
# Change from: model=os.getenv("MODEL_NAME", "gpt-4.1"), # To: model="gpt-4.1-mini",The prompt file should still point to v4_optimized_concise.txt (keep line 14 as-is).
-
Commit this configuration:
git add src/agents/trail_guide_agent/trail_guide_agent.py git commit -m "Configure agent to use GPT-4.1-mini model with v4 prompt" -
Deploy the agent with GPT-4.1-mini:
python src/agents/trail_guide_agent/trail_guide_agent.py -
Run batch tests with the same test prompts:
python src/tests/run_batch_tests.py gpt41mini -
Create your evaluation CSV at
experiments/gpt41mini/evaluation.csv:test_prompt,agent_response_excerpt,intent_resolution,relevance,groundedness,depth,comments day-hike-gear,"For summer day hikes on moderate terrain...",4,4,4,3,Good coverage but less detailed than GPT-4 overnight-camping,"First overnight camping essentials include...",4,4,4,3,Solid advice, slightly less nuanced three-day-backpacking,"October mountain safety priorities...",4,4,4,4,Good safety guidance, concise winter-hiking,"Winter hiking requires additional gear...",4,4,4,3,Clear comparison, adequate detail trail-difficulty,"To assess trail difficulty as a beginner...",4,4,4,4,Practical framework provided -
Commit experiment results:
git add experiments/gpt41mini/ git commit -m "Complete GPT-4.1-mini experiment with evaluation" git tag experiment-2-gpt41miniNote: Keep the model change in trail_guide_agent.py committed on this branch. When you switch back to main, the script will revert to GPT-4.
Compare experiments and decide
After completing baseline and both optimization experiments, use your CSV data to make evidence-based decisions about which variant to promote to production.
Review all experiment branches
-
List all experiment branches:
git branch | Select-String "experiment/"You should see:
experiment/baseline experiment/gpt41mini experiment/optimized-concise -
For each branch, open its
experiments/[name]/evaluation.csvin VS Code or Excel.
Compare results across experiments
Use the CSV files to compare experiments side-by-side.
-
Open all three CSV files for comparison:
code experiments/baseline/evaluation.csv code experiments/optimized-concise/evaluation.csv code experiments/gpt41mini/evaluation.csv -
Review the scores across all three experiments to compare quality:
Look for patterns in the CSV data:
- Which experiment has consistently higher scores?
- Are there specific test prompts where quality differs significantly?
- How do token counts compare across experiments?
-
Make your decision based on the data:
Winner:
optimized-concise(v4 prompt with GPT-4)Rationale:
- Maintains or improves quality across test cases
- Significant token reduction (42% fewer tokens)
- Better cost-to-quality ratio than GPT-4.1-mini
Alternative consideration: GPT-4.1-mini could be used for simple, high-volume queries if cost is critical.
Merge winning experiment
Use Git to merge the winning experiment to main.
-
Checkout main branch:
git checkout main -
Merge the winning experiment:
```powershell git merge experiment/optimized-concise -m “Merge optimized-concise prompt experiment
Approved optimization reduces prompt tokens by 42% while maintaining quality.
- Validated with 5 production test cases
- Quality criteria met across all test prompts
- Cost savings: 42% reduction in completion tokens
- Ready for production deployment
Evaluation results: experiments/optimized-concise/evaluation.csv” ```
-
Update the trail_guide_agent.py to use the new prompt:
Open
src/agents/trail_guide_agent/trail_guide_agent.pyand update line 14 to use the new prompt:# Change from: prompt_file = Path(__file__).parent / 'prompts' / 'v3_instructions.txt' # To: prompt_file = Path(__file__).parent / 'prompts' / 'v4_optimized_concise.txt' -
Commit the updated agent script:
git add src/agents/trail_guide_agent/trail_guide_agent.py git commit -m "Update agent to use optimized-concise prompt (v4)" -
Create a production release tag:
```powershell git tag -a v4-optimized-prompt -m “Release v4: Token-optimized prompts
Changes:
- Deployed v4_optimized_concise.txt prompt variant
- 42% token reduction validated
- Quality maintained across all test scenarios
Migration: Update trail_guide_agent.py to use v4_optimized_concise.txt” ```
-
Push changes to GitHub:
git push origin main git push origin experiment/baseline git push origin experiment/optimized-concise git push origin experiment/gpt41mini git push --tagsYour experiments are now saved with full history on GitHub, making them easy to review and reference.
Next steps
- Continue to Lab 04: Automated Evaluation to scale your testing with automated evaluators
- Explore Lab 05: Monitoring to track production prompt performance
- Review Lab 06: Tracing to debug and optimize agent behavior