Use Azure Speech in an agent
Azure Speech in Foundry Tools provides an MCP server that you can use to enable and agent to call its speech recognition and synthesis capabilities.
In this exercise, you'll configure the Azure Speech in Foundry Tools MCP server, and connect it to an agent.
Tip: The code used in this exercise is based on the for Microsoft Foundry SDK for Python. You can develop similar solutions using the SDKs for Microsoft .NET, JavaScript, and Java. Refer to Microsoft Foundry SDK client libraries for details.
This exercise takes approximately 30 minutes.
Prerequisites
Before starting this exercise, ensure you have:
- An active Azure subscription
- Visual Studio Code installed
- Python version 3.13 or higher installed
- Git installed and configured
Create an Azure storage account
The Azure Speech MCP server uses an Azure storage account to save generated audio files.
-
Open the Azure portal at
https://portal.azure.com, and sign in using your Azure credentials. -
Create a new Azure storage account resource with the following settings:
- Subscription: Your subscription
- Resource group: Create or select a resource group
- Storage account name: A unique name for your storage account
- Region: Any available region
- Preferred storage type: Azure blob storage or Azure Data Lake Storage Gen2
- Primary workload: Cloud native
- Performance: Standard
- Redundancy: Locally-redundant storage (LRS)
-
When the Azure storage account resource has been created, go to it in the portal.
-
In the left navigation pane for the storage account, expand Data storage, and select Containers.
-
Add a new container named files. This is where your agent will save the audio files it generates.
-
In the context menu (...) for the files container, select Generate SAS, and create a SAS token with the following details:
- Signing method : Account key
- Signing key: Key1
- Stored access policy: None
- Permissions:
- Read
- Add
- Create
- Write
- List
- Start and expiry date/time:
- Start: The current date and time
- Expiry: 11:59pm tomorrow
- Allowed IP addresses: Leave blank
- Allowed protocols: HTTPS only
IMPORTANT: Copy the generated SAS token and URL, and store them in a text file for now - you'll need them later!
Create a Microsoft Foundry project
Microsoft Foundry uses projects to organize models, resources, data, and other assets used to develop an AI solution.
-
In a web browser, open the Microsoft Foundry portal at
https://ai.azure.comand sign in using your Azure credentials. Close any tips or quick start panes that are opened the first time you sign in, and if necessary use the Foundry logo at the top left to navigate to the home page. -
If it is not already enabled, in the tool bar the top of the page, enable the New Foundry option. Then, if prompted, create a new project with a unique name; expanding the Advanced options area to specify the following settings for your project:
- Foundry resource: Use the default name for your resource (usually {project_name}-resource)
- Subscription: Your Azure subscription
- Resource group: Create or select a resource group
- Region: Select any of the AI Foundry recommended regions
TIP: Remember (or make a note of) the Foundry resource name - you're going to need it later!
-
Select Create. Wait for your project to be created.
-
On the home page for your project, note the project endpoint, key, and OpenAI endpoint.
TIP: Copy the project key to the clipboard - you're going to need it later!
Create an agent
Now that you have a Foundry project, you can create an agent.
-
In the Start building menu, select Create agent; and when prompted, name the agent
Speech-Agent.When ready, your agent opens in the agent playground.
-
In the model drop-down list, ensure that a gpt-4.1 model has been deployed and selected for your agent.
-
Assign your agent the following Instructions:
You are an AI agent that uses the Azure AI Speech tool to transcribe and generates speech. -
Use the Save button to save the changes.
-
Test the agent by entering the following prompt in the Chat pane:
What can you help me with?The agent should respond with an appropriate answer based on its instructions.
Create an Azure Speech in Foundry Tools connection
Foundry includes an MCP server for Azure Speech in Foundry Tools, which you can connect to your project and use in your agent.
-
In the navigation pane on the left, select the Tools page.
-
Connect a tool; selecting Azure Speech in Foundry Tools in the Catalog and specifying the following configuration
- Foundry resource name: Enter the name of your Foundry resource (for example,
{project_name}-resource) - Authentication: Key-based
- Bearer (Ocp-Apim-Subscription-Key): The key for your Foundry project
- X-Blob-Container-Url: The SAS URL for your storage container
- Foundry resource name: Enter the name of your Foundry resource (for example,
-
Wait for the MCP tool connection to be created, and then view its details page.
-
On the details page for the Azure Speech in Foundry Tools connection, select Use in an agent, and then select the Speech-Agent agent you created previously.
The agent should open in the playground, with the Azure Speech in Foundry Tools tool connected.
Test the Azure Speech tool in the playground
Now let's test the agent's ability to use the tool you connected.
-
In the agent playground for the Speech-Agent agent, enter the following prompt:
Generate "To be or not to be, that is the question." as speech -
When prompted, approve use of the Azure Speech tool by selecting Always approve all Azure Speech MCP Server tools.
-
Review the response, which should include a link to the generated audio file. Then click the link to hear the synthesized speech.
-
Enter the following prompt:
Transcribe the file at https://microsoftlearning.github.io/mslearn-ai-language/Labfiles/05-speech-tool/speech_1.wav -
If prompted, approve use of the Azure Speech tool by selecting Always approve all Azure Speech MCP Server tools.
-
Review the output, which should be a transcription of the audio file.
Create a client application
Now that you have a working agent, you can create a client application that uses it.
Get the application files from GitHub
-
Open Visual Studio Code.
-
Open the command palette (Ctrl+Shift+P) and use the
Git:clonecommand to clone thehttps://github.com/microsoftlearning/mslearn-ai-languagerepo to a local folder (it doesn't matter which one). Then open it.You may be prompted to confirm you trust the authors.
-
After the repo has been cloned, in the Explorer pane, navigate to the folder containing the application code files at /Labfiles/05-speech-tool/Python/speech-client. The application files include:
- .env (the application configuration file)
- requirements.txt (the Python package dependencies that need to be installed)
- speech-client.py (the code file for the application)
Configure the application
-
In Visual Studio Code, view the Extensions pane; and if it is not already installed, install the Python extension.
-
In the Command Palette, use the command
python:select interpreter. Then select an existing environment if you have one, or create a new Venv environment based on your Python 3.1x installation.Tip: If you are prompted to install dependencies, you can install the ones in the requirements.txt file in the /Labfiles/05-speech-tool/Python/speech-client folder; but it's OK if you don't - we'll install them later!
-
In the Explorer pane, right-click the text-agent folder containing the application files, and select Open in integrated terminal (or open a terminal in the Terminal menu and navigate to the /Labfiles/05-speech-tool/Python/speech-client folder.)
Note: Opening the terminal in Visual Studio Code will automatically activate the Python environment. You may need to enable running scripts on your system.
-
Ensure that the terminal is open in the speech-client folder with the prefix (.venv) to indicate that the Python environment you created is active.
-
Install the Foundry SDK package, the Azure Identity package, and other required packages by running the following command:
pip install -r requirements.txt azure-identity --pre azure-ai-projects==2.0.0b4 -
In the Explorer pane, in the speech-client folder, select the .env file to open it. Then update the configuration values to include your project endpoint (from the project home page in Foundry Portal)and the name of your agent (which should be Speech-Agent - note that this name is case-sensitive).
-
Save the modified configuration file.
Implement application code
-
In the Explorer pane, in the speech-client folder, open the speech-client.py file.
-
Review the existing code. You will add code to submit prompts to your agent.
Tip: As you add code to the code file, be sure to maintain the correct indentation.
-
At the top of the code file, under the existing namespace references, find the comment Import namespaces and add the following code to import the namespaces you will need:
# import namespaces from azure.identity import DefaultAzureCredential from azure.ai.projects import AIProjectClient -
In the main function, note that code to load the endpoint and key from the configuration file has already been provided. Then find the comment Get project client, and add the following code to create a client for your Foundry project:
# Get project client project_client = AIProjectClient( endpoint=foundry_endpoint, credential=DefaultAzureCredential(), ) -
Find the comment Get an OpenAI client, and add the following code to get an OpenAI client with which to call your agent.
# Get an OpenAI client openai_client = project_client.get_openai_client() -
Find the comment Use the agent to get a response, and add the following code to submit a user prompt to your agent, and display the response.
# Use the agent to get a response prompt = input("User prompt (or 'quit'): ") response = openai_client.responses.create( input=[{"role": "user", "content": prompt}], extra_body={"agent_reference": {"name": agent_name, "type": "agent_reference"}}, ) print(f"{agent_name}: {response.output_text}") -
Save the changes you made to the code file.
Test the client application
Now let's test the application by running it in a Python environment and authenticating the connection to your project.
-
In the Visual Studio Code terminal, enter the following command to sign into Azure
az loginWhen prompted, sign into Azure using your credentials.
Note: In most scenarios, just using az login will be sufficient. However, if you have subscriptions in multiple tenants, you may need to specify the tenant by using the --tenant parameter. See Sign into Azure interactively using the Azure CLI for details.
-
In the Visual Studio Code terminal, confirm the details of your Azure subscription; and then enter the following command to run the client application:
python speech-client.py -
When prompted, enter the following prompt:
Synthesize "Better a witty fool, than a foolish wit!" as speech using the voice "en-GB-SoniaNeural". -
Review the response, which should include a clickable link to a generated audio file.
-
After checking out the generated audio file, enter the following prompt:
Transcribe https://microsoftlearning.github.io/mslearn-ai-language/Labfiles/05-speech-tool/speech_2.wav -
Review the response.
Clean up resources
If you're finished exploring the Azure AI Language service, you can delete the resources you created in this exercise. Here's how:
- In the Azure portal, browse to the Foundry resource you created in this lab.
- On the resource page, select Delete and follow the instructions to delete the resource.