Translate Speech
Azure AI Speech includes a speech translation API that you can use to translate spoken language. For example, suppose you want to develop a translator application that people can use when traveling in places where they don’t speak the local language. They would be able to say phrases such as “Where is the station?” or “I need to find a pharmacy” in their own language, and have it translate them to the local language.
NOTE This exercise requires that you are using a computer with speakers/headphones. For the best experience, a microphone is also required. Some hosted virtual environments may be able to capture audio from your local microphone, but if this doesn’t work (or you don’t have a microphone at all), you can use a provided audio file for speech input. Follow the instructions carefully, as you’ll need to choose different options depending on whether you are using a microphone or the audio file.
Provision an Azure AI Speech resource
If you don’t already have one in your subscription, you’ll need to provision an Azure AI Speech resource.
- Open the Azure portal at
https://portal.azure.com
, and sign in using the Microsoft account associated with your Azure subscription. - In the search field at the top, search for Azure AI services and press Enter, then select Create under Speech service in the results.
- Create a resource with the following settings:
- Subscription: Your Azure subscription
- Resource group: Choose or create a resource group
- Region: Choose any available region
- Name: Enter a unique name
- Pricing tier: Select F0 (free), or S (standard) if F is not available.
- Responsible AI Notice: Agree.
- Select Review + create, the select Create to provision the resource.
- Wait for deployment to complete, and then go to the deployed resource.
- View the Keys and Endpoint page. You will need the information on this page later in the exercise.
Prepare to develop an app in Visual Studio Code
You’ll develop your speech app using Visual Studio Code. The code files for your app have been provided in a GitHub repo.
Tip: If you have already cloned the mslearn-ai-language repo, open it in Visual Studio code. Otherwise, follow these steps to clone it to your development environment.
- Start Visual Studio Code.
- Open the palette (SHIFT+CTRL+P) and run a Git: Clone command to clone the
https://github.com/MicrosoftLearning/mslearn-ai-language
repository to a local folder (it doesn’t matter which folder). -
When the repository has been cloned, open the folder in Visual Studio Code.
Note: If Visual Studio Code shows you a pop-up message to prompt you to trust the code you are opening, click on Yes, I trust the authors option in the pop-up.
-
Wait while additional files are installed to support the C# code projects in the repo.
Note: If you are prompted to add required assets to build and debug, select Not Now.
Configure your application
Applications for both C# and Python have been provided. Both apps feature the same functionality. First, you’ll complete some key parts of the application to enable it to use your Azure AI Speech resource.
- In Visual Studio Code, in the Explorer pane, browse to the Labfiles/08-speech-translation folder and expand the CSharp or Python folder depending on your language preference and the translator folder it contains. Each folder contains the language-specific code files for an app into which you’re you’re going to integrate Azure AI Speech functionality.
-
Right-click the translator folder containing your code files and open an integrated terminal. Then install the Azure AI Speech SDK package by running the appropriate command for your language preference:
C#
dotnet add package Microsoft.CognitiveServices.Speech --version 1.30.0
Python
pip install azure-cognitiveservices-speech==1.30.0
-
In the Explorer pane, in the translator folder, open the configuration file for your preferred language
- C#: appsettings.json
- Python: .env
-
Update the configuration values to include the region and a key from the Azure AI Speech resource you created (available on the Keys and Endpoint page for your Azure AI Speech resource in the Azure portal).
NOTE: Be sure to add the region for your resource, not the endpoint!
- Save the configuration file.
Add code to use the Speech SDK
-
Note that the translator folder contains a code file for the client application:
- C#: Program.cs
- Python: translator.py
Open the code file and at the top, under the existing namespace references, find the comment Import namespaces. Then, under this comment, add the following language-specific code to import the namespaces you will need to use the Azure AI Speech SDK:
C#: Program.cs
// Import namespaces using Microsoft.CognitiveServices.Speech; using Microsoft.CognitiveServices.Speech.Audio; using Microsoft.CognitiveServices.Speech.Translation;
Python: translator.py
# Import namespaces import azure.cognitiveservices.speech as speech_sdk
-
In the Main function, note that code to load the Azure AI Speech service key and region from the configuration file has already been provided. You must use these variables to create a SpeechTranslationConfig for your Azure AI Speech resource, which you will use to translate spoken input. Add the following code under the comment Configure translation:
C#: Program.cs
// Configure translation translationConfig = SpeechTranslationConfig.FromSubscription(aiSvcKey, aiSvcRegion); translationConfig.SpeechRecognitionLanguage = "en-US"; translationConfig.AddTargetLanguage("fr"); translationConfig.AddTargetLanguage("es"); translationConfig.AddTargetLanguage("hi"); Console.WriteLine("Ready to translate from " + translationConfig.SpeechRecognitionLanguage);
Python: translator.py
# Configure translation translation_config = speech_sdk.translation.SpeechTranslationConfig(ai_key, ai_region) translation_config.speech_recognition_language = 'en-US' translation_config.add_target_language('fr') translation_config.add_target_language('es') translation_config.add_target_language('hi') print('Ready to translate from',translation_config.speech_recognition_language)
-
You will use the SpeechTranslationConfig to translate speech into text, but you will also use a SpeechConfig to synthesize translations into speech. Add the following code under the comment Configure speech:
C#: Program.cs
// Configure speech speechConfig = SpeechConfig.FromSubscription(aiSvcKey, aiSvcRegion);
Python: translator.py
# Configure speech speech_config = speech_sdk.SpeechConfig(ai_key, ai_region)
-
Save your changes and return to the integrated terminal for the translator folder, and enter the following command to run the program:
C#
dotnet run
Python
python translator.py
-
If you are using C#, you can ignore any warnings about using the await operator in asynchronous methods - we’ll fix that later. The code should display a message that it is ready to translate from en-US and prompt you for a target language. Press ENTER to end the program.
Implement speech translation
Now that you have a SpeechTranslationConfig for the Azure AI Speech service, you can use the Azure AI Speech translation API to recognize and translate speech.
IMPORTANT: This section includes instructions for two alternative procedures. Follow the first procedure if you have a working microphone. Follow the second procedure if you want to simulate spoken input by using an audio file.
If you have a working microphone
- In the Main function for your program, note that the code uses the Translate function to translate spoken input.
-
In the Translate function, under the comment Translate speech, add the following code to create a TranslationRecognizer client that can be used to recognize and translate speech using the default system microphone for input.
C#: Program.cs
// Translate speech using AudioConfig audioConfig = AudioConfig.FromDefaultMicrophoneInput(); using TranslationRecognizer translator = new TranslationRecognizer(translationConfig, audioConfig); Console.WriteLine("Speak now..."); TranslationRecognitionResult result = await translator.RecognizeOnceAsync(); Console.WriteLine($"Translating '{result.Text}'"); translation = result.Translations[targetLanguage]; Console.OutputEncoding = Encoding.UTF8; Console.WriteLine(translation);
Python: translator.py
# Translate speech audio_config = speech_sdk.AudioConfig(use_default_microphone=True) translator = speech_sdk.translation.TranslationRecognizer(translation_config, audio_config = audio_config) print("Speak now...") result = translator.recognize_once_async().get() print('Translating "{}"'.format(result.text)) translation = result.translations[targetLanguage] print(translation)
NOTE The code in your application translates the input to all three languages in a single call. Only the translation for the specific language is displayed, but you could retrieve any of the translations by specifying the target language code in the translations collection of the result.
- Now skip ahead to the Run the program section below.
Alternatively, use audio input from a file
-
In the terminal window, enter the following command to install a library that you can use to play the audio file:
C#: Program.cs
dotnet add package System.Windows.Extensions --version 4.6.0
Python: translator.py
pip install playsound==1.3.0
-
In the code file for your program, under the existing namespace imports, add the following code to import the library you just installed:
C#: Program.cs
using System.Media;
Python: translator.py
from playsound import playsound
-
In the Main function for your program, note that the code uses the Translate function to translate spoken input. Then in the Translate function, under the comment Translate speech, add the following code to create a TranslationRecognizer client that can be used to recognize and translate speech from a file.
C#: Program.cs
// Translate speech string audioFile = "station.wav"; SoundPlayer wavPlayer = new SoundPlayer(audioFile); wavPlayer.Play(); using AudioConfig audioConfig = AudioConfig.FromWavFileInput(audioFile); using TranslationRecognizer translator = new TranslationRecognizer(translationConfig, audioConfig); Console.WriteLine("Getting speech from file..."); TranslationRecognitionResult result = await translator.RecognizeOnceAsync(); Console.WriteLine($"Translating '{result.Text}'"); translation = result.Translations[targetLanguage]; Console.OutputEncoding = Encoding.UTF8; Console.WriteLine(translation);
Python: translator.py
# Translate speech audioFile = 'station.wav' playsound(audioFile) audio_config = speech_sdk.AudioConfig(filename=audioFile) translator = speech_sdk.translation.TranslationRecognizer(translation_config, audio_config = audio_config) print("Getting speech from file...") result = translator.recognize_once_async().get() print('Translating "{}"'.format(result.text)) translation = result.translations[targetLanguage] print(translation)
Run the program
-
Save your changes and return to the integrated terminal for the translator folder, and enter the following command to run the program:
C#
dotnet run
Python
python translator.py
-
When prompted, enter a valid language code (fr, es, or hi), and then, if using a microphone, speak clearly and say “where is the station?” or some other phrase you might use when traveling abroad. The program should transcribe your spoken input and translate it to the language you specified (French, Spanish, or Hindi). Repeat this process, trying each language supported by the application. When you’re finished, press ENTER to end the program.
The TranslationRecognizer gives you around 5 seconds to speak. If it detects no spoken input, it produces a “No match” result. The translation to Hindi may not always be displayed correctly in the Console window due to character encoding issues.
NOTE: The code in your application translates the input to all three languages in a single call. Only the translation for the specific language is displayed, but you could retrieve any of the translations by specifying the target language code in the translations collection of the result.
Synthesize the translation to speech
So far, your application translates spoken input to text; which might be sufficient if you need to ask someone for help while traveling. However, it would be better to have the translation spoken aloud in a suitable voice.
-
In the Translate function, under the comment Synthesize translation, add the following code to use a SpeechSynthesizer client to synthesize the translation as speech through the default speaker:
C#: Program.cs
// Synthesize translation var voices = new Dictionary<string, string> { ["fr"] = "fr-FR-HenriNeural", ["es"] = "es-ES-ElviraNeural", ["hi"] = "hi-IN-MadhurNeural" }; speechConfig.SpeechSynthesisVoiceName = voices[targetLanguage]; using SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig); SpeechSynthesisResult speak = await speechSynthesizer.SpeakTextAsync(translation); if (speak.Reason != ResultReason.SynthesizingAudioCompleted) { Console.WriteLine(speak.Reason); }
Python: translator.py
# Synthesize translation voices = { "fr": "fr-FR-HenriNeural", "es": "es-ES-ElviraNeural", "hi": "hi-IN-MadhurNeural" } speech_config.speech_synthesis_voice_name = voices.get(targetLanguage) speech_synthesizer = speech_sdk.SpeechSynthesizer(speech_config) speak = speech_synthesizer.speak_text_async(translation).get() if speak.reason != speech_sdk.ResultReason.SynthesizingAudioCompleted: print(speak.reason)
-
Save your changes and return to the integrated terminal for the translator folder, and enter the following command to run the program:
C#
dotnet run
Python
python translator.py
-
When prompted, enter a valid language code (fr, es, or hi), and then speak clearly into the microphone and say a phrase you might use when traveling abroad. The program should transcribe your spoken input and respond with a spoken translation. Repeat this process, trying each language supported by the application. When you’re finished, press ENTER to end the program.
NOTE In this example, you’ve used a SpeechTranslationConfig to translate speech to text, and then used a SpeechConfig to synthesize the translation as speech. You can in fact use the SpeechTranslationConfig to synthesize the translation directly, but this only works when translating to a single language, and results in an audio stream that is typically saved as a file rather than sent directly to a speaker.
More information
For more information about using the Azure AI Speech translation API, see the Speech translation documentation.