Develop a multimodal generative AI app

In this exercise, you use the Phi-4-multimodal-instruct generative AI model to generate responses to prompts that include text, images, and audio. You’ll develop an app that provides AI assistance with fresh produce in a grocery store by using Azure AI Foundry and the Azure AI Model Inference service.

This exercise takes approximately 30 minutes.

Note: This exercise is based on pre-release SDKs, which may be subject to change. Where necessary, we’ve used specific versions of packages; which may not reflect the latest available versions.

Create an Azure AI Foundry project

Let’s start by creating an Azure AI Foundry project.

  1. In a web browser, open the Azure AI Foundry portal at https://ai.azure.com and sign in using your Azure credentials. Close any tips or quick start panes that are opened the first time you sign in, and if necessary use the Azure AI Foundry logo at the top left to navigate to the home page, which looks similar to the following image:

    Screenshot of Azure AI Foundry portal.

  2. In the home page, select + Create project.
  3. In the Create a project wizard, enter a suitable project name for (for example, my-ai-project) then review the Azure resources that will be automatically created to support your project.
  4. Select Customize and specify the following settings for your hub:
    • Hub name: A unique name - for example my-ai-hub
    • Subscription: Your Azure subscription
    • Resource group: Create a new resource group with a unique name (for example, my-ai-resources), or select an existing one
    • Location: Select any of the following regions*:
      • East US
      • East US 2
      • North Central US
      • South Central US
      • Sweden Central
      • West US
      • West US 3
    • Connect Azure AI Services or Azure OpenAI: Create a new AI Services resource with an appropriate name (for example, my-ai-services) or use an existing one
    • Connect Azure AI Search: Skip connecting

    * At the time of writing, the Microsoft Phi-4-multimodal-instruct model we’re going to use in this exercise is available in these regions. You can check the latest regional availability for specific models in the Azure AI Foundry documentation. In the event of a regional quota limit being reached later in the exercise, there’s a possibility you may need to create another resource in a different region.

  5. Select Next and review your configuration. Then select Create and wait for the process to complete.
  6. When your project is created, close any tips that are displayed and review the project page in Azure AI Foundry portal, which should look similar to the following image:

    Screenshot of a Azure AI project details in Azure AI Foundry portal.

Deploy a model

Now you’re ready to deploy a Phi-4-multimodal-instruct model to support multimodal prompts.

  1. In the toolbar at the top right of your Azure AI Foundry project page, use the Preview features icon to enable the Deploy models to Azure AI model inference service feature. This feature ensures your model deployment is available to the Azure AI Inference service, which you’ll use in your application code.
  2. In the pane on the left for your project, in the My assets section, select the Models + endpoints page.
  3. In the Models + endpoints page, in the Model deployments tab, in the + Deploy model menu, select Deploy base model.
  4. Search for the Phi-4-multimodal-instruct model in the list, and then select and confirm it.
  5. Agree to the license agreement if prompted, and then deploy the model with the following settings by selecting Customize in the deployment details:
    • Deployment name: A unique name for your model deployment - for example Phi-4-multimodal (remember the name you assign, you’ll need it later)
    • Deployment type: Global Standard
    • Deployment details: Use the default settings
  6. Wait for the deployment provisioning state to be Completed.

Create a client application

Now that you’ve deployed the model, you can use the deployment in a client application.

Tip: You can choose to develop your solution using Python or Microsoft C# (coming soon). Follow the instructions in the appropriate section for your chosen language.

Prepare the application configuration

  1. In the Azure AI Foundry portal, view the Overview page for your project.
  2. In the Project details area, note the Project connection string. You’ll use this connection string to connect to your project in a client application.
  3. Open a new browser tab (keeping the Azure AI Foundry portal open in the existing tab). Then in the new tab, browse to the Azure portal at https://portal.azure.com; signing in with your Azure credentials if prompted.
  4. Use the [>_] button to the right of the search bar at the top of the page to create a new Cloud Shell in the Azure portal, selecting a PowerShell environment. The cloud shell provides a command line interface in a pane at the bottom of the Azure portal.

    Note: If you have previously created a cloud shell that uses a Bash environment, switch it to PowerShell.

  5. In the cloud shell toolbar, in the Settings menu, select Go to Classic version (this is required to use the code editor).

  6. In the PowerShell pane, enter the following commands to clone the GitHub repo for this exercise:

     rm -r mslearn-ai-foundry -f
     git clone https://github.com/microsoftlearning/mslearn-ai-studio mslearn-ai-foundry
    

    Tip: As you paste commands into the cloudshell, the ouput may take up a large amount of the screen buffer. You can clear the screen by entering the cls command to make it easier to focus on each task.

  7. After the repo has been cloned, navigate to the folder containing the application code files:

    Python

    cd mslearn-ai-foundry/labfiles/multimodal/python
    

    C#

    cd mslearn-ai-foundry/labfiles/multimodal/c-sharp
    
  8. In the cloud shell command line pane, enter the following command to install the libraries you’ll use:

    Python

    pip install python-dotenv azure-identity azure-ai-projects azure-ai-inference
    

    C#

    dotnet add package Azure.Identity
    dotnet add package Azure.AI.Projects --version 1.0.0-beta.3
    dotnet add package Azure.AI.Inference --version 1.0.0-beta.3
    
  9. Enter the following command to edit the configuration file that has been provided:

    Python

    code .env
    

    C#

    code appsettings.json
    

    The file is opened in a code editor.

  10. In the code file, replace the your_project_endpoint placeholder with the connection string for your project (copied from the project Overview page in the Azure AI Foundry portal), and the your_model_deployment placeholder with the name you assigned to your Phi-4-multimodal-instruct model deployment.
  11. After you’ve replaced the placeholders, within the code editor, use the CTRL+S command or Right-click > Save to save your changes and then use the CTRL+Q command or Right-click > Quit to close the code editor while keeping the cloud shell command line open.

Write code to connect to your project and get a chat client for your model

Tip: As you add code, be sure to maintain the correct indentation.

  1. Enter the following command to edit the code file that has been provided:

    Python

    code chat-app.py
    

    C#

    code Program.cs
    
  2. In the code file, note the existing statements that have been added at the top of the file to import the necessary SDK namespaces. Then, under the comment Add references, add the following code to reference the namespaces in the libraries you installed previously:

    Python

    from dotenv import load_dotenv
    from azure.identity import DefaultAzureCredential
    from azure.ai.projects import AIProjectClient
    from azure.ai.inference.models import (
        SystemMessage,
        UserMessage,
        TextContentItem,
        ImageContentItem,
        ImageUrl,
        AudioContentItem,
        InputAudio,
        AudioContentFormat,
    )
    

    C#

    using Azure.Identity;
    using Azure.AI.Projects;
    using Azure.AI.Inference;
    
  3. In the main function, under the comment Get configuration settings, note that the code loads the project connection string and model deployment name values you defined in the configuration file.
  4. Under the comment Initialize the project client, add the following code to connect to your Azure AI Foundry project using the Azure credentials you are currently signed in with:

    Python

    project_client = AIProjectClient.from_connection_string(
         conn_str=project_connection,
         credential=DefaultAzureCredential())
    

    C#

    var projectClient = new AIProjectClient(project_connection,
                         new DefaultAzureCredential());
    
  5. Under the comment Get a chat client, add the following code to create a client object for chatting with your model:

    Python

    chat_client = project_client.inference.get_chat_completions_client(model=model_deployment)
    

    C#

    ChatCompletionsClient chat = projectClient.GetChatCompletionsClient();
    

Write code to use a text-based prompt

  1. Note that the code includes a loop to allow a user to input a prompt until they enter “quit”. Then in the loop section, under the comment Get a response to text input, add the following code to submit a text-based prompt and retrieve the response from your model:

    Python

    response = chat_client.complete(
        messages=[
            SystemMessage(system_message),
            UserMessage(content=[TextContentItem(text= prompt)])
        ])
    print(response.choices[0].message.content)
    

    C#

    var requestOptions = new ChatCompletionsOptions()
    {
    Model = model_deployment,
    Messages =
        {
            new ChatRequestSystemMessage(system_message),
            new ChatRequestUserMessage(prompt),
        }
    };
    
    Response<ChatCompletions> response = chat.Complete(requestOptions);
    Console.WriteLine(response.Value.Content);
    
  2. Use the CTRL+S command to save your changes to the code file - don’t close it yet though.

  3. In the cloud shell command line pane beneath the code editor, enter the following command to run the app:

    Python

    python chat-app.py
    

    C#

    dotnet run
    
  4. When prompted, enter 1 to use a text-based prompt and then enter the prompt I want to make an apple pie. What kind of apple should I use?
  5. Review the response. Then enter quit to exit the program.

Write code to use an image-based prompt

  1. In the code editor for the chat-app.py file, in the loop section, under the comment Get a response to image input, add the following code to submit a prompt that includes the following image:

    A photo of an orange.

    Python

    image_url = "https://github.com/microsoftlearning/mslearn-ai-studio/raw/refs/heads/main/labfiles/multimodal/orange.jpg"
    image_format = "jpeg"
    request = Request(image_url, headers={"User-Agent": "Mozilla/5.0"})
    image_data = base64.b64encode(urlopen(request).read()).decode("utf-8")
    data_url = f"data:image/{image_format};base64,{image_data}"
    
    response = chat_client.complete(
        messages=[
            SystemMessage(system_message),
            UserMessage(content=[
                TextContentItem(text=prompt),
                ImageContentItem(image_url=ImageUrl(url=data_url))
            ]),
        ]
    )
    print(response.choices[0].message.content)
    

    C#

      string imageUrl = "https://github.com/microsoftlearning/mslearn-ai-studio/raw/refs/heads/main/labfiles/multimodal/orange.jpg";
    ChatCompletionsOptions requestOptions = new ChatCompletionsOptions()
    {
        Messages = {
            new ChatRequestSystemMessage(system_message),
            new ChatRequestUserMessage([
                new ChatMessageTextContentItem(prompt),
                new ChatMessageImageContentItem(new Uri(imageUrl))
            ]),
        },
        Model = model_deployment
    };
    var response = chat.Complete(requestOptions);
    Console.WriteLine(response.Value.Content);
    
  2. Use the CTRL+S command to save your changes to the code file - don’t close it yet though.

  3. In the cloud shell command line pane beneath the code editor, enter the following command to run the app:

    Python

    python chat-app.py
    

    C#

    dotnet run
    
  4. When prompted, enter 2 to use an image-based prompt and then enter the prompt I don't know what kind of fruit this is. Can you identify it, and tell me what kinds of food I could make with it?
  5. Review the response. Then enter quit to exit the program.

Write code to use an audio-based prompt

  1. In the code editor for the chat-app.py file, in the loop section, under the comment Get a response to audio input, add the following code to submit a prompt that includes the following audio:

    Python

    file_path="https://github.com/microsoftlearning/mslearn-ai-studio/raw/refs/heads/main/labfiles/multimodal/manzanas.mp3"
    response = chat_client.complete(
            messages=[
                SystemMessage(system_message),
                UserMessage(
                    [
                        TextContentItem(text=prompt),
                        {
                            "type": "audio_url",
                            "audio_url": {"url": file_path}
                        }
                    ]
                )
            ]
        )
    print(response.choices[0].message.content)
    

    C#

    string audioUrl="https://github.com/microsoftlearning/mslearn-ai-studio/raw/refs/heads/main/labfiles/multimodal/manzanas.mp3";
    var requestOptions = new ChatCompletionsOptions()
    {
        Messages =
        {
            new ChatRequestSystemMessage(system_message),
            new ChatRequestUserMessage(
                new ChatMessageTextContentItem(prompt),
                new ChatMessageAudioContentItem(new Uri(audioUrl))),
        },
        Model = model_deployment
    };
    var response = chat.Complete(requestOptions);
    Console.WriteLine(response.Value.Content);
    
  2. Use the CTRL+S command to save your changes to the code file. You can also close the code editor (CTRL+Q) if you like.

  3. In the cloud shell command line pane beneath the code editor, enter the following command to run the app:

    Python

    python chat-app.py
    

    C#

    dotnet run
    
  4. When prompted, enter 3 to use an audio-based prompt and then enter the prompt What is this customer saying in English?
  5. Review the response.
  6. You can continue to run the app, choosing different prompt types and trying different prompts. When you’re finished, enter quit to exit the program.

    If you have time, you can modify the code to use a different system prompt and your own internet-accessible image and audio files.

    Note: In this simple app, we haven’t implemented logic to retain conversation history; so the model will treat each prompt as a new request with no context of the previous prompt.

Summary

In this exercise, you used Azure AI Foundry and the Azure AI Inference SDK to create a client application uses a multimodal model to generate responses to text, images, and audio.

Clean up

If you’ve finished exploring Azure AI Foundry, you should delete the resources you have created in this exercise to avoid incurring unnecessary Azure costs.

  1. Return to the browser tab containing the Azure portal (or re-open the Azure portal at https://portal.azure.com in a new browser tab) and view the contents of the resource group where you deployed the resources used in this exercise.
  2. On the toolbar, select Delete resource group.
  3. Enter the resource group name and confirm that you want to delete it.