Develop a vision-enabled chat app

In this exercise, you use a generative AI model to generate responses to prompts that include images. You’ll develop an app that provides AI assistance with fresh produce in a grocery store by using Microsoft Foundry and the OpenAI SDK.

Note: This exercise is based on pre-release SDK software, which may be subject to change. Where necessary, we’ve used specific versions of packages; which may not reflect the latest available versions. You may experience some unexpected behavior, warnings, or errors.

While this exercise is based on the OpenAI Python SDK, you can develop AI chat applications using multiple language-specific SDKs; including:

This exercise takes approximately 30 minutes.

Prerequisites

Before starting this exercise, ensure you have:

Create a Microsoft Foundry project

Microsoft Foundry uses projects to organize models, resources, data, and other assets used to develop an AI solution.

  1. In a web browser, open the Microsoft Foundry portal at https://ai.azure.com and sign in using your Azure credentials. Close any tips or quick start panes that are opened the first time you sign in, and if necessary use the Foundry logo at the top left to navigate to the home page.

  2. If it is not already enabled, in the tool bar the top of the page, enable the New Foundry option. Then, if prompted, create a new project with a unique name; expanding the Advanced options area to specify the following settings for your project:
    • Foundry resource: Use the default name for your resource (usually {project_name}-resource)
    • Subscription: Your Azure subscription
    • Resource group: Create or select a resource group
    • Region: Select any available region
  3. Select Create. Wait for your project to be created.
  4. On the home page for your project, note the project endpoint, key, and OpenAI endpoint.

    TIP: You’re going to need the Azure OpenAI endpoint later!

Deploy a model

You’ll need a model that can process image-based input.

  1. On the project home page, in the Start building menu, select Browse models to view the Microsoft Foundry model catalog.

  2. Search for and deploy the gpt-4.1-mini model using the default settings. Deployment may take a minute or so.

    Tip: Model deployments are subject to regional quotas. If you don’t have enough quota to deploy the model in your project’s region, you can use a different model - such as gpt-4.1, or gpt-4o. Alternatively, you can create a new project in a different region.

  3. When the model has been deployed, view the model playground page that is opened, in which you can chat with the model.

    TIP: Note the model deployment name (which by default should be gpt-4.1-mini) - you’ll need this later!

Test the model in the playground

Now you can test your model deployment with an image-based prompt in the chat playground.

  1. In a new browser tab, download mango.jpeg from https://microsoftlearning.github.io/mslearn-ai-vision/Labfiles/gen-ai-vision/mango.jpeg and save it to a folder on your local file system.

  2. Navigate back to the chat playground page for your model deployment in the Foundry portal.

  3. In the main chat session panel, under the chat input box, use the attach button (📎) to upload the mango.jpeg image file, and then add the text What desserts could I make with this fruit? and submit the prompt.

    Screenshot of the chat playground page.

  4. Review the response, which should hopefully provide relevant guidance for desserts you can make using a mango.

Create a client application

Now that you’ve deployed the model, you can use the deployment in a client application.

Get application files from GitHub

The initial application files you’ll need to develop the translation application are provided in a GitHub repo.

  1. Open Visual Studio Code.
  2. Open the command palette (Ctrl+Shift+P) and use the Git:clone command to clone the https://github.com/microsoftlearning/mslearn-ai-vision repo to a local folder (it doesn’t matter which one). Then open it.

    You may be prompted to confirm you trust the authors.

  3. In Visual Studio Code, view the Extensions pane; and if it is not already installed, install the Python extension.
  4. In the Command Palette, use the command python:select interpreter. Then select an existing environment if you have one, or create a new Venv environment based on your Python 3.1x installation.

    Tip: If you are prompted to install dependencies, you can install the ones in the requirements.txt file in the /labfiles/gen-ai-vision/python folder; but it’s OK if you don’t - we’ll install them later!

    Tip: If you prefer to use the terminal, you can create your Venv environment with python -m venv labenv, then activate it with \labenv\Scripts\activate.

Prepare the application configuration

  1. After the repo has been cloned, open the folder in VS Code (File > Open Folder), and navigate to the /labfiles/gen-ai-vision/python folder.

  2. In the VS Code Explorer pane, review the files in the folder:

    • .env - A configuration file for application settings.
    • image-chat-app.py - The Python code file for the image application.
    • requirements.txt - A file listing the package dependencies.
    • mystery-fruit.jpeg - An image of a fruit.
  3. In the Explorer pane, in the python folder, select the .env file to open it. Then update the configuration values to include the Azure OpenAI endpoint for your Foundry resource, and the model deployment name for the generative AI model you deployed.

    Important:Be sure to add the https://{foundry-resource-name}.openai.azure.com/openai/v1/ Azure openAI endpoint, not the project endpoint!

    Save the modified configuration file.

  4. In the Explorer pane, right-click the python folder containing the application files, and select Open in integrated terminal (or open a terminal in the Terminal menu and navigate to the /labfiles/gen-ai-vision/python folder.)

    Note: Opening the terminal in Visual Studio Code will automatically activate the Python environment. You may need to enable running scripts on your system.

  5. Ensure that the terminal is open in the /labfiles/gen-ai-vision/python folder with the prefix (.venv) to indicate that the Python environment you created is active.
  6. Install the required Python packages by running the following command:

     pip install -r requirements.txt
    

Write code to get an OpenAI chat client for your model

Tip: As you add code, be sure to maintain the correct indentation.

  1. In VS Code, open the image-chat-app.py file.

  2. In the code file, note the existing statements that have been added at the top of the file to import the necessary SDK namespaces. Then, Find the comment Add references, add the following code to reference the namespaces in the libraries you installed previously:

    # Add references
    from openai import OpenAI
    from azure.identity import DefaultAzureCredential, get_bearer_token_provider
    
  3. In the main function, under the comment Get configuration settings, note that the code loads the project connection string and model deployment name values you defined in the configuration file.

  4. Find the comment ICreate an OpenAI client, and add the following code to connect to your Azure AI Foundry project:

    Tip: Be careful to maintain the correct indentation level for your code.

    # Create an OpenAI client
    credential = DefaultAzureCredential()
    token_provider = get_bearer_token_provider(credential, "https://ai.azure.com/.default")
    client = OpenAI(
        base_url=endpoint,
        api_key=token_provider()
    )
    

Write code to submit a URL-based image prompt

  1. Note that the code includes a loop to allow a user to input a prompt until they enter “quit”. Then in the loop section, find the comment Get a response to image input, add the following code to submit a prompt that includes the following image:

    A photo of an orange.

    # Get a response to image input
    image_url = "https://microsoftlearning.github.io/mslearn-ai-vision/Labfiles/gen-ai-vision/orange.jpeg"
    image_format = "jpeg"
    request = Request(image_url, headers={"User-Agent": "Mozilla/5.0"})
    image_data = base64.b64encode(urlopen(request).read()).decode("utf-8")
    data_url = f"data:image/{image_format};base64,{image_data}"
    
    response = client.chat.completions.create(
         model=model_deployment,
         messages=[
             {"role": "system", "content": system_message},
             { "role": "user", "content": [  
                 { "type": "text", "text": prompt},
                 { "type": "image_url", "image_url": {"url": data_url}}
             ] } 
         ]
    )
    print(response.choices[0].message.content)
    
  2. Save your changes to the code file.

Sign into Azure and run the app

  1. In the VS Code terminal, sign into Azure:

     az login
    

    You must sign into Azure to authenticate with your Azure OpenAI resource.

    Note: In most scenarios, just using az login will be sufficient. However, if you have subscriptions in multiple tenants, you may need to specify the tenant by using the –tenant parameter.

  2. When prompted, follow the instructions to open the sign-in page in a new tab and enter the authentication code provided and your Azure credentials.

  3. After you have signed in, run the application:

    python image-chat-app.py
    
  4. When prompted, enter the following prompt:

    Suggest some recipes that include this fruit
    
  5. Review the response. Then enter quit to exit the program.

Modify the code to upload a local image file

  1. In the code editor for your app code, in the loop section, find the code you added previously under the comment Get a response to image input. Then modify the code as follows, to upload this local image file:

    A photo of a dragon fruit.

    # Get a response to image input
    script_dir = Path(__file__).parent  # Get the directory of the script
    image_path = script_dir / 'mystery-fruit.jpeg'
    mime_type = "image/jpeg"
    
    # Read and encode the image file
    with open(image_path, "rb") as image_file:
         base64_encoded_data = base64.b64encode(image_file.read()).decode('utf-8')
    
    # Include the image file data in the prompt
    data_url = f"data:{mime_type};base64,{base64_encoded_data}"
    response = client.chat.completions.create(
             model=model_deployment,
             messages=[
                 {"role": "system", "content": system_message},
                 { "role": "user", "content": [  
                     { "type": "text", "text": prompt},
                     { "type": "image_url", "image_url": {"url": data_url}}
                 ] } 
             ]
    )
    print(response.choices[0].message.content)
    
  2. Use the CTRL+S command to save your changes to the code file.

  3. In the terminal, enter the following command to run the app:

    python image-chat-app.py
    
  4. When prompted, enter the following prompt:

    What is this fruit? What recipes could I use it in?
    
  5. Review the response. Then enter quit to exit the program.

    Note: In this simple app, we haven’t implemented logic to retain conversation history; so the model will treat each prompt as a new request with no context of the previous prompt.

Clean up

If you’ve finished exploring Azure AI Foundry portal, you should delete the resources you have created in this exercise to avoid incurring unnecessary Azure costs.

  1. Open the Azure portal and view the contents of the resource group where you deployed the resources used in this exercise.
  2. On the toolbar, select Delete resource group.
  3. Enter the resource group name and confirm that you want to delete it.