Get started with information extraction in Microsoft Foundry
In this exercise, you will use Azure Content Understanding in Foundry, Microsoft’s platform for creating intelligent applications.
Azure Content Understanding is a Foundry service that uses AI models to turn unstructured, multimodal content (documents, images, video, audio) into structured, usable outputs like JSON. It processes content by extracting, classifying, and generating fields with confidence scores and source grounding.
This exercise takes approximately 25 minutes.
Note: This exercise utilizes the new Foundry portal experience.
Create a Microsoft Foundry project
-
In a web browser, open Microsoft Foundry at
https://ai.azure.comto start building; signing in using your Azure credentials. - If it is not already enabled, in the tool bar the top of the page, enable the New Foundry option. Then, if prompted, create a new project with a unique name; expanding the Advanced options area to specify the following settings for your project:
- Foundry resource: Enter a valid name for your AI Foundry resource.
- Subscription: Your Azure subscription
- Resource group: Create or select a resource group
- Region: Select West US, Sweden Central, Australia East, or any of the regions in this list
Note: Depending on your permissions in the Azure subscription, you may need to clear the option to set up recommended resources.
-
Deselect the option to Set up recommended resources…. Then select Create.

-
Wait for your project to be created. It may take a few minutes. After creating a project in the new Foundry portal, it should take you to a list of your projects (note: you may need to refresh the page to see your newly created project). Select the project you just created to open a page similar to the following image:

Tip: Close any suggestions or ‘get-started’ tutorials that may appear on the home page.
Extract information from documents in the new Foundry portal
- In the Foundry portal, navigate to the tool bar at the top of the screen and select Build.
- On the Build page, in the menu on the left-side of the screen (which you may need to expand), select Deployments. Then, at the top of the Deployments page, select AI Services.
- Identify the Content Understanding capabilities you can try out in a Foundry playground setting:
- Content Understanding - Read: Raw text extraction only. Answers the question, “What text is here?”
- Content Understanding - Layout: Adds structure, hierarchy, and positioning. Answers the question, “How is this content organized?”
- Content Understanding: offers the full analyzer capability by extracting fields and structure and generating insights. Answers the question, “What does this content mean and what should I do with it?”
Try out Content Understanding’s Read capabilities
-
Select Content Understanding - Read. The Read capability is the first step in content understanding—it reads and extracts text, but doesn’t try to understand structure or meaning yet.
-
Select the sample read_barcode.pdf and use the Run analysis button to extract information from the document. When analysis is complete, view the results.

-
Select the back button to return to the previous page to test out other capabilities.
Try out Content Understanding’s Layout capabilities
-
On the AI Services tab, select Content Understanding - Layout.
-
Select the sample layout_checklist.jpg and use the Run analysis button to extract information from it. When analysis is complete, view the results.

-
In the content output, select the Tables tab. Review how the Layout analyzer is able to capture both the text and structure of the content.

-
Select the back button to return to the previous page to test out other capabilities.
Try out Content Understanding’s other analyzer capabilities
-
On the AI Services tab, select Content Understanding to test another one of Azure Content Understanding analyzers.
-
On the Content Understanding page, select the Document modality.

-
Next to the Document modality, select Document fields from the dropdown menu. If asked to deploy models that aren’t configured yet, select Deploy models.
Tip: document fields and other complex extraction needs require deploying multiple AI models, since each deployment is tied to a specific model version or capability. Using multiple models in Azure AI Foundry lets you handle different types of processing tasks more effectively, with flexibility to choose the right model for each need.
-
Select a recommended Chat completion model and Embedding model from the drop-down menus. Then select Apply changes. Once the changes are applied, you can close the Configure panel.
-
Let’s try to use the full analyzer with our own invoice. Open a new browser window. Enter the following URL:
https://raw.githubusercontent.com/MicrosoftLearning/mslearn-ai-fundamentals/refs/heads/main/data/content-understanding/contoso-invoice-1.pdfto download contoso-invoice-1.pdf . -
Back in the Content Understanding playground in the Foundry portal, use the Browse for files link to upload the contoso-invoice-1.pdf document you just downloaded. Select Run analysis and review the results. Notice that not only is the text rendered, but its layout is captured, and the fields are organized into cohesive categories.

-
In the pane on the right where the extracted fields are displayed, view the Result tab to see the raw results in JSON. Identify the analyzerID field, which contains the type of analyzer used. You can find a list of prebuilt Content Understanding analyzers here.

Tip: Consider this: the Fields tab displays the information from the raw JSON in the Results tab in a user-friendly way.
Understand how to extract content with the Python SDK
As a developer, you can also use code to extract meaning from content. The Foundry playground provides various code samples to get you started with information extraction with Azure Content Understanding.

-
Let’s take a closer look at the Python code for document layout analysis. In the Content Understanding playground, select the Code tab, then select Modality: Document and the Layout analyzer. The following code is provided:
import sys import json from azure.ai.contentunderstanding import ContentUnderstandingClient from azure.ai.contentunderstanding.models import AnalysisInput, AnalysisResult from azure.core.credentials import AzureKeyCredential from azure.core.exceptions import AzureError from azure.identity import DefaultAzureCredential def main() -> None: # Insert the following configurations. # 1) AZURE_CONTENT_UNDERSTANDING_ENDPOINT - the endpoint to your Content Understanding resource. endpoint = "https://<your-resource>.services.ai.azure.com/" # 2) CONTENT_UNDERSTANDING_KEY - your Content Understanding API key (optional if using DefaultAzureCredential). key = "" # 3) FILE_URL - you can replace this with your own URL. file_url = "https://contentunderstanding.ai.azure.com/assets/prebuilt/layout_checklist.jpg" # ANALYZER_ID - the ID of the analyzer to use. analyzer_id = "prebuilt-layout" # API_VERSION - the API version to use. api_version = "2025-11-01" # Set up Content Understanding client. credential = AzureKeyCredential(key) if key and "" not in key else DefaultAzureCredential() client = ContentUnderstandingClient(endpoint=endpoint, credential=credential, api_version=api_version) # [START analyze] print(f"Analyzing with {analyzer_id} analyzer...") print(f" File URL: {file_url}\n") try: poller = client.begin_analyze( analyzer_id=analyzer_id, inputs=[AnalysisInput(url=file_url)], ) result: AnalysisResult = poller.result() except AzureError as err: print(f"[Azure Error]: {err.message}") sys.exit(1) except Exception as ex: print(f"[Unexpected Error]: {ex}") sys.exit(1) # [END analyze] # [START output_result] print("=" * 50) print("Analysis result:") print("=" * 50 + "\n") max_display_lines = 50 result_str = json.dumps(result.as_dict(), indent=2) ret_lines = result_str.splitlines() if len(ret_lines) > max_display_lines: print("\n".join(ret_lines[:max_display_lines])) print(f"\n {len(ret_lines) - max_display_lines} more lines to be displayed...\n") else: print(result_str) # [END output_result] if __name__ == "__main__": main() - Consider what you might need to configure in the code:
- The endpoint to your Content Understanding resource
- Your resource key
- A URL to the file you’d like analyzed
- Consider what’s provided in the code sample that you might alter:
- Analyzer ID (which you can change to use different prebuilt models)
- API Version
-
After setting configurations, the code creates a client to talk to Azure Content Understanding. The code decides how to authenticate: if you provided a real API key, it uses that key directly. Otherwise, it falls back to
DefaultAzureCredential(), which automatically finds credentials from your environment (like your Azure CLI login). Then it creates the client using your endpoint, the chosen credential, and an API version.# Set up Content Understanding client. credential = AzureKeyCredential(key) if key and "" not in key else DefaultAzureCredential() client = ContentUnderstandingClient(endpoint=endpoint, credential=credential, api_version=api_version) -
Next, the code analyzes the content. The SDK starts the analysis as a long-running operation. The function
begin_analyze()returns a poller that handles checking the operation status (whether the analysis is successfully complete or not). The SDK’s poller handles the complete operation automatically whenpoller.result()is called.# [START analyze] print(f"Analyzing with {analyzer_id} analyzer...") print(f" File URL: {file_url}\n") try: poller = client.begin_analyze( analyzer_id=analyzer_id, inputs=[AnalysisInput(url=file_url)], ) result: AnalysisResult = poller.result() except AzureError as err: print(f"[Azure Error]: {err.message}") sys.exit(1) except Exception as ex: print(f"[Unexpected Error]: {ex}") sys.exit(1) # [END analyze] -
The output of the analysis is formatted and displayed as JSON using the following code:
# [START output_result] print("=" * 50) print("Analysis result:") print("=" * 50 + "\n") max_display_lines = 50 result_str = json.dumps(result.as_dict(), indent=2) ret_lines = result_str.splitlines() if len(ret_lines) > max_display_lines: print("\n".join(ret_lines[:max_display_lines])) print(f"\n {len(ret_lines) - max_display_lines} more lines to be displayed...\n") else: print(result_str) # [END output_result]Note: Much of the code above makes the output look more readable. Its purpose is actually very simple: to print the results of the analysis.
-
Running the entire code from step 1 returns JSON like you observed earlier in the lab. For example:
{ "id": "", "status": "Succeeded", "result": { "analyzerId": "prebuilt-layout", "apiVersion": "2025-11-01", "createdAt": "", "warnings": [], "contents": [ { "path": "input1", "markdown": "", "kind": "document", "startPageNumber": 1, "endPageNumber": 1, "unit": "pixel", "pages": [ { "pageNumber": 1, "angle": 0, "width": 2580, "height": 3433, "spans": [ { "offset": 0, "length": 2269 } ], "words": [ { "content": "Documents", "span": { "offset": 2, "length": 9 }, "confidence": 0.996, "source": "D(1,213,217,768,201,768,296,214,310)" }, { "content": "to", "span": { "offset": 12, "length": 2 }, "confidence": 0.999, "source": "D(1,802,200,906,197,906,293,803,295)" }, { "content": "Store", "span": { "offset": 15, "length": 5 }, "confidence": 0.998, "source": "D(1,947,196,1218,189,1219,285,947,292)" } ...Tip: To actually run the code in your own environment, you will need to follow the setup and configuration instructions shared at the start of the code sample.
Click to see those instructions:
Create a Python file in a code editor such as Visual Studio Code and call it sample.py. Make sure you have Python 3.9 or later installed. Navigate to the directory containing this file in your terminal. Install dependencies in the terminal with the command: `python -m pip install azure-ai-contentunderstanding azure-identity`. Then run the script in the terminal with the command: `python sample.py`.
Summary
In this exercise, you explored Azure Content Understanding in Foundry and learned how it transforms unstructured content into structured, usable data. You tried out three analyzers, each building on the previous one in capability:
- Read: Extracts raw text from documents without interpreting structure or meaning—answering, “What text is here?”
- Layout: Goes a step further by capturing structure, hierarchy, and positioning—including tables—answering, “How is this content organized?”
- Document fields: an analyzer that uses a combination of capabilities to extract fields, organize them into cohesive categories, and generate insights—answering, “What does this content mean and what should I do with it?” Content Understanding analyzers like this one sometimes require deploying additional AI models (such as chat completion and embedding models) to handle complex extraction needs.
You also learned how developers can integrate Content Understanding into applications using the Python SDK, which enables programmatic analysis of documents outside the Foundry playground.
Ask Anton
If you have questions about some of the topics covered in this exercise, Ask Anton is a generative AI-based agent that you can ask about AI concepts and Microsoft Foundry. Open the app at https://aka.ms/azk-anton and use the Configure button to enter your Foundry project and model details.
Ask Anton is not a supported Microsoft product or a component of Microsoft Learn or AI Skills Navigator. Just an example of an AI agent for you to explore as you learn about what’s possible with AI.
If you do check out Ask Anton, we’d love you to tell us about your experience!
Clean up
If you’ve finished working with the Content Understanding service, you should delete the resources you have created in this exercise to avoid incurring unnecessary Azure costs.
- In the Azure portal, delete the resource group you created in this exercise.