Read Text in Images
Optical character recognition (OCR) is a subset of computer vision that deals with reading text in images and documents. The Azure AI Vision service provides an API for reading text, which you’ll explore in this exercise.
Clone the repository for this course
If you haven’t already done so, you must clone the code repository for this course:
- Start Visual Studio Code.
- Open the palette (SHIFT+CTRL+P) and run a Git: Clone command to clone the
https://github.com/MicrosoftLearning/mslearn-ai-vision
repository to a local folder (it doesn’t matter which folder). - When the repository has been cloned, open the folder in Visual Studio Code.
-
Wait while additional files are installed to support the C# code projects in the repo.
Note: If you are prompted to add required assets to build and debug, select Not Now. If you are prompted with the Message Detected an Azure Function Project in folder, you can safely close that message.
Provision an Azure AI Services resource
If you don’t already have one in your subscription, you’ll need to provision an Azure AI Services resource.
- Open the Azure portal at
https://portal.azure.com
, and sign in using the Microsoft account associated with your Azure subscription. - In the top search bar, search for Azure AI services, select Azure AI Services, and create an Azure AI services multi-service account resource with the following settings:
- Subscription: Your Azure subscription
- Resource group: Choose or create a resource group (if you are using a restricted subscription, you may not have permission to create a new resource group - use the one provided)
- Region: Choose from East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, West US, or East Asia*
- Name: Enter a unique name
- Pricing tier: Standard S0
*Azure AI Vision 4.0 features are currently only available in these regions.
- Select the required checkboxes and create the resource.
- Wait for deployment to complete, and then view the deployment details.
- When the resource has been deployed, go to it and view its Keys and Endpoint page. You’ll need the endpoint and one of the keys from this page in the next procedure.
Prepare to use the Azure AI Vision SDK
In this exercise, you’ll complete a partially implemented client application that uses the Azure AI Vision SDK to read text.
Note: You can choose to use the SDK for either C# or Python. In the following steps, perform the actions appropriate for your preferred language.
- In Visual Studio Code, in the Explorer pane, browse to the Labfiles\05-ocr folder and expand the C-Sharp or Python folder depending on your language preference.
-
Right-click the read-text folder and open an integrated terminal. Then install the Azure AI Vision SDK package by running the appropriate command for your language preference:
C#
dotnet add package Azure.AI.Vision.ImageAnalysis -v 1.0.0-beta.3
Note: If you are prompted to install dev kit extensions, you can safely close the message.
Python
pip install azure-ai-vision-imageanalysis==1.0.0b3
-
View the contents of the read-text folder, and note that it contains a file for configuration settings:
- C#: appsettings.json
- Python: .env
Open the configuration file and update the configuration values it contains to reflect the endpoint and an authentication key for your Azure AI services resource. Save your changes.
Use the Azure AI Vision SDK to read text from an image
One of the features of the Azure AI Vision SDK is to read text from an image. In this exercise, you’ll complete a partially implemented client application that uses the Azure AI Vision SDK to read text from an image.
-
The read-text folder contains a code file for the client application:
- C#: Program.cs
- Python: read-text.py
Open the code file and at the top, under the existing namespace references, find the comment Import namespaces. Then, under this comment, add the following language-specific code to import the namespaces you’ll need to use the Azure AI Vision SDK:
C#
// Import namespaces using Azure.AI.Vision.ImageAnalysis;
Python
# import namespaces from azure.ai.vision.imageanalysis import ImageAnalysisClient from azure.ai.vision.imageanalysis.models import VisualFeatures from azure.core.credentials import AzureKeyCredential
-
In the code file for your client application, in the Main function, the code to load the configuration settings has been provided. Then find the comment Authenticate Azure AI Vision client. Then, under this comment, add the following language-specific code to create and authenticate an Azure AI Vision client object:
C#
// Authenticate Azure AI Vision client ImageAnalysisClient client = new ImageAnalysisClient( new Uri(aiSvcEndpoint), new AzureKeyCredential(aiSvcKey));
Python
# Authenticate Azure AI Vision client cv_client = ImageAnalysisClient( endpoint=ai_endpoint, credential=AzureKeyCredential(ai_key) )
-
In the Main function, under the code you just added, note that the code specifies the path to an image file and then passes the image path to the GetTextRead function. This function isn’t yet fully implemented.
-
Let’s add some code to the body of the GetTextRead function. Find the comment Use Analyze image function to read text in image. Then, under this comment, add the following language-specific code, noting that the visual features are specified when calling the
Analyze
function:C#
// Use Analyze image function to read text in image ImageAnalysisResult result = client.Analyze( BinaryData.FromStream(stream), // Specify the features to be retrieved VisualFeatures.Read); stream.Close(); // Display analysis results if (result.Read != null) { Console.WriteLine($"Text:"); // Prepare image for drawing System.Drawing.Image image = System.Drawing.Image.FromFile(imageFile); Graphics graphics = Graphics.FromImage(image); Pen pen = new Pen(Color.Cyan, 3); foreach (var line in result.Read.Blocks.SelectMany(block => block.Lines)) { // Return the text detected in the image } // Save image String output_file = "text.jpg"; image.Save(output_file); Console.WriteLine("\nResults saved in " + output_file + "\n"); }
Python
# Use Analyze image function to read text in image result = cv_client.analyze( image_data=image_data, visual_features=[VisualFeatures.READ] ) # Display the image and overlay it with the extracted text if result.read is not None: print("\nText:") # Prepare image for drawing image = Image.open(image_file) fig = plt.figure(figsize=(image.width/100, image.height/100)) plt.axis('off') draw = ImageDraw.Draw(image) color = 'cyan' for line in result.read.blocks[0].lines: # Return the text detected in the image # Save image plt.imshow(image) plt.tight_layout(pad=0) outputfile = 'text.jpg' fig.savefig(outputfile) print('\n Results saved in', outputfile)
-
In the code you just added in the GetTextRead function, and under the Return the text detected in the image comment, add the following code (this code prints the image text to the console and generates the image text.jpg which highlights the image’s text):
C#
// Return the text detected in the image Console.WriteLine($" '{line.Text}'"); // Draw bounding box around line var drawLinePolygon = true; // Return the position bounding box around each line // Return each word detected in the image and the position bounding box around each word with the confidence level of each word // Draw line bounding polygon if (drawLinePolygon) { var r = line.BoundingPolygon; Point[] polygonPoints = { new Point(r[0].X, r[0].Y), new Point(r[1].X, r[1].Y), new Point(r[2].X, r[2].Y), new Point(r[3].X, r[3].Y) }; graphics.DrawPolygon(pen, polygonPoints); }
Python
# Return the text detected in the image print(f" {line.text}") drawLinePolygon = True r = line.bounding_polygon bounding_polygon = ((r[0].x, r[0].y),(r[1].x, r[1].y),(r[2].x, r[2].y),(r[3].x, r[3].y)) # Return the position bounding box around each line # Return each word detected in the image and the position bounding box around each word with the confidence level of each word # Draw line bounding polygon if drawLinePolygon: draw.polygon(bounding_polygon, outline=color, width=3)
-
In the read-text/images folder, select Lincoln.jpg to view the file that your code processes.
-
In the code file for your application, in the Main function, examine the code that runs if the user selects menu option 1. This code calls the GetTextRead function, passing the path to the Lincoln.jpg image file.
-
Save your changes and return to the integrated terminal for the read-text folder, and enter the following command to run the program:
C#
dotnet run
Python
python read-text.py
-
When prompted, enter 1 and observe the output, which is the text extracted from the image.
-
In the read-text folder, select the text.jpg image and noticed how there’s a polygon around each line of text.
-
Return to the code file in Visual Studio Code, and find the comment Return the position bounding box around each line. Then, under this comment, add the following code:
C#
// Return the position bounding box around each line Console.WriteLine($" Bounding Polygon: [{string.Join(" ", line.BoundingPolygon)}]");
Python
# Return the position bounding box around each line print(" Bounding Polygon: {}".format(bounding_polygon))
-
Save your changes and return to the integrated terminal for the read-text folder, and enter the following command to run the program:
C#
dotnet run
Python
python read-text.py
-
When prompted, enter 1 and observe the output, which should be each line of text in the image with their respective position in the image.
-
Return to the code file in Visual Studio Code, and find the comment Return each word detected in the image and the position bounding box around each word with the confidence level of each word. Then, under this comment, add the following code:
C#
// Return each word detected in the image and the position bounding box around each word with the confidence level of each word foreach (DetectedTextWord word in line.Words) { Console.WriteLine($" Word: '{word.Text}', Confidence {word.Confidence:F4}, Bounding Polygon: [{string.Join(" ", word.BoundingPolygon)}]"); // Draw word bounding polygon drawLinePolygon = false; var r = word.BoundingPolygon; Point[] polygonPoints = { new Point(r[0].X, r[0].Y), new Point(r[1].X, r[1].Y), new Point(r[2].X, r[2].Y), new Point(r[3].X, r[3].Y) }; graphics.DrawPolygon(pen, polygonPoints); }
Python
# Return each word detected in the image and the position bounding box around each word with the confidence level of each word for word in line.words: r = word.bounding_polygon bounding_polygon = ((r[0].x, r[0].y),(r[1].x, r[1].y),(r[2].x, r[2].y),(r[3].x, r[3].y)) print(f" Word: '{word.text}', Bounding Polygon: {bounding_polygon}, Confidence: {word.confidence:.4f}") # Draw word bounding polygon drawLinePolygon = False draw.polygon(bounding_polygon, outline=color, width=3)
-
Save your changes and return to the integrated terminal for the read-text folder, and enter the following command to run the program:
C#
dotnet run
Python
python read-text.py
-
When prompted, enter 1 and observe the output, which should be each word of text in the image with their respective position in the image. Notice how the confidence level of each word is also returned.
-
In the read-text folder, select the text.jpg image and noticed how there’s a polygon around each word.
Use the Azure AI Vision SDK to read handwritten text from an image
In the previous exercise, you read well defined text from an image, but sometimes you might also want to read text from handwritten notes or papers. The good news is that the Azure AI Vision SDK can also read handwritten text with the same exact code you used to read well defined text. We’ll use the same code from the previous exercise, but this time we’ll use a different image.
-
In the read-text/images folder, select on Note.jpg to view the file that your code processes.
-
In the code file for your application, in the Main function, examine the code that runs if the user selects menu option 2. This code calls the GetTextRead function, passing the path to the Note.jpg image file.
-
From the integrated terminal for the read-text folder, enter the following command to run the program:
C#
dotnet run
Python
python read-text.py
-
When prompted, enter 2 and observe the output, which is the text extracted from the note image.
-
In the read-text folder, select the text.jpg image and noticed how there’s a polygon around each word of the note.
Clean up resources
If you’re not using the Azure resources created in this lab for other training modules, you can delete them to avoid incurring further charges. Here’s how:
-
Open the Azure portal at
https://portal.azure.com
, and sign in using the Microsoft account associated with your Azure subscription. -
In the top search bar, search for Azure AI services multi-service account, and select the Azure AI services multi-service account resource you created in this lab
-
On the resource page, select Delete and follow the instructions to delete the resource.
More information
For more information about using the Azure AI Vision service to read text, see the Azure AI Vision documentation.