Generate and update embeddings in Azure SQL Database
Estimated Time: 30 minutes
In this exercise, you create an external model reference, generate embeddings from text stored in Azure SQL Database using the AI_GENERATE_EMBEDDINGS function, and verify the results. You also explore how embeddings need to be maintained when source data changes. Finally, you perform a basic vector search to confirm that the embeddings capture semantic meaning.
You’re a database developer for Adventure Works. Your team is adding AI-powered search capabilities to the product catalog. The first step is to generate and store vector embeddings for customer reviews so they’re compared by semantic similarity later. You also need to understand how embeddings stay in sync when review data changes.
📝 These exercises ask you to copy and paste T-SQL code. Please verify that the code has been copied correctly, before executing the code.
Prerequisites
- An Azure subscription with approval for Azure OpenAI access
- Visual Studio Code with the SQL Server (mssql) extension, or SQL Server Management Studio
- Basic familiarity with Azure SQL Database and T-SQL
Provision an Azure SQL Database
First, create an Azure SQL Database with sample data.
📝 Skip this section if you already have an AdventureWorksLT Azure SQL Database provisioned.
-
Go to the Azure SQL hub and sign in with your Azure account if prompted. In the Azure SQL Database pane, select Show options, and then select Create SQL Database.
💡 If you see a Free offer banner on this page, you can apply it to use Azure SQL Database at no cost. The free offer provides 100,000 vCore seconds of serverless compute and 32 GB of storage per month. If you apply the free offer, skip steps 3 to 6.
-
On the Basics tab of the Create SQL Database page, fill in the required information:
Setting Value Subscription Select your Azure subscription. Resource group Select or create a resource group. Database name AdventureWorksLT Server Select Create new and create a new server with a unique name. Select your Location. For authentication, select one of the following options and then select OK: 📝 Authentication is not optional. You must choose the method that matches your organization’s security policies. Each option affects how you connect to the database later:
- Use Microsoft Entra-only authentication (recommended): Select this if your organization requires Entra-based access. Set your Azure account as the Microsoft Entra admin. You connect to the database using your Microsoft Entra account (for example, in SSMS select Authentication = Microsoft Entra MFA).
- Use both SQL and Microsoft Entra authentication/SQL authentication: Select this if you prefer a SQL admin login or your organization allows both methods. Provide a Server admin login and Password. You need these credentials to connect. You can also set a Microsoft Entra admin to enable Entra logins alongside SQL auth.
The authentication method you choose here controls how you connect to the database as a developer. It is separate from how Azure SQL connects to Azure OpenAI, which is configured later in the lab.
📝 If you already have a test server you can use, select it instead of creating a new one.
- Leave Want to use SQL elastic pool set to No.
- For Workload environment, select Development. This presets the compute to General Purpose serverless with auto-pause enabled, which is the most cost-friendly paid option.
- Under Compute + storage, select Configure database. Change the service tier to Hyperscale and the compute tier to Serverless. Select Apply to confirm.
- Under Backup storage redundancy, select Locally-redundant backup storage.
- Select Next: Networking.
-
On the Networking tab, for Connectivity method, select Public endpoint.
📝 If you selected an existing server instead of creating a new one, the Connectivity method option may not appear because it is already configured on the server.
- Under Firewall rules, set Allow Azure services and resources to access this server to Yes and Add current client IP address to Yes.
- Select Next: Security, then select Next: Additional settings.
- On the Additional settings tab, under Data source, set Use existing data to Sample to create the AdventureWorksLT sample database. Select OK when prompted to confirm.
- Select Review + create, review the settings, and then select Create.
- Wait for the deployment to complete, then navigate to the new Azure SQL Database resource.
Create a Foundry project and deploy Azure OpenAI models
Next, create a project in Microsoft Foundry and deploy a chat model and an embedding model.
📝 Skip this section if you already have an Azure OpenAI resource with gpt-5.4-mini and text-embedding-3-small models deployed.
Create a Foundry project
The first step is to create a new project in Microsoft Foundry, which will serve as the workspace for deploying and managing your Azure OpenAI models.
-
Go to Microsoft Foundry and sign in with your Azure account.
📝 If you see a New Foundry toggle in the upper-right corner of the portal, make sure it is turned on to use the latest version of the Foundry portal. The steps below assume the new experience.
-
Create a new project:
- If a project name is shown in the upper-left corner, select it and then select Create new project. If no project exists, select + Create project from the home page.
- Enter a project name (for example, proj-sqlailab).
- Expand Advanced options. Select the same Subscription and Resource group you used for your Azure SQL Database, and choose a Region where Azure OpenAI models are available.
- Select Create. If a Let’s go prompt appears, select Let’s go.
Deploy Azure OpenAI models
You will deploy two models: gpt-5.4-mini (for chat completions) and text-embedding-3-small (for embeddings). The following steps guide you through deploying both models.
Let’s deploy the gpt-5.4-mini model first.
- Select Discover in the top navigation bar. In the search bar at the top of the page, search for gpt-5.4-mini. Alternatively, select View all models in the Featured models section to browse the full catalog. Select gpt-5.4-mini from the search results.
- On the model details page, select the Deploy dropdown and choose Default settings.
- If a Select a project to deploy dialog appears, select the same Region you chose when creating the project, then select your project from the Project dropdown. Select Continue.
- Set the Deployment name to gpt-5.4-mini and select Deploy.
- After deployment is complete, the model is now deployed.
Let’s deploy the text-embedding-3-small model next.
- Now deploy the embedding model. Select Discover in the top navigation and search for text-embedding-3-small. Select it from the search results.
- On the model details page, select the Deploy dropdown and choose Default settings.
- If a Select a project to deploy dialog appears, select the same Region and Project as before and select Continue.
-
After deployment is complete, the model is now deployed.
- Select Build > Models in the top navigation to verify both deployments appear.
Retrieve the Azure OpenAI endpoint
Now retrieve the Azure OpenAI endpoint you need for the T-SQL steps later.
- In the Azure portal, navigate to your resource group and select the Foundry resource that was created with your project (for example, proj-sqlailab-resource).
-
In the left menu, select Resource Management > Keys and Endpoint. Select the OpenAI tab and note the endpoint URL (for example,
https://proj-sqlailab-resource.openai.azure.com/). You need the endpoint name (the part before.openai.azure.com) later.📝 The Keys and Endpoint page has three tabs: Foundry, OpenAI, and AI Services. Make sure you select the OpenAI tab. This tab shows the endpoint in the format required by Azure SQL Database’s
CREATE EXTERNAL MODELandsp_invoke_external_rest_endpoint. The Foundry tab shows a different.services.ai.azure.comURL that does not work with these T-SQL features.
Configure managed identity access
Since Azure SQL Database uses a system-assigned managed identity to authenticate with Azure OpenAI, you need to enable the identity on your SQL Server and grant it access to the Azure OpenAI resource. This approach is more secure than API keys and doesn’t require storing secrets.
📝 Skip this section if your SQL Server already has a system-assigned managed identity enabled and it has been granted the Cognitive Services OpenAI User role on your Azure OpenAI resource.
- In the Azure portal, navigate to the SQL server you created earlier (the logical server, not the database).
- In the left menu, select Security > Identity.
- Under System assigned managed identity, set Status to On and select Save.
- Once the managed identity is enabled on your SQL Server, navigate to your Azure OpenAI resource (for example, adventureworks-openai).
- In the left menu, select Access control (IAM).
- Select + Add and then select Add role assignment.
- On the Role tab, search for and select Cognitive Services OpenAI User, then select Next.
- On the Members tab, select Managed identity, then select + Select members.
- In the Select managed identities pane, set Managed identity to SQL server, select your SQL server from the list, and then select Select.
-
Select Review + assign twice to complete the role assignment.
📝 The role assignment may take up to 5 minutes to take effect. You can proceed with the next steps while waiting.
Create the ProductReview table
The AdventureWorksLT sample database contains product information but no customer reviews. In this step, you download and run a script that creates a ProductReview table with 140 realistic reviews across many product categories. These reviews provide the text data that you generate embeddings for in this exercise.
-
Connect to your Azure SQL Database using Visual Studio Code (with the SQL Server extension) or SQL Server Management Studio.
💡 How to connect depends on which authentication method your organization supports and was configured during server creation:
- Microsoft Entra authentication: In SSMS, set Authentication to Microsoft Entra MFA and sign in with your Azure account. In VS Code, select the Microsoft Entra ID authentication type when creating a connection profile.
- SQL authentication: In SSMS or VS Code, enter the Server admin login and Password you specified during server creation, with Authentication set to SQL Login.
In both cases, set the Server name to
<your-server-name>.database.windows.netand the Database to AdventureWorksLT. - Download the review script from product-reviews-insert.sql and save it locally.
- Open the downloaded file and run the entire script against your AdventureWorksLT database.
-
Verify the table was created and populated by running:
SELECT COUNT(*) AS TotalReviews FROM dbo.ProductReview; GO📝 You should see 140 rows. The reviews cover bikes, tires, lights, helmets, gloves, maintenance tools, and more, with ratings ranging from 1 to 5 stars.
Create a database scoped credential and an external model for embeddings
Set up the credential and model reference needed to call Azure OpenAI from T-SQL. You create a single credential using the system-assigned managed identity of your Azure SQL Server. This credential is used by both CREATE EXTERNAL MODEL (for embeddings) and sp_invoke_external_rest_endpoint (for chat completions), and eliminates the need for API keys.
📝 Skip this section if you already have a database scoped credential and an external embedding model configured for your Azure OpenAI endpoint.
-
To create a database scoped credential using managed identity, open a new query window and run the following script. Replace
<your-openai-endpoint>with the endpoint name you noted from the OpenAI tab (for example, if your endpoint ishttps://proj-sqlailab-resource.openai.azure.com/, use proj-sqlailab-resource).-- Create a database master key if one doesn't exist IF NOT EXISTS (SELECT * FROM sys.symmetric_keys WHERE name = '##MS_DatabaseMasterKey##') CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<your strong password here>'; GO -- Create a credential using Managed Identity CREATE DATABASE SCOPED CREDENTIAL [https://<your-openai-endpoint>.openai.azure.com] WITH IDENTITY = 'Managed Identity', SECRET = '{"resourceid":"https://cognitiveservices.azure.com"}'; GO📝 Replace
<your strong password here>with a strong password. Replace<your-openai-endpoint>with the endpoint name from the OpenAI tab (for example, proj-sqlailab-resource).⚠ Important: Do not change the
resourceidvalue in the SECRET. It must remain exactlyhttps://cognitiveservices.azure.com. This is the fixed OAuth audience for Azure Cognitive Services, not your specific endpoint URL. Changing it will cause authentication errors. -
Now create an external model reference for the embedding model. This reference allows you to use
AI_GENERATE_EMBEDDINGSdirectly in T-SQL. Replace<your-openai-endpoint>with your endpoint name.-- Create an external model reference for embeddings CREATE EXTERNAL MODEL my_embedding_model WITH ( LOCATION = 'https://<your-openai-endpoint>.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-10-21', API_FORMAT = 'Azure OpenAI', MODEL_TYPE = EMBEDDINGS, MODEL = 'text-embedding-3-small', CREDENTIAL = [https://<your-openai-endpoint>.openai.azure.com] ); GO📝 Replace
<your-openai-endpoint>with the same endpoint name you used in the credential. Theapi-versionvalue2024-10-21is the current GA version of the Azure OpenAI REST API. TheMODELoption is required and must match the model name.
Add a vector column and generate embeddings
In this section, you add a vector column to the ProductReview table and generate embeddings for the review text. The embedding model converts each review’s text into a 1536-dimensional vector that captures its semantic meaning.
-
To store review embeddings, add a vector column to the
dbo.ProductReviewtable:-- Add a vector column to store review text embeddings ALTER TABLE dbo.ProductReview ADD ReviewVector VECTOR(1536); GO📝 The
VECTOR(1536)data type stores a 1536-dimensional vector, which matches the output of the text-embedding-3-small model. Each element is stored as a single-precision (4-byte) float, so a 1536-dimension vector uses about 6 KB per row. -
Test embedding generation on a single review first. This test confirms the external model and credential are working:
-- Test embedding generation on a single review SELECT TOP 1 r.ReviewID, r.ReviewTitle, AI_GENERATE_EMBEDDINGS( r.ReviewTitle + ': ' + r.ReviewText USE MODEL my_embedding_model ) AS TestEmbedding FROM dbo.ProductReview r; GO📝 You should see a long array of floating-point numbers. This confirms that your credential, external model, and Azure OpenAI deployment are all configured correctly. If you get an error, check that the managed identity role assignment has taken effect (it can take up to 5 minutes).
-
Now generate embeddings for all reviews in batches. The script combines the product name, review title, and review text to create richer embeddings, and processes 30 reviews per batch with retry logic for rate limits:
-- Generate embeddings in batches to avoid API rate limits DECLARE @batchSize INT = 30; DECLARE @rowsUpdated INT = 1; DECLARE @retryCount INT; DECLARE @maxRetries INT = 3; WHILE @rowsUpdated > 0 BEGIN SET @retryCount = 0; RETRY: BEGIN TRY UPDATE TOP (@batchSize) r SET r.ReviewVector = AI_GENERATE_EMBEDDINGS( p.Name + ' - ' + r.ReviewTitle + ': ' + r.ReviewText USE MODEL my_embedding_model) FROM dbo.ProductReview r INNER JOIN SalesLT.Product p ON r.ProductID = p.ProductID WHERE r.ReviewVector IS NULL; SET @rowsUpdated = @@ROWCOUNT; -- Brief pause between batches to respect API rate limits IF @rowsUpdated > 0 WAITFOR DELAY '00:00:02'; END TRY BEGIN CATCH SET @retryCount += 1; IF @retryCount <= @maxRetries BEGIN PRINT 'Rate limited. Retrying in 5 seconds... (Attempt ' + CAST(@retryCount AS NVARCHAR(10)) + ' of ' + CAST(@maxRetries AS NVARCHAR(10)) + ')'; WAITFOR DELAY '00:00:05'; GOTO RETRY; END ELSE THROW; END CATCH END GO📝 This step may take a couple of minutes. The script processes 30 reviews per batch with a 2-second pause between batches. If the API returns a rate-limit error, it retries up to three times with a 5-second wait. The product name, review title, and review text are embedded together so that vector search can match on both the product being reviewed and the customer’s experience.
Verify and inspect the generated embeddings
After generating embeddings, verify that all reviews have vectors and inspect the vector properties.
-
Check how many reviews have embeddings:
-- Verify embedding counts SELECT COUNT(*) AS TotalReviews, COUNT(ReviewVector) AS ReviewsWithEmbeddings, COUNT(*) - COUNT(ReviewVector) AS ReviewsMissingEmbeddings FROM dbo.ProductReview; GO📝 All 140 reviews should have embeddings. If any are missing, run the batch embedding script again as it only processes rows where
ReviewVector IS NULL. -
Inspect the vector dimensions and data type using
VECTORPROPERTY:-- Check vector metadata SELECT TOP 1 r.ReviewID, r.ReviewTitle, VECTORPROPERTY(r.ReviewVector, 'Dimensions') AS VectorDimensions, VECTORPROPERTY(r.ReviewVector, 'BaseType') AS VectorBaseType, DATALENGTH(r.ReviewVector) AS VectorSizeBytes FROM dbo.ProductReview r WHERE r.ReviewVector IS NOT NULL; GO📝
VECTORPROPERTYreturns metadata about a vector. You should see 1536 dimensions, afloatbase type, and approximately 6,148 bytes per vector. This function is useful for validating that vectors have the expected structure, especially when troubleshooting dimension mismatch errors. -
View a sample of actual vector values for a review:
-- View a sample embedding SELECT TOP 1 r.ReviewID, r.ReviewTitle, LEFT(CAST(r.ReviewVector AS NVARCHAR(MAX)), 200) AS EmbeddingPreview FROM dbo.ProductReview r WHERE r.ReviewVector IS NOT NULL; GO📝 The embedding is a JSON array of floating-point numbers. The full vector has 1536 elements, so this query shows only the first 200 characters. Each number represents a dimension of meaning that the embedding model learned during training.
Validate embeddings with a basic vector search
To confirm that the embeddings capture semantic meaning, run a basic vector similarity search using VECTOR_DISTANCE. This function calculates the cosine distance between two vectors, where smaller values indicate greater similarity.
-
Search for reviews similar to a natural language question:
-- Find reviews semantically similar to a question DECLARE @searchText NVARCHAR(1000) = 'Which tires last the longest and resist punctures?'; DECLARE @searchVector VECTOR(1536); -- Generate an embedding for the search text SELECT @searchVector = AI_GENERATE_EMBEDDINGS(@searchText USE MODEL my_embedding_model); -- Find the 5 closest reviews by cosine distance SELECT TOP 5 p.Name AS ProductName, r.ReviewTitle, r.ReviewText, r.Rating, VECTOR_DISTANCE('cosine', @searchVector, r.ReviewVector) AS Distance FROM dbo.ProductReview r INNER JOIN SalesLT.Product p ON r.ProductID = p.ProductID ORDER BY Distance; GO📝 The results should show tire-related reviews discussing puncture resistance and durability, even though the search text and review text use different wording. This confirms that the embeddings capture semantic meaning.
VECTOR_DISTANCEwith cosine metric returns values between 0 (identical) and 2 (opposite). Lower values mean the review is more relevant to the question. -
Compare the semantic search to a simple keyword search:
-- Keyword search for comparison SELECT TOP 5 p.Name AS ProductName, r.ReviewTitle, r.ReviewText, r.Rating FROM dbo.ProductReview r INNER JOIN SalesLT.Product p ON r.ProductID = p.ProductID WHERE r.ReviewText LIKE '%puncture%'; GO📝 The
LIKEsearch only finds reviews containing the exact word “puncture.” The vector search found reviews about tire durability and longevity that describe the same concept using different words. This difference illustrates why embeddings are valuable for search.
Explore embedding maintenance
Embeddings are a snapshot of the source text at the time they were generated. When the source text changes, the embedding becomes stale and no longer reflects the current content. In this section, you observe this problem and practice regenerating embeddings.
-
Pick a review and note its current embedding distance from a known query:
-- Check distance of a specific review before update DECLARE @searchVector VECTOR(1536); SELECT @searchVector = AI_GENERATE_EMBEDDINGS('warm winter cycling gloves' USE MODEL my_embedding_model); SELECT r.ReviewID, r.ReviewTitle, r.ReviewText, VECTOR_DISTANCE('cosine', @searchVector, r.ReviewVector) AS DistanceBefore FROM dbo.ProductReview r WHERE r.ReviewID = 124; GO📝 Review 124 is about warm winter gloves. Note the distance value. The embedding currently reflects this content, so the distance should be small.
-
Now update the review text to something different. The embedding is now stale since it still reflects the original text:
-- Change the review text without updating the embedding UPDATE dbo.ProductReview SET ReviewText = 'These tires are the most puncture-resistant tires I have ever used. Over 3000 miles with zero flats on rough gravel roads.', ReviewTitle = N'Indestructible tires' WHERE ReviewID = 124; GO -
Check the distance again. The embedding still reflects the old glove review, but the text is now about tires:
-- Check distance after text change (embedding is stale) DECLARE @searchVector VECTOR(1536); SELECT @searchVector = AI_GENERATE_EMBEDDINGS('warm winter cycling gloves' USE MODEL my_embedding_model); SELECT r.ReviewID, r.ReviewTitle, r.ReviewText, VECTOR_DISTANCE('cosine', @searchVector, r.ReviewVector) AS DistanceAfterTextChange FROM dbo.ProductReview r WHERE r.ReviewID = 124; GO📝 The distance is still small because the embedding has not been updated. The vector still represents “warm winter gloves” even though the text now describes puncture-resistant tires. A vector search for gloves would incorrectly return this tire review. This is why embedding maintenance is important.
-
Regenerate the embedding for the updated review:
-- Regenerate the embedding to match the updated text UPDATE r SET r.ReviewVector = AI_GENERATE_EMBEDDINGS( p.Name + ' - ' + r.ReviewTitle + ': ' + r.ReviewText USE MODEL my_embedding_model) FROM dbo.ProductReview r INNER JOIN SalesLT.Product p ON r.ProductID = p.ProductID WHERE r.ReviewID = 124; GO -
Verify the distance now reflects the updated content:
-- Check distance after embedding regeneration DECLARE @searchVector VECTOR(1536); SELECT @searchVector = AI_GENERATE_EMBEDDINGS('warm winter cycling gloves' USE MODEL my_embedding_model); SELECT r.ReviewID, r.ReviewTitle, r.ReviewText, VECTOR_DISTANCE('cosine', @searchVector, r.ReviewVector) AS DistanceAfterRegeneration FROM dbo.ProductReview r WHERE r.ReviewID = 124; GO📝 The distance should now be much larger because the embedding reflects the tire content, not the glove content. The vector and the text are back in sync. In production, you would automate this regeneration using triggers, Change Tracking, Change Data Capture, or an external process like Azure Functions. The right approach depends on how often your data changes and how quickly embeddings need to reflect those changes.
-
Restore the original review so the data is consistent for later labs:
-- Restore the original review text and regenerate the embedding UPDATE dbo.ProductReview SET ReviewTitle = N'Warm hands well below freezing', ReviewText = N'Rode through an entire winter with the Full-Finger Gloves and never had cold fingers even at minus 5 degrees with wind chill. The fleece lining is cozy without being bulky and I can still operate my brake levers precisely. Essential cold weather gear.' WHERE ReviewID = 124; GO -- Regenerate the embedding for the restored text UPDATE r SET r.ReviewVector = AI_GENERATE_EMBEDDINGS( p.Name + ' - ' + r.ReviewTitle + ': ' + r.ReviewText USE MODEL my_embedding_model) FROM dbo.ProductReview r INNER JOIN SalesLT.Product p ON r.ProductID = p.ProductID WHERE r.ReviewID = 124; GO
Cleanup
If you aren’t using the Azure SQL Database or the Azure OpenAI resources for any other purpose, you can clean up the resources you created in this exercise.
📝 These resources are used on labs 9, 10, and 11.
If you provisioned a new resource group for this lab, you can simply delete the entire resource group to remove all resources at once. If you used an existing resource group, delete the Azure SQL Database and Azure OpenAI resource individually.
- In the Azure portal, navigate to your resource group.
- Select Delete resource group and confirm deletion by typing the resource group name.
- Select Delete to remove all resources created in this lab.
You successfully completed this exercise.
In this exercise, you learned how to store and generate vector embeddings in Azure SQL Database. You added a vector column and generated embeddings individually and in batches with retry logic. You inspected vector metadata such as dimensions and storage size. You validated embeddings with a similarity search and compared the results to a traditional keyword search. Finally, you observed how embeddings become stale when source text changes and how to regenerate them to restore consistency.