Move multiple documents in bulk with the Azure Cosmos DB for NoSQL SDK

The easiest way to learn how to perform a bulk operation is to attempt to push many documents to an Azure Cosmos DB for NoSQL account in the cloud. Using the bulk features of the SDK, this can be done with some minor help from the System.Threading.Tasks namespace.

In this lab, you’ll use the Bogus library from NuGet to generate fictional data and place that into an Azure Cosmos DB account.

Prepare your development environment

If you have not already cloned the lab code repository for DP-420 to the environment where you’re working on this lab, follow these steps to do so. Otherwise, open the previously cloned folder in Visual Studio Code.

  1. Start Visual Studio Code.

    📝 If you are not already familiar with the Visual Studio Code interface, review the Getting Started documentation

  2. Open the command palette and run Git: Clone to clone the https://github.com/microsoftlearning/dp-420-cosmos-db-dev GitHub repository in a local folder of your choice.

    💡 You can use the CTRL+SHIFT+P keyboard shortcut to open the command palette.

  3. Once the repository has been cloned, open the local folder you selected in Visual Studio Code.

Create an Azure Cosmos DB for NoSQL account and configure the SDK project

  1. In a new web browser window or tab, navigate to the Azure portal (portal.azure.com).

  2. Sign into the portal using the Microsoft credentials associated with your subscription.

  3. Select + Create a resource, search for Cosmos DB, and then create a new Azure Cosmos DB for NoSQL account resource with the following settings, leaving all remaining settings to their default values:

    Setting Value
    Subscription Your existing Azure subscription
    Resource group Select an existing or create a new resource group
    Account Name Enter a globally unique name
    Location Choose any available region
    Capacity mode Provisioned throughput
    Apply Free Tier Discount Do Not Apply

    📝 Your lab environments may have restrictions preventing you from creating a new resource group. If that is the case, use the existing pre-created resource group.

  4. Wait for the deployment task to complete before continuing with this task.

  5. Go to the newly created Azure Cosmos DB account resource and navigate to the Keys pane.

  6. This pane contains the connection details and credentials necessary to connect to the account from the SDK. Specifically:

    1. Notice the URI field. You will use this endpoint value later in this exercise.

    2. Notice the PRIMARY KEY field. You will use this key value later in this exercise.

  7. Still within the newly created Azure Cosmos DB account resource, navigate to the Data Explorer pane.

  8. In the Data Explorer, select New Container, and then create a new container with the following settings, leaving all remaining settings to their default values:

    Setting Value
    Database id Create new | cosmicworks
    Share throughput across containers Do not select
    Container id products
    Partition key /categoryId
    Container throughput Autoscale | 4000
  9. Return to Visual Studio Code.

  10. In in the Explorer pane, browse to the 08-sdk-bulk folder.

  11. Open the script.cs code file within the 08-sdk-bulk folder.

    📝 The [Microsoft.Azure.Cosmos][nuget.org/packages/microsoft.azure.cosmos/3.22.1] library has already been pre-imported from NuGet.

  12. Locate the string variable named endpoint. Set its value to the endpoint of the Azure Cosmos DB account you created earlier.

     string endpoint = "<cosmos-endpoint>";
    

    📝 For example, if your endpoint is: https­://dp420.documents.azure.com:443/, then the C# statement would be: string endpoint = “https­://dp420.documents.azure.com:443/”;.

  13. Locate the string variable named key. Set its value to the key of the Azure Cosmos DB account you created earlier.

     string key = "<cosmos-key>";
    

    📝 For example, if your key is: fDR2ci9QgkdkvERTQ==, then the C# statement would be: string key = “fDR2ci9QgkdkvERTQ==”;.

  14. Save the script.cs code file.

  15. Open the context menu for the 08-sdk-bulk folder and then select Open in Integrated Terminal to open a new terminal instance.

    📝 This command will open the terminal with the starting directory already set to the 08-sdk-bulk folder.

  16. Add the [Microsoft.Azure.Cosmos][nuget.org/packages/microsoft.azure.cosmos/3.22.1] package from NuGet using the following command:

     dotnet add package Microsoft.Azure.Cosmos --version 3.22.1
    
  17. Build the project using the dotnet build command:

     dotnet build
    
  18. Close the integrated terminal.

Bulk inserting a twenty-five thousand documents

Let’s “go for the gusto” and try to insert a lot of documents to see how this works. In our internal testing, this can take approximately 1-2 minutes if the lab virtual machine and Azure Cosmos DB for NoSQL account are relatively close to each other geographically speaking.

  1. Return to the editor tab for the script.cs code file.

  2. Create a new instance of the CosmosClientOptions named options class with the AllowBulkExecution property set to a value of true:

     CosmosClientOptions options = new () 
     { 
         AllowBulkExecution = true 
     };
    
  3. Create a new instance of the CosmosClient class named client passing in the endpoint, key, and options variables as constructor parameters:

     CosmosClient client = new (endpoint, key, options); 
    
  4. Use the GetContainer method of the client variable to retrieve the existing container using the name of the database (cosmicworks) and the name of the container (products):

     Container container = client.GetContainer("cosmicworks", "products");
    
  5. Use this special sample code to generate 25,000 fictitious products using the Faker class from the Bogus library imported from NuGet.

     List<Product> productsToInsert = new Faker<Product>()
         .StrictMode(true)
         .RuleFor(o => o.id, f => Guid.NewGuid().ToString())
         .RuleFor(o => o.name, f => f.Commerce.ProductName())
         .RuleFor(o => o.price, f => Convert.ToDouble(f.Commerce.Price(max: 1000, min: 10, decimals: 2)))
         .RuleFor(o => o.categoryId, f => f.Commerce.Department(1))
         .Generate(25000);
    

    💡 The Bogus library is an open-source library used to design fictitious data to test user interface applications and is great for learning how to develop bulk import/export applications.

  6. Create a new generic List<> of type Task named concurrentTasks:

     List<Task> concurrentTasks = new List<Task>();
    
  7. Create a foreach loop that will iterate over the list of products that was generated earlier in this application:

     foreach(Product product in productsToInsert)
     {
     }
    
  8. Within the foreach loop, create a Task to asynchornously insert a product into Azure Cosmos DB for NoSQL being sure to explicitly specify the partition key and to add the task to list of tasks named concurrentTasks:

     concurrentTasks.Add(
         container.CreateItemAsync(product, new PartitionKey(product.categoryId))
     );   
    
  9. After the foreach loop, asynchronously await the result of Task.WhenAll on the concurrentTasks variable:

     await Task.WhenAll(concurrentTasks);
    
  10. Use the built-in Console.WriteLine static method to print a static message of Bulk tasks complete to the console:

     Console.WriteLine("Bulk tasks complete");
    
  11. Once you are done, your code file should now include:

     using System;
     using System.Collections.Generic;
     using System.Threading.Tasks;
     using Bogus;
     using Microsoft.Azure.Cosmos;
        
     string endpoint = "<cosmos-endpoint>";
     string key = "<cosmos-key>";
        
     CosmosClientOptions options = new () 
     { 
         AllowBulkExecution = true 
     };
        
     CosmosClient client = new (endpoint, key, options);  
        
     Container container = client.GetContainer("cosmicworks", "products");
        
     List<Product> productsToInsert = new Faker<Product>()
         .StrictMode(true)
         .RuleFor(o => o.id, f => Guid.NewGuid().ToString())
         .RuleFor(o => o.name, f => f.Commerce.ProductName())
         .RuleFor(o => o.price, f => Convert.ToDouble(f.Commerce.Price(max: 1000, min: 10, decimals: 2)))
         .RuleFor(o => o.categoryId, f => f.Commerce.Department(1))
         .Generate(25000);
            
     List<Task> concurrentTasks = new List<Task>();
        
     foreach(Product product in productsToInsert)
     {    
         concurrentTasks.Add(
             container.CreateItemAsync(product, new PartitionKey(product.categoryId))
         );
     }
        
     await Task.WhenAll(concurrentTasks);   
    
     Console.WriteLine("Bulk tasks complete");
    
  12. Save the script.cs code file.

  13. In Visual Studio Code, open the context menu for the 08-sdk-bulk folder and then select Open in Integrated Terminal to open a new terminal instance.

  14. Build and run the project using the dotnet run command:

     dotnet run
    
  15. The application should run silently, it should take approximately one to two minutes to run before completing silently.

  16. Close the integrated terminal.

Observe the results

Now that you have sent 25,000 items to Azure Cosmos DB let’s go and look at the Data Explorer.

1.Return to the web browser and navigate to the Data Explorer pane.

  1. In the Data Explorer, expand the cosmicworks database node, then observe the products container node within the NOSQL API navigation tree.

  2. Expand the products node, and then select the Items node. Observe the list of items within your container.

  3. Select the products container node within the NoSQL API navigation tree, and then select New SQL Query.

  4. Delete the contents of the editor area.

  5. Create a new SQL query that will return a count of all documents created using the bulk operation:

     SELECT COUNT(1) FROM items
    
  6. Select Execute Query.

  7. Observe the count of the items in your container.

  8. Close your web browser window or tab.