Migrate existing data using Azure Data Factory

In Azure Data Factory, Azure Cosmos DB is supported as a source of data ingest and as a target (sink) of data output.

In this lab, we will populate Azure Cosmos DB using a helpful command-line utility and then use Azure Data Factory to move a subset of data from one container to another.

Create and seed your Azure Cosmos DB for NoSQL account

You will use a command-line utility that creates a cosmicworks database and a products container at 4,000 request units per second (RU/s). Once created, you will adjust the throughput down to 400 RU/s.

To accompany the products container, you will create a flatproducts container manually that will be the target of the ETL transformation and load operation at the end of this lab.

  1. In a new web browser window or tab, navigate to the Azure portal (portal.azure.com).

  2. Sign into the portal using the Microsoft credentials associated with your subscription.

  3. Select + Create a resource, search for Cosmos DB, and then create a new Azure Cosmos DB for NoSQL account resource with the following settings, leaving all remaining settings to their default values:

    Setting Value
    Subscription Your existing Azure subscription
    Resource group Select an existing or create a new resource group
    Account Name Enter a globally unique name
    Location Choose any available region
    Capacity mode Provisioned throughput
    Apply Free Tier Discount Do Not Apply
    Limit the total amount of throughput that can be provisioned on this account Unchecked

    📝 Your lab environments may have restrictions preventing you from creating a new resource group. If that is the case, use the existing pre-created resource group.

  4. Wait for the deployment task to complete before continuing with this task.

  5. Go to the newly created Azure Cosmos DB account resource and navigate to the Keys pane.

  6. This pane contains the connection details and credentials necessary to connect to the account from the SDK. Specifically:

    1. Notice the URI field. You will use this endpoint value later in this exercise.

    2. Notice the PRIMARY KEY field. You will use this key value later in this exercise.

  7. Keep the browser tab open, as we will return to it later.

  8. Start Visual Studio Code.

    📝 If you are not already familiar with the Visual Studio Code interface, review the Get Started guide for Visual Studio Code

  9. In Visual Studio Code, open the Terminal menu and then select New Terminal to open a new terminal instance.

  10. Install the cosmicworks command-line tool for global use on your machine.

     dotnet tool install cosmicworks --global --version 1.*
    

    💡 This command may take a couple of minutes to complete. This command will output the warning message (*Tool ‘cosmicworks’ is already installed’) if you have already installed the latest version of this tool in the past.

  11. Run cosmicworks to seed your Azure Cosmos DB account with the following command-line options:

    Option Value
    –endpoint The endpoint value you checked earlier in this lab
    –key The key value you checked earlier in this lab
    –datasets product
     cosmicworks --endpoint <cosmos-endpoint> --key <cosmos-key> --datasets product
    

    📝 For example, if your endpoint is: https­://dp420.documents.azure.com:443/ and your key is: fDR2ci9QgkdkvERTQ==, then the command would be: cosmicworks --endpoint https://dp420.documents.azure.com:443/ --key fDR2ci9QgkdkvERTQ== --datasets product

  12. Wait for the cosmicworks command to finish populating the account with a database, container, and items.

  13. Close the integrated terminal.

  14. Switch back to the web browser, open a new tab and navigate to the Azure portal (portal.azure.com).

  15. Select Resource groups, then select the resource group you created or viewed earlier in this lab, and then select the Azure Cosmos DB account resource you created in this lab.

  16. Within the Azure Cosmos DB account resource, navigate to the Data Explorer pane.

  17. In the Data Explorer, expand the cosmicworks database node, expand the products container node, and then select Items.

  18. Observe and select the various JSON items in the products container. These are the items created by the command-line tool used in previous steps.

  19. Select the Scale & Settings node. In the Scale & Settings tab, select Manual, update the required throughput setting from 4000 RU/s to 400 RU/s and then Save your changes**.

  20. In the Data Explorer pane, select New Container.

  21. In the New Container popup, enter the following values for each setting, and then select OK:

    Setting Value
    Database id Use existing &vert; cosmicworks
    Container id flatproducts
    Partition key /category
    Container throughput (autoscale) Manual
    RU/s 400
  22. Back in the Data Explorer pane, expand the cosmicworks database node and then observe the flatproducts container node within the hierarchy.

  23. Return to the Home of the Azure portal.

Create Azure Data Factory resource

Now that the Azure Cosmos DB for NoSQL resources are in place, you will create an Azure Data Factory resource and configure all of the necessary components and connections to perform a one-time data movement from one NoSQL API container to another to extract data, transform it, and load it to another NoSQL API container.

  1. Select + Create a resource, search for Data Factory, and then create a new Data Factory resource with the following settings, leaving all remaining settings to their default values:

    Setting Value
    Subscription Your existing Azure subscription
    Resource group Select an existing or create a new resource group
    Name Enter a globally unique name
    Region Choose any available region
    Version V2
    Git configuration Configure Git later

    📝 Your lab environments may have restrictions preventing you from creating a new resource group. If that is the case, use the existing pre-created resource group.

  2. Wait for the deployment task to complete before continuing with this task.

  3. Go to the newly created Data Factory resource and select Launch studio.

    💡 Alternatively, you can navigate to (adf.azure.com/home), select your newly created Data Factory resource, and then select the home icon.

  4. From the home screen. Select the Ingest option to begin the quick wizard to perform a one-time copy data at scale operation and move to the Properties step of the wizard.

  5. Starting with the Properties step of the wizard, in the Task type section, select Built-in copy task.

  6. In the Task cadence or task schedule section, select Run once now and then select Next to move to the Source step of the wizard.

  7. In the Source step of the wizard, in the Source type list, select Azure Cosmos DB for NoSQL.

  8. In the Connection section, select + New connection.

  9. In the New connection (Azure Cosmos DB for NoSQL) popup, configure the new connection with the following values, and then select Create:

    Setting Value
    Name CosmosSqlConn
    Connect via integration runtime AutoResolveIntegrationRuntime
    Authentication method Account key &vert; Connection string
    Account selection method From Azure subscription
    Azure subscription Your existing Azure subscription
    Azure Cosmos DB account name Your existing Azure Cosmos DB account name you chose earlier in this lab
    Database name cosmicworks
  10. Back in the Source data store section, within the Source tables section, select Use query.

  11. In the Table name list, select products.

  12. In the Query editor, delete the existing content and enter the following query:

     SELECT 
         p.name, 
         p.categoryName as category, 
         p.price 
     FROM 
         products p
    
  13. Select Preview data to test the query’s validity. Select Next to move to the Destination step of the wizard.

  14. In the Destination step of the wizard, in the Destination type list, select Azure Cosmos DB for NoSQL.

  15. In the Connection list, select CosmosSqlConn.

  16. In the Target list, select flatproducts and then select Next to move to the Settings step of the wizard.

  17. In the Settings step of the wizard, in the Task name field, enter FlattenAndMoveData.

  18. Leave all remaining fields to their default blank values and then select Next to move to the final step of the wizard.

  19. Review the Summary of the steps you have selected in the wizard and then select Next.

  20. Observe the various steps in the deployment. When the deployment has finished, select Finish.

  21. Return to the browser tab that has your Azure Cosmos DB account and navigate to the Data Explorer pane.

  22. In the Data Explorer, expand the cosmicworks database node, select the flatproducts container node, and then select New SQL Query.

  23. Delete the contents of the editor area.

  24. Create a new SQL query that will return all documents where the name is equivalent to HL Headset:

     SELECT 
         p.name, 
         p.category, 
         p.price 
     FROM
         flatproducts p
     WHERE
         p.name = 'HL Headset'
    
  25. Select Execute Query.

  26. Observe the results of the query.

  27. Close your web browser window or tab.