Lab 04 - Enhance workload resiliency by using Traffic Manager and test resiliency by using Azure Chaos Studio

Estimated timing: 35 minutes

Scenario

Remember this module’s scenario in which you work for a software development company in the retail industry that is experiencing frequent downtime and performance issues with their new online store application. Since you have decided to enhance workload and test resiliency using Traffic Manager and Azure Chaos Studio respectively, this lab gives you the opportunity to implement a Traffic Manager profile and validate Traffic Manager functionality, configure Azure Chaos Studio environment, and implement and validate an experiment.

Objectives

In this lab, you will:

  • Prepare the Azure subscription for the lab
  • Enhance workload resiliency by using Traffic Manager
  • Test workload resiliency by using Azure Chaos Studio
  • Remove the resources used in the labs

Note: In the previous lab, you deployed two instances of a .NET web app into two Azure region. In this lab, you will first create a resilient configuration that implements the load balancing functionality of Azure Traffic Manager between the two web app instances. Next, you will use Azure Chaos Studio to trigger a failure of one of the apps to test the resiliency of the load balancing implementation.

Note: For this lab, use the same GitHub account you have been using in all previous labs.

Prerequisites

Exercise 0: Prepare the Azure subscription for the lab

Note: If you are using a new Azure subscription, some of the Azure resource providers might not be registered. Since the registration process might take a few minutes, to minimize the wait time, you will register them at the beginning of this lab.

Note: In this lab, you will be using Azure Chaos Studio. If this is the first time you implement Azure Chaos Studio in your subscription, you must first register the corresponding resource provider.

  1. Start a web browser and navigate to the Azure portal at https://portal.azure.com.
  2. If prompted, sign in by using your Microsoft Entra ID account with the Owner access to the Azure subscription you used in the previous lab.
  3. In the web browser tab displaying the Azure portal, in the search text box at the top of the page, enter Subscriptions and, in the list of results, select Subscriptions.
  4. On the subscriptions page, in the vertical menu on the left side, select Resource providers.
  5. In the list of resource providers, search for and select Microsoft.Chaos.
  6. With the Microsoft.Chaos resource provider selected, in the toolbar, select Register.

    Note: Don’t wait until the registration complete but proceed directly to the next exercise. The registration might take a couple of minutes.

Exercise 1: Enhance workload resiliency by using Traffic Manager

In this exercise, you will implement a resilient configuration that distributes requests between the two .NET web app instances in two different Azure regions by using Azure Traffic Manager.

Note: For the purpose of our lab, we will consider the deployment of .NET-based web app eShopOnWeb in the East US region as the primary instance. While in this case such consideration is purely arbitrary (and used for demonstration purposes only), there might be scenarios where it might be beneficial to prioritize one of the endpoints.

The exercise consists of the following tasks:

  • Task 1: Implement a Traffic Manager profile
  • Task 2: Validate Traffic Manager functionality

Task 1: Implement a Traffic Manager profile

  1. In the web browser tab displaying the Azure portal, in the search text box at the top of the page, enter Traffic Manager profiles and, in the list of results, select Traffic Manager profiles.
  2. On the Load balancing | Traffic Manager page, select + Create.
  3. On the Create Traffic Manager profile page, perform the following actions:

    • In the Name text box, enter devopsfoundationstmprofile.

      Note: The name of the Traffic Manager profile must be globally unique. If you receive an error message indicating that the name is already in use, try a different name and make sure to record it. You will need it throughout this lab.

    • In the Routing method drop-down list, select Priority.

    Note: We choose the priority routing method to reflect the somewhat arbitrary assumption that all of requests should be handled by the Azure App Service web app in the East US.

    • Verify that your Azure subscription appears in the Subscription drop-down list
    • Select the Create new link below the Resource Group drop-down list, in the Name text box, enter devops-foundations-rg, and then select OK.
    • In the Resource group location drop-down list, select the same Azure region you chose in the previous labs of this course.
  4. Select Create to start the provisioning process.

    Note: Wait for the deployment to complete. This should complete within a minute.

  5. On the Load balancing | Traffic Manager page, if necessary, select Refresh and then select devopsfoundationstmprofile.
  6. On the devopsfoundationstmprofile page, in the Essentials section, copy the value of the DNS name setting, and record it. You will need it throughout this lab.
  7. On the devopsfoundationstmprofile page, in the left navigation menu, in the Settings section, select Configuration.
  8. Review the content of the devopsfoundationstmprofile | Configuration page. Note that, by default, DNS time to live (TTL) is set to 60 seconds. Change the value to 5 seconds.

    Note: Lowering the value of TTL will accelerate the failover time.

    Note: In addition to the value of TTL, probing interval, probe timeout, and tolerated number of failures affect the amount of time that an endpoint failure is detected.

  9. On the devopsfoundationstmprofile | Configuration page, in the Protocol drop-down list, select HTTPS and, in the Port text box, enter 443.

    Note: Both of the Azure App Service web apps that will be configured as endpoints are accessible via HTTPS on the default TCP port 443.

  10. On the devopsfoundationstmprofile | Configuration page, select Save.
  11. On the devopsfoundationstmprofile page, in the left navigation menu, in the Settings section, select Endpoints.
  12. On the devopsfoundationstmprofile | Endpoints** page, select + Add.

    Note: You will first add an endpoint representing the Azure App Service web app.

  13. In the Add endpoint pane, perform the following actions:

    • Ensure that Azure endpoint appears in the Type drop-down list.
    • In the Name text box, enter DevOps Foundations web app - East US.
    • Ensure that the Enable Endpoint checkbox is selected.
    • In the Target resource type drop-down list, select App Service.
    • In the Target resource drop-down list, in the rg-eshoponweb-eastus section, select the name of the App Service web app in the East US Azure region.
    • In the Priority text box, enter 1.

    Note: A lower priority value designates a higher precedence. In this case, all requests will be sent to the web app instance in the East US region, as long as that instance remains available.

    • Ensure that the Health Checks are enabled.
  14. Select Add.

  15. Back on the devopsfoundationstmprofile | Endpoints** page, select + Add.

    Note: Next, you will add an endpoint representing the other Azure App Service web app deployed to the West Europe Azure region.

  16. In the Add endpoint pane, perform the following actions:

    • Ensure that Azure endpoint appears in the Type drop-down list.
    • In the Name text box, enter DevOps Foundations web app - West Europe.
    • Ensure that the Enable Endpoint checkbox is selected.
    • In the Target resource type drop-down list, select App Service.
    • In the Target resource drop-down list, in the rg-eshoponweb-westeurope section, select the name of the App Service web app in the West Europe Azure region.
    • In the Priority text box, enter 2.
    • Ensure that the Health Checks are enabled.
  17. Select Add.
  18. Back on the devopsfoundationstmprofile | Endpoints** page, verify that both endpoints are listed with the Online entry in the Monitor status column.

    Note: You might need to wait a couple of minutes and select Refresh before the status entry is up-to-date.

Task 2: Validate Traffic Manager functionality

  1. Start a web browser window and navigate to the DNS name associated with the Traffic Manager profile.
  2. Verify that the web browser displays the familiar page of the web site hosted by the Azure App Service web app you deployed in the previous lab.
  3. Close the newly opened web browser window.
  4. Switch over to the web browser window displaying the Azure portal.
  5. In the Azure portal, select the Azure Cloud Shell icon to the right of the search text box.
  6. If necessary, select Bash in the drop-down menu in the upper-left corner of the Cloud Shell pane.
  7. In the Bash session within the Cloud Shell pane, run the following command to resolve the DNS name of the Traffic Manager profile you created in the previous task to one of two corresponding endpoints (replace the placeholder <tm_profile> with the DNS name of the Traffic Manager profile, for example, devopsfoundationstmprofile.trafficmanager.net), but make sure to remove the leading https:\\:

    nslookup <tm_profile>
    
  8. Rerun the same command a few times, waiting for a bit more than 5 seconds between each invocation (to eliminate the possibility of DNS caching) and note that the output in each case references the DNS name of the web app in the East US Azure region.

Exercise 2: Test workload resiliency by using Azure Chaos Studio

In this exercise, you will test workload resiliency by using Azure Chaos Studio.

Note: This exercise illustrates the use of Azure Chaos Studio. The purpose of Azure Chaos Studio is to assist with measuring, understanding, and building application and service resilience. This is accomplished by intentionally disrupting workloads in order to identify resiliency gaps and implement corresponding mitigation proactively, rather than reactively.

The exercise consists of the following tasks:

  • Task 1: Configure Azure Chaos Studio environment
  • Task 2: Implement an experiment
  • Task 3: Validate the experiment

Task 1: Configure Azure Chaos Studio environment

  1. In the web browser tab displaying the Azure portal, in the search text box at the top of the page, enter Chaos Studio and, in the list of results, select Chaos Studio.
  2. On the Chaos Studio page, select Targets.
  3. On the Chaos Studio | Targets page, select the Azure App Service web app instance in the rg-eshoponweb-eastus resource group in the East US region you deployed in the previous lab.
  4. In the toolbar, select the Enable targets drop-down list header and, in the drop-down list select Enable service-direct targets (All resources).
  5. On the Enable service direct targets page, ensure that the correct App Service web app instance is listed, select Review + Enable, and then select Enable.

    Note: Wait for the target to be enabled. This should take less than a minute.

Task 2: Implement an experiment

  1. In the Azure portal, in the search text box at the top of the page, enter Chaos Studio and, in the list of results, select Chaos Studio.
  2. On the Chaos Studio page, select Experiments.
  3. On the Experiments page, select + Create and then, in the drop-down list, select New experiment.
  4. On the Basics tab of the Create an experiment page, perform the following actions:

    • Verify that your Azure subscription appears in the Subscription drop-down list.
    • Select the Create new link below the Resource Group drop-down list, in the Name text box, enter devops-foundations-rg, and then select OK.
    • In the Experiment details section, in the Name text box, enter DevOps_Foundations_Labs_Experiment_01.
    • In the Region drop-down list, select the West Europe Azure region.

    Note: You could potentially choose any Azure region, but considering that you are testing failures to a resource in the West Europe region, any region other than East US seems more appropriate.

  5. On the Basics tab of the Create an experiment page, select Next: Permissions >.
  6. On the Permissions tab, accept the default option System assigned identity and then select Next: Experiment designer >.
  7. On the Experiment designer tab, perform the following actions:

    • In the Step text box, enter Step 1: Failover an App Service web app.
    • In the Branch text box, enter Branch 1: Emulate an App Service failure.
    • Select + Add action and, in the drop-down list, select Add fault.
  8. On the Fault details tab of the Add fault pane, in the Faults drop-down list, select Stop App Service and set the value of Duration (minutes) to 10.
  9. On the Fault details tab of the Add fault pane, select Next: Target resources >.
  10. On the Target resources tab of the Add fault pane, ensure that the option Manually select from a list is selected, select the App Service web app you added as a target to the Chaos Studio environment, and then select Add.
  11. Back on the Experiment designer tab of the Create an experiment page, select Review + create.
  12. On the Review + create tab of the Create an experiment page, select Create.

    Note: Wait for the experiment to be created. This should take less than a minute.

    Note: For the experiment to succeed, you need to also grant the newly created managed service account permissions sufficient to stop the Azure App Service web app. We will leverage for this purpose the built-in Azure Contributor role, but you could create a custom role if you want to follow the principle of least privilege.

  13. In the Azure portal, in the search text box at the top of the page, enter App Services and select App Services in the list of results.
  14. On the App Services page, select the Azure App Service web app in the East US region you deployed in the previous lab.
  15. On the web app page, in the vertical menu on the left side, select Access control (IAM).
  16. On the web app’s Access control (IAM) page, select + Add and, in the drop-down list, select Add role assignment.
  17. On the Role tab of the Add role assignment page, select Privileged administrator roles, in the list of roles, select Contributor, and then select Next.
  18. On the Members tab of the Add role assignment page, set the Assigned access to option to Managed identity and then select + Select Members
  19. In the Select managed identities pane, in the Managed identities drop-down list, select Chaos Experiment, in the list of managed identities, select the one you created earlier in this task, and then select Select.
  20. Back on the Members tab of the Add role assignment page, select Review + assign and then again select Review + assign.

Task 3: Validate the experiment

  1. In the Azure portal, navigate back to the Chaos Studio page and, in the vertical menu on the left side, select Experiments.
  2. In the list of experiments, select DevOps_Foundations_Labs_Experiment_01.
  3. On the DevOps_Foundations_Labs_Experiment_01 page, select Start and, when prompted for confirmation, select OK.
  4. Wait until the status of the experiment changes to Running.
  5. On the DevOps_Foundations_Labs_Experiment_01 page, in the History section, select the Details link in the entry representing the current experiment execution.
  6. On the DevOps_Foundations_Labs_Experiment_01 experiment details page, review the listing containing the step, branch, and action associated with the experiment.

    Note: You might need to select Refresh toolbar entry to update the status of the listing.

  7. Select the Action entry to display the Fault details pane.
  8. In the Running targets section, select the entry representing the Azure App Service web app.
  9. On the web app page, in the Essential section, note that the status of the web app is listed as Stopped.

    Note: You might need to select Refresh toolbar entry to update the status of the Web App.

    Note: Now let’s verify whether Traffic Manager is successfully redirecting requests targeting its profile to the endpoint representing the App Service web app in the West Europe region.

  10. Start a web browser and navigate to the DNS name associated with the Traffic Manager profile.
  11. Verify that the web browser displays the familiar page of the web site hosted by the Azure App Service web app you deployed in the previous lab.
  12. Switch over to the web browser window displaying the Azure portal.
  13. In the Azure portal, select the Azure Cloud Shell icon to the right of the search text box.
  14. In the Bash session within the Cloud Shell pane, run the following command to resolve the DNS name of the Traffic Manager profile you created in the previous exercise to one of two corresponding endpoints (replace the placeholder <tm_profile> with the DNS name of the Traffic Manager profile, but make sure to remove the leading https:\\):

    nslookup <tm_profile>
    
  15. Rerun the same command a few times, waiting for a bit more than 5 seconds between each invocation (to eliminate the possibility of DNS caching) and verify that the output in each case references the DNS name of the web app in the West Europe Azure region.

Note: While this example might seem a bit simplistic, the primary objective of the exercise was to illustrate the process of implementing Azure Chaos Studio-based testing and its primary components. You would use the equivalent approach if an Azure App Service web app was a part of more complex environment, where the impact of its failure might be much more challenging to predict.

Exercise 3: Remove the resources used in the labs

In this exercise, you will remove the resources used in the labs.

  1. In the web browser tab displaying the Azure portal, in the search text box at the top of the page, enter Resource groups and, in the list of results, select Resource groups.
  2. On the Resource groups page, in the list of resource groups, select the resource group you created in the current lab.
  3. On the resource group page, select Delete resource group.
  4. In the Enter resource group name to confirm deletion text box, enter the name of the resource group you are about to delete, and then select Delete.

    Note: Wait for the resource group to be deleted. This should take less than a minute.

  5. Repeat the previous step for the resource group you created in the previous labs of this course.

    Note: Deleting the resource group will remove all resources associated with it, including the Traffic Manager profile and the Azure Chaos Studio environment.

    Note: There is no need to delete the GitHub account and the repositories you created in the previous labs of this course. You can continue using them as a portfolio of your work.