Find the best classification model with Automated Machine Learning

Determining the right algorithm and preprocessing transformations for model training can involve a lot of guesswork and experimentation.

In this exercise, you’ll use automated machine learning to determine the optimal algorithm and preprocessing steps for a model by performing multiple training runs in parallel.

Before you start

You’ll need an Azure subscription in which you have administrative-level access.

Provision an Azure Machine Learning workspace

An Azure Machine Learning workspace provides a central place for managing all resources and assets you need to train and manage your models. You can interact with the Azure Machine Learning workspace through the studio, Python SDK, and Azure CLI.

You’ll use the Azure CLI to provision the workspace and necessary compute, and you’ll use the Python SDK to train a classification model with Automated Machine Learning.

Create the workspace and compute resources

To create the Azure Machine Learning workspace, a compute instance, and a compute cluster, you’ll use the Azure CLI. All necessary commands are grouped in a Shell script for you to execute.

  1. In a browser, open the Azure portal at https://portal.azure.com/, signing in with your Microsoft account.
  2. Select the [>_] (Cloud Shell) button at the top of the page to the right of the search box. This opens a Cloud Shell pane at the bottom of the portal.
  3. Select Bash if asked. The first time you open the cloud shell, you will be asked to choose the type of shell you want to use (Bash or PowerShell).
  4. Check that the correct subscription is specified and that No storage account required is selected. Select Apply.
  5. In the terminal, enter the following commands to clone this repo:

     rm -r azure-ml-labs -f
     git clone https://github.com/MicrosoftLearning/mslearn-azure-ml.git azure-ml-labs
    

    Use SHIFT + INSERT to paste your copied code into the Cloud Shell.

  6. After the repo has been cloned, enter the following commands to change to the folder for this lab and run the setup.sh script it contains:

     cd azure-ml-labs/Labs/06
     ./setup.sh
    

    Ignore any (error) messages that say that the extensions were not installed.

  7. Wait for the script to complete - this typically takes around 5-10 minutes.

Clone the lab materials

When you’ve created the workspace and necessary compute resources, you can open the Azure Machine Learning studio and clone the lab materials into the workspace.

  1. In the Azure portal, navigate to the Azure Machine Learning workspace named mlw-dp100-….
  2. Select the Azure Machine Learning workspace, and in its Overview page, select Launch studio. Another tab will open in your browser to open the Azure Machine Learning studio.
  3. Close any pop-ups that appear in the studio.
  4. Within the Azure Machine Learning studio, navigate to the Compute page and verify that the compute instance and cluster you created in the previous section exist. The compute instance should be running, the cluster should be idle and have 0 nodes running.
  5. In the Compute instances tab, find your compute instance, and select the Terminal application.
  6. In the terminal, install the Python SDK on the compute instance by running the following commands in the terminal:

     pip uninstall azure-ai-ml
     pip install azure-ai-ml
    

    Ignore any (error) messages that say that the packages couldn’t be found and uninstalled.

  7. Run the following command to clone a Git repository containing notebooks, data, and other files to your workspace:

     git clone https://github.com/MicrosoftLearning/mslearn-azure-ml.git azure-ml-labs
    
  8. When the command has completed, in the Files pane, click to refresh the view and verify that a new Users/your-user-name/azure-ml-labs folder has been created.

Train a classification model with automated machine learning

Now that you have all the necessary resources, you can run the notebook to configure and submit the Automated Machine Learning job.

  1. Open the Labs/06/Classification with Automated Machine Learning.ipynb notebook.

    Select Authenticate and follow the necessary steps if a notification appears asking you to authenticate.

  2. Verify that the notebook uses the Python 3.8 - AzureML kernel.
  3. Run all cells in the notebook.

    A new job will be created in the Azure Machine Learning workspace. The job tracks the inputs defined in the job configuration, the data asset used, and the outputs like metrics to evaluate the models.

    Note that the Automated Machine Learning jobs contains child jobs, which represent individual models that have been trained and other tasks needed to execute.

  4. Go to Jobs and select the auto-ml-class-dev experiment.
  5. Select the job under the Display name column.
  6. Wait for its status to change to Completed.
  7. When the Automate Machine Learning job status has changed to Completed, explore the job details in the studio:
    • The Data guardrails tab shows whether your training data had any issues.
    • The Models + child jobs tab will show all models that have been trained. Select Explain model for the best model and create the explanation job run using the aml-cluster.
    • Wait until a new column Explained appears next to the Algorithm name column and select View explanation. You may need to refresh the algorithm list for this option to appear.
    • Review the dashboard created to understand which features influenced the target value the most.

Delete Azure resources

When you finish exploring Azure Machine Learning, you should delete the resources you’ve created to avoid unnecessary Azure costs.

  1. Close the Azure Machine Learning studio tab and return to the Azure portal.
  2. In the Azure portal, on the Home page, select Resource groups.
  3. Select the rg-dp100-… resource group.
  4. At the top of the Overview page for your resource group, select Delete resource group.
  5. Enter the resource group name to confirm you want to delete it, and select Delete.