Run pipelines in Azure Machine Learning
You can use the Python SDK to perform all of the tasks required to create and operate a machine learning solution in Azure. Rather than perform these tasks individually, you can use pipelines to orchestrate the steps required to prepare data, run training scripts, and other tasks.
In this exercise, you’ll run multiple scripts as a pipeline job.
Before you start
You’ll need an Azure subscription in which you have administrative-level access.
Provision an Azure Machine Learning workspace
An Azure Machine Learning workspace provides a central place for managing all resources and assets you need to train and manage your models. You can interact with the Azure Machine Learning workspace through the studio, Python SDK, and Azure CLI.
You’ll use the Azure CLI to provision the workspace and necessary compute, and you’ll use the Python SDK to run a command job.
Create the workspace and compute resources
To create the Azure Machine Learning workspace, a compute instance, and a compute cluster, you’ll use the Azure CLI. All necessary commands are grouped in a Shell script for you to execute.
- In a browser, open the Azure portal at
https://portal.azure.com/
, signing in with your Microsoft account. - Select the [>_] (Cloud Shell) button at the top of the page to the right of the search box. This opens a Cloud Shell pane at the bottom of the portal.
- Select Bash if asked. The first time you open the cloud shell, you will be asked to choose the type of shell you want to use (Bash or PowerShell).
- Check that the correct subscription is specified and that No storage account required is selected. Select Apply.
-
In the terminal, enter the following commands to clone this repo:
rm -r azure-ml-labs -f git clone https://github.com/MicrosoftLearning/mslearn-azure-ml.git azure-ml-labs
Use
SHIFT + INSERT
to paste your copied code into the Cloud Shell. -
After the repo has been cloned, enter the following commands to change to the folder for this lab and run the setup.sh script it contains:
cd azure-ml-labs/Labs/09 ./setup.sh
Ignore any (error) messages that say that the extensions were not installed.
-
Wait for the script to complete - this typically takes around 5-10 minutes.
Troubleshooting tip: Workspace creation error
If you receive an error when running the setup script through the CLI, you need to provision the resources manually:
- In the Azure portal home page, select + Create a resource.
- Search for machine learning and then select Azure Machine Learning. Select Create.
- Create a new Azure Machine Learning resource with the following settings:
- Subscription: Your Azure subscription
- Resource group: rg-dp100-labs
- Workspace name: mlw-dp100-labs
- Region: Select the geographical region closest to you
- Storage account: Note the default new storage account that will be created for your workspace
- Key vault: Note the default new key vault that will be created for your workspace
- Application insights: Note the default new application insights resource that will be created for your workspace
- Container registry: None (one will be created automatically the first time you deploy a model to a container)
- Select Review + create and wait for the workspace and its associated resources to be created - this typically takes around 5 minutes.
- Select Go to resource and in its Overview page, select Launch studio. Another tab will open in your browser to open the Azure Machine Learning studio.
- Close any pop-ups that appear in the studio.
- Within the Azure Machine Learning studio, navigate to the Compute page and select + New under the Compute instances tab.
- Give the compute instance a unique name and then select Standard_DS11_v2 as the virtual machine size.
- Select Review + create and then select Create.
- Next, select the Compute clusters tab and select + New.
- Choose the same region as the one where you created your workspace and then select Standard_DS11_v2 as the virtual machine size. Select Next.
- Give the cluster a unique name and then select Create.
- Download the training data from https://github.com/MicrosoftLearning/mslearn-azure-ml/raw/refs/heads/main/Labs/09/data/diabetes.csv
- In the Azure Machine Learning studio, navigate to the Data page and select + Create.
- Name the data asset diabetes-data and verify that the type File (uri_file) is selected. Select Next.
- Select From local files as your data source and then select Next.
- Verify that Azure Blob Storage and workspaceblobstore are selected as your destination storage type and datastore respectively. Select Next.
- Upload the .csv file you downloaded previously and then select Next.
- Review the settings for your data asset and then select Create. </ol> </details>
Clone the lab materials
When you’ve created the workspace and necessary compute resources, you can open the Azure Machine Learning studio and clone the lab materials into the workspace.
- In the Azure portal, navigate to the Azure Machine Learning workspace named mlw-dp100-….
- Select the Azure Machine Learning workspace, and in its Overview page, select Launch studio. Another tab will open in your browser to open the Azure Machine Learning studio.
- Close any pop-ups that appear in the studio.
- Within the Azure Machine Learning studio, navigate to the Compute page and verify that the compute instance and cluster you created in the previous section exist. The compute instance should be running, the cluster should be idle and have 0 nodes running.
- In the Compute instances tab, find your compute instance, and select the Terminal application.
-
In the terminal, install the Python SDK on the compute instance by running the following commands in the terminal:
pip uninstall azure-ai-ml pip install azure-ai-ml
Ignore any (error) messages that say that the packages couldn’t be found and uninstalled.
-
Run the following command to clone a Git repository containing notebooks, data, and other files to your workspace:
git clone https://github.com/MicrosoftLearning/mslearn-azure-ml.git azure-ml-labs
- When the command has completed, in the Files pane, click ↻ to refresh the view and verify that a new Users/your-user-name/azure-ml-labs folder has been created.
Run scripts as a pipeline job
The code to build and submit a pipeline with the Python SDK is provided in a notebook.
-
Open the Labs/09/Run a pipeline job.ipynb notebook.
Select Authenticate and follow the necessary steps if a notification appears asking you to authenticate.
- Verify that the notebook uses the Python 3.8 - AzureML kernel.
- Run all cells in the notebook.
Delete Azure resources
When you finish exploring Azure Machine Learning, you should delete the resources you’ve created to avoid unnecessary Azure costs.
- Close the Azure Machine Learning studio tab and return to the Azure portal.
- In the Azure portal, on the Home page, select Resource groups.
- Select the rg-dp100-… resource group.
- At the top of the Overview page for your resource group, select Delete resource group.
- Enter the resource group name to confirm you want to delete it, and select Delete.