Deploy and monitor a model in Azure Machine Learning
Training a model is only the beginning of the machine learning lifecycle. To generate value, you need a repeatable way to move models from development into production, monitor how they behave, and respond quickly when data or performance changes.
In this exercise, you:
- Use GitHub Actions environments to run training in a development context.
- Register models in a shared Azure Machine Learning registry.
- Deploy the registry model to a production online endpoint.
- Monitor drift, retrain in dev, and promote a new model version to production.
- Optionally roll back to a previous deployment and archive a bad model version.
Before you start
You need:
- An Azure subscription in which you have administrative-level access.
- A GitHub account with permission to create repositories and configure GitHub Actions.
Provision the workspace and data assets
First, you create or reuse an Azure Machine Learning workspace and the data assets you need for development and production. You build on the same pattern that you used in the previous lab: provision the workspace and data from the original lab repo, and then connect a separate GitHub repository for automation.
- In a browser, open the Azure portal at
https://portal.azure.com/and sign in with your Microsoft account. - Select the [>_] (Cloud Shell) button at the top of the page to open Cloud Shell, and choose Bash if prompted.
- Make sure the correct subscription is selected and that No storage account required is selected. Then select Apply.
-
In the terminal, clone this repo and navigate to the infrastructure folder:
rm -r mslearn-mlops -f git clone https://github.com/MicrosoftLearning/mslearn-mlops.git mslearn-mlops cd mslearn-mlops/infra - Before you run the setup script, update it so that it also creates separate dev and prod folder data assets:
- In Cloud Shell, open the
setup.shfile in the editor (for example, runcode setup.sh). -
In the file, locate the Create data assets section:
# Create data assets echo "Create training data asset:" az ml data create --type mltable --name "diabetes-training" --path ../data/diabetes-data az ml data create --type uri_file --name "diabetes-data" --path ../data/diabetes-data/diabetes.csv -
Directly after these commands, add the following lines to create folder data assets that point to the dev and prod data folders in this repo:
az ml data create --type uri_folder --name "diabetes-dev-folder" --path ../experimentation/data az ml data create --type uri_folder --name "diabetes-prod-folder" --path ../production/data - Save the file and return to the terminal pane.
- In Cloud Shell, open the
-
Run the setup script:
./setup.shIgnore any messages that say that extensions couldn’t be installed.
- Wait for the script to finish. It creates a resource group, an Azure Machine Learning workspace, compute resources, and the data assets you need for this lab.
- In the Azure portal, go to Resource groups and open the
rg-ai300-...resource group that was created. - Select the Azure Machine Learning workspace (for example,
mlw-ai300-...) and then select Launch studio to open Azure Machine Learning studio. - In the studio, select Data and verify that you have the following data assets:
- An MLTable or file-based asset named
diabetes-trainingfor the core training data. - A File (uri_folder) data asset named
diabetes-dev-folderthat points to theexperimentation/datafolder in your workspace files. - A File (uri_folder) data asset named
diabetes-prod-folderthat points to theproduction/datafolder.
If any of these assets are missing, create them manually with the types and paths shown.
- An MLTable or file-based asset named
For this lab, you can use a single workspace and separate data assets to represent development and production data.
Create your GitHub repository from the template
Next, you create your own GitHub repository from the original lab repo so you can use GitHub Actions. This follows the same template-based approach you used in the previous lab.
- In a browser, go to
https://github.com/MicrosoftLearning/mslearn-mlops. - In the upper-right corner, select Use this template and then choose Create a new repository.
- In the Owner field, select your GitHub account. In the Repository name field, enter a name such as
mslearn-mlops. - Select Create repository from template.
- In your new repository that was created from the template, go to the Actions tab and enable GitHub Actions if prompted.
- Note the clone URL for your new repository (for example,
https://github.com/<your-alias>/mslearn-mlops.git). You use this URL when you work with the repository locally or from a development environment.
With your template-based repository in place and your workspace already provisioned, you can now connect GitHub securely to Azure.
Prepare your GitHub repo, environments, and secrets
Next, you set up GitHub so that you can run training in a dev environment and later deploy from a prod environment.
- In a browser, go to the repository you created from the
MicrosoftLearning/mslearn-mlopstemplate. - In your repo, go to the Actions tab and enable GitHub Actions if prompted.
- In the repo, go to Settings > Environments.
- Create two environments:
devprod
-
In the Azure portal Cloud Shell, create a service principal that has Contributor access to your resource group. Replace
<service-principal-name>,<subscription-id>, and<your-resource-group-name>before running the command:az ad sp create-for-rbac --name "<service-principal-name>" --role contributor \ --scopes /subscriptions/<subscription-id>/resourceGroups/<your-resource-group-name> \ --sdk-auth - Copy the full JSON output and save it temporarily. You’ll add it as a secret in both environments.
- In your GitHub repository, go to Settings > Environments and select the
devenvironment. - Add a new environment secret named
AZURE_CREDENTIALSand paste the JSON output from the service principal as the value. Save the secret. - Repeat the previous two steps for the
prodenvironment so that both environments have anAZURE_CREDENTIALSsecret. - (Optional) In the
prodenvironment, add an environment protection rule that requires manual approval before a job can run in this environment.
GitHub can now authenticate to Azure for both dev and prod via environment-specific secrets and protection rules.
Train and validate a model in dev from a pull request
Now you use a GitHub Actions workflow in your template-based repository that trains your model against dev data whenever someone proposes a change to the training configuration.
- Clone your
mslearn-mlopsrepository that you created from the template to a development machine where you can edit files and push changes. - In your local clone, open
src/train-model-parameters.pyand review how it:- Reads training data from a file or folder path.
- Trains a logistic regression model and logs metrics such as Accuracy and AUC.
- Open
src/job.ymland review how the Azure Machine Learning command job:- Runs
train-model-parameters.pyon theaml-clustercompute. - Uses a
training_datainput that points to thediabetes-dev-folderdata asset by default. - Accepts a configurable
reg_ratehyperparameter.
- Runs
- Review the workflow at
.github/workflows/train-dev.ymlto see how it:- Signs in to Azure by using the
AZURE_CREDENTIALSsecret in thedevenvironment. - Detects the resource group and workspace that the
infra/setup.shscript created. - Submits the Azure Machine Learning job defined in
src/job.yml, overriding thetraining_datainput to use thediabetes-dev-folderdata asset. - Streams the job logs, parses the Accuracy and AUC values from the output, and posts them as a comment on the pull request.
- Signs in to Azure by using the
- In your local clone, create a new feature branch and make a small, safe hyperparameter change. For example, adjust the default value of
--reg_rateinsrc/train-model-parameters.py. - Commit your change and push the new branch to GitHub.
- In GitHub, create a pull request from your feature branch into
main. - On the pull request page, observe that the Train model in dev (PR) workflow runs automatically because of the
pull_requesttrigger, and wait for it to complete. - When the workflow run has finished, review the comments on the pull request. You should see a comment from the workflow that includes the dev Accuracy and AUC values from the training job.
The dev workflow now validates training changes against the dev data asset and surfaces key evaluation metrics directly in the pull request so that reviewers can make an informed decision.
Retrain the model on prod data from a pull request comment
After you’re satisfied that the dev training results look reasonable, you can request a prod training run that uses the prod data asset and the prod environment in GitHub Actions.
- In your GitHub repository, open Settings > Environments and verify that you have a
prodenvironment with anAZURE_CREDENTIALSsecret configured (just as you did earlier fordev). - On the same pull request where you validated the dev run, add a new comment that contains the command
/train-prodon its own line. - In the Actions tab, observe that the Train model in prod (PR comment) workflow (defined in
.github/workflows/train-prod.yml) starts in response to your comment. - Wait for the workflow to complete. The workflow:
- Signs in to Azure by using the
AZURE_CREDENTIALSsecret in theprodenvironment. - Detects the same Azure Machine Learning workspace you used for dev.
- Submits the
src/job.ymlcommand job again, this time overriding thetraining_datainput to use thediabetes-prod-folderdata asset. - Streams the logs, parses Accuracy and AUC from the output, and posts them back to the pull request as a separate comment.
- Signs in to Azure by using the
- Review the new comment on the pull request that includes the prod evaluation metrics. Compare these values to the dev metrics to understand how your model behaves on production-like data.
By using a comment command to trigger prod training, you keep control over when prod workloads run while still capturing the results as part of the pull request discussion.
Deploy the model to a real-time endpoint from a pull request comment
With dev and prod training complete and reviewed, you’re ready to deploy the model to a managed online endpoint by using a Python script and another comment-triggered workflow.
- In your local clone, open
src/deploy_to_online_endpoint.pyand review how it:- Connects to your Azure Machine Learning workspace by using
DefaultAzureCredentialandMLClient. - Ensures that an online endpoint (for example,
diabetes-endpoint) exists or creates it if needed. - Creates or updates a deployment (for example,
blue) that uses the MLflow model in the localmodelfolder. - Directs 100% of endpoint traffic to the specified deployment and prints the scoring URI.
- Connects to your Azure Machine Learning workspace by using
- Review the workflow at
.github/workflows/deploy-prod.ymlto see how it:- Listens for a
/deploy-prodcomment on a pull request. - Uses the
prodenvironment andAZURE_CREDENTIALSsecret to authenticate to Azure. - Detects the subscription, resource group, and workspace created by
infra/setup.sh. - Runs
deploy_to_online_endpoint.pywith the detected values to deploy the MLflow model to a managed online endpoint.
- Listens for a
- On your pull request, add a comment that contains
/deploy-prodon its own line. - In the Actions tab, watch the Deploy model to online endpoint (PR comment) workflow run and wait for it to complete successfully.
- When the workflow has finished, review the new comment on the pull request that confirms the deployment. Then, in Azure Machine Learning studio, go to Endpoints > Real-time endpoints, select the
diabetes-endpoint, and use the Test tab to send a sample request.
Your production endpoint now serves a model that was trained and reviewed through a PR-based workflow, with dev and prod metrics visible in the pull request and a scripted deployment you can repeat and extend.
Enable data collection and configure model monitoring
To monitor for drift and quality issues, Azure Machine Learning needs access to production inference data from your endpoint.
- In Azure Machine Learning studio, on your real-time endpoint, open the Settings or Data collection section.
- Enable Model data collection for the endpoint so that inputs and outputs are stored in a workspace datastore.
- Save the changes.
- In the left navigation, go to Monitoring (or Model monitoring, depending on your workspace view).
- Create a new monitor and associate it with your online endpoint.
- Configure the monitor with settings such as:
- Monitoring signals: enable Data drift.
- Reference data: use the training data asset (for example,
diabetes-dev-folder). - Production data: use the data collected from the online endpoint.
- Frequency: set a reasonable schedule (for example, daily).
- Save and enable the monitor.
Once the first monitoring run completes, you can review metrics like drift scores and see whether the production data distribution is diverging from the training data.
Simulate drift and retrain through the PR workflow
In a real system, drift or performance degradation would trigger retraining. In this lab, you simulate this by changing a training parameter and then repeating the same PR-based dev → prod → deploy flow you used earlier.
- Wait until your monitor has run at least once and review the drift metrics in Azure Machine Learning studio.
-
In your local clone of the repo, create a new feature branch to represent your retraining work. For example:
```bash git checkout -b feature/drift-retrain ``` - Open
src/train-model-parameters.pyand change the default regularization rate or another safe parameter (for example, adjust the default value of--reg_rate) to represent how you want to respond to drift. -
Commit the change and push the branch to GitHub:
```bash git add src/train-model-parameters.py git commit -m "Retrain in response to drift" git push --set-upstream origin feature/drift-retrain ``` - In GitHub, create a new pull request from your
feature/drift-retrainbranch intomain. - Observe that the Train model in dev (PR) workflow runs automatically for the new pull request. When it completes, review the comment that shows the updated dev Accuracy and AUC.
- If the dev metrics look acceptable, add a comment
/train-prodon the pull request to trigger the Train model in prod (PR comment) workflow. When it completes, review the comment that shows the updated prod Accuracy and AUC. - If the prod metrics also meet your expectations, add a comment
/deploy-prodon the pull request to trigger the Deploy model to online endpoint (PR comment) workflow. Wait for it to complete. - Finally, in Azure Machine Learning studio, go to Endpoints > Real-time endpoints, select the
diabetes-endpoint, and use the Test tab to confirm that the endpoint still returns predictions after your retraining and deployment.
By repeating the same PR-based dev → prod → deploy workflow in response to simulated drift, you see how monitoring, retraining, and controlled promotion can work together in an end-to-end MLOps process.
(Optional) Roll back to a previous model version
If the newly deployed model introduces a regression or causes unexpected behavior in production, you can roll back to a previous version directly in Azure Machine Learning studio without rerunning any workflows.
- In Azure Machine Learning studio, go to Endpoints > Real-time endpoints and select
diabetes-endpoint. - On the Deployments tab, review the list of deployments. You should see the current
bluedeployment associated with the model version you most recently deployed. - To create a rollback deployment from a previous model version:
- In the left navigation, go to Models and open the model (for example,
trainor the name used in yourmodel/MLmodelfile). - In the model’s Versions list, identify the version you want to roll back to (for example, the version before the most recent deployment).
- Select that version, then select Deploy > Real-time endpoint.
- In the deployment wizard, select Existing endpoint and choose
diabetes-endpoint. - Give the new deployment a distinct name, such as
rollback. - Accept the remaining defaults and select Deploy.
- Wait for the deployment to reach a Succeeded state.
- In the left navigation, go to Models and open the model (for example,
- Once the
rollbackdeployment is ready, return to Endpoints > Real-time endpoints >diabetes-endpoint. - On the Deployments tab, select Update traffic.
- Set the traffic allocation so that
rollbackreceives 100% of traffic andbluereceives 0%. Select Update to apply the change. - On the Test tab, send a sample request to confirm that the endpoint now returns predictions from the rollback model version.
- (Optional) To flag the problematic model version so it is excluded from future deployments, go back to Models, open the affected version, and set its Stage to Archived.
Tip: Because traffic routing is updated immediately, rolling back in the UI takes effect without any downtime for the endpoint. When you’re confident the rollback is stable, you can delete the original
bluedeployment from the Deployments tab to reduce hosting costs.
Clean up Azure resources
When you finish exploring Azure Machine Learning, you should delete the resources you’ve created to avoid unnecessary Azure costs.
- Close the Azure Machine Learning studio tab and return to the Azure portal.
- In the Azure portal, on the Home page, select Resource groups.
- Select the rg-ai300-… resource group that contains your Azure Machine Learning workspace and any associated resources.
- At the top of the Overview page for your resource group, select Delete resource group.
- Enter the resource group name to confirm you want to delete it, and select Delete.