Automate model training with GitHub Actions
As your machine learning solution matures, you move from running ad-hoc experiments in Azure Machine Learning to automating repeatable training workflows. GitHub Actions lets you run Azure Machine Learning jobs whenever you need them, using secure, traceable workflows that fit into your existing source control practices.
In this exercise, you automate model training with GitHub Actions in three phases:
- Configure secure access from GitHub to your Azure Machine Learning workspace by using a service principal and GitHub secrets.
- Run an Azure Machine Learning command job from a manually triggered GitHub Actions workflow.
- Use feature-based development and branch protection so that model training runs as part of a pull request workflow.
Along the way, you review how workspace networking and source control settings influence how you design secure automation for model training.
Before you start
You need:
- An Azure subscription in which you have administrative-level access.
- A GitHub account with permission to create repositories and configure GitHub Actions.
Provision an Azure Machine Learning workspace
First, you create the Azure Machine Learning workspace and compute resources you’ll use from your GitHub workflows.
- In a browser, open the Azure portal at
https://portal.azure.com/and sign in with your Microsoft account. - Select the [>_] (Cloud Shell) button at the top of the page to open Cloud Shell, and choose Bash if you’re prompted.
- Make sure the correct subscription is selected and that No storage account required is selected. Then select Apply.
-
In the Cloud Shell terminal, clone the original lab repo and run the setup script:
```azurecli rm -r mslearn-mlops -f git clone https://github.com/MicrosoftLearning/mslearn-mlops.git mslearn-mlops cd mslearn-mlops/infra ./setup.sh ``` > Ignore any messages that say that extensions couldn't be installed. - Wait for the script to finish. It creates a resource group, an Azure Machine Learning workspace, and compute resources.
- In the Azure portal, go to Resource groups and open the
rg-ai300-...resource group that was created. - Select the Azure Machine Learning workspace (for example,
mlw-ai300-...) and then select Launch studio to open Azure Machine Learning studio.
With a workspace in place, you can now create your own GitHub repository and configure secure access.
Create your GitHub repository from the template
Next, you create your own GitHub repository from the original lab repo so you can use GitHub Actions.
- In a browser, go to
https://github.com/MicrosoftLearning/mslearn-mlops. - In the upper-right corner, select Use this template and then choose Create a new repository.
- In the Owner field, select your GitHub account. In the Repository name field, enter a name such as
mslearn-mlops. - Select Create repository from template.
- In your new repository that was created from the template, go to the Actions tab and enable GitHub Actions if prompted.
- Note the clone URL for your new repository (for example,
https://github.com/<your-alias>/mslearn-mlops.git). You use this URL when you work with the repository locally or from a development environment.
With your template-based repository in place, you can now connect GitHub securely to Azure.
Configure GitHub integration with Azure Machine Learning
To let GitHub Actions authenticate to Azure Machine Learning, you use a service principal. The credentials for this service principal are stored as an encrypted secret in your GitHub repository.
- In the Azure portal, select the [>_] (Cloud Shell) button at the top of the page to open Cloud Shell.
- Select Bash if you are prompted to choose a shell type.
- Make sure the correct subscription is selected for your Azure Machine Learning workspace.
-
In Cloud Shell, create a service principal that has Contributor access to the resource group that contains your Azure Machine Learning workspace. Replace
<service-principal-name>,<subscription-id>, and<your-resource-group-name>with your own values before you run the command:```azurecli az ad sp create-for-rbac --name "<service-principal-name>" --role contributor \ --scopes /subscriptions/<subscription-id>/resourceGroups/<your-resource-group-name> \ --sdk-auth ``` - Copy the full JSON output of the command to a safe location. You use the values in the next steps and in later challenges.
- In the GitHub repository you created from the template, navigate to Settings > Secrets and variables > Actions.
- Select New repository secret.
- Enter
AZURE_CREDENTIALSas the Name of the secret. - Paste the JSON output from the
az ad sp create-for-rbaccommand into the Value field and select Add secret.
Your GitHub repository now has an encrypted secret that GitHub-hosted runners can use to sign in to Azure and submit jobs to your Azure Machine Learning workspace.
[!NOTE] The service principal is scoped to your Azure Machine Learning resource group. In a production scenario, you would further restrict its permissions and combine it with network controls, such as private endpoints and self-hosted runners, to limit how and where jobs can be submitted.
Review workspace network access options
Before you automate training from GitHub, review how your Azure Machine Learning workspace controls network access.
- In Azure Machine Learning studio, select Manage in the left navigation and then select Networking under your workspace.
- Review the Public access and Private endpoints settings. Note how you can: - Allow public network access from all networks. - Restrict access to specific IP ranges. - Disable public network access and rely on private endpoints.
- For this lab, keep the default public access settings so that GitHub-hosted runners can reach your workspace. In a production environment, you would typically: - Use private endpoints and virtual networks. - Run GitHub Actions on self-hosted runners that can reach those private networks. - Combine role-based access control (RBAC) with network rules to tightly control who can submit jobs.
Now that you understand the network options, you are ready to automate a training job from GitHub.
Automate model training with a manually triggered workflow
In this section, you connect your GitHub workflow to Azure Machine Learning and run a command job to train a model. The workflow uses the AZURE_CREDENTIALS secret you created earlier.
- Clone your
mslearn-mlopsrepository that you created from the template to a development environment where you can edit files and push changes back to GitHub. - In the cloned repository, locate the
.github/workflows/manual-trigger.ymlworkflow file. - Open
manual-trigger.ymland review the existing steps. The workflow should: - Check out the repository code. - Use theAZURE_CREDENTIALSsecret to sign in to Azure. - Install the Azure Machine Learning CLI extension. -
At the end of the workflow, add a new step that submits the Azure Machine Learning job defined in
src/job.yml. For example:```yml - name: Run Azure Machine Learning training job run: | az ml job create -f src/job.yml --stream ``` - Save your changes, commit them to your local repository, and push the changes to the main branch of your fork.
- In GitHub, go to the Actions tab for your repository.
- Select the workflow defined in
manual-trigger.ymland use Run workflow to start it manually. - Wait for the workflow run to complete. Verify that the Run Azure Machine Learning training job step completes successfully.
- In Azure Machine Learning studio, select Jobs and confirm that a new job based on
src/job.ymlhas run successfully. Review the job inputs, metrics, and logs.
You have now automated the training job by using a GitHub Actions workflow that you can run on demand.
Use feature-based development to trigger workflows
Running workflows manually is useful for initial testing, but in a team environment you usually want training workflows to run automatically when someone proposes a change. Next, you update the existing workflow so it runs for pull requests, and then you use feature branches and branch protection rules to control when the workflow runs.
- In your GitHub repository, open the
.github/workflows/manual-trigger.ymlworkflow file. -
Update the
onsection so that the workflow can run both manually and when a pull request targets the main branch. For example:```yml on: workflow_dispatch: pull_request: branches: - main ``` - Commit the updated workflow file and push it to the main branch of your repository.
- In GitHub, go to Settings > Branches and select Add branch protection rule.
- Configure a rule for the main branch that prevents direct pushes. At a minimum, select:
- Branch name pattern:
main. - Protect matching branches. - Save the branch protection rule.
-
In your local clone of the repository, create a new branch for a feature change. For example:
```bash git checkout -b feature/update-parameters ``` - Make a small, safe change to the training configuration. For example, adjust a hyperparameter value in
src/train-model-parameters.pyor insrc/job.yml. -
Commit your change to the feature branch and push the branch to GitHub:
```bash git add . git commit -m "Adjust training parameters" git push --set-upstream origin feature/update-parameters ``` - In GitHub, create a pull request from your feature branch into main.
- On the pull request page, observe that the workflow defined in
manual-trigger.ymlruns automatically because of thepull_requesttrigger you added. - After the workflow completes successfully, review the results and then complete the pull request to merge your changes into main.
By using feature branches, branch protection rules, and pull request–triggered workflows with the same training workflow definition, you ensure that model training automation is tied to controlled changes in source control.
Clean up Azure resources
When you finish exploring Azure Machine Learning and GitHub Actions, you should delete the resources you created to avoid unnecessary Azure costs.
- Close the Azure Machine Learning studio tab and return to the Azure portal.
- In the Azure portal, on the Home page, select Resource groups.
- Select the rg-ai300-… resource group that contains your Azure Machine Learning workspace.
- At the top of the Overview page for your resource group, select Delete resource group.
- Enter the resource group name to confirm you want to delete it, and select Delete.
- In GitHub, you can also delete the repository you created from the
mslearn-mlopstemplate if you no longer need the workflows or sample code.