Generate batch predictions using a deployed model in Microsoft Fabric
In this lab, you’ll use a machine learning model to predict a quantitative measure of diabetes.
By completing this lab, you’ll gain hands-on experience in generating predictions and visualizing the results.
This lab will take approximately 20 minutes to complete.
Note: You need a Microsoft Fabric trial to complete this exercise.
Create a workspace
Before working with data in Fabric, create a workspace with the Fabric trial enabled.
- Navigate to the Microsoft Fabric home page at
https://app.fabric.microsoft.com/home?experience=fabric
in a browser. - In the Microsoft Fabric home page, select Synapse Data Science
- In the menu bar on the left, select Workspaces (the icon looks similar to 🗇).
- Create a new workspace with a name of your choice, selecting a licensing mode that includes Fabric capacity (Trial, Premium, or Fabric).
-
When your new workspace opens, it should be empty.
Create a notebook
You’ll use a notebook to train and use a model in this exercise.
-
In the Synapse Data Science home page, create a new Notebook.
After a few seconds, a new notebook containing a single cell will open. Notebooks are made up of one or more cells that can contain code or markdown (formatted text).
-
Select the first cell (which is currently a code cell), and then in the dynamic tool bar at its top-right, use the M↓ button to convert the cell to a markdown cell.
When the cell changes to a markdown cell, the text it contains is rendered.
-
If necessary, use the 🖉 (Edit) button to switch the cell to editing mode, then delete the content and enter the following text:
# Train and use a machine learning model
Train a machine learning model
First, let’s train a machine learning model that uses a regression algorithm to predict the response of interest for diabetes patients (a quantitative measure of disease progression one year after baseline)
-
In your notebook, use the + Code icon below the latest cell to add a new code cell to the notebook.
Tip: To see the + Code icon, move the mouse to just below and to the left of the output from the current cell. Alternatively, in the menu bar, on the Edit tab, select + Add code cell.
-
Enter the following code to load and prepare data and use it to train a model.
import pandas as pd import mlflow from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeRegressor from mlflow.models.signature import ModelSignature from mlflow.types.schema import Schema, ColSpec # Get the data blob_account_name = "azureopendatastorage" blob_container_name = "mlsamples" blob_relative_path = "diabetes" blob_sas_token = r"" wasbs_path = f"wasbs://%s@%s.blob.core.windows.net/%s" % (blob_container_name, blob_account_name, blob_relative_path) spark.conf.set("fs.azure.sas.%s.%s.blob.core.windows.net" % (blob_container_name, blob_account_name), blob_sas_token) df = spark.read.parquet(wasbs_path).toPandas() # Split the features and label for training X, y = df[['AGE','SEX','BMI','BP','S1','S2','S3','S4','S5','S6']].values, df['Y'].values X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0) # Train the model in an MLflow experiment experiment_name = "experiment-diabetes" mlflow.set_experiment(experiment_name) with mlflow.start_run(): mlflow.autolog(log_models=False) model = DecisionTreeRegressor(max_depth=5) model.fit(X_train, y_train) # Define the model signature input_schema = Schema([ ColSpec("integer", "AGE"), ColSpec("integer", "SEX"),\ ColSpec("double", "BMI"), ColSpec("double", "BP"), ColSpec("integer", "S1"), ColSpec("double", "S2"), ColSpec("double", "S3"), ColSpec("double", "S4"), ColSpec("double", "S5"), ColSpec("integer", "S6"), ]) output_schema = Schema([ColSpec("integer")]) signature = ModelSignature(inputs=input_schema, outputs=output_schema) # Log the model mlflow.sklearn.log_model(model, "model", signature=signature)
-
Use the ▷ Run cell button on the left of the cell to run it. Alternatively, you can press SHIFT + ENTER on your keyboard to run a cell.
Note: Since this is the first time you’ve run any Spark code in this session, the Spark pool must be started. This means that the first run in the session can take a minute or so to complete. Subsequent runs will be quicker.
-
Use the + Code icon below the cell output to add a new code cell to the notebook, and enter the following code to register the model that was trained by the experiment in the previous cell:
# Get the most recent experiement run exp = mlflow.get_experiment_by_name(experiment_name) last_run = mlflow.search_runs(exp.experiment_id, order_by=["start_time DESC"], max_results=1) last_run_id = last_run.iloc[0]["run_id"] # Register the model that was trained in that run print("Registering the model from run :", last_run_id) model_uri = "runs:/{}/model".format(last_run_id) mv = mlflow.register_model(model_uri, "diabetes-model") print("Name: {}".format(mv.name)) print("Version: {}".format(mv.version))
Your model is now saved in your workspace as diabetes-model. Optionally, you can use the browse feature in your workspace to find the model in the workspace and explore it using the UI.
Create a test dataset in a lakehouse
To use the model, you’re going to need a dataset of patient details for whom you need to predict a diabetes diagnosis. You’ll create this dataset as a table in a Microsoft Fabric Lakehouse.
- In the Notebook editor, in the Explorer pane on the left, select + Data sources to add a lakehouse.
- Select New lakehouse and select Add, and create a new Lakehouse with a valid name of your choice.
- When asked to stop the current session, select Stop now to restart the notebook.
-
When the lakehouse is created and attached to your notebook, add a new code cell run the following code to create a dataset and save it in a table in the lakehouse:
from pyspark.sql.types import IntegerType, DoubleType # Create a new dataframe with patient data data = [ (62, 2, 33.7, 101.0, 157, 93.2, 38.0, 4.0, 4.8598, 87), (50, 1, 22.7, 87.0, 183, 103.2, 70.0, 3.0, 3.8918, 69), (76, 2, 32.0, 93.0, 156, 93.6, 41.0, 4.0, 4.6728, 85), (25, 1, 26.6, 84.0, 198, 131.4, 40.0, 5.0, 4.8903, 89), (53, 1, 23.0, 101.0, 192, 125.4, 52.0, 4.0, 4.2905, 80), (24, 1, 23.7, 89.0, 139, 64.8, 61.0, 2.0, 4.1897, 68), (38, 2, 22.0, 90.0, 160, 99.6, 50.0, 3.0, 3.9512, 82), (69, 2, 27.5, 114.0, 255, 185.0, 56.0, 5.0, 4.2485, 92), (63, 2, 33.7, 83.0, 179, 119.4, 42.0, 4.0, 4.4773, 94), (30, 1, 30.0, 85.0, 180, 93.4, 43.0, 4.0, 5.3845, 88) ] columns = ['AGE','SEX','BMI','BP','S1','S2','S3','S4','S5','S6'] df = spark.createDataFrame(data, schema=columns) # Convert data types to match the model input schema df = df.withColumn("AGE", df["AGE"].cast(IntegerType())) df = df.withColumn("SEX", df["SEX"].cast(IntegerType())) df = df.withColumn("BMI", df["BMI"].cast(DoubleType())) df = df.withColumn("BP", df["BP"].cast(DoubleType())) df = df.withColumn("S1", df["S1"].cast(IntegerType())) df = df.withColumn("S2", df["S2"].cast(DoubleType())) df = df.withColumn("S3", df["S3"].cast(DoubleType())) df = df.withColumn("S4", df["S4"].cast(DoubleType())) df = df.withColumn("S5", df["S5"].cast(DoubleType())) df = df.withColumn("S6", df["S6"].cast(IntegerType())) # Save the data in a delta table table_name = "diabetes_test" df.write.format("delta").mode("overwrite").saveAsTable(table_name) print(f"Spark dataframe saved to delta table: {table_name}")
- When the code has completed, select the … next to the Tables in the Lakehouse explorer pane, and select Refresh. The diabetes_test table should appear.
- Expand the diabetes_test table in the left pane to view all fields it includes.
Apply the model to generate predictions
Now you can use the model you trained previously to generate diabetes progression predictions for the rows of patient data in your table.
-
Add a new code cell and run the following code:
import mlflow from synapse.ml.predict import MLFlowTransformer ## Read the patient features data df_test = spark.read.format("delta").load(f"Tables/{table_name}") # Use the model to generate diabetes predictions for each row model = MLFlowTransformer( inputCols=["AGE","SEX","BMI","BP","S1","S2","S3","S4","S5","S6"], outputCol="predictions", modelName="diabetes-model", modelVersion=1) df_test = model.transform(df) # Save the results (the original features PLUS the prediction) df_test.write.format('delta').mode("overwrite").option("mergeSchema", "true").saveAsTable(table_name)
- After the code has finished, select the … next to the diabetes_test table in the Lakehouse explorer pane, and select Refresh. A new field predictions has been added.
- Add a new code cell to the notebook and drag the diabetes_test table to it. The necessary code to view the table’s contents will appear. Run the cell to display the data.
Clean up resources
In this exercise, you have used a model to generate batch predictions.
If you’ve finished exploring the notebook, you can delete the workspace that you created for this exercise.
- In the bar on the left, select the icon for your workspace to view all of the items it contains.
- In the … menu on the toolbar, select Workspace settings.
- In the General section, select Remove this workspace .