Lab 11: Implement Lakeflow Jobs with Azure Databricks

Introduction

You are a data engineer at TelConnect, a regional telecommunications provider that manages millions of calls, SMS messages, and data sessions every day. Each event produces a Call Detail Record (CDR) — the raw signal of network activity. This data must flow reliably through a multi-layer pipeline before operations teams can use it for network performance reporting and capacity planning.

Today’s task is to automate that pipeline using Lakeflow Jobs. You will:

Run a pre-built parameterized medallion architecture notebook (Bronze → Silver → Gold) that processes CDR data
Create a Lakeflow Job in the Azure Databricks UI that runs the notebook as an orchestrated task
Pass a processing_date job parameter to the notebook using a dynamic trigger value
Add a second downstream task with a task dependency
Configure an event-based file arrival trigger that fires when new CDR files land in the volume
Add a nightly cron schedule so the job also runs automatically each night
Set up failure notifications so the network operations team is alerted immediately
Configure automatic retry policies to handle transient cloud-storage errors

🤖 Use Genie Code throughout this lab

You are expected and encouraged to use the Genie Code for every coding exercise. Use it for suggestions, error explanations, boilerplate generation, and API exploration.

To open Genie Code, select the assistant-icon on the right side of any notebook cell, or use the keyboard shortcut.

Prerequisites

An Azure Databricks Premium workspace provisioned using Lab 00: Set up your Azure Databricks environment.
You have permission to create catalogs and schemas in Unity Catalog.
Basic familiarity with PySpark and Spark SQL.

Part 1: Run the pipeline notebook

The notebook contains five exercises that build the Bronze → Silver → Gold pipeline. All cells are provided — run them in order to set up the data environment you will orchestrate in Part 2.

Import the notebook

In your Databricks workspace, click Workspace in the left sidebar.
Navigate to or create a folder where you want to store the lab.
Click the ⋮ (kebab menu) or right-click the folder, then select Import.
Choose URL, enter the following URL, and click Import: https://raw.githubusercontent.com/MicrosoftLearning/DP-750T00-Implement-Data-Engineering-Solutions-using-Azure-Databricks/refs/heads/main/Allfiles/11-implement-lakeflow-jobs.ipynb
Open the imported notebook and, in the compute selector at the top, choose Serverless compute.

Run the notebook

Run Exercise 1 cells — this creates the telconnect_lab Unity Catalog with bronze, silver, and gold schemas.
In Exercise 2, run the first cell to create the raw_uploads volume and automatically download telecom_cdrs.csv from GitHub into the volume.
Continue running all remaining cells through Exercise 5.

Tip: Each cell is marked ✅. Read the markdown above each code cell to understand what the step does before running it — this context will help you when configuring the Lakeflow Job in Part 2.

Part 2: Orchestrate with Lakeflow Jobs (UI exercises)

These exercises are the core of this lab. Using the Azure Databricks Jobs & Pipelines UI, you wire the notebook you just ran into a production-grade automated job with triggers, schedules, notifications, and retry policies.

🤖 Genie Code reminder: Open Genie Code in the notebook or workspace sidebar. For UI-only steps, ask it questions like “How do I add a file arrival trigger to a Lakeflow Job?” to get contextual guidance fast.

Exercise 1: Create a Lakeflow Job with the CDR pipeline notebook

Create the job

In the left sidebar, click Jobs & Pipelines.
Click Create job.
Replace the default job name with TelConnect CDR Pipeline.

Configure the first task

In the task editor, enter the task name process_cdrs.
In the Type dropdown, select Notebook.
In the Source dropdown, select Workspace, then browse to the path of the notebook you imported (11-implement-lakeflow-jobs).
In the Compute selector, choose Serverless.
Click Save task.

Add a job parameter

In the Job details panel on the right, scroll to the Parameters section and click Add.
Set the key to processing_date and the value to ``.

This dynamic value inserts the date on which the job was triggered, matching the widget default in the notebook.

Click Save on the parameter.

Verify

In the task graph, confirm that process_cdrs appears as the single node.
Click Run now to test the job manually.
In the Runs tab, wait for the run to complete and confirm all notebook cells executed successfully.

Exercise 2: Add a downstream notification task

In a real pipeline, you might want a second task to send a summary report after the CDR processing task completes. Here you simulate that with a short SQL task that reads from the Gold table.

The SQL query task type doesn’t accept inline SQL — it can only reference a query that has already been saved in the SQL editor. So you first create and save the query, then add it to the job.

Create and save the SQL query

In the left sidebar, click SQL Editor.
Create a new SQL Query and choose an available Serverless SQL warehouse.

Paste the following query into the editor:

SELECT region, network_type, total_calls, drop_rate_pct
FROM telconnect_lab.gold.network_summary
ORDER BY drop_rate_pct DESC;

Click Run to confirm the query works.
Click the query name at the top (for example, New Query 2026-…) and rename it to summarize_gold_query.
Click Save.

Add the SQL query task to the job

Return to the TelConnect CDR Pipeline job editor and click + Add task below process_cdrs.
Name the task summarize_gold.
In the Type dropdown, select SQL query.
In the SQL query dropdown, search for and select summarize_gold_query.
In the SQL warehouse selector, choose an available Serverless SQL warehouse.
Confirm that Depends on is already set to process_cdrs and the Run if dependencies condition is All succeeded.
Click Save task.

Verify the DAG

Review the DAG view. You should see:
```
process_cdrs → summarize_gold
```

Exercise 3: Configure a file arrival trigger

TelConnect drops fresh CDR files into the volume at irregular intervals throughout the day. Instead of polling on a fixed schedule, you configure the job to fire automatically each time a new file arrives.

In the Job details panel, under Schedules & Triggers, click Add trigger.
In the Trigger type dropdown, select File arrival.

In the Storage location field, enter:

/Volumes/telconnect_lab/bronze/raw_uploads/

Under Advanced options, set Wait after last change to 60 seconds.

This batches rapid file arrivals into a single job run — useful when multiple CDR batches arrive in quick succession.
Click Save.

Pause the trigger

Since you will also add a schedule in the next exercise, pause the file arrival trigger to avoid duplicate runs during the lab:

Under Schedules & Triggers, locate the file arrival trigger and click Pause.

Note: In production, you would keep both active — the file arrival trigger handles intraday updates while the schedule provides a reliable nightly catch-all run.

Exercise 4: Add a nightly schedule

TelConnect’s operations team expects the Gold summary table to be refreshed every night at 02:00 CET so dashboards are current before the morning shift begins.

In the Job details panel, click Add trigger again.
Set Trigger type to Scheduled.
Set Schedule type to Advanced.
In the Cron expression field, enter:
```
0 0 2 * * ?
```
This expression means: at second 0, minute 0, hour 2, every day of the month, every month, any day of the week.
Set the Time zone to Europe/Amsterdam.
Click Save.

Verify the cron expression

After saving, review the human-readable schedule preview. It should read: “Every day at 2:00 AM Amsterdam time.”

Exercise 5: Configure failure notifications

The network operations team must be alerted immediately if the CDR pipeline fails — a missed run could delay fault detection and network SLA reporting.

Add job-level failure notification

In the Job details panel, scroll to Job notifications and click Edit notifications.
Click Add notification.
Choose Email address and enter your email.
Select the Failure event type.
Optionally also select Duration warning and set a threshold of 30 minutes.
Click Save.

Add task-level retry notification

Select the process_cdrs task in the task graph to open its configuration.
In the task panel, scroll to Notifications and click Add.
Choose Email address (same address), select Failure, and check Mute notifications until the last retry.

This prevents notification spam when the task retries. You only receive an alert if all retry attempts fail.
Click Save task.

Exercise 6: Configure automatic retries

CDR files are read from cloud storage, and transient network errors or storage throttling are common causes of short-lived task failures. Configuring retries means the pipeline recovers without manual intervention.

Configure task-level retries on process_cdrs

Select the process_cdrs task in the task graph.
In the task configuration panel, click + Add next to Retries.
Set Retry count to 2.
Set Retry interval to 60 seconds.
Click Save task.

Configure a timeout threshold

In the same task configuration, click Metric thresholds.
Set the Metric to Run duration.
Set Warning to 20 minutes.
Set Timeout to 30 minutes.
Click Save task.

If the notebook hangs — for example, a query stalls on a large CDR backlog — the timeout ensures the task fails cleanly so the retry mechanism can kick in rather than the task running indefinitely.

Exercise 7: Run and monitor the complete job

In the TelConnect CDR Pipeline job editor, click Run now.
Navigate to the Runs tab and click the active run to open the run details.
Observe both tasks (process_cdrs and summarize_gold) executing in sequence.
Click process_cdrs in the run timeline to view:
- The notebook output (each cell result)
- The job parameter value passed in (processing_date)
- The task values set by dbutils.jobs.taskValues.set()
After both tasks complete, verify that the run status is Succeeded.

Summary

In this lab you:

Ran a medallion architecture pipeline (Bronze → Silver → Gold) for TelConnect CDR data in Unity Catalog
Created a Lakeflow Job with two tasks and a sequential task dependency
Passed processing_date as a dynamic job parameter tied to the trigger time
Configured a file arrival trigger that fires when new CDR files land in the volume
Added a nightly cron schedule at 02:00 CET to ensure daily refresh
Set up failure notifications with muted retry alerts to reduce alert noise
Configured task-level retries (2 attempts, 60-second interval) and a 30-minute timeout threshold