An introduction to analyzing data using Spark in Azure Databricks

View the Project on GitHub MicrosoftLearning/databricks-intro

Introduction to Databricks

Use the labs in this repo to get started with Spark in Azure Databricks.

Start by following the Setup Guide to prepare your Azure environment and download the labfiles used in the lab exercises. Then complete the labs in the following order:

  1. Lab 1 - Getting Started with Spark. In this lab you’ll learn how to provision a Spark cluster in an Azure Databricks workspace, and use it to analyze data interactively using Python or Scala.
  2. Lab 2 - Running a Spark Job. In this lab, you’ll learn how to configure a Spark job for unattended execution so that you can schedule batch processing workloads.
  3. Lab 3 - Using Structured Streaming. In this lab you’ll learn how to use Spark to process an unbounded stream of realtime data; a common requirement in Internet-of-Things (IoT) scenarios.
  4. Lab 4 - Introduction to Machine Learning. In this lab you’ll get started with machine learning by using Spark to train and evaluate a classification model.

If you want to learn more about Machine Learning in Databricks, take a look at Machine Learning with Databricks