Query a data warehouse in Microsoft Fabric
In Microsoft Fabric, a data warehouse provides a relational database for large-scale analytics. The rich set of experiences built into Microsoft Fabric workspace enables customers to reduce their time to insights by having an easily consumable, always connected semantic model that is integrated with Power BI in DirectLake mode.
This lab will take approximately 30 minutes to complete.
Note: You need a Microsoft Fabric trial to complete this exercise.
Create a workspace
Before working with data in Fabric, create a workspace with the Fabric trial enabled.
- On the Microsoft Fabric home page at
https://app.fabric.microsoft.com/home?experience=fabric
, select Data Warehouse. - In the menu bar on the left, select Workspaces (the icon looks similar to đź—‡).
- Create a new workspace with a name of your choice, selecting a licensing mode that includes Fabric capacity (Trial, Premium, or Fabric).
-
When your new workspace opens, it should be empty.
Create a sample data warehouse
Now that you have a workspace, it’s time to create a data warehouse.
- At the bottom left, ensure that the Data Warehouse experience is selected.
-
On the Home page, select Sample warehouse and create a new data warehouse named sample-dw.
After a minute or so, a new warehouse will be created and populated with sample data for a taxi ride analysis scenario.
Query the data warehouse
The SQL query editor provides support for IntelliSense, code completion, syntax highlighting, client-side parsing, and validation. You can run Data Definition Language (DDL), Data Manipulation Language (DML) and Data Control Language (DCL) statements.
-
In the sample-dw data warehouse page, in the New SQL query drop-down list, select New SQL query.
-
In the new blank query pane, enter the following Transact-SQL code:
SELECT D.MonthName, COUNT(*) AS TotalTrips, SUM(T.TotalAmount) AS TotalRevenue FROM dbo.Trip AS T JOIN dbo.[Date] AS D ON T.[DateID]=D.[DateID] GROUP BY D.MonthName;
-
Use the â–· Run button to run the SQL script and view the results, which show the total number of trips and total revenue by month.
-
Enter the following Transact-SQL code:
SELECT D.DayName, AVG(T.TripDurationSeconds) AS AvgDuration, AVG(T.TripDistanceMiles) AS AvgDistance FROM dbo.Trip AS T JOIN dbo.[Date] AS D ON T.[DateID]=D.[DateID] GROUP BY D.DayName;
-
Run the modified query and view the results, which show the average trip duration and distance by day of the week.
-
Enter the following Transact-SQL code:
SELECT TOP 10 G.City, COUNT(*) AS TotalTrips FROM dbo.Trip AS T JOIN dbo.Geography AS G ON T.PickupGeographyID=G.GeographyID GROUP BY G.City ORDER BY TotalTrips DESC; SELECT TOP 10 G.City, COUNT(*) AS TotalTrips FROM dbo.Trip AS T JOIN dbo.Geography AS G ON T.DropoffGeographyID=G.GeographyID GROUP BY G.City ORDER BY TotalTrips DESC;
-
Run the modified query and view the results, which show the top 10 most popular pickup and dropoff locations.
-
Close all query tabs.
Verify data consistency
Verifying data consistency is important to ensure that the data is accurate and reliable for analysis and decision-making. Inconsistent data can lead to incorrect analysis and misleading results.
Let’s query your data warehouse to check for consistency.
-
In the New SQL query drop-down list, select New SQL query.
-
In the new blank query pane, enter the following Transact-SQL code:
-- Check for trips with unusually long duration SELECT COUNT(*) FROM dbo.Trip WHERE TripDurationSeconds > 86400; -- 24 hours
-
Run the modified query and view the results, which show details of all trips with unusually long duration.
-
In the New SQL query drop-down list, select New SQL query to add a second query tab. Then in the new empty query tab, run the following code:
-- Check for trips with negative trip duration SELECT COUNT(*) FROM dbo.Trip WHERE TripDurationSeconds < 0;
-
In the new blank query pane, enter and run the following Transact-SQL code:
-- Remove trips with negative trip duration DELETE FROM dbo.Trip WHERE TripDurationSeconds < 0;
Note: There are several ways to handle inconsistent data. Rather than removing it, one alternative is to replace it with a different value such as the mean or median.
-
Close all query tabs.
Save as view
Suppose that you need to filter certain trips for a group of users who will use the data to generate reports.
Let’s create a view based on the query we used earlier, and add a filter to it.
-
In the New SQL query drop-down list, select New SQL query.
-
In the new blank query pane, re-enter and run the following Transact-SQL code:
SELECT D.DayName, AVG(T.TripDurationSeconds) AS AvgDuration, AVG(T.TripDistanceMiles) AS AvgDistance FROM dbo.Trip AS T JOIN dbo.[Date] AS D ON T.[DateID]=D.[DateID] GROUP BY D.DayName;
-
Modify the query to add
WHERE D.Month = 1
. This will filter the data to include only records from the month of January. The final query should look like this:SELECT D.DayName, AVG(T.TripDurationSeconds) AS AvgDuration, AVG(T.TripDistanceMiles) AS AvgDistance FROM dbo.Trip AS T JOIN dbo.[Date] AS D ON T.[DateID]=D.[DateID] WHERE D.Month = 1 GROUP BY D.DayName
-
Select the text of the SELECT statement in your query. Then next to the â–· Run button, select Save as view.
-
Create a new view named vw_JanTrip.
-
In the Explorer, navigate to Schemas » dbo » Views. Note the vw_JanTrip view you just created.
-
Close all query tabs.
Further Information: See Query using the SQL query editor in the Microsoft Fabric documentation for more information about querying a data warehouse.
Clean up resources
In this exercise, you have used queries to get insights of the data in a Microsoft Fabric data warehouse.
If you’ve finished exploring your data warehouse, you can delete the workspace you created for this exercise.
- In the bar on the left, select the icon for your workspace to view all of the items it contains.
- Select Workspace settings and in the General section, scroll down and select Remove this workspace.
- Select Delete to delete the workspace.