Work with data in a Microsoft Fabric eventhouse
In Microsoft Fabric, an eventhouse is used to store real-time data related to events; often captured from a streaming data source by an eventstream.
Within an eventhouse, the data is stored in one or more KQL databases, each of which contains tables and other objects that you can query by using Kusto Query Language (KQL) or a subset of Structured Query Language (SQL).
In this exercise, you’ll create and populate an eventhouse with some sample data related to taxi rides, and then query the data using KQL and SQL.
This exercise takes approximately 25 minutes to complete.
Create a workspace
Before working with data in Fabric, create a workspace with the Fabric capacity enabled.
- Navigate to the Microsoft Fabric home page at
https://app.fabric.microsoft.com/home?experience=fabric
in a browser and sign in with your Fabric credentials. - In the menu bar on the left, select Workspaces (the icon looks similar to đź—‡).
- Create a new workspace with a name of your choice, selecting a licensing mode that includes Fabric capacity (Trial, Premium, or Fabric).
-
When your new workspace opens, it should be empty.
Create an Eventhouse
Now that you have a workspace with support for a Fabric capacity, you can create an eventhouse in it.
- In the menu bar on the left, select Workloads. Then, select the Real-Time Intelligence tile.
-
On the Real-Time Intelligence home page, select the Explore Real-Time Intelligence Sample tile. It will automatically create an eventhouse called RTISample:
- In the pane on the left, note that your eventhouse contains a KQL database with the same name as the eventhouse.
- Verify that a Bikestream table has also been created.
Query data by using KQL
Kusto Query Language (KQL) is an intuitive, comprehensive language that you can use to query a KQL database.
Retrieve data from a table with KQL
- In the left pane of the eventhouse window, under your KQL database, select the default queryset file. This file contains some sample KQL queries to get you started.
-
Modify the first example query as follows.
Bikestream | take 100
NOTE: The Pipe ( | ) character is used for two purposes in KQL including to separate query operators in a tabular expression statement. It is also used as a logical OR operator within square or round brackets to denote that you may specify one of the items separated by the pipe character.
-
Select the query code and run it to return 100 rows from the table.
You can be more precise by adding specific attributes you want to query using the
project
keyword and then using thetake
keyword to tell the engine how many records to return. -
Type, select, and run the following query:
// Use 'project' and 'take' to view a sample number of records in the table and check the data. Bikestream | project Street, No_Bikes | take 10
NOTE: The use of // denotes a comment.
Another common practice in the analysis is renaming columns in our queryset to make them more user-friendly.
-
Try the following query:
Bikestream | project Street, ["Number of Empty Docks"] = No_Empty_Docks | take 10
Summarize data by using KQL
You can use the summarize keyword with a function to aggregate and otherwise manipulate data.
-
Try the following query, which uses the sum function to summarize the rental data to see how many bikes are available in total:
Bikestream | summarize ["Total Number of Bikes"] = sum(No_Bikes)
You can group the summarized data by a specified column or expression.
-
Run the following query to group the number of bikes by neighbourhood to determine the amount of available bikes in each neighbourhood:
Bikestream | summarize ["Total Number of Bikes"] = sum(No_Bikes) by Neighbourhood | project Neighbourhood, ["Total Number of Bikes"]
If any of the bike points has a null or empty entry for neighbourhood, the results of summarization will include a blank value, which is never good for analysis.
-
Modify the query as shown here to use the case function along with the isempty and isnull functions to group all trips for which the neighbourhood is unknown into a Unidentified category for follow-up.
Bikestream | summarize ["Total Number of Bikes"] = sum(No_Bikes) by Neighbourhood | project Neighbourhood = case(isempty(Neighbourhood) or isnull(Neighbourhood), "Unidentified", Neighbourhood), ["Total Number of Bikes"]
Note: As this sample dataset is well-maintained, you might not have an Unidentified field in the query result.
Sort data by using KQL
To make more sense of our data, we typically order it by a column, and this process is done in KQL with either a sort by or order by operator (they act the same way).
-
Try the following query:
Bikestream | summarize ["Total Number of Bikes"] = sum(No_Bikes) by Neighbourhood | project Neighbourhood = case(isempty(Neighbourhood) or isnull(Neighbourhood), "Unidentified", Neighbourhood), ["Total Number of Bikes"] | sort by Neighbourhood asc
-
Modify the query as follows and run it again, and note that the order by operator works the same way as sort by:
Bikestream | summarize ["Total Number of Bikes"] = sum(No_Bikes) by Neighbourhood | project Neighbourhood = case(isempty(Neighbourhood) or isnull(Neighbourhood), "Unidentified", Neighbourhood), ["Total Number of Bikes"] | order by Neighbourhood asc
Filter data by using KQL
In KQL, the where clause is used to filter data. You can combine conditions in a where clause by using and and or logical operators.
-
Run the following query to filter the bike data to include only bike points in the Chelsea neighbourhood:
Bikestream | where Neighbourhood == "Chelsea" | summarize ["Total Number of Bikes"] = sum(No_Bikes) by Neighbourhood | project Neighbourhood = case(isempty(Neighbourhood) or isnull(Neighbourhood), "Unidentified", Neighbourhood), ["Total Number of Bikes"] | sort by Neighbourhood asc
Query data by using Transact-SQL
KQL Database doesn’t support Transact-SQL natively, but it provides a T-SQL endpoint that emulates Microsoft SQL Server and allows you to run T-SQL queries on your data. The T-SQL endpoint has some limitations and differences from the native SQL Server. For example, it doesn’t support creating, altering, or dropping tables, or inserting, updating, or deleting data. It also doesn’t support some T-SQL functions and syntax that aren’t compatible with KQL. It was created to allow systems that didn’t support KQL to use T-SQL to query the data within a KQL Database. So, it’s recommended to use KQL as the primary query language for KQL Database, as it offers more capabilities and performance than T-SQL. You can also use some SQL functions that are supported by KQL, such as count, sum, avg, min, max, and so on.
Retrieve data from a table by using Transact-SQL
-
In your queryset, add and run the following Transact-SQL query:
SELECT TOP 100 * from Bikestream
-
Modify the query as follows to retrieve specific columns
SELECT TOP 10 Street, No_Bikes FROM Bikestream
-
Modify the query to assign an alias that renames No_Empty_Docks to a more user-friendly name.
SELECT TOP 10 Street, No_Empty_Docks as [Number of Empty Docks] from Bikestream
Summarize data by using Transact-SQL
-
Run the following query to find the total number of bikes available:
SELECT sum(No_Bikes) AS [Total Number of Bikes] FROM Bikestream
-
Modify the query to group the total number of bikes by neighbourhood:
SELECT Neighbourhood, Sum(No_Bikes) AS [Total Number of Bikes] FROM Bikestream GROUP BY Neighbourhood
-
Modify the query further to use a CASE statement to group bike points with an unknown origin into a Unidentified category for follow-up.
SELECT CASE WHEN Neighbourhood IS NULL OR Neighbourhood = '' THEN 'Unidentified' ELSE Neighbourhood END AS Neighbourhood, SUM(No_Bikes) AS [Total Number of Bikes] FROM Bikestream GROUP BY CASE WHEN Neighbourhood IS NULL OR Neighbourhood = '' THEN 'Unidentified' ELSE Neighbourhood END;
Sort data by using Transact-SQL
-
Run the following query to order the grouped results by neighbourhood:
SELECT CASE WHEN Neighbourhood IS NULL OR Neighbourhood = '' THEN 'Unidentified' ELSE Neighbourhood END AS Neighbourhood, SUM(No_Bikes) AS [Total Number of Bikes] FROM Bikestream GROUP BY CASE WHEN Neighbourhood IS NULL OR Neighbourhood = '' THEN 'Unidentified' ELSE Neighbourhood END; ORDER BY Neighbourhood ASC;
Filter data by using Transact-SQL
-
Run the following query to filter the grouped data so that only rows having a neighbourhood of “Chelsea” are included in the results
SELECT CASE WHEN Neighbourhood IS NULL OR Neighbourhood = '' THEN 'Unidentified' ELSE Neighbourhood END AS Neighbourhood, SUM(No_Bikes) AS [Total Number of Bikes] FROM Bikestream GROUP BY CASE WHEN Neighbourhood IS NULL OR Neighbourhood = '' THEN 'Unidentified' ELSE Neighbourhood END; HAVING Neighbourhood = 'Chelsea' ORDER BY Neibourhood ASC;
Clean up resources
In this exercise, you have created an eventhouse and queried data using KQL and SQL.
When you’ve finished exploring your KQL database, you can delete the workspace you created for this exercise.
- In the bar on the left, select the icon for your workspace.
- In the toolbar, select Workspace settings.
- In the General section, select Remove this workspace.