The Difference Between Power BI Dataflows and Datasets
If you’re into Microsoft, data, and visualizations, the Microsoft Power Platform World Tour is the awesome sauce event of the year. Fueled by Power Platform User Groups, the conference brings together Microsoft and industry experts to showcase and dig into the latest developments with data visualizations, app customization and innovation.
Conference leadership tapped our own BI Solutions Architect Andrew Kinnier to drop in on the Tour when it hit Anaheim, California and talk about Power BI dataflows and datasets.
Recognized for his expertise in analytics and Power BI, Andrew has made regular appearances at the World Tour events over the years. He also is Assistant Organizer of the NJ/NY branch of the Power BI User Groups.
You can read Andrew’s full slide presentation here. In this blog, we give you a summary of the difference between Power BI dataflows and datasets.
WHAT’S THE DIFFERENCE BETWEEN A DATAFLOW AND A DATASET?
Power BI allows for a number of different, but complementary, ways to organize and model self-service data. Two of these mechanisms are called dataflows and datasets. While named similarly, these two concepts complement each other while filling different gaps in the Power BI system.
A dataflow is used to organize and persist self-service data. Behind the scenes, Power BI is using an Azure Data Lake for data storage of the source data and meta data. Data can be cleaned and transformed as part of the dataflow.
Data is then mapped to a standard, extensible schema called the Common Data Model for clearer presentation to end users. Data from multiple sources can be combined in the Data Lake to present a unified data structure to the report developer.
A dataset is a pointer to your data source, typically including a subset of the data in the data source. When used with dataflows, the dataset is pointing at the managed Azure Data Lake and including some or all of the data in the data lake. The dimensions and measures of the data lake that are needed in the current report can be pulled into a specific dataset at the proper grain for speed and efficiency. For example, a retailer may wish to have a dataset containing transaction level product mix (which would be large) and another dataset containing summarized daily sales and discounts (which would be small).
Report developers can then select the proper, minimal dataset when building a report. Multiple datasets can leverage the same underlying data in the managed Azure Data Lake at similar or different grains.
Once published, datasets can be shared between workspaces to other users and groups. To further clarify the quality of a dataset, an organization can certify datasets to indicate to analysts the quality of the data they’re using.
WANT MORE INFORMATION ON DATASETS AND DATAFLOWS?
See Andrew’s webinar recording How to Share Power BI Datasets: Dataflows and Certified Datasets.