To data warehouse or not to data warehouse? The age-old BI debate rages on.
Is it better to connect to data sources directly and build Power BI reports quickly, as they are needed? Or, is it better to build those reports off a standardized and integrated data warehouse?
The arguments for both cases are compelling. Time, cost, reporting speed and accuracy considerations come into play. Ultimately, the answer depends on your company’s business objective(s) and what risks it is willing to accept.
The tortoise and the hare conundrum of BI
There is still a large contingent of BI experts who see the data warehouse (DW) as the engine behind BI. As the sanctioned clearing house for all business data, it can be relied on to serve compiled, standardized and integrated data for reporting and analytics. Slow but reliable, the DW does not satisfy our need for immediacy. It is not the solution when business needs to quickly integrate data coming in from new data sources and wants insights delivered yesterday. The DW does not satisfy a need for immediacy.
Building and sustaining a data warehouse is slow and costly. It requires a skilled resource to maintain. It can slow reporting time. However, that time-to-market is greatly reduced once the DW exists. New datapoints can be easily added to the data warehouse to accommodate businesses changes. Report authors can leverage a rich set of dimensionality.
On the other hand, the first version of a Power BI dataset used without a data warehouse will get reports out the door faster. But as the adage goes, there’s no such thing as a free lunch. The initial time saved gets spent in duplicative efforts. More user acceptance testing is required. Over time, more refactoring must likely be performed since most everything must be revalidated in a new dataset (versus using data from a DW that has been mostly vetted).
We have worked with Power BI both ways, depending on what approach was best suited to the client’s needs. In this blog, we have assembled a SWOT analysis (strengths, weaknesses, opportunities, threats) that outlines the pros and cons of connecting Power BI directly to a data source versus using a data warehouse.
Power BI without a data warehouse
- Can stand up a first version of a simple report relatively quickly. Because no intermediate data store is involved, the data simply flows directly from the system of record.
- Reporting requirements don’t overlap with each other, simplifying parallel paths in dataset and report development.
- Can foster trust and partnership with business units through quick development of prototypes and first versions of datasets.
- Can respond quickly to shifting reporting needs to support a dynamic business.
- Can operate with a smaller and less senior technical team.
- Multiple versions of the truth across multiple datasets cause confusion and frustration in the userbase.
- Little to no data governance means potentially introducing multiple definitions for the same measure.
- Complex reports have a much higher cost to build and maintain.
- Example: A report with a table that merges data from different sources and that logic cannot be easily leveraged by another dataset.
- Increased complexity of measures considerably slows down dataset refresh and report responsiveness.
- Consequently, the specs on the Power BI service need to be increased, which carries significant cost.
- Lack of agreement across business units on how measures are defined leads to cross-functional conflict.
- Telling the wrong data story can lead to bad business decisions with potentially large monetary consequences.
Power BI with a data warehouse
- Pushes business transformation logic upstream from Power BI into a database, which can then be used for other purposes beyond Microsoft BI/Power BI.
- Greatly simplifies the complexity and number of Power BI datasets.
- Makes maintenance and extensibility much easier, thereby reducing cost.
- Makes reports much faster: measures are pre-baked rather than calculated at run time.
- Greatly decreases the load on Power BI service. Because it is crunching fewer numbers, refresh failures and resource conflicts are reduced.
- Unlocks the possibility of time-based analysis such as snapshots and slowly changing dimensions.
- Example: As of the end of any given month, what was the on-hand inventory for Part X?
- Example: What was the name of Part X last year, and did its Item Type change over time?
- Creates a single version of the truth, eliminating the possibility of discrepancies across the reporting ecosystem.
- Allows reports to be created much faster since data structures can be reused across datasets.
- Allows new data points and data sources to be added to the DW’s foundation easily after the initial version of the DW is completed.
- Can accommodate Big Data analysis at any scale, even billions of rows.
- Business definitions are different across functions. The creation of a DW is an opportunity to align those definitions, greatly reducing confusion and inaccuracies in cross-functional analysis.
- Governance of data increases data quality, increasing user trust and adoption of the system, which increases user efficiency.
- Exposes more junior development resources to BI best practices.
- Aligns with typical BI best practices.
- Higher upfront cost of time and money than operating dataset-by-dataset. A data model presents its highest value when requirements for multiple data sources are considered concurrently.
- Higher cost associated with the data warehouse infrastructure.
- Requires someone with a strong understanding of data modeling and BI architecture to maintain and augment.
- Adds a component in between the systems of record and Power BI.
- Increases complexity of overall architecture.
- Increases refresh time.
- Near-real-time operational reports are not necessarily well suited for use with a data warehouse.
- Example: A report that refreshes every 30 minutes may not be a good candidate; a report that refreshes every two hours may be.
- A data warehouse typically runs behind the needs of the business. To facilitate these newly uncovered business requirements, DW and non-DW data will need to be merged until the DW can be augmented.
- A data warehouse requires support from a knowledgeable technical resource. Without it, the DW can grow cumbersome, stale and/or inaccurate, creating a worse reporting environment than having no DW at all.
In your organization’s race to competitive BI, what will be the better solution, the tortoise or the hare? An August 2018 Duke University article, In the Race of Life, the Tortoise Beats the Hare Every Time, seems to support taking the slow but steady approach of Aesop’s famed fable. Research showed that, when speed is averaged throughout a lifetime, the fastest animals and machines are actually the slowest.
But Aesop didn’t have to contend with a data-driven culture that demands immediacy and blazing fast technology to support those demands. So does this research hold true for the data warehouse debate and your organization? Only you can decide which side of the argument your organization needs to land on. And so, the debate goes on.
Need help making the right decision for your organization? Get in touch with us. We will work with you to determine the roadmap for achieving your vision of a responsive, accurate and modernized analytics.