OLAP: Past, Present…and Future?
Next generation visualization tools and fast technologies like columnar and in-memory databases are causing people to question the relevancy of the OLAP cube. But should we be writing off OLAP? Before we can answer this question, it’s helpful to do a quick retrospective of where OLAP started and what it brought…and brings…to the table.
The Birth of a Beautiful Thing: OLAP
The term OLAP, or on-line analytical processing, was coined by Edgar F. Codd, the father of the relational database, best known for his 12 rules. The term appeared in a 1993 whitepaper published by Arbor Software. The first OLAP product on the market was Arbor Essbase, a name meaning "extended spreadsheet database."
OLAP was a solution to some of the shortcomings of the data warehouse, particularly around query performance. Business people and analysts were never interested in looking at individual transactions but rather aggregate calculations, usually over various time periods – e.g. sales by product line, year to date and compared to the same period last year.
While a data warehouse could provide this information, the queries could take hours, if not days, to run. This was at a time when computing power was significantly more expensive than now: in 1995, 1 GB of RAM would set you back $32,300.00!
OLAP solved this problem by pulling data out of the data warehouse, pre-calculating all the aggregate measures at every level for every dimension and then building an indexed data structure optimized for retrieval. The result is analogous to a three dimensional spreadsheet that can be spun around and looked at from any angle.
Thus the term "cube" is synonymous with OLAP. The cubes build overnight, after the ETL (Extract, Transform and Load) populates the dimension and fact tables in the data warehouse. Everyone is happy in the morning! Not only do the reports run quickly, but it is possible to run ad-hoc queries, slicing and dicing the data or drilling down to additional levels of detail. Cubes are also good at relative time calculations: year over year, month over month, year to date, month to date, etc., which are expensive queries in a relational database.
From a business user perspective, the drag and drop interfaces of OLAP cubes shorten the learning curve and encourage enterprise-wide adoption. Because business logic and calculations are embedded in the cube, users have the flexibility of ad-hoc queries with the assurance of vetted and cleansed data.
But all is not perfect in the world of OLAP. It was never possible to have real time numbers and even overnight numbers are a scheduling challenge. The nightly ETL has to run first, then be followed by a long cube compilation process. If the ETL fails, cube builds are compromised. In global organizations, time zones cut the cube build window even shorter. To meet service level agreements (SLAs) and reporting deadlines, the amount of data in the cube is often throttled, trading off usefulness for timeliness.
There are other things that business users find frustrating with OLAP. Cube development and changes are not quick. Then once the data model is done, testing the business logic and summary calculations is an onerous job. In a cube there are hundreds of thousands, if not millions, of dimension and fact intersections. With some exceptions, cubes are in a proprietary format, which limits how the cubes can be used as a data source. Many companies are not comfortable will this level of vendor lock-in.
The Current State of OLAP
RAM and CPU prices decrease each year and servers become more powerful for the same price. With this increasing processing power, it is possible to address OLAP limitations with a slightly different approach. The dimensional model remains, but new technologies are changing the implementation layer. We now have columnar databases, in-memory storage options and simply more raw CPU horsepower to chew through nasty SQL queries. How much does this resemble, or is capable of replacing, traditional OLAP technology? As usual, the answer is a resounding "it depends." Microsoft, IBM and Tableau have several options available.
Microsoft first got into the cube game with the acquisition of OLAP technology from Panorama software in 1996. At the time, Panorama decided to focus on their NovaView front end client product. Panorama had a ROLAP solution, meaning that the OLAP model was layered on top of the relational data warehouse tables. Introduced in 1998 as OLAP Services, then rebranded in 2000 as Analysis Services, it is now a mature product commonly known as MSAS. It is popular with organizations that have bought into the Microsoft ecosystem, with Windows Servers and SQL Server databases. Its strength is support for the multidimensional query language (MDX) which is like SQL for OLAP. Although development is complex, MDX permits BI developers to embed business logic into the cube.
With SQL Server 2012, Microsoft introduced the Tabular Model, which uses a combination of in-memory cube and columnar database technology which they branded as the xVelocity engine. It comes with a simplified MDX language called Data Analysis Expression, or DAX. The columnar structure improves the performance of queries like select distinct and aggregate calculations that bog down legacy OLAP structures. There are a few trade-offs, such as significant memory requirements (because everything is in-memory, duh) and no many-to-many relationships or custom roll-ups.
The Cognos flagship product in the late 90s was PowerPlay. Hockey fans might remember the clever Cognos advertising that filled the arena when the Ottawa Senators were on a powerplay during a home game. PowerPlay is a MOLAP solution which compiles data into an indexed data file in a proprietary format. Despite the cube size limit and other restrictions, it remains popular as an all around OLAP solution. There will be no new features but IBM did release a version 11 and they are committed to supporting version 7.5 until at least 2021.
A newer IBM Cognos OLAP option is Dynamic Cubes, which evolved from DB2 cube view technology. It can handle high volumes of data and it supports MDX scripting, making it a strong alternative to MSAS. For optimal performance, it requires a star schema data warehouse, ideally with deep hierarchies, and a server with tons of RAM. The cube structure only works natively with Cognos client tools, although the Senturus Analytics Connector for Tableau lets you use Tableau with a Dynamic Cube data source.
IBM also has another OLAP product, TM1, which was Applix TM1 before being acquired by Cognos in 2007. TM1 is a planning, forecasting and budgeting tool with an OLAP backend. The TM1 cube is in-memory, so it is very fast, and it calculates summary measures in real-time as financial numbers are updated by users. The TM1 data can be leveraged as a dimensional datasource by the other Cognos reporting tools. (Refer to the Senturus webinar recording Using TM1 Cubes with Cognos BI.)
Cognos Analytics, aka Cognos 11, offers analytics with minimal involvement from IT. Like its predecessors, Cognos Analytics works best with relational databases – data that has been staged into a Kimball-style data warehouse. What is relevant to this discussion is the Data Module feature that allows business users to augment the data warehouse with their own data. The user loaded data is stored in Apache Parquet, a columnar, in-memory database. This provides high performance for aggregates and OLAP style queries. The drawback is that there is no structure or data governance, which is the opposite of the PowerPlay approach. However, the blending of data warehouse and user uploaded data is ideal for fast moving organizations who find that data warehouse and OLAP development is too slow for the business pace of change. Once a prototype concept is proven it can be industrialized with staged data, managed models and data governance.
Tableau Data Extracts, or TDE, are similar to Cognos Data Modules. TDE is a compressed data snapshot which is loaded into memory. The columnar store structure makes quick work of aggregates and time period comparisons. Like PowerPlay, it delivers performance and portability, but at the cost of scalability. The next version promises the use of larger data sets with the 10.5 hyper data engine.
A handful of startups have emerged to take on the same challenge as the early OLAP vendors did 30 years ago. Running queries against billions of records in Hadoop is long, analogous to running summary queries against millions of records against a relational database in 1990. The big data hype is subsiding, but companies are collecting more data and analyzing this data is integral to staying competitive. Ad-hoc queries that return results in minutes, rather than hours, are a real business advantage. Emerging vendors Kyvo Insight, AtScale and Druid have attracted big dollars in venture capital and an open source project, Apache Kylin is gaining momentum.
Why OLAP Persists
OLAP remains a compelling solution because it allows non-experts to ask questions of business data. A report or dashboard provides a limited set of answers but OLAP solutions are interactive and let users ask more questions and explore the data. Because the summary calculations and business logic are built into the model, only a minimum of technical skill is required.
For example, the tools are easy enough to use that a sales team can sit on the same side of the table with their customer and browse the history of what they purchased. What adds punch is the ability to interactively query the data and look at different time periods, averages and trends, depending on where the conversation is going. A supplier can help their customer plan future purchases, work through different scenarios, and optimize how they are spending money. This is adding value to the business transaction without adding much cost.
OLAP is not dead. But it has morphed into only one interesting option among many analytics tools. The concept of structuring data with dimensions and hierarchies persists because it provides guardrails. Business users can query the data in business terms without knowing the technical details of the implementation.
For a more on OLAP’s role in organizations today, don’t miss our recorded webinar Is OLAP Dead? Here, we explore various facets of OLAP, pros and cons of the big three current OLAP technologies and discuss real life client scenarios for a pragmatic perspective.
Thanks to David Currie for being inspired by our OLAP webinar and providing this blog post. David is a long-time Cognos consultant and former Cognos employee. He also blogs about business intelligence and big data at davidpcurrie.com.