January 21, 2016

Data Architecture

Cost Effective, Highly Scalable, High Speed

You can do data discovery or rapid BI prototyping on large data sets without becoming a Hadoop expert. Our webinar shows how you can enjoy benefits of big data and the ability to analyze it with standard BI tools, including Cognos.

IBM BigInsights Product Manager Paul Yip demonstrates a cost effective, scalable solution that does not have the barriers to entry common with big data applications. He reviews:

  • Use cases for Hadoop
  • Pros and cons of different visualization tools and their integration with Hadoop
  • Demonstration of BigInsights, IBM’s solution

You may also be interested in the following YouTube videos authored by Paul Yip:


Big SQL Technology Sandbox is a large, shared environment for data science. You can use it to run R, SQL, Spark, and Hadoop jobs. It is a high performance cluster demonstrating the advantages of parallelized processing of big data sets. Click here to access


IBM COGNOS BI, Hadoop / Big Data, Tableau


BI Report Authors, BI Power Users (Developers, Support Staff), BI Managers, IT Managers, Marketing Analysts, Marketing Managers / Directors, Program or Project Managers, Predictive: Quality Assurance, Risk & Fraud


Paul Yip, IBM Analytics
BigInsights Product Manager

Paul has 15 years of hands-on technical experience with the design, implementation, and performance tuning of information management systems. His career has involved many aspects of information management including distributed transactional systems, data marts and warehousing, information security and governance, Hadoop, and Big Data. Paul is the author of three books and has published dozens of technical articles.


  • What is Hadoop?
  • Hadoop’s Cost Advantage
  • Distributed Analytics Example: MapReduce
  • Hive Provides a SQL Interface to MapReduce
  • SQL on Hadoop Matters for Big Data Analytics for BI Tools like Cognos
  • Hive – Joins in MapReduce
  • N-way Joins in MapReduce
  • IBM BigInsights
  • Hive is Really 3 Things…Storage Format, Metastore, and Execution Engine
  • Big SQL Preserves Open Source Foundation
  • IBM First/Only to Produce Audited Benchmark Hadoop-DS (based on TPC-DS) / Oct 2014
  • Performance Test – Hadoop-DS (based on TPC-DS) 20 (Physical Node) Cluster
  • Big SQL Runs More SQL Out-of-Box
  • Cognos & Hadoop Lessons Learned
  • Big SQL Security – Best in Class
  • Announced at Strata + Hadoop World Sept 2015
  • Performance Test Summary
  • BigSheets: Browser based analytics tool for BigData
  • BigSheets Demo
  • Big Data Technology Patterns
    • Traditional Enterprise Analytic Environment
    • Traditional Approach to Improve Analytic Architectures
    • Faster, Deeper Insights while Reducing Costs
    • Current State: Analytics Development Cycle
    • Target State: Rapid Prototyping
    • Right Tool, Right Job
  • Backups
    • Big SQL: Query-able (Rapid) Archive
    • Two Models for High Performance Analytics
    • IBM Open Platform (alone) is similar to Hortonworks
    • This is Hortonworks…Oh wait…no, its BigInsights!
    • IBM Open Data Platform as of 1
    • IBM Open Platform vNext (1Q 2016)
    • IBM’s Big R vs Native Open Source R
    • Sequential vs Task Parallel Execution
    • Automatically Distributes Workload Across Cluster
  • Spark SQL vs Big SQL
    • How did Big SQL Scale?
    • Client Stories – Leveraging IBM Value Adds
    • Public Customer References
    • Big R Machine Learning – Scalability and Performance