<< VIEW FULL RESOURCE LIBRARY

Snowflake’s Cloud Data Platform and Modern Analytics

February 27, 2020

Data Architecture

Demo of performant, secure, scalable, near-zero maintenance platform

In this webinar recording learn about performant modern business intelligence using Snowflake, the cloud-based data warehouse platform. If you’re looking to improve the performance of your old legacy database or want to get out of the business of performing maintenance and upgrades, this demo and discussion with Snowflake is for you.

What we demo in this webinar

  • Snowflake’s intuitive user interface
  • Creating databases and compute nodes
  • Loading data via various methods
  • Running queries in Snowflake
  • Natively storing and querying semi-structured data

What you will learn about

  • Zero management: built-in performance and security
  • Instant elasticity: easily scale computer power up or down
  • A single repository: structured and semi-structured data
  • ANSI-standard SQL: keep your existing tools
  • Pay by the second usage: scale storage separate from compute

Designed specifically for the cloud, Snowflake can serve as a single data repository for structured and unstructured data, providing elastic performance and scalability while reducing overhead and maintenance. While the face of analytics has morphed in this hyper data-driven age, the need for a central repository remains. Serving as a single source of the truth, it provides efficient and secure storage of often sensitive and highly varied data, enables automated data updates and addresses the needs for information sharing.

This webinar on Snowflake delivers a flurry of great information. If you’re moving off a legacy database to a more modern performant data platform and considering moving to the cloud, watch this recording.

▼ TECHNOLOGIES COVERED

Snowflake

▼ PRESENTER

Chris Richardson
Sales Engineer
Snowflake

Chris Richardson joined Snowflake in June 2018. Chris has nearly two decades of experience in data and analytics with roles spanning from business intelligence to marketing to strategy.

▼ PRESENTATION OUTLINE 1

  • Introduction to Snowflake
    • Our story
      • Founded in 2012 by industry veterans
      • Over $950M in venture funding from leading investors
      • Over 3000 active customers
    • Common challenges
      • Siloed, diverse data increases security and governance exposure
      • Scale and speed issues limit timely and accurate business decisions
      • Traditional data architecture: Complex, costly infrastructure slows innovation
  • Value of a modern data architecture with Snowflake cloud data platform
    • One platform, one copy of data, many workloads
    • Secure and governed access to all data
    • Near-zero maintenance, as a service
  • Fully managed data platform as a service
    • Dynamic three-layer service-oriented architecture fully managed by Snowflake
      • Cloud services are a collection of independent, scalable, fault-tolerant stateless services
      • Virtual warehouses are elastic compute engines that handle the execution of customer queries
      • Storage layer is highly optimized hybrid columnar format
  • Snowflake demo
    • Citibike schema
      • Trips: 76M records, each record represents a single rider trip on the New York City Citibike bike share program
      • Weather: 82M weather observations records in JSON format in a variant column
      • Stations: 980 records, contains data for the bike stations where trips begin and end
      • Programs: 61 records with data about the membership programs that rides are taken under
    • Load + query structured data
    • Load + query semi-structured data (JSON)
    • Data sharing
    • Cloning and time travel
  • A deep dive into Snowflake architecture
    • Traditional architecture
      • Shared disk
        • Additional capacity requires forklift upgrade
        • Reads/writes at the same time cripples the system
        • Replication requires additional hardware
      • Shared nothing
        • Resizing cluster requires redistributing data
        • Shut down requires unloading
        • Each cluster requires its own copy of data (ex: test/dev, HA)
      • Vacuuming processes needed to maintain sort and distribution
    • A new architecture for data warehousing
      • Centralized, scale-out storage that expands and contracts automatically
      • Independent compute clusters can read/write at the same time and resize instantly
      • Backed by eleven 9’s of durability SLA by underlying cloud providers
      • Storage and compute
        • Storage separated from compute: automatically grows without adding nodes, never run out of space
        • Resize compute instantly: scale up/down depending on the business needs right now or turn off when not in use
        • Multiple clusters access data without contention: ETL, reporting, data science and applications all running at the same time without performance impact
      • Global services
        • Centralized management
        • Separate metadata from storage and compute
        • Full transactional consistency across entire system (ACID)
      • A deeper look
        • Storage decoupled from compute
        • All data in one place
        • Dynamically combine storage and compute
      • Separate compute, same data
        • Elastic scaling for storage: low-cost cloud storage, fully replicated and resilient
        • Elastic scaling for compute: virtual warehouses scale- up and down instantly without downtime to support workload needs
        • Dedicated performance SLAs: each warehouse can access the same tables at the same time without performance penalty (including ETL)
        • Test/dev/staging/QA: reference objects in multiple databases with one SQL statement
        • Elastic scaling for concurrency: auto-scaling maintains constant query performance

▼ PRESENTATION OUTLINE 2

  • Can you handle the 9 am rush hour?
    • Provides consistent SLAs and performance: no matter how many users/applications are accessing the system
    • Single virtual warehouse: of multiple compute clusters
    • Automatically scales up and down: transparently depending on changing concurrency
    • Clusters automatically paused and resumed: to maximize concurrency while minimizing cost
  • Adaptive caching
    • Metadata: cached for fast access during query planning
    • Data: active working set transparently cached on virtual warehouse SSD
    • Query results: sets cached for reuse without requiring compute (e.g., static dashboard queries)
  • Relational database extended to semi-structured data
  • A better way to share data
    • No data movement: share with unlimited number
      of consumers
    • Live access: data consumers immediately see all updates
    • Ready to use: consumers can immediately start querying
  • Snowflake secure data sharing
    • Share data without moving or copying
    • Without complex reconstruction
    • In a secure, governed, resilient environment
    • With full database capabilities
  • Comprehensive data protection
    • Protection against infrastructure failures: all data transparently and synchronously replicated more than three ways across independent infrastructure
    • Protection against corruption and user errors: time fravel feature enables instant roll-back to any point during chosen retention window
    • Long-term data protection: zero-copy clones plus optional export to cloud object storage enable user-managed data copies
  • Built-in availability
    • Scale-out of all tiers: metadata, compute, storage
    • Resiliency across redundant independent infrastructure
      backed by cloud provider SLAs separate power supplies
      built for synchronous replication
    • Fully online updates and patches zero downtime
    • Fully managed by Snowflake
  • Time travel for data
    • Previous versions of data automatically retained: retention period selected by customer
    • Accessed via SQL extensions
      • AS OF for selection
      • CLONE to recreate
      • UNDROP recovers from accidental deletion
  • Zero-copy data cloning
    • Instant data cloning operations: databases, schema, tables, etc.; Metadata-only operation
    • Modified data stored as new blocks: unmodified data stored only once; no data copying required, no cost
    • Instant test/dev environments: test code on your entire production dataset; swap tables into production when ready
  • Value proposition: pay for only what you use