What makes the Snowflake analytic data warehouse so special? Find out during our on-demand webinar where you’ll get to know Snowflake and learn its unique advantages for Power BI, Tableau and Cognos.
We review and demo why Snowflake makes great sense for BI including
- Data loading flexibility and scalability
- Consumption-based storage and compute costs allow fine tuning of BI investment
- Support across a range of BI tools like Power BI, Tableau and Cognos
- How to allocate separate compute costs to specific departments and use cases
- How unique Snowflake features like Time Travel and data sharing can benefit your analytics team
VP Software & Architecture
Bob Looney leads software development and BI architecture efforts at Senturus, focusing on software and cloud architecture with Snowflake, Power BI and Tableau practices. Before coming to Senturus, Bob was designing, building and scaling business intelligence software, reports and dashboards for use by thousands of restaurants.
Principal Snowflake Architect
Reeves is a Microsoft Certified Master and Snowflake – SnowPro Advanced: Architect. He has more than 20 years of experience working on development and data warehouse projects within a large variety of industries.Read more
Q: Can you go over a possible Snowflake use case for a small/medium company (<200gb) that has several OLTP systems with SQL Server backends and a SQL Server data warehouse that is used to model data into fact tables and dimensions for use in Cognos? What would be the benefits of moving to Snowflake over accessing it from the SQL Server warehouse?
A: One benefit of moving to the Snowflake is the lack of maintenance. Another benefit is the ability to scale past what SQL Server is able to do. If those two items are not an issue and performance is acceptable at present, we don’t see a reason to move.
Q: Do we still need to cluster instead of index when data is huge when using Snowflake?
A: We have not seen the need with large tables (+22 billion rows per day) to cluster, but we would imagine there could be a benefit at some level. It is something to consider when performance is not acceptable. In general, tables in the multi-terabyte (TB) range will experience the most benefit from clustering, particularly if DML is performed regularly/continually on these tables. Read more about clustering on the Snowflake support page.
Q: What happens to existing data connections during a clone operation in Snowflake? Are the connections terminated, the cloning suspended or are the users none the wiser?
A: Connections are still referencing the original object, and nothing would happen to that connection. Clones are new metadata objects that point to the original object.
Q: Is Snowflake good for streaming data?
A: Yes, Snowflake can handle streaming loads, but it uses the term micro-batches. Read more about data load Snowpipe on the Snowflake support page.
Q: Is there support for geospatial data types in Snowflake?
A: Currently Snowflake supports the geography data type. Read more on the Snowflake support page.
Q: Is data retrieval through JDBC slow compared to native connectors?
A: We have not seen issues with JDBC connectors vs. other connectors.
Q: What are the top reasons to pick Snowflake over Azure Synapse Analytics?
A: Here are reasons to pick Snowflake over Azure Synapse Analytics
- It’s easier to manage with no indexes and excellent performance. Other solutions need to distribute data among nodes and add indexes to achieve performance.
- True separate storage from compute. Snowflake starts and stops depending on demand; it does not need to be running all of the time like other cloud offerings.
- Low cost to start a project because you pay for what you use. Even an extra-small warehouse can load 35+ million rows in under 30 seconds extra-small warehouse is between $2.00-$4.00 per hour.
- Incredible scale.
Q: Do I need to create a presentation layer for the report’s consumption in Snowflake? Or can I query directly?
A: You can query tables or views directly, just like other databases.
Q: How does a reader account work in Snowflake?
A: You can share data with customers that do not have a Snowflake account and they will be able to see the shared data on your account. This is a detailed subject that will require a lengthy response, read the Snowflake support documentation for more information.
Q: Where would you recommend the transformation of data into a star schema, on Snowflake or on-prem?
A: It depends, it could happen in both easily. If scale is not an issue and it is already done on-prem it might be hard to justify moving it unless there are other reasons. If you want to discuss this, please contact us for a free consultation at 888-601-6010 ext. 1 or [email protected].
Q: Does Snowflake use any ColumnStore Indexes? What makes it so responsive?
A: Micro partitions have columnar storage/compression and metadata that helps with partition elimination. Read more about micro partitions on the Snowflake support page.
Q: What security considerations do we need to consider if we put financial data into Snowflake?
A: Snowflake is extremely secure and has additional security options like
- Customer-managed encryption keys through Tri-Secret Secure
Support for secure, direct proxy to your other virtual networks or on-premises data centers using AWS PrivateLink or Azure Private Link
- Support for PHI data (in accordance with HIPAA and HITRUST CSF regulations)
- Support for PCI DSS
Q: If Snowflake is so fast, do we need to worry about our data scientists terrible SQL? Is there still a way to analyze a query for efficiency?
A: Yes, like other tools, Snowflake has query profile that displays execution details for a query. Read more about query profile on the Snowflake support page.
Q: Can we secure data in Snowflake so certain users see only certain subsets of data from the same warehouse?
A: Yes, you can use secure views. According to Snowflake, when deciding whether to use a secure view, you should consider the purpose of the view and weigh the trade-off between data privacy/security and query performance. Read more about secure views on the Snowflake support page.
Q: What is the licensing/cost structure for adding accounts that need to access data in Snowflake?
A: The is no licensing cost to users or for access. It is managed with storage, compute and cloud services, like Snowpipe use cost. Read more about use costs on the Snowflake support page.
Q: Currently we have Snowflake, and we are configuring Cognos to access it. Using Cognos, there is a performance degradation. How do we improve performance?
A: The first thing to try is to increase the warehouse size or if there are a lot of requests at the same time, scale out the warehouse. If you’d like to discuss your issue in more detail, contact us for a free consultation at 888-601-6010 ext. 1 or [email protected]. Read more about warehouse performance on the Snowflake support page.
Q: If I need a data lake, and build API’s for data retrieval (large or small data set), is Snowflake a good candidate?
A: Yes, Snowflake sounds like it would be a good solution, but to say for sure, we’d need more information. Contact us to discuss your situation at 888-601-6010 ext. 1 or [email protected].
Greetings and welcome to the latest installment of the Senturus Knowledge Series. Today we’re excited to be presenting to you on the topic of 10 reasons why Snowflake is great for analytics.
Before we get into the heart of the presentation, we invite you to submit questions via the control panel that you’ll see on your screen here.
Feel free to use that to make this interactive, and you can expand or collapse that using that orange arrow. While we’re usually able to respond to any questions, the questions, while the webinars in progress.
If for some reason, we’re not able to do that, we will provide a written response document that will post on Senturus.com. And we always get the question, can I get today’s presentation? And the answer is an unqualified, absolutely
at Senturus.com, either by selecting the Resources tab, and then drilling into the Resources Library, or you can go to the link that’s on your screen, or you can click the link that was just posted in the chat.
While you’re there, be sure to bookmark it, as it contains tons of free, valuable content addressing a wide array of business analytics topics.
Our agenda today, we will do some quick introductions. Then we’ll give an overview of Snowflake, and cover 10 reasons why Snowflake is purpose built for analytics.
Then, we’ll do a brief overview of Senturus, for those of you who may not be familiar with our organization and what we do, and then some free additional resources.
And then we’ll get into the Q&A, so please make sure you stick around at the end for that.
By way of introductions today, I’m pleased to be joined by a couple of my esteemed colleagues, Bob Looney and Reeves Smith.
Bob leads Software Development and BI Architecture efforts, here at Senturus, focusing on software and cloud architecture with Snowflake Power BI, and Tableau. Before coming to Senturus, he was designing, building and scaling BI software reports and dashboards for use by thousands of different restaurants.
Reeves is a Microsoft Certified Master, and Snowflake’s Pro Advanced Architect.
He has more than 20 years of experience working on various development and data warehousing projects across a wide range of industries, including banking, DOD, Energy, Entertainment, Insurance, Manufacturing, Oil and Gas, Pretty much every vertical out there.
So we’re pleased to be joined by those two today. My name is Mike Weinhauer.
I am a Director here at Senturus and I host many of our webinar events.
So, before we get into the content, we always like to get a sense for our audience here, and we have a poll here for you today, as usual. We’re just curious what brought you here today. Why did this webinar pique your interest, and just select one of those if you don’t mind. Just, you know, I want to keep up to date on new technologies.
Organization is evaluating data platforms, or organization is evaluating Snowflake specifically, Or our organization has recently purchased Snowflake or if it’s not on the list there then select other.
We’ll give everybody a few seconds here to get their answers in, and then we’ll share them back to you as we usually do.
All right, it looks like we got just about everybody there. And it looks like most of you are just here to kind of keep up to date on new technologies, but it’s a good can. Almost 20% of you actually own Snowflakes. So that’s interesting. Great.
Thank you for sharing your insights with us, and with that, I’m going to hand the floor over to Bob and Reeves. The floor is yours.
Thanks, Mike. Appreciate the intro and, and welcome to the presentation, everyone.
We’re going to kick off with a quick Snowflake overview.
So just talk a little bit about what Snowflake is.
Snowflake is software as a service. So don’t go into Snowflake thinking its a platform where you’re building VMs and things like that. It is a software as a service type product. The benefits of that is low administration, a lot of the heavy lifting behind the scenes gets done for you.
And then you can quickly start trials. So like you can today, go to snowflakes website, start a free 30 day trial. Start banging away and seeing how Snowflake works for your use cases.
Stuff like, also goes with consumption based pricing, so it’s not a straight monthly fee, so it’s all about how much data, you end up storing on the platform.
And, uh, what type of compute resources you use, and we’re goanna dive pretty deep into that, as we go along.
Snowflake, the service sits on top of one of the three major cloud providers there, so you have AWS, You have Azure, You have Google, the Snowflake platform, software platform looks the same, regardless of which service that sits on top of, but there are some reasons you might want to be on a specific platform. And Reeves, do you want to
touch on that, just a second. Sure. And then the nice part about Snowflake because it runs on all three major cloud vendors, is if you’ve already, if your organization, has already made the choice to go with one of those cloud vendors, you do not need to move to a specific one.
You can use Snowflake from within your cloud vendors. You have to check. There’s a couple of different regions that are available.
They’re not available, in every region.
But you are able to use Snowflake within the cloud platform that you’re already accustomed to. So if you’re saving all your data off onto AWS or into Azure cloud, you don’t have to move that data across.
You can use Snowflake within that environment and read from you’re already curated, or you’re already, the data that you already have available to you.
So it’s nice that they’re not forcing you into that platform. And as a probably not, today’s case, but down the road, if your company decided to move to a different cloud vendor, you could do an easy migration by starting up a Snowflake instance in that new cloud provider. And then just replicating all the data across.
It keeps you from having to, you know, having to pick specific ones.
You can use the cloud vendor that you like.
It really can save on some of those data transfer fees to you.
So, if you walked into AWS here, and your data was ever on Azure, you might eat up a bunch of money.
So, something to keep in mind when you deploy where your storage is possible.
Lastly, and this is what the bulk of today’s webinar is going to talk about, it’s an analytics focused platform. It’s really not a transactional database’s at all.
Think of it in those terms, think of it more from a data warehousing type standpoint, and we’ll talk about why here shortly.
So the core is Snowflake. It looks like a database.
A lot of the database terminology is very consistent. You have a database, and you can make multiple databases.
Your databases have schemas and tables, have columns,
you can create views, users and roles, things like that. And it’s not SQL compliant.
The origin of Snowflake started as some guys that were out of Oracle, that decided they wanted to do a platform a little bit differently. So it is a SQL database.
Unlike some of the larger SQL databases, it is SQL compliant and it adheres to the same type of SQL functionality that you have.
And any platform it just scales to a very, very large amount to keep up with large LARC data.
So it can scale, to an extremely large amount of data.
So let’s touch on a couple of unique concepts, and we’re just going to hit a couple highlights here, because there’s a lot, but a couple of big things.
The one thing that really stood out to me, and that I’ve heard a lot, is this idea of Snowflake, the company, or the product name, and how that so closely aligns with the Snowflake data model.
The explanation from, as Netflix and marketing materials, is its founders, just really love skiing and Snowflakes are made in the cloud.
So don’t equate Snowflake, the product, meaning that you can only make Snowflake data models in Snowflake databases. That’s the key takeaway there.
It’s a cloud data platform, so calling it, just a cloud database is too simplistic.
It does have this idea of compute and storage and things like that that we’ll continue to see in the webinar.
Reeves, do you want to talk to procedures, and the unique approach, Snowflakes taken here?
I really like Snowflake, there’s some things that are just things to think about, and that’s why we outline them here in a unique concept.
So as you can see, the stored procedure to the left would be one that you would run in SQL, and it uses regular SQL.
So those are the two little items that, if you are a SQL person, you might go, oh, I just want that to be call that early and make you aware of that, so that you’re not trying to go down that path with that understanding. It is it is slated on the roadmap to write SQL stored procedures.
And that’s on the roadmap, as if here, I would imagine
at some point they will get to that, but that’s just something to call out.
That’s a little that’s a little different than your other SQL systems, definitely unique.
It can the last set of unique concepts we’re going to touch on here is this idea of truly decouple compute and storage.
And this idea is that you can have these warehouses, which are your computer resources that act on storage and they can act on it concurrently.
And again, that marketing is backed up by the technology. But the marketing phrase is virtually unlimited number of concurrent workloads against the same single copy of your data.
Do want to touch on without getting too deep into the weeds on kind of how they’re doing that reefs, because that’s fascinating concept.
So, so their concept of compute, is what they call warehouse, so two, OK term. But so when they say warehouse, they basically mean compute.
And the nice part about the way that Snowflake has been set up as the storage is, is using they’re using cloud storage.
So they’re able to have warehouses, interact with that cloud storage, and not interact with each other.
So you can truly separate storage and compute, where you can have different loads, like in this diagram.
You can have the marketing people run with a different size of compute. As compared to, maybe, the finance app, or as compared to the Trader data transformation app is able to run it at a lower level of our lower level of compute was what that means.
They use t-shirt sizing. So they use large, medium, small.
And it’s, it’s the amount of, it’s the amount of nodes that are in that, that are in that cluster of compute, and an extra small being one, and they all scale up by two, so an extra small being, one as small as two, See, you’re able to do separate workloads against that same data at the same time, and not affect anyone else.
Which is, which is really nice from a separation of storage and compute. And the last piece on that is a nice part with warehouses because they’re the thing that you get charged for. You get charged for compute, and you get charged for storage.
Well, the nice part is, now if, if anyone’s ever tried to read a cloud bill, they’re very difficult to keep Snowflake. It’s pretty easy, because storage, and then its compute, and so you can easily figure out how much the finance group had. And if you’re billing back to groups, it’s very easy to fall through with this concept of warehouses.
It’s a nice little feature of how that service is built.
Well, we hope that kind, quick overview gives you some context as we dive into these 10 reasons that snowflake is really built for analytics and analytics use cases here. So let’s, let’s get going on that list of 10 reasons.
Preview them here, so if you want to read through those real quick, we’re going to hit these 1 by 1 recap here at the end.
So let’s start talking about large data volumes. So again, you’re backed by the cloud.
So you can put as much data up there as you want just about because cloud providers aren’t going to run out of storage space anytime soon, and we’ll just go buy pallets of more drives if they need them.
They have this idea of flexible staging, where you can use stages from Amazon S three directly, or as your blob, or Google Cloud Storage, any of those three, and hook those into your snowflake application, like tenant.
You can also use what’s called an internal stage, which is part of the snowflake application itself.
So if you don’t have those things set up, and it reefs you mentioned, maybe, for, like, demo use cases are, let’s see, another use case to be, as if I’m, if I’m transitioning to the cloud, but I’m not ready to be in the cloud, meaning, are not ready. I shouldn’t say that, Excuse me. I’m transitioning to cloud, but I haven’t started that journey yet.
And if you haven’t started that journey, maybe I don’t have things like blob storage, or any of that kind of stuff set up. But I’d still like to try snowflake out. If I use internal stages, is what you’re basically saying is snowflake, you manage my blob storage for me, or my orange.
So it allows you to try Snowflake. And if you, if you were in an organization that didn’t have any cloud presence, I could try Snowflake and still have, and use storage. But now you’re letting snowflake manage that storage. So that’s all that additional overhead. Absolutely.
That last point about fast data loading of large datasets, we’re actually going to do it little example here.
So Reeves, let me swap this over to you, OK.
And run through that C, and that should.
The over there, will do, sorry, OK, can everybody see my screen, or by not shared? Oh, I have to share, I’m sorry.
And then let me know that you can see my screen.
So this is the snowflake UI, and this is what comes when you, when you are, this is what you can interact with snowflake, and it allows you to not have to have tools, if you don’t want.
It’s not to say, there’s not a plenty of tools that will that will allow you to interact with snowflake. You can use tools like you can use tools like data, Gripper. Other tools that you’re used to using.
Let us know if I can gives you this UI interface and in that UI interface, I can look at the databases that are out here. I can look at the shares.
And the shares are, are a concept, a concept where I’m able to data share and share different datasets.
And there’s a very unique feature, and we will talk about this later on in the meeting, about how you can share data and not have to export stuff through Excel.
And you’re allowed to share it and let people pay for compute. Or you can actually pay for the compute. So it goes in both places, and that the data marketplace is someplace where you can pull that stuff in.
Again, warehouses is the compute concept, and I can make as many warehouses as I’d like. And I could set them up to whatever size we want. For this demo, just are going to be using a demo warehouse, which is an extra small, and in this instance of snowflake, which is a standard edition, if I’m not mistaken. I think that’s a $2 an hour span.
So, if I were to, yeah, if I were to turn on this database for an hour, using this warehouse and using this warehouse for an hour, it would cost $2.
You do see that it does unlike some of the other providers, some of the other databases out there like the Red chips, the Redshifts in the synapse analytics.
It has an auto suspend feature that those do not have, and the nice part about that is and you’ll see, as I run the script that I’m not currently, I don’t have a warehouse running an aisle.
Start doing stuff that needs a warehouse, and it will kick the warehouse up for me and started. And then it auto suspends and you can have auto suspend time.
You can set whatever you want from that stay standpoint.
You also have a Worksheets tab. And this is where we will be doing most of the demo.
And then cool little features, you can see when things are run.
And now you’re going to see that you can see that when, when things are run, what kind of functionality, and if you’re used to SQL Server, this is kind of that same idea as Profiler.
And this is restricted by accounts. As you can see right now, I’m an account admin.
So I can see everything here, but you can see that warehouse had to kick up four, this command and this command, but there is no warehouses associated with this.
Then, yeah, the account tab at your account admin, and that’s for roles and stuff like that. But so that’s a quick view of the snowflake UI. But in this demo, we wanted to really show, because everyone talks about, you know, big data, big data, big data, right?
Let’s I want to create a table, and, again, standard SQL.
I create a table, as you can see, that the demo warehouse is not on.
So these DDL type actions don’t require a warehouse.
So, I can create tables, and I can do stuff like that, without having a warehouse, but the second I then, need to do something that requires a warehouse.
It will start ups A, you’ll see, you should see that.
Come up. And maybe that’s a UI thing, or, or.
Because I did this just a second ago, it had this already cached, probably use that from a cache. And that’s why that. And that’s why the demo or else didn’t start.
And that’s another topic that they cash as much as they can to be able to show you, so that you don’t have to rerun the queries and pay the compute cost. You can just get it from cache.
So, in this example, in this example, I’m going to polling some taxi data, right?
And we have this out in a stage, and it’s an Azure Blob, and we’ve created this concept of what we call a stage.
You point to this Location.
And, what I’m doing here is, I’m saying, I want everything from this folder, and I want to use this pattern.
And I’m saying, Give me all the data. And right now, I will be using all the files that are in that location. And as you can see, there’s 15 files there.
And they all kind of fit this trips data 2019 1, 2 3 4, 5, 6, 7, or 1 to 1 to ABC for the different days. Or I think that’s actually months. Excuse me.
So if I were to say, I wanted to see, oh, the first data cycle, I can limit it, and I’m using this list command to just show you, because I’m going to do the exact same thing in a copy command.
And so, I can come through, and I can do that exact same thing and copy all this data into this table.
Because you saw, from this query, there’s nothing in there, right?
So, if I were to run this command, and that would start loading all the data from this location, but, ah, what didn’t I do? I didn’t limited by a month, right?
So I’m actually pulling all 15 of these files, and these all 15 of these files are compressed CSVs. And there are about 60 Megs, so there’s actually a ton of data.
So normal demos you wouldn’t try to do is kind of feeds because I just wanted to prove that this works. But I’d like to prove that this is a big data solution. And as you can see, I just loaded all 15 of those files.
So then the next question, you’d say, well, how many rows that?
Because it might not be that impressive, but.
As without commas, it’s hard to see if I were to put the commas in, you can see that’s 37 million rows.
And now you’d ask the question, well, OK, I can load 37 million, but that doesn’t solve any problems.
If I can’t query it very quickly, just loaded that data, I now see 37 million rows.
And I just ran a query over 37 million rows and less than a second aggregating all the particular locations.
So if you’re using the extra small warehouses, nighttime, aren’t you? Wow. Exactly.
And if I were to actually do the calculation and this query, the load query, I can come back and look at history, actually was in 2007 point Nan.
So if I wanted to actually do the calculation, because cloud’s expensive, right, that would have cost 1.5%.
So, and if you do it for real because, snowflake builds, you had a second after a minute. So if I were to start it and stop it.
Under Nan, I still get billed for a minute, but after a minute, I start getting billed on a second.
So that true cost, was, that was 3.3%, to run that, if I turned and suspended my warehouse.
So, the database impressive performance. Yeah, I’ve heard database and a lot. And if that’s not something that you go, wow, that’s pretty amazing, then.
I don’t know what databases that you’ve been using, but, that’s that, from that standpoint.
Alright, I’m goanna take back over here, and we’ll keep going. So that was point number one, large data volumes. And I think we proved that fast data loading, for sure.
Reason hashtag two, data living flexibility, sir.
You know, we just saw some file based.
Examples of how to help pulling in data into snowflake can happen. There’s a few other options.
There’s something called snow pipe, which is kind of this API event, driven loading, where you can trigger off of events to have the load come in.
Still, again, you know, file based somewhat, But that idea that you’re not setting up a, hey, every day at 10 0 AM run this, which is kind of, can be a common ETL, it’s more, as data gets generated and written to S three, For example, evince get triggered, then pulls the data right into snowflake, and you’re, you’re ready.
Up to the minute. As they say, they’re not alternatively have wide tool support, so, you know, they have a lot of data integration partners.
If you’re using, say, Oracle, as your data warehouse currently, and you’re looking to go into Snowflake, and maybe you’re using all tricks as you’re approach to loading data into Oracle. Well, you could just re point that, you know, it’s ODBC, JDBC type drivers, and start loading snowflake and off you go.
And so this, kind tries to summarize a few of those the different ways that you can pull data, either from web applications, through APIs into snowflake. Externals stage, like we just saw.
This idea that you can also, like, use a command line tool to copy it to snowflakes own internal stage, then pull into the database, or the third party tools.
Also, going into snowflake there. So lots of different ways to get data into the system, which is always beneficial.
Next, we’re going to talk about a Broad BI tool support.
So you’ll see the three tools there with disinterest logo that we are partner with those tools. We’re goanna do a quick demo on how the Connection Workspace. On the right side, you can see from within snowflake, you can download DBC drivers which those tools use or JDBC drivers in the case of correctness.
There’s other types of drivers and connectors, and command line tools. Lots of flexibility here on how you talk to and connect Snowflake.
So let’s, real quick run through just that connection process in Tableau is where we’re going to start.
So you’re going to connect to a server in Tableau, its snowflakes, not here.
You pump out your more and you’re goanna give it this server URL.
This comes from your Snowflake URL. So if we get back to you or snowflake environment, you’ll see this root URL. That’s what we’re going to pull in there, too.
Server. You tell it which role you want to connect as.
You give it your username and password, and we’re going to login.
And since we need compute resources, we have to select a warehouse that we’re going to use, so this compute warehouse was already running.
Select that guy and then we’re going to just select whenever databases, This Test baited database is extremely simple.
One table, one schema.
You pick your schema, so table and you’re connected.
You can pull data, you would do your normal Tableau, data visualizations, just like you would.
Let’s look at that same process, real quick, comparable.
So Power BI, I Stroke, Get Data, Connect and use the Snowflake Connector.
We, again, need that server name. So let’s grab that again.
This is where Power BI isn’t quite as nice. We don’t get the nice drop-down like we got in Tableau.
So we actually have to know that, or, warehouse is called compute warehouse.
Get a couple other optional columns. We don’t really want to pull all the data, and in this example, wouldn’t matter.
But typically to data sources like this, you’d want to direct query.
It’s going to prompt us for our snowflake credentials.
If you’ve done that before, it gets cached off locally on your machine.
And at that point, you’re doing the typical navigator experience of what database schema, let’s say what you want to use.
And then from there, you would do your typical Power BI visualizations, which we’re not going to get into today.
And just to show, we’re not actually going to walk through the Cognos example, but incognita, she would configure it just like any other data source, again, leveraging that JDBC driver that you install.
Referencing a warehouse. Same concepts, just a little different approach from the tool’s standpoint.
All right. Let’s go into the reason for the support for analysis ready data models.
So, modern BI tools like Power BI and Tableau and Cognos that we just saw, they still work really well with star and Snowflake schema data models, right?
So, you can use other data models that they support it when your business analysts really get it. And then use models the most effective, and they work best when you can kind of present them in this analysis ready view of the data.
So, Reeves’, do you want to talk to kind of, what you typically see folks do here on top of, know, their route, once they’ve pulled in that data, how this kind of plays out?
Yeah, you can.
I mean, you can I think, the big, the big thing here is, the tool, unlike some of the big data tools, don’t force you down.
Any of the modeling that you can’t already do, within a normal database.
So, are within your traditional databases.
So you can model it, however you want, from, the standpoint of, you can model it.
If you wanted to do a data vault, you can model it with, you know, any kind of normal modeling that you’ve done for, for Data Warehouse loads.
I mean, you do not want to use snowflake and for transactional loads. That’s not what the system was designed for. And that’s the one.
But I could still build the model in that way.
If it was not loaded on a row by row type basis, and you’ll understand as you do more things with snowflake, it’s just that’s the use case. That it’s that it’s not ideal for.
It’s ideal for large amounts of large amounts of data.
Some queries, like what I did in the demo, and one of the things that, you know, it didn’t note in the demo, did you see me create an index?
You know, I didn’t have to do any of that stuff, so you’re able to model nicely, like what we’re talking about, and then not have to do a lot of this stuff that we have to do in other systems. Millimeter, hmm.
And so, you know, we sort of talked about the idea that sometimes you can use, you can use. Sometimes you can actually make another database and kind of use that database as your data mart. And there’s different ways to create these analysis ready views of your, your data.
And then how, how do you want to implement security becomes a big thing to read?
So just a quick mocked up example here, that we put together is, maybe you, ingests tables, maybe, don’t come in, as, you know, this star schemas, that you build some BI views on top of them.
And then you have a, a user that’s sort of restricted to that schema, said that, regardless of who’s using it, and which warehouse they’re going through, that that user has only seen the view levels instead of the data tables.
But your data science team might come back and say, hey, we don’t want the analysis ready views, right? Just getting, let us have the data.
So you can give them a different user with different permissions, or even a different database, if that’s how you want.
Play it out.
Anything to add there, eaves.
Sorry, I’m on me. Oh, no, I’m not. I mean, no, I’m sorry, I didn’t. All right, we’ll keep will keep rolling here.
Then, OK, so you touched on this just a second to get this idea of minimized administration. And this is from some of snowflakes partner marketing materials here.
This idea that like in a traditional database, you’re doing all these admin tasks, great on top of potentially hardware maintenance, and things like that.
Backups and query’s query planning hints, you know, partitioning all this, all this stuff that you got to worry about. And their approach is really just, like, forget about all that. We have, we’re handling it under the covers.
They will dive into some of the, how they’re handling it, but at the end of the day, the impression I get at least is this, this idea of, don’t, really don’t even worry about what we’re doing under the covers.
Is that fair to say?
Oh, for sure.
Yeah, it’s, it’s nice because they’ve taking that piece away to where you don’t have to do the indexes. And then other large systems like Synapse and Redshift, you have to distribute the data because it’s a MPP system that’s not sharing storage.
A lot of that complexity is taken away.
And now, is that always to your benefit, though? Because I know like, you know, we’ve talked about maybe query, query hints. You know, I’m writing a query. I want to add an index, and I want to use it?
The thing that that they’ll suggest and the only thing that you can actually do is you can actually store data into the partitions.
So in sorting the data into partitions, you can do partition elimination, which is common in other databases.
That’s the best thing you can do.
And it’s, it’s something that we didn’t see much of a benefit over.
We were, you know, we’re working with tables, over 22 billion row tables that we’re loading every day.
Didn’t see much of a benefit there, and even when we were joining those 22 billion row tables together, we didn’t see that much of a benefit.
I run things where you’re almost into trillions of rows, and I didn’t see that much of a benefit in specific queries.
There might be edge cases that that’s the case, but is, is what I’m getting at is for your large datasets.
And when I say large, I mean we’re not I would doubt that many people would have trillions of rows. I mean, if you’re talking IOT stuff, yes. But if you’re talking normal stuff, you probably never going to see that thing.
Don’t really have to get to that level.
So I would say you don’t really have to do that.
Let’s talk about some scale performance scalability, and we kind of started to see glimpses of this as we looked, snowflake, UI. This idea that each of these different warehouses can have different sizes they can start and stop when they need to.
There’s even some skill up’s. Scale out Scale, back down, type features.
So let’s swap back to Snowflake UY real quick.
Just highlight how simple a lot of this is.
So whenever you have a suspended warehouse, if you’re manually restarting it, nice things will automatically restart if they, some query comes in and tries to use them.
But if I wanted to resume that’s manually, I actually get a packet, what size I’m going to resume this already created Data Warehouse outright.
And then they also tell you the credits, which again, comes back to the cost side of things on how, how many credits.
How much you’re spending per hour.
When do you use it?
So really easy to scale up, or scale down an existing warehouse?
And then if you have some of the more advanced editions, you get this idea of clusters.
So, let’s see, Think about your typical Analytics use case of, everyone comes in, on a Monday morning and runs all the reports, right? You don’t want to always be running an extra-large warehouse all week long.
For that use Case C, you might run a small warehouse all week and then let that small scale out as it needs to know how many clusters you want to go up to. How much does it come back down? Does it suspend or does it not just men, although such things?
So very flexible.
And just very, you know, from a UI standpoint, very streamlined and easy to understand and implement four for the administrator.
Easy to easy to manage.
A lot of self-management, which is pretty nice.
So real quick, we’re going to talk about semi structured data.
And this is, you know, maybe an approach you’d see if you’re doing a lot of big data manipulation, or, I think you mentioned the other day, maybe if you had some, some data coming in with schemas that change a lot, or maybe they’re not always consistent.
But there is wide support in Snowflake for XML JSON, and another semi structured data.
There’s this Variant column concept, where you can just pull that entire block of XML, or JSON into com.
And then there is a File Format concept, where, when you’re pulling data in from a file, you have to tell it, know, a little bit about the structure of that type of that file. So you’re giving in a Type …, JSON, and XML, something else.
What do you want to do with null values? Is there like an outer array, meaning, like, sometimes you’ll get back like this results top level array around JSON data, if you were hitting an API? Maybe.
Maybe you don’t want to put that PSN because it’s just repetitive and not useful. Things like that.
Then, when you actually load the data, you can do it with the wire, or through, you know, procedures or scripts.
You’re going to use that file format and reference that file to pull it in.
And at that point, you can kind of build views or other queries too parse apart that XML, or JSON semi structured data.
You can even type it see.
You can see like, it’s being cast of our chart.
Allais them, just anything else you want to do with that type of data?
Very flexible for this data scenarios.
Nice thing to add there is, yeah, and the nice part about as you can, you could use Snowflake as a as you call it a data lake for stuff that you are using. Possibly, and that you want to use down the road, but you’re not 100% sure.
But you’d like it there to be able to query it without having to worry about using other features like Athena to query, S three, and stuff like that.
So it enables you to, to put that, to, pull that data in, and then query those pieces off those tables. So that’s it.
You can think of it as a staging.
You can stage all those, all those. Yeah.
Doctor, then, the only table in snowflake would just be a table that had a single variant column Correct.
Yeah, he’s a variant in a time-stamp of some sort. So you can see when stuff comes in.
It’s a great support for semi structured data, and a variety of formats. Kind of, summary bullet point here. These Last few points here. We’re goanna get into some of the fun stuff that you can do with snowflake in my opinion.
This cloning feature is really, really nifty.
So on the right side, you see this little diagram of what they call 0, 0, copy technology.
The idea is that if you copy a database, you’re not actually copying any data. It’s just kind of a metadata layer that sits on top of it.
And then once you start modifying the data in that new database, you’re just storing off that slice a modified data information.
So really powerful. The DevOps implication here.
Implications here are very powerful.
You could, you know, coffee or prod back to dev, and no matter how big your prod is.
And since you haven’t actually replicated all that data, you did not cost yourself any more money.
Very small slice of metadata information gets generated and copied at that point in time, and same thing for promoting tables. Sorry, go ahead, Reefs.
I was just going to say the whole nice ideas.
Most of the time we have problems.
When we, we’re developers, right? We test it against a small subset of data.
The nice part with the nice part of a snowflake is you no longer need to do that. I can clone the database there.
It’s a metadata operation that basically, now I have two pointers to the same amount of data. If I go to modify it, because its production, everyone’s like, oh, you can’t modify production.
If I go to modify it, they will be different partitions that are separate from the ones that the production data is is pointing to, see, don’t interfere, and you’re only adding additional storage as you start to modify things. So it takes up, it takes a deep dive into understanding what’s going on underneath.
But the simplified version of that is, when I do a clone, there’s no, there’s no storage, there’s no storage cost there. If I were to modify that clone, I modify it separate to what I clone.
So, that is left alone, but the nice part of that is I can now run my test loads against a production scanner, which, which is so incredible.
And I can then delete that clone, and nothing had affected what the production, what was in the production, did it?
Yeah, we’re goanna run through a really quick view of demo of what that looks like. So we have this prod database over here.
Doesn’t have a ton of data, just a few tables in there, but we’re going to create a new database called, Dev Cloning Pride. And, you know, you’re going to see that this executes against my small warehouse very quickly.
First Nan. I refresh this, every here, now, you’ll see that I have a new Dev database grade.
I’m going to swap my context of this current worksheet here to Dev, which I’ll say you can do using this drop-down over here.
So now that I’m in Dev, let’s go ahead and create this new table, my new table that I’m going to add to, yeah, to my database to do some development, and I’m going to put some test data in there.
I’m going to go off and do whatever development I’m going to do. Let’s make sure we have our test data in their group one row of data.
I go do everything everyone signs off. I’m ready to promote to Dev, so our permission back to fraud.
So we’re going to use Prod, and then used the same Clone keyword here to create my new table and prod from the public domain new table.
It’s now pulling up that table in the prod, and in prod. Let’s go ahead and select start from that new table.
There’s my test data. I actually didn’t mean to bring that up and prod, right?
But interesting thing here is that clunk man did bring both the schema and the test data all the way into prod.
So to fix that, let’s drop that table, and then let’s recreate that table and prior to using like, command. So instead of the clone doing a like, which is only going to pull up the schema.
And then lastly, we’ll select star from that and prod, and see that we’re back to not having any tested in prod.
So, really powerful ability to, you know, clone your production back into a dev database, and do testing, or, you know, investigate an error maybe. And then also, if you’re doing new development or fixing things, being able to promote tables and things like that into prod, a couple of different ways.
All right, time travel thing. Let us speak to this one Reeves. Sure.
So, with the way that snowflakes stores and storage data, it allows you because I’m one thing to understand. Well, just all this. All this storage for snowflake is non updatable. So if I need to update something, I’m going to create a different partition with that data now to an updated fashion.
Because of that, I can go back to what I previously, what I previously had prior to that update or prior to that change.
It has storage implications. So, there’s some storage implications, if you were to do that.
And hence, why a transactional system wouldn’t be great, because if you’re updating a row by row, it’s not a, it’s not a good idea, but because of the way that a start, and a, in a data warehouse fashion, you’re able to look at things, previously, that didn’t.
I mean, the great example as I drop a table, and I can drop there’s a…
Table in and Snowflake, and who wouldn’t love a database that has these number one. We’ve never made that mistake. But I know a friend that might have. So. Yeah. So you have things like that, and that’s a time travel, and that’s a time travel by this. Is that ability to go back and look at specific things now, to keep this on an easy, it’s just an introduction? We’re not trying to get too deep into the weeds. But the concept is, I can make tables that store data for a specific period of time, from, let’s say, one day to 90 days, depending on how I set the table. So I could make updates, if they happened every day. I can make updates and with those updates, I could go back and look at what it looked like 10 days ago, five days ago. So it makes it very nice from that standpoint.
So, for our storage costs, because you’re keeping and you’re paying for all the stores, that you keep it, the benefits are nice. And that’s why, for some tables, you can create transient tables that don’t keep much history.
And then, you create regular tables, but it allows you to not have to worry about doing, not have to worry about doing backups and stuff.
Excuse me, stuff like that.
So, Niagara Falls, somebody asked the question about cloning, and just too kind of go back to the cloning, what I’m calling an object. I create a new object. So any connections to that existing objects still remain, connected to that original object. So it has no effect, the great new object.
So let’s run through a really short, brief time travel demo, just to show what it does here.
But a simple table, three lines of data here, sitting here on a database, we’re goanna insert a fourth row of data.
Just like you normally would, going to select star again, you’ll see we have or is data. Fantastic.
But, we can now ask the question of what was in this table two minutes ago.
So, let’s look star using this at an offset. You’ll see we’re back to only having our three rows of data.
So there’s this offset keyword. What are the other options that we can use their reads?
You can use query, ID, and you can use specific time.
So I can sit there and say what happened prior to this query ID, and then I can also say what happened at midnight, two, or midnight.
Got cha, second, so those are the things that you can go back on.
So, great, Nifty, very helpful. And I think that last point is, you don’t have to worry as much. That’s very powerful for all of us. A little less stress is good.
Then, lastly, point hashtag 10, we’re going to talk about data sharing.
So, to kind of use cases here, one is this idea that you can pull and curated datasets into your environment and then maybe add them to your enterprise data warehouse or just do some ad hoc analysis, whatever you’re wanting to do with those.
And then, the second one being you can share your own data with partners or have partners who share their data with you, it’s all safe, secure, streamlined, connection.
So, let’s take a look what that looks like.
So, if I click on this Data Marketplace tab here, I’ll go to the Data Exchange Hoover in Snowflake.
And, so, we’re going to just do a quick search on couvade, since that’s popular these days will see that lots of folks have published lots of different cover data sets out here. State of California’s interesting.
So, we’re going to use that for this example, Tells you a little bit about it.
We do this get data.
We get, we get to pick the database name, and this is going to show up as, see it good.
So, account admin is going to get this by default, let’s just make it public in our environment.
Should be sufficient.
We’re going to accept the terms this, this is really key here. You want to, you know, talk to that Reeves real quick. We’ve finished making this.
Data is not going to consume any storage space, right?
So, if I’m, if I’m using a shared database, I’m basically now getting pointers to their database, and that’s why I’m not incurring the storage costs.
And in sharing, I can set it up in 1 of 2 ways. If I don’t have snowflake customers, they can use my compute and I pay for the compute.
And whatever storage if I’m sharing it in this instance, these people are sharing the data, so they’re paying for the storage, and now we are using that data and would pay for any compute on that data.
Yeah, it makes sense.
We’re just going to do it quick
query against that database, that’s now, obviously, visible in our environment, so you can see how you can pull that in, and maybe you could populate another database with that, You know, there’s all sorts of different use cases for how you might want to use this to then expose it, either to your business analyst and other business users, or just use an ad hoc comparison right within snowflake itself.
And so that concludes the 10 reasons that snowflakes is built for Analytics.
The recap of those, and again, you can definitely download the slide deck. We’re going to do a Q&A here in a second.
Once we get through a couple quick slides, the first being that if, if your organization is using Snowflake, if you’re thinking about using Snowflake Senturus is here to help you.
We can help with data warehouse migration projects, enhancing performance, just overall BI tool, and data center architecture, things like that.
We’re offering a free hour consultation on your Snowflake environment. Just reach out at [email protected]
or 888 601 6010. Feel free even to put a question in
the question bar, and we’ll make sure to get back to you soon there.
Mike, I’m going to hand it back to you to run through these next couple slides. Great. Thanks, Bob. Stick around, everybody, get your get your questions in the questions pane. We do have some great questions in there. I know you’ve answered some of those in the question pane, but there, if we have a few minutes would be great to maybe go over a couple of those.
But, for those of you who may not be familiar with Senturus, we are the authority in Business Intelligence. We concentrate and focus solely on business intelligence with the depth of knowledge across the entire BI stack.
Our clients know us for providing, for providing clarity from the chaos of complex business requirements, myriad, disparate, ever changing, data sources, and constantly moving, and changing regulatory environment environments, and targets.
We’ve made a name for ourselves because of our strength at Bridging the Gap between IT and business users.
We deliver solutions that give you access to reliable analysis, ready data across your organizations, enabling you to quickly and easily get answers at the point of impact, in the form of the decisions made, and the actions taken.
Our consultants are leading experts in the field of analytics with years of pragmatic, real-world expertise, and experience advancing the state of the art.
We’re so confident in our team, and our methodology that we back our projects with a 100% money back guarantee.
That is unique in the industry, and we’ve been doing this for a long time, coming up on two decades here. We’ve been focused exclusively on BI for nearly 20 years working across the spectrum from Fortune 500 down to the mid-market. You probably recognize.
many of those logos on the screen, in front of you, solving business problems across virtually every industry and many functional areas, including the office finance, sales, and marketing, manufacturing, operations, HR, and IT.
We pride ourselves in the fact that our team is both large enough to meet all of your business analytics and project needs. It’s small enough to provide personalized attention.
As mentioned earlier in the webinar, we encourage you to visit Senturus.com and the Senturus resources tab there, where there are hundreds of free resources on our website from webinars to all things BI including our fabulous up to the minute, easily consumable blogs.
Thursdays are a typical time for these events.
And we have another event coming up on November 19th. So next week, data integration options for Microsoft Power BI.
That would be one of our right tool for the job webinars. So, head on over to Senturus.com and look at the events page where you can register for that.
And, of course, you’d be remiss if we didn’t bring up our comprehensive BI training offerings.
We offer training in the top three, major BI platforms, Cognos Analytics, Power BI and Tableau are ideal for organizations that are running multiple platforms, or those moving from one to another as our trainers are conversant.
Generally, in all three of those platforms, we can provide training in many different modes, from tailored group sessions to mentoring to instructor-led online courses, and E-learning, and we can mix and match those to suit your user community.
Senturus provides hundreds of free resources on our website, and we’ve been committed to sharing our PI expertise for over a decade.
And with that, that brings us to the Q&A, and Bob. I don’t know if you had a couple of picked out there, though, but there were a couple of good ones in there.
Or do you want me to pick one for you?
What are the top three to five reasons to pick Snowflake over other database platforms. And the first one is it’s as Snowflake is easier to manage than almost any other platforms. There’s no indexes, but performance is still really good.
I do like the separation of storage and compute over the other platforms, because I don’t have to keep things running to answer queries.
And the other, the other piece that I forgot to add, and somebody else asked a question about it, also, as is, can I do this at a small scale? And the answer to that question is, yes, the nice part about Snowflake because I can start off and it’s because it’s a consumption based.
I can start off on a very small project and scale up as the project expands or as the data expands or as the need expands.
But I want to incur those costs were other providers. And some of those other systems, when I start a project out in those MPP systems, you’ve got to start up the entire MPP system and loaded with maybe small amounts of data as you start off with a POC’s. That is not the case with Snowflake. So there’s some benefits there.
There’s a couple other questions that people are asking, let me see if I can find.
Possible snowflake use case for a small, medium company that has several quality PA systems.
I thought that was kind of an interesting question. You can start off slow, so it doesn’t have to be large amounts of data, but you start off slow and you just have a small Snowflake bill.
So it doesn’t it doesn’t have to be large sets of data, but the nice part about it is, if it ever did get large, you don’t have to change anything.
You can scale up different compute, no warehouses.
And you don’t have to change the way you’re doing things, and those other platforms, you have to spread the data across nodes, you create indexes and stuff like that. It becomes a little bit more management.
From that standpoint, I saw one user comments that we still need to use clustering instead of indexes when the data is huge. And that’s kind of one of those things. Or it may be hard for DBAs, but you have to kind of get used to
letting go in the range a little bit and trusting that snowflake’s going to handle that in the background or use the auto scaling features and things like that.
Right. And, and I’ve seen it.
Where I have not, there’s one case, and I don’t know why I didn’t look back, and try to figure out why it wasn’t, but every time I scaled up and cut the cost in half. So theoretically, it’s actually the same cost, if not cheaper, to scale up. So I had, let’s just use an example.
I had a query that, let’s say, ran for an hour, and I was running on an extra small, if I turned it to a small. So I scaled up.
So now, I’m being charged, instead of $2 an hour or $4 an hour.
Usually that cuts in half, and I’ve seen almost every case where that would cut in house, and now its 30 minutes.
So really, even though it’s $4, so it’s more per hour, I only used it for 30 minutes, so it’s actually the same cost. And usually, it scales down to at least 50%.
And I’ve seen it.
And it depends on the type of queries and things that you’re doing, where you can scale.
We’re doing joins on almost trillions of rows joining 1000 billion row table with another table.
It was running on for Excel, which is huge, but it ran, I think, 13 minutes.
I can’t imagine you putting that on a system where that would even run.
But in and if you say, well, I don’t even have close to that data scale, that’s fine.
You can still run with the same type of performance, and now you’re running on an extra small, and you just have extra small warehouses in that fit your need for your entire inner part. So cost effective.
I saw one other.
We need to write better data scientists, terrible C, equal to, no offense to the data scientists out there, really, on top of not only being fast, It’s, it’s this idea that you’ve isolated your data scientists into their own compute space, so that their compute state tax it out completely.
If they don’t, it doesn’t matter, because all you’re reporting, assuming you’ve set up a reporting specific date, warehouse or compute, is not impacted at all by what this scientists are doing.
So that’s a major benefit of this idea, that compute, and storage is separate, and then computers just fully separate from each other.
All right, yeah, that’s a great question. I mean, there’s always that when you get performance. And then you sort of can enable bad habits. And we’ve seen that with things like columnar databases, and across the spectrum. So you still want to design things well.
There was another question I want to be mindful of the time here, is we’re at the top of the hour, but about needing to create a presentation layer on top of that data for reporting.
You have anything to say about that?
You don’t have to. So Snowflake is, like, any other database I could do that from a security standpoint if I wanted to create views on top of it to be able to then provide different security.
But to those views, but no, you don’t.
You don’t necessarily need to put a presentation layer on, on top of it.
It’s not to say you couldn’t with some kind of thing in in-between, like gave you.
But you don’t have to do that because you can connect directly.
As you saw with the tools, you can connect directly with the tablet and the Power BI is directly to smell.
And given you are hopefully designing upfront with analytics, use cases in mind, it’ll have some of that dimensionality and things. So you won’t have to do as much modeling on top of it, as you would say, a transactional third normal form database.
You know, wedge, one more in here before we can wrap up.
I think this idea of security and users and accounts and things like that, can you just touch a little bit on per user licensing cost reefs.
There’s, well, there’s not a cost on, there’s no per user licensing cost.
If you are being charged on a on a computer and storage, it’s actually very easy. There’s actually three.
There’s a moving data outside the environment, pay for that, but that’s usually a minimal cost, the cost for storage and compute.
And you will, it doesn’t matter how many users you have.
If they all use the same data warehouse, then you’re fine.
But if you have to create different data warehouses for different groups of users, then you can do that also.
But there’s no, there’s no cost on a user basis.
Security is such a deep topic. you could have individual users.
You could have users at the tool or purpose type level, different ways to approach that, and we’d be happy to talk on it a call in depth, right, to a particular needs. You can do Federated Access. So you can set up your eighties, you can setup OKTA, and those kind of providers to be able to have your users login the Snowflake without having to have a login.
They can just authenticate to it from that stamp.
So yeah. Excellent.
Great. Well, that was fabulous. So any of you, the questions that were submitted, we will respond to those in a written response document that we’ll post to.
Senturus.com, along with the recording, and today’s deck. And, again, as you can see, this is fascinating technology.
So any of you out there who are, you know, looking to retire, that old database sitting on an AS 400 in the corner of your basement somewhere, you’re looking to modernize your, your database strategies, get better performance, enable self-service, move from on-prem to cloud, give us a shout. So, if you want to move to the last slide, Bob. First of all, I want to thank Bob and Reeves for great presentations, some great content today.
And, of course, thank all of you for joining us and taking an hour out of your day for this edition of the Senturus Knowledge Series. So, again, if we can help you with that, or any other analytics types of challenges, or projects you might have, or training, please feel free to check us out at the website. If you’re still use a phone, call us at 888 601 6010 or email at [email protected].