Snowflake’s Cloud Data Platform and Modern Analytics
Demo of Performant, Secure, Scalable, Near-Zero Maintenance Platform
In this webinar recording learn about performant modern business intelligence using Snowflake, the cloud-based data warehouse platform. If you’re looking to improve the performance of your old legacy database or want to get out of the business of performing maintenance and upgrades, this demo and discussion with Snowflake is for you.
What we demo in this on-demand webinar
- Snowflake’s intuitive user interface
- Creating databases and compute nodes
- Loading data via various methods
- Running queries in Snowflake
- Natively storing and querying semi-structured data
What you will learn about
- Zero management: built-in performance and security
- Instant elasticity: easily scale computer power up or down
- A single repository: structured and semi-structured data
- ANSI-standard SQL: keep your existing tools
- Pay by the second usage: scale storage separate from compute
Designed specifically for the cloud, Snowflake can serve as a single data repository for structured and unstructured data, providing elastic performance and scalability while reducing overhead and maintenance. While the face of analytics has morphed in this hyper data-driven age, the need for a central repository remains. Serving as a single source of the truth, it provides efficient and secure storage of often sensitive and highly varied data, enables automated data updates and addresses the needs for information sharing.
This on-demand webinar on Snowflake delivers a flurry of great information. If you’re moving off a legacy database to a more modern performant data platform and considering moving to the cloud, watch this recording.
Chris Richardson joined Snowflake in June 2018. Chris has nearly two decades of experience in data and analytics with roles spanning from business intelligence to marketing to strategy.
Welcome to the Senturus Knowledge Series today, we’re proud to present the topic of Snowflake, cloud data platform and modern analytics. Before we get into the heart of the presentation, we have a few housekeeping items. First of all the GoToWebinar control panel that you are seeing on your computer can be minimized or restored using the orange arrow that you see and while we have everyone on mute.
We do encourage everyone to submit questions via the pane and we will do our very best to answer those live during the presentation. If we are not able to answer them live. We will answer them via a published questions log that we will put up on Senturus.com in our resources area along with a copy of the presentation and the recording of the webinar. Which brings us to the next question, can I get a copy of this presentation? The answer is an unqualified yes. We will put it up on Senturus.com as mentioned before with the questions log.
And a recording lags the presentation and question log a little bit. But I promise you we’ll get it up there. And while you’re there be sure to bookmark our resource library as it has tons of valuable content addressing a wide variety of business analytics topics our agenda today after we do a few brief introductions. We will discuss Snowflake, joined by our partner Snowflake. He is going to cover the cloud platform and all of its various benefits
and uses. After that we’ll do a quick Senturus overview, discuss some additional mostly if not entirely free resources and then we’ll get to our question and answer. So make sure you stick around after the presentation. So introductions, I’m pretty pleased to be joined today by Chris Richardson who is a sales engineer at Snowflake has been there since mid-2018 as a senior sales engineer.
He’s a seasoned business leader with almost two decades of experience in data analytics with roles spanning from business intelligence to marketing and strategy. He’s skilled at developing deep insights Innovative approaches and strategic solutions across many industries and technologies. Prior to Snowflake. Chris was a solution architect and prior to that
he held roles in consulting business, ops and marketing where he learned his hands-on skills and BI and data warehousing to create a culture of data-driven decision making. My name is Mike Weinhaur, I’m the director of training content and here training here at senturus and have been delivering and selling business analytics solutions for a long time over 20 years at this point. So with that I’m going to hand the floor and the microphone over to Mr. Richardson and the floor is yours take it away. Thanks Mike. And thanks everyone for coming to the webinar today as Mike said I work at Snowflake. I’ve been there for a couple years and we’re going to talk today a little bit about Snowflake who we are some of the main problems.
As we solve and then get into a demo and I really am not intending this to be a sales pitch. You’re going to have the slides you can you can read those slides later. I’m not going to read them to you, but I will run through them quickly here before the demo and just to level set and everyone’s on the same page before we jump into a hands-on demo.
So as Mike said I have been in in the data space really for 20 plus years and it’s interesting over the last five years to see five or so years to see all of the media attention around data and data being like oil or your most valuable resource and you know, that that’s interesting and I think a lot of us are data folks, but we realized that it’s not just the data.
It’s how you use the data and it’s something that a lot of people have struggled with overtime as I how do I you leverage this data that I have to turn it into an asset that can be a competitive advantage and I really think that snowflake has taken some strides in this area and I’m happy to kind of share this with you today and kind of explain our journey and then show you the product. So sort of first of all why you know what’s changed, right we’ve had data for a long time. We’ve had it sent.
Three things that have happened in the industry one is this maturity and growth of cloud computing. So the idea that you can have an on-demand resource that virtually has unlimited storage and unlimited compute is different right and if you could have a database on top of that that would actually be something that could allow you to leverage data in different ways.
That’s really what Snowflake’s done. The second thing is to make this data actionable you there’s a lot more data and that data has insights that the data that we had before didn’t provide. So there’s a just a ton of data in such an exciting space, especially just even the last couple years of how much that’s gone mainstream tons of data coming off of websites and social media that can be combined with the data that you have already to get new insights.
And then lastly there’s a lot of really new patterns around analytics libraries tons of new tools and really the desire the kind of thirst for analytics is really taken off really in the last few years and part of it is driven by the media part of it is driven by these things more data and new technologies that make it happen.
Snowflake has only been around for five years. The owners were trying to tackle some of these big problems and turn that data into assets and they looked at the market and you know, the first one was they wanted to help people make better and quicker business decisions and what this meant was all of your data needs to be in one place.
You can’t be hopping around to different databases to run your analytics to sort of have a centralized storage layer for all of your data. The second one was it can’t be difficult for you to scale this infrastructure if people want to use it. You should have the resources available to use it. You shouldn’t have to do an upgrade remove hardware from your data center replace it with new hardware install patches things like that.
So they wanted to make this cost effective and they wanted to make this easy for people to do and then the last piece is they wanted it to be easy for people to consume. I worked on projects and there’s some people who have to have access to the data. They can run the queries they can get whatever data they want loaded in a timely manner and then there’s some people who have to wait they get throttled. They can’t run their jobs. They’re not trusted to create certain objects in the database.
And so there’s the haves and the have nots and what we’ve done really with Snowflake is try to eliminate those barriers and now everyone can really get access to the data and start doing the analytics that they want to do.
So, you know, we’ve created what we think is the next generation of a data platform. We call it a cloud data platform. But this idea it’s been around for a long time. And you need even in the late 90s the first data warehouse I worked on it was a Teradata system on-perm and it was in EDW. It was all of the entities in the business all of the things about manufacturing all of the things about testing the devices. It was all in one database.
The size of that was 800 gigabytes, and I really wanted it to be a terabyte. That was the big thing back then but it was 800 gigabytes and that had all the data and it works. It works fine. We could model all the data but where it started to break down was with users. So when we started to pile on more users on to that system it became slow the amount of resources were fixed. We had to start to throttle people. They couldn’t get the access to the data that they needed.
So along came kind of like around the same time. The cloud was born people started to build services in that and one of those was databases so products like Redshift which is which is a great product. It was the very first product really in the cloud that had an MPP type architecture to do data warehousing and it gave you it gave you what you wanted.
It could write SQL which people are used to you it ran fast queries, but just like other the on-perm. The stems it started to break down when a lot of users use the system and then along the same time these new data store sources coming in different kinds of data, whether it may be structured data or semi-structured data a lot more data. So the volumes were starting to get kind of push the limits of these systems that there was a new generation of that had a lot of promise the Hadoop kind of environment where people are creating data lakes.
These data lakes are great because they could store all kinds of data. They had separated sort of this idea of storage from the processing of that data. So you could have more users different clusters running at different kind of on that same workload of data the problem with this was twofold one was it wasn’t a SQL database right people were used to sequel. They wanted to use transactions.
They wanted to do updates and inserts and things that they used to do in their data warehouse, but they couldn’t do that in their Hadoop environment and some there were some strides made to do SQL on top of Hadoop, but in the end and you know, we’re still seeing this. It’s really hard to get consistent results the bi kind of queries the interactive workloads that you need for a business. It’s hard to get that out of Hadoop without a lot of special tuning and a lot of resources really specialized resources to make that happen.
People have been trying to centralize data and kind of democratize it for a long time. But the cloud has the ability to handle different kinds of data. We’ve got that in Snowflake and I’m going to talk to you why we think about that in just a second.
So this isn’t really intended to be an architecture that you might have but it’s sort of meant to represent the landscape that we see today in across different customers. It’s basically its very complex today. There’s a lot of different data is landing in different places a lot of different tools. It’s coming from a lot of sources. So it might you know, it might come from Kafka and streaming. It might be a change data capture off of your ERP system.
You may be pulling it from other systems like Salesforce.com. And then usually there’s some sort of transformation that happens on that data could be done through SQL and Hadoop could be done through spark, but then it finds its way into a data warehouse. Sometimes there’s data marts that are peeled off of that data warehouse because they have capacity limits on the data warehouse got to back it up and other systems.
There’s dev systems, prod system, and then usually it has to end up going out into other structure it could be you’re sending files out. It could be people are sucking it into a notebook. It could be Tableau data extract.
There’s a lot of ways that people are pulling data out and kind of the net here is it’s complicated all of these systems cost money to maintain each of these connection points is a potential point of failure and throughput could be an issue and lastly is that you’re really losing governance across this whole It’s hard to keep track of where the data is who has it. Is it secure so that’s you know, kind of the big problem that that snowflake solves is that it allows you to eliminate a lot of this complexity. You can put your data in one place and you can do these different workloads on top of it and the platform itself can run in the cloud.
So it runs on each of these different cloud vendors. I you know, I believe you Chris you seem like a nice guy what makes it different and the biggest thing that I want to share with you is it’s this architecture. There’s a new architecture we call shared data multi-cluster architecture and that is what Snowflake built off of and that’s what makes it different to talk about. The architecture here now is that there’s really three parts to the architecture.
In the first one is this idea that at the core there’s a storage layer and this is a single place to centralize as much or as little data as you want. So you get to pick what goes there. You can ingest structure data, you can ingest semi-structured data and it leverages the elasticity and scale and cost Effectiveness and durability really of blob storage.
So an Amazon, this would be an S3 as an example. The second piece of architecture is this multi-cluster compute these blue gears that are on the outside and the middle. They’re compute resources there clusters of computers that are separated from the storage and they operate independently, so they eliminate resource contention between different workloads. And those clusters themselves more importantly can scale up and scale out on demand.
So if you need more resources to do a certain job, you can choose through policy based settings to make those scale-out scale up and you can get your job done quicker and you pay for kind of those resources by the second and then the last piece of it is really the brains of the operation. So snowflake itself runs as a managed service, maybe a little bit like Salesforce is a managed service. We take care of provisioning all of the equipment. We keep the service running.
Cross the real availability zones, we handle the transactional consistency. We ensure that performance is optimized query plans that kind of thing. We handle authentication role-based Access Control, we capture metadata and everything that’s going on in the system including the data and queries and everything.
That’s fun and we allow you to share this data securely out kind of without moving it and I’m going to go through that a little bit in the demo and as I And they’re all of this runs across three cloud platforms. So the Amazon is where we started about five years ago. Asher came online a couple years ago, and then Google just came online a couple of weeks ago, but the benefit of running on all these platforms is that we allow you to backup data across the different cloud platforms. You can replicate so you can try an environment if you wanted to run.
And Azure and then back up to AWS or failover to AWS of Azure with some reason down and or if you just wanted to move from AWS to Google because you’re getting better cloud pricing from them. You can do that.
So our competitors, you know, it’s an interesting dynamic because our biggest competitors right now our Google AWS and Azure, they all have data warehouses that run in the cloud, but we also are their biggest moors, we spend a lot of money with these folks and we run on their platform. So it’s an interesting dynamic. Okay, and then one kind of another way to sort of look at this before I jump into the demo, which is going to be the next slide is we’ve got this cloud services part there. It’s the brains of the operation. It’s a managed service. We’ve got a centralized data store that’s in storage. And then we have these independent compute resources that use these t-shirt sizes that represent the size.
These compute resources can be used for different workloads. It can be used for ELT. So these folks up here are just a partial list of the partners. They’ve written connectors to snowflake using our odbc or jdbc drivers. They orchestrate ETL or ELT into Snowflake and lay leverage these compute resources, which we call virtual warehouses.
We also have some services for data engineering where we can ingest data without using an ETL tool. So if it’s streaming data as an example, we have a tool called snow pipe land the data in Blob storage. We automatically pick it up and load it load that raw data into Snowflake. Similarly the same the same compute resources can be used for bi so if the sales team has BI dashboards, they can have their own computer resource if data is being loaded the same time the same.
This clearing it they’re not impacted. It’s consistent because its read committed. So data has to be committed before you can read it. So their sales is reading the data running their queries. They’re going to get consistent performance, even if the load fails and it needs to get reloaded that’s not going to impact them because they’re completely isolated compute resources.
Same thing if there’s a data science team there they have a Jupyter notebook. They’re running python. They can run independently spin up these resources as their own if they need more compute resources. They can scale them up automatically scale them down when they’re not needed the other dimension of elasticity that we have is scaling out. So imagine the sales team at the end of the quarter.
They have a lot of queries that can run automatically, so if there are queries that are running and a new query comes in and it realizes that it doesn’t have enough resources to execute that so it can either queue the job or wait for resources to be available or what it will do if you decide through configuration and I’ll show you this in the UI spin out horizontally and then there’s a couple other use cases which I’m going to go through you can share data securely with external or internal.
Folks and you can we have a capability in our product called time travel time travel allows you to query data as it was in the past so it could be 60 seconds ago. It could be 30 days ago. It’s up to you. You can configure like how you how much time travel you want to have you can have zero if you want to and then you can then through some SQL extensions. You can query data at different points in time. And there’s nothing you need to do on that.
You don’t need to do any special data modeling or anything comes with the product and its really nice and then lastly because of the way the architecture works. There’s one copy of the data are databases themselves are just logical objects. So if you need to create a Dev environment, that’s a copy of your Prada environment. You don’t need to buy additional Hardware. You don’t need to you know, buy more capacity or expensive tools to move that data. You just need to do a zero copy clone and I’ll go through this.
The demo as well. Okay, so jumping in here.
Jump into a demo here. So snowflake is a web service. This is an account. I have it’s a demo account. You can see here. It’s called demo 49. It’s running in AWS east region. So I’m going to log in as a user called John.
John is the DBA and Jane is the analyst who’s asked John to load some data for her and this data is for some trips data. I realized that I wanted to go back to this one slide here. So let me just kind of set up the demo here quick. I forgot to do this is that we’re going to load two kinds of data. We’re going to load some structure data which comes from an open, source City.
It’s a bike sharing system in New York City. Basically you pick up a bike in one location drop it off in another location that’s considered a trip and then there’s some data captured about that trip. And here’s the data model that we’re going to go through is that this trip stable is sort of the fact table and there’s a couple Dimension tables stations and programs and then we have a another data source around weather and weather is a observations of weather and that’s semi-structured data. It’s in jail.
So on format and it’s related to the trips data based off of the location York City and the time so we’re going to use that data source to compare how weather impacts trips and that’s really what Jane’s interested in is that that correlation between those two things so jumping back here to the demo. So this is this is what’s the snowflake UI looks like I’m logged in as John.
I have a role of DBA For I jump into running queries. Let me go through some of these icons on the top. First one is databases. One thing about snowflake is that if you’re used to working in a relational database, you will find it very easy to work in Snowflake. It’s an ANSI SQL database. We sequel we write the way you interact with. It is very familiar to someone who came from a traditional database background.
So there’s databases has you can have as many databases as you want and snowflake, there isn’t any sort of capacity limits and snowflake. We are bounded by the amount of data that you can put into S3 or a region in an Amazon, which is pretty much unlimited. So and a database itself. It has schemas those schemas have tables in those tables have rows and columns so you can have use and other database objects. It’s very familiar.
The second one which I’ve talked a little bit about is the compute resources which we call virtual warehouses. These are clusters of computes. You can just if you have the permission the privileges you can just create your own. Let’s just say I wanted to create a warehouse called test. You can pick the size of this warehouse so Snowflake you pay for Snowflake by the second.
So every second that you’re using one of these warehouses you’re paying for Snowflake. The product itself is priced using a system called credits you buy credits, and then you consume those credits, basically one node per hour. So, this is a large warehouse it’s using eight nodes. It’ll cost you a credit per hour. The smallest one is one credit per hour, and that’s small so this just has one node in the cluster. So you pick when you’re defining a warehouse.
The size of the cluster you choose how you want the cluster to scale horizontally. So again, this was that case of the sales team scaling out horizontally when the when they need it you but you can start with just one and then it will go all the way to 3.
If you need it, you have a different policies that you can decide on how you scale how aggressively you scale and then if someone isn’t using the warehouse it can be shut down. We also shut it down for you. Automatically. This can be set down as low as one minute and at the same time if the warehouse is suspended you can have it. So when someone comes in on Monday morning the warehouse will start up so they don’t have to do anything with you don’t have to worry about shutting this down. You don’t have to worry about starting it up and you only pay for this when you’re using it so it’s not on all the time and that’s a really great cost-saving. So I’m just going to hit finish.
What happens here, of course, I’m doing a demo now that took a little longer than it usually does it. It generally starts up right away. And the reason it starts up right away is because we’ve provisioned these resources for you on the Snowflake side. When you request 8 nodes in a cluster we give you those eighth notes and we now you now have a warehouse. It’s running you have this resource.
Source these eight nodes available to run queries and I’m just going to shut this off because I’m going to I’m going to show you in a second like how you know different ways that you can sort of provision these resources and use these resources. Okay. So moving on here to the worksheets tab when I start out by loading the structure data, which is this trips data. You can see here. I have a database called Citi Bike if you remember I had a couple tables in there.
One of the things that maybe takes a little getting used to and Snowflake is this idea of setting the context so because you have multiple databases and those databases can have multiple schemas and you also have the ability to have multiple roles. So your roles are how you get access to certain objects role-based access control and there’s also multiple warehouses you need to tell us which ones you want and you can do that either through SQL.
Well, you can do it through the UI here or you can have default ones. But for this example, I’m just going to show you kind of right here. I’m going to set some context and you can see here. I’m actually going to create a warehouse that isn’t this isn’t running right now, but I’m going to create a warehouse little suspend and five minutes. It’s a small size.
And so now I’ve got this context that I’m using I’m going to use this role that is going to drive what objects I see this database the schema this warehouse, so now I can actually run some queries and so you’ll see here now this this Warehouse is green is started up when I ran that first query it was configured to Auto resume and it started up so I can run query. So I’m just running some queries on these little dimension tables now.
I’m interested in in loading some data and Snowflake. All cloud data warehouses for that matter load data from the data has to get to the cloud so it needs to get to some sort of object storage in the cloud and then we load from there. So in this example, I have a bucket in Amazon. It has some folders inside of those folders. It has some data this data CSV. It zips CSV files and I want to load that data in and how we do that in Snowflake is pretty simple is that we have a pointer.
That object we call the stage at the sign here. You can kind of indicates that this is a staged object and I can just do a list command and I can say show me all the files that are in that folder so there 725 CSV zip files.
And because this is structured data, right? It’s CSV data. It has a structure there’s columns of data in there. I’m going to create a target for this called strips. And next when I load the weather data. I’ll show you how you can load data semi-structured data in you don’t need a schema on when you’re loading the data. So created this table called trips going to actually the command we have and snowflake to load data is just called copy. You can actually copy data.
Into a table, you can copy data out of four table. So a lot of times people ask well, how do I get my data out of snowflake? It’s really simple. You just flip these things around in you could copy into a stage which would be a bucket that you want to dump that data to and you know, you can configure it. So, you know people have certain Privileges and can’t do that all the time and they can just screw it. So, okay, so I’m going to run a query now, so I loaded that data 18 million records.
I didn’t see how long that took though. It took me about eight seconds. So I loaded 18 million records in about eight seconds. That was just one year of data. So a lot of times what people do is that because you’re paying for these resources by the second and they scale linearly. So a small as two nodes that large is eight nodes. What people will do is they’ll just once they get used to snowflake though. They’ll start to use the appropriate size to make sure it stays busy.
And so what I’m going to do here is I’m going to just that warehouse so you can see now it’s a large Warehouse. So I added essentially six nodes to that cluster because I want this copy command which is going to do the rest of this bucket. It’s going to load the rest of that day bow and it won’t load the 2019 data because it keeps track of which files it loads and it won’t it won’t reload data that it’s already loaded. So I’m using a large warehouse now again, I ran that copy command. I’m loading into that
table that I created up here and I’m loading it from the stage table. So that’s done. You can kind of see here how long it took about 20 seconds. Yes some details on everything that’s happened can now if I want to I can resize my warehouse so I don’t have to pay for it running at a large because I’m not that clearing it anymore and I can take a look at that trip stable. So now I’ve got about 89 million records.
So looks like it loaded about another 71 million records in 20 seconds. I loaded what about 4 x 4 x the amount of records that I had before and it took about twice as long but so you can see it’s you get this linear scalability and let’s take a look at it there. There it is. I’ve got this trips data now, you know, you can’t just sort of like there’s a lot of situations.
Options when you’re using BI that you need to create this semantic layer. So people can easily query from tools like Tableau. So John has been tasked with creating the semantic layer given the requirements from Jane so he can calculate some metrics here. He can use some functions. So you wants to know how far people traveled between a latitude and longitude. We can use a have her sign function for that.
Okay that looks good. And now I’m going to do what I’m going to do is I’m going to create a Target or a kind of a Target table that people can query it’s going to be called trips view and this table is you know, what Jane’s going to analyze so, okay. So it looks good when it just kind of pop over here. This sometimes takes a little while to Tableau. So I’m connected to that account and I when I just do a quick update.
I hear sometimes that Tableau takes just a few seconds to fire up good. It’s running.
So I’m going to use this trips view as my source member. I just created that table. It’s in the city bike database the demo schema. I can take a look at some dashboards now.
So with Tableau you can still use extracts and everything, but you can also have these dedicated compute resources. So Jane is actually using her own compute resources. She can run her own queries so she can use this dashboard and you know, there may be situations if the data isn’t changing where you do want to do an extract, but there could be situations where you want to query that.
Data live and with Snowflake because the compute resources isolated. It makes it really kind of Handy to do this. So again, these are all live queries kind of running on that trip stable. So, alright.
Okay, so I’m going to jump back in here kind of get into one of the more exciting things about our product which is using semi structured data, and I’m going to show a couple other features as well here, so, Now I’m going to load weather data. So again, I’m going to set a little context here. I’m going to use this role Citi Bike and I have a production database now right there using it. It’s this demo schema their querying this trip stable or in particular this trip is view, but I don’t want to impact production. I want to actually create a dev environment where I can test this data out make sure it works before I put it into production.
So what I’m going to do is I’m going to create a database I’m going to call it Citi bike Dev. And I’m simply going to clone the city bike database. So I just run this command and again, I mentioned this at the beginning but this is a metadata operation. So we’re not actually copying data physical data. We’re actually just copying the metadata. So if I run this statement now, I have this new database here called Citi bike Dev.
You can see it’s just like City bike it has those tables in it that 90 million row trips table. It has my view. And again, this is a logical copy. It’s not we didn’t physically copy the data, but we now logically have these separated and we can start to work within this environment.
I’m also going to create a warehouse called Dev Warehouse because I don’t want to impact may be other people who are using that load warehouse, so, I’ve got my dev warehouse over here, it automatically set the context and now I want to load weather data. So again, this weather data is sitting in a bucket in S3 cold weather just kind of show you here. There’s weather data by year and this is Json data that’s zipped. And if I did a list here, I can see that there’s about five hundred and ninety files in that bucket.
We can actually query the data so in Snowflake, you don’t have to load data into copy data into Snowflake to see it in the blob storage. You can actually query it and we actually something I’m not going to show today but we have the ability to have external tables as well. So you could never copy your data into Snowflake. But if you do copy it in you get a lot of benefits around performance and cashing but again, I’m not going to go into a ton of detail on that. But so let’s just take a look at this data.
It’s Json data. It’s just a structure that sitting in a Json file and the way we select that structure is just this notation dollar sign one. If I take a look at it, you can see here. It’s you know, there’s a number of Json objects. Some of them are nested towards the end here. There’s a there’s actually a weather object that has an array in it these square brackets sort of show that so it’s a fairly simple.
Structure but there’s some things going on here that are kind of interesting and I’m going to I want to load the data into my dev environment. So I’m going to click over here use my dev environment. And then the first thing I’m going to do is so I’m in dev.
I want to create a table called weather, this is unique to snowflake in that we have a data type called variant and a variant data type is a data type which will load that Json structure natively right in and so you don’t need to put schema around it. You can load it directly in, so if your scheme has evolving or new objects are being added or things are being removed that variant column you don’t need to do anything special with just handle it as it comes in. So I’m going to just create this table and most going to have a time stamp on their data.
It’s quite a bit of data. So I’m just going to bump up the size of my warehouse again. It happens automatically and I’m going to kind of kick off this copy, which I think takes around a minute.
So what it’s doing right now is it’s um copying the data into that weather table which again has this variant column and a timestamp column and you can see it’s doing that dollar one. So it’s taking that whole object sticking it into the variant column. And then the other thing its doing is some light transformation as it’s coming in. So there’s we have extensions on our sequel to really walk the objects in the Json structure.
And so the notation here you can see it’s dollar sign one and time so there was an object called time and what I’m doing here is I’m saying I’m going to use a function convert time zone. It’s in UTC right now. I want to put it in New York time Eastern Time and then I’m going to cast that as a timestamp. So essentially what I’m doing here is I’m providing the schema as I’m reading as a copying the data in and now that copied in so you can see its cop took about a minute. I’m going to bump down my warehouse.
And we can see how many records we loaded. So it’s about a hundred million records that we loaded.
And we can take a look at it.
And you can see here again. It’s like so that structure now is copied into this variant column. It’s the structure and then I also did that light transformation and created this time stamp on here. So now this may be helpful just to query on because it was just in, you know, it was an epic time before and now I can query it with a normal time stamp.
And then now we can just query this data so we can select city as a nested sort of object. So it’s City.name and then I’m going to cast it as a string and call it City similar looks whether it’s an array. So with this query right here. It’s putting schema around it as we’re queering it. So now I have City and conditions even though the data itself is still Json is still in that semi structured format, but I can in
real-time. I can query it and provide the structure that I need to so there’s a lot of a lot of flexibility that this offers folks because they can get the raw data into snowflake query it right away if they need to materialize it they can in a view a table on the fly using SQL just one other thing is that we have other more sophisticated sort of ways we can walk the Json structure, but we can also flatten it. So if it’s nested if there’s a nested array inside of it if that’s the value of your Json object, you can flatten it. So this is just counting the different conditions across different cities. So it’s great that looks like this is working now. I’m ready to promote this this back to production, right? So I’m working in dev right now. I have a table and
dev. It’s called weather but I don’t have that in Citi Bike, but that’s actually where we want it. We want it up in this city bike database. So what I do is I can create a table in prod Citybikedemo.weather and I can clone this one that I’ve been working on in Dev. So I run this command again. This is a metadata operation.
So I now have weather up here that hundred million Row Record table people and prod if they have access to the demo Seems they See it this demo environment. I don’t need it anymore. So I’m just going to blow it away.
So blow it away. You can see that that’s gone now, so I didn’t need to provision anything or anything else and let’s just say I wanted it back and I’m going to kind of highlight this in a little bit. Although I know we’re running a little short on time is that if I didn’t want to blow that away if I want it to go back in time and say I didn’t want to blow that away. I have I have this ability to under op objects and Snowflake and we call this time travel or and so now Dave is here.
It’s back. I can query it 80 million records and you can decide how long those objects stick around in this ability to time travel. It’s up to 90 days. So I’m just going to drop it here and we’re good. So now we’re operating in production Jane wanted to know how weather impacted the city bike thing. So what I can do is I can create a table of you.
That called strips weather and it will combine the data from the initial when I had this trips View and we’ll combine it with the weather data doing some Transformations here and we can run it.
So it created that view just test it out.
And so, yep, it’s there. We’ve got 89 million records. And now if we go back to Tableau.
Can see here.
So actually I let’s go back to our data source.
So what we’re going to do is to let this guy go so I’m going to go back to my data source here, and I’m going to say I want to use that new view that I created that trips that that view that has the weather data and the trips data.
So you can see now I have that that other view here. I’m going to just going to replace this guy.
So now I have trips and whether interview but it has all the objects that I had before. So I’m going to you know, I’m I can I know that my initial dashboards are going to work. So it’s kind of running the query it’s kind of loading it here. And now if I go back she was interested to say hey, how are how is whether impacting trips?
So this is over time how my trips are trending, but if I want to take some of this weather data, so if I want to take the temperature and I put it on this graph and say how is that correlated to the number of trips and let me just take the average temperature.
So what I can see here is that you know in September, you know the weather which is this orange piece. It’s nice. There’s more people taking trips. I guess it gets cold in New York in the winter. So people stop riding their bikes so down here the average temperature is about 20. So again, it goes up the next year goes down. So what we can see is sand of a simple correlation here between our trips data are structured data and then a second data source weather data that was coming in.
Json format that we combine together in a view so that you know, that’s sort of the basics of kind of loading two different types of data.
So we yeah, so I’m going to show kind of one other part here. I’m going to introduce this concept of data sharing and what data sharing means in Snowflake is that you can create a dataset and share it with someone else and they could be a Snowflake user.
Or you can create a Snowflake account for them or reader account, which is a read-only account so they can log in over the web and their own account and see data that you’re sharing with them and what this does is it eliminates the need for FTP and files or doing different things. So jumping sort of right in here is that we I’m going to sort of set some context here and it’s really simple as how we create a share is that I want to share this city bike data and you can see up here, right?
I’m not I’m not sharing anything. There’s nothing outbound that’s being shared but we can do it through the UI or we can also do it through a GUI here is that I’m going to share the database Citi Bike and then I’m going to add some objects into that share. So I’m going to add the database the schema and that view that trips whether view that we that we had.
So if we look up here right now in the shares, there’s a Citi Bike, but I need some people to actually consume that share and these people are other Snowflake accounts. So if we go up here I’m going to I have a table here that has some mappings and it basically maps a group of people to an account and I’m just going to set some variables so these variables and then I’m going to take those variables and assign them to that share.
So you can see up here in the share now that there’s some people who have cd bike and they can see it and I’m going to kind of jump right over kind of too.
So you can see right now. I’m in this account. So there’s this account HP, which is my New York City Housing Authority account. I’m going to kind of jump back here and sort of show you guys is that they’re in that mapping table this HP 0 0 3 6 2 is this account this NYC HSA account? And what I can do is I’m on this account. Now, this is data from Citi Bike that they want to share with me. I can go to my shares and I can see oh, I have an inbound share.
I’m going to create a database from that share. I’m just going to call it City bike to keep it simple and then you can decide who sees this share the created database. Okay, and now if I just go to a worksheet, I now have data this trips weather data, and I can actually query that data. So this data is in a completely new accounts, but I can get access to it. And if I were a snowflake customer I can combine it with other data that I have.
but the data itself did not move out of the City Bike account. It’s just a logical share. It just happens to be going to another account and this eliminates a lot of the need for sending files and losing custody of data and trying to set up FTP or EDI kind of transactions.
It’s really kind of something that we think is going to be an emerging way for people to not only eliminate friction ensuring data, but perhaps they have and sharing it with other people. So with that Mike, I’m going to hand it over back to you.
So everybody stick around because we do have your questions in the question log and will get to those in like two minutes here so you can there you go. So just a quick couple things. I mean, first of all, I just love how these are blindly move around and pipe in a hundred million records. That’s something that used to walk away from the machine for a couple of days with and it’s just a really seems like a very powerful and a great platform here. So we’re excited.
I did want to offer to our folks watching the webinar here that we do offer a data and analytics roadmap where we can help design and sure to accommodate changes in your technology and your data. Obviously, the BI applications are predicated upon good performant data so we can help you review your existing system and outline a plan for the future and offer a holistic tech agnostic and highly pragmatic guidance, whether that’s migrating from for example, maybe you’re looking at that old on-premise database that’s on an unsupported operating system. Now that’s sort of sitting there chewing up power and space.
And doesn’t perform for your users. Well, we can help you create a plan to move that whether it’s an old Cube or a data martyr your enterprise data warehouse or data lake help you with the ETL and ELT or the BI tools sitting on top of that with our world-class experts could just a quick couple of slides on Senturus before we get into the queue a going to move to the next slide, please Chris.
We are the authority and business intelligence all we do all day every day when we get out of bed is help people in make their BI better or unrivaled in our expertise across that stack and we have a depth of knowledge across that entire stack.
Our clients know us for providing clarity from the chaos of complex business requirements myriad growing disparate data sources and constantly moving targets and we’ve made a name for ourselves based on our strengths and Bridging the Gap between 90 and business users.
We do that by delivering solutions that give organizations access to reliable analysis starting data across the organization enabling them to quickly and easily get answers at the point of impact in the form of the decisions that are made and the actions that are taken On the next slide, our Consultants are leading experts in the field of analytics with years of pragmatic real-world expertise and experience advancing the state of the art. In fact, we’re so confident in our team and methodology that we back our projects with a unique 100% money back guarantee. We’ve been at this for a while. We have a long strong history of success.
We’ve been doing this for nearly two decades across 13 hundred plus clients with over 2,500 successful projects ranging from The Fortune 500 to the mid-market and across channel areas, including the office of finance, sales and marketing manufacturing operations. HR, IT. Our team is large enough to meet all of your business analytics needs yet small enough to provide personalized attention as mentioned before and when we were talking about the deck and the presentation we encourage you to visit the senturus.com/resources to expand your knowledge where you can find everything from webinars to one to our fabulous up-to-the-minute easily digestible blogs.
I’d be remiss if I left out our complete array of BI training across our core platforms of Cognos, Power BI and Tableau that includes everything from tailored group sessions to mentoring to instructor-led online courses and self-paced e-Learning and then finally some additional resources here. If you head over to Senturus.com, you can find unbiased product reviews.
Tech tips, insider viewpoints again on our blog more on this subject interactive product demos of things like our Senturus Analytics Connector, which allows you to connect Cognos metadata to Tableau and or Power BI and then additional upcoming events.
Of which there are many. With that we will jump over to the QA and Chris. I don’t know if you had a chance to sort of take a look at that.
So do you know if the use cases for an operational database? It’s an analytical database column ER analytical database. That’s what gives us the performance that we need for those interactive queries. So, you know, it’s not it’s not designed for OLAP type workloads at this time. I mean we can handle high volumes
It also reads dashboard type things, but we are a tactical query. We do not handle MDX queries. It’s a SQL database, but you can use odbc through Excel to connect to Snowflake data model. So I was just going to write this.
Can we create a data model view on top of SAP Hana? We have people who create replicate sort of that data model in Snowflake using views. It’s that’s how people do it. They know Hana has a GUI to do that and we just create views. We have a number of SAP folks set Snowflake and they’ve shown how we do that at a few customers come up wondering how those commands like cloning.
Copying settings have drag-and-drop alternates. It’s all done through SQL, but there are some parts of the GUI where you can actually clone and drop it like the browser that I showed at the beginning the databases tab allows you to do cloning and dropping and everything through a GUI assigning privileges things like that.
So we do have that it although it’s it fully functional through SQL and that’s generally how most people use it in production. Although the GUI does provide you know, once you’re getting used to it what how to use it? Is it compatible with Cognos? Yes. We have a number of folks using Cognos and Snowflake there. I didn’t have that in the picture but we do partner with Cognos and have folks using Cognos.
Cognos can use it if you go to our website you’ll see all of the folks that have written connectors to Snowflake and that we’re supported with Irwin. You connect to Snowflake through a number of different ways. I didn’t really get into this but you can use odbc or jdbc the sort of generic ways to do that.
For example Tableau connects through odbc if you wanted to use a tool to write your queries. That’s jdbc. We also have a python connector spark connector go connector node.js connector. So if you’re using those languages you can do it, but I think Irwin probably uses odbc. I’m not sure maybe jdbc so I believe there’s a blog post on it. Give an example of pricing for queries.
Say I had million rows and gets query 10 times a day trying to get a quick understanding of cost. So yeah, as you know, we work wearying tables that were a hundred million rows there and we were using small warehouses. So Snowflake’s designed for large-scale queries.
It’s the behind-the-scenes the data’s column arise which gives us a lot of advantages as far as reducing I/O and improving Put also the way your cables structured we eliminate partition. So we read less data when you run a query so this would be very inexpensive. I guess. I don’t know what the price would be but it would be dollars is what might my estimate would because you’d only need an extra small warehouse and it should run in just a few seconds. So that’s what I would I would say. If you want to try this out yourself, you can get a free trial
at Snowflake.com. There’s some very large sample datasets multi billions of records. You can run some of the sample queries and or selects from there and you can get a good idea without having to spend any money SQL Server to Snowflake.
Snowflake is a ANSI SQL database and so SQL Server, so there are maybe a few functions in SQL Server that are a little bit different but it’s relatively straightforward to migrate from any SQL database to Snowflake use the SQL queries will pretty much translate 95% of the time and the 5% that don’t are usually find and replace kind of things.
About half of our customers did previously come from another database like Redshift though. It is the conversion. There’s some things out there.
How long does it take someone to learn it?implementing Snowflake doesn’t take long if you know SQL if you know databases, it’s really the learning curve is quite short because it’s the same patterns. There are some more interesting things when we talk about external tables and automatically ingesting data capture streams and test things like that. But just the core database is really easy to learn in my opinion. So, okay.
I think it’s 12 o’clock and I think we got through all the questions. One here that someone had asked about commands like cloning and copying if there’s any kind of drag and drop Alternatives or if they have to be done with as a variance equal snow playing variance equal I guess is what he’s saying.
So in the in the UI, if you wanted to clone the weather table you can clone it here, so you’ve got some things that are handled will be the via the GUI. It just issues some SQL statements behind the scenes. So there are you know, some ways to do that in the GUI. Obviously, I think there’s room for improvement there, but it’s essentially this is all SQL behind the scenes. But yeah, there are a few ways to load data clone drop objects assign privileges to them right here from the GUI.
It’s exactly like you were showing they’ve got scripts laying around all over the place that do all these things. They do a lot of copy paste. You can get a free trial at Snowflake.com and you don’t need a credit card. It takes only maybe 10 minutes to get an email back from Snowflake and you can start going playing around with it. Sounds good. I love you.
Could toggle back to the slide real quick? Just that last slide there. Thank you everyone for joining us today, and thank you, especially to our presenter Chris Richardson from Snowflake. Please reach out to us if you have any questions about this, or if you’d like to discuss doing a migration or exploring Snowflake or any of your other analytics needs and we look forward to seeing you on one of our future Senturus Knowledge Series webinars. So thanks everyone for your time and have a great rest of your day.