If your organization is considering moving to a cloud-based data warehouse, chances are two leading contenders are tugging at your heart strings: Snowflake Data Cloud and Azure Synapse Analytics. Each of these handsome hosted platforms is appealing because they are both quick to implement, separate storage and compute costs, and make it easy to scale storage and processing capacity.
Like marriage, deciding on one is a long-term commitment. Before you put on ring on it, you’ll want to really understand the complexities and unique aspects of each to help you decide which is appropriate for your organization.
Watch our on-demand webinar for a discussion comparing Snowflake and Azure Synapse. We dive into product features and technical approaches taken by these two market leaders.
What you get
- Demos of each data cloud
- Overview of similarities, strengths and weaknesses of each
- Criteria to help you determine which is the right choice for your organization
Sr. Director of Software Engineering
Bob leads software development and BI architecture efforts at Senturus, focusing on software and cloud architecture in the Power BI and Tableau practices. Before coming to Senturus, Bob was designing, building and scaling business intelligence software, reports and dashboards for use by thousands of restaurants.
Sr. Microsoft Architect
Gary is a versatile solution architect at Senturus with over 20 years of experience ranging multiple industries and disciplines. An MBA with end-to-end expertise in business intelligence, he enjoys combining his background in business with the latest technologies in data and analytics.
Principal Snowflake Architect
Reeves is a Microsoft Certified Master and Snowflake – SnowPro Advanced: Architect. He has more than 20 years of experience working on development and data warehouse projects within a large variety of industries.Read more
Greetings everyone, and welcome to the latest installment of the Senturus Knowledge Series. Today, we are pleased to be presenting to you on the topic of Azure Synapse versus Snowflake, comparing analytic data clouds.
Before we get into the heart of the presentation, we encourage you to submit questions via the GoToWebinar control panel to help make the session interactive.
We do try to answer all the questions at the end.
But during the webinar, in the event that we are unable to, for whatever reason, we will provide a written response document that we’ll post on Senturus.com, Along with the next slide here, where everyone asks, Can I get the presentation?
We put it up on our resource library. You can see the link there,
www.senturus.com/resources, and there’s a link in the chat, if you prefer to go there and go get it right away, we’ll put it there, and while you’re there, make sure you bookmark it, it’s got tons of great
analytics, business intelligence, content, that’s worth visiting frequently.
We post the recordings there, as well as the deck, and any other relevant content, including response, document, if we can’t answer all the questions.
So, our agenda today, after some quick introductions, we’ll get into the heart of the presentation or Azure Synapse versus snowflake comparisons and then stick around for a very brief Senturus overview.
For those of you who may not be familiar with what we offer and additional free resources, and then we’ll get to the Q and A.
I’m pleased to be joined today by several of my esteemed colleagues, Bob Looney Reeves Smith, and Gary Locke.
Mr. Looney leads our software development and BI architecture efforts here at Senturus, focusing on software and cloud architecture with Snowflake, Power BI, and Tableau.
Prior to Senturus, Bob was designing, building, and scaling BI software reports and dashboards for use by thousands of restaurants.
Reeves is a certified Microsoft Certified Master and Snowflake Snow Pro Advanced architect, with more than 20 years of experience, working on various development and data warehouse projects, within a large variety of industries.
And last but not least, Gary Locke is a versatile solution architect here at Senturus, with over 20 years of experience ranging across multiple industries and disciplines.
He’s an MBA with end-to-end expertise and BI who enjoys combining his background in business with the latest technologies in data and analytics.
My name is Mike Weinhauer, I’m a director here at Senturus.
Among my myriad responsibilities, I have the pleasure of hosting our knowledge series webinars, and before we get last thing here, before we get into the main presentation, we like to always ask our audience a question.
Get a sense for what they’re up to, relative to this presentation. And our question today is, where is your data stored now?
Pretty easy 3 choices, either on prem, VM based into the cloud, IE, sort of lift and shift?
Or are you using a cloud native app, such as Snowflake or Azure Synapse?
So go ahead and get those votes in here.
And we’ll give you a few more seconds to do that, and then we’ll share the results back with you.
We’re taking their time today.
Got about two thirds of you with your votes in here. Go ahead and get those last votes in, and we’ll close this out.
It looks good, and I’ll share this back.
So, most of you are still on prem, about two thirds there, quarter of you all using cloud native apps, though, and then the rest, doing, lift, and shift, So, always interesting to see what are clients and the folks joining our webinars are doing in their own organizations.
So, thank you again for sharing those insights with us.
And then, without further ado, I’m going to hand the microphone on the floor over to Bob. Reeves and Gary, gentlemen, take it away.
Thanks, Mike. Thanks for the introduction there.
Looking forward to sharing some great content about Azure Synapse and Snowflake here, some discussions with Gary and Reeves as we go through these various comparison points.
Both are great tools; we’ve started working with them with clients. And as we’ve done Sir, started discovering some of these interesting, oh, this works like this.
This other one works like, in a different way, type comparisons.
And so we’ve tried to pull those together into this webinar to, to share that information, maybe help you along with your decision making process, if that’s where you’re at.
So, we’re going to run through this, these points and talk through and do a couple of demos as we go.
So, we’re going to start with just kind of general overview of the platforms. There are both cloud based, kind of data, data cloud, data analytic cloud platforms. That means they’re highly scalable, and you get these demand-based computing resources, just your classic cloud advantages to being on platforms like this.
They both support the idea of creating SQL databases, so that as you can curate and structure your, your known data from raw data, you can get that into a SQL database. And it’s optimized, and all the technology under the covers is optimized for analytical purposes.
And, these architectures cover your typical use cases that you’re, you’re going to see in your environments these days, which is covering things like classic BI, you know, first of all, reporting, some of those type things.
It covers self-service use cases, as well as your data science and, you know, AI ML type use cases as well, so, we’ll touch on each and some of those points as we go through the webinar.
So, Gary, I’m going to hit you up to, kind of give us a little bit deeper dive into Azure Synapse is architecture here.
OK, great, thanks, Bob. Sure. So, with Azure Synapse, really, it brings together in one platform features and components that are available elsewhere individually in Azure and the Microsoft platform. But within Azure Synapse, you access it off of one workspace in the Azure portal.
So, within the workspace, there’s different components for ETL and storing and accessing data, compute resources, and database technologies.
So, for ingesting data into Azure Synapse, a flavor of Azure Data Factory is included, which is Microsoft’s modern ETL tool, and really a great replacement for SSIS if you use that.
For storing data and files, there’s Azure Data Lake Storage, which is integrated is Microsoft’s highly scalable Cloud Storage Platform.
For compute, Azure Synapse actually includes multiple different analytics pools.
First being this serverless SQL pool, which allows you to use SQL to directly access data in your lake.
And that can be like CSV files or parquet files where you came and hit multiple files that are stored within folders.
Besides that, there’s a dedicated SQL pool, which is Microsoft’s massively parallel processing engine, previously known as SQL Data Warehouse.
And that’s really designed for handling the largest of data warehouses, it’s good for when you have more predictable usage patterns. And you have, like, higher compute utilization.
And with a dedicated SQL pool, your data is actually stored in a database, and it can be accessed using T SQL, stored procedures, things like that.
And also, there’s Apache Spark. So, Microsoft has a serverless version of Apache Spark and Cloud, which allows you parallel processing framework that supports you in memory processing of your data.
And then, finally, within Azure Synapse is Analytics as well. So, of course, Azure Synapse integrates with Power BI and there’s also some Azure Machine Learning capabilities in there as well.
Great. That’s a, that’s a great overview.
I’m going to slide in a logged in Synapse, Analytics Workspace and lets you kind of walk us through a couple of pieces and parts and are in a real environment here.
So, accessing the workspace from the Azure Portal, this is basically your homepage, so, you know, there’s always things changing here, down below, but, over in the left menu, you can see it’s really easy, kind of broken into components. So, let’s start actually in the bottom.
So, if you click Manage, here, as you can see up top, and we’ll get into a lot of this more later, but you can see it’s all within one workspace. You can get to your different analytics pools. You can set up link services with all types of external connections.
Um, and then once you started getting into integrations, you can set up different integration runtimes, and trigger them via schedule.
You have integrated security within, you know, Azure Active Directory, and things like that.
You can pull in code libraries for working with Spark, and then, we’ll get into this a bit more later, but source control, as well with Git, then, you know, once you do get things fired up, there’s the ability to monitor, and see how things are running, stuff like that, But, say, yeah, we’ll get into more of that later. But this is definitely, you know, the important thing here, is that it’s, it’s a single workspace. It really gives you access to everything in one place.
Great. Thanks for the overview.
Alright raised, so we’re going to toss it over to you to talk, to talk to the snowflake architecture a little bit, and maybe do a little bit of comparing and contrasting there, OK, Thanks, Bob.
So, snowflake, it’s a little bit different, and some of that difference is that snowflake concentrate more on the database side and leaves the tooling and that kind of stuff for other partners in the space.
And part of that reason is, as you can see in the centre, there, or we talk about data lake, it’s from each of the three cloud providers.
So yeah, excuse me, Snowflake runs on all three of the cloud providers.
And because of that, they don’t have the ability to create, are they?
They do, but they could, but they, they choose to concentrate on the database engine, and allow the providers that are cross-platform to deal with the different, the different cloud providers. So, it’s kind of a little bit of a different concept from that standpoint. It is a database and a compute, and they call compute and snowflake, a warehouse.
So, it’s kind of a misused or multi use term that can have some confusion. So, when we say, we spin up a warehouse is what we’re saying is we’re spinning up compute.
So that’s the snowflake side.
They are able to integrate with any of the data lake, like an S three and Azure storage, or Google Cloud Storage here able to ingest data through the data lake, and you’re able to use any of those ETL tools that that or clot cross-platform.
So, great for an offeror, talking to the UI yet, no, that’s fine, and we’re talking to the UI.
Um, as you can see, it’s very similar, but it’s just from a SQL development sniped standpoint. If you look, we have current database left, you can write your basic SQL statements.
Um, certain databases have, you know, it’s the same concept as it, it’s the same concept that it’s database schema.
And tables just like every other database, it’s not, it’s it doesn’t use different. It doesn’t use different language while you’re used to on your on-premises databases. But, with that, you’re able to change compute. And if you click over to the, if you click over to the Warehouses, we’ll talk about that for a second okra Warehouses to Yeah. Right there. As you can see, we’ve created two warehouses here. And you can create. You can create as many as you need.
And here’s a really cool feature of snowflake. I can create a warehouse for any department, so I can track departmental costs very easily. And there’s no cost. We’re spinning up.
Or there’s no cost with creating warehouses. They have to spin up time or that they have the ability to spin up on demand and then auto suspend based on some settings that you can set. Their already cannot suspend it yourself.
But it allows you to scale up compute very easily. If you have something running, you could If you have something running and it’s not working on an extra small. You could you could spend that compute up and right now you can spin it up to one.
For Excel, which is has quite a bit of processing power or so, but the nice part is that that auto suspend, so they turn off when they’re not in use, and you can set that are spent from wherever, whenever you’d like.
Absolutely. one of the best features. Yes, so, and then that one piece to finish it off.
If you click on, if you click on partner connect, show it to them over to the right. Here you go.
And here’s where those, here’s where you can integrate those partners in to two snowflakes. So, if you wanted to use a five trend, you can come here, and it just has some steps that will walk you through it a little easier. Because it already is somewhat in your, if you’re kicking it off, it’s from your account. So, there’s a semi-integration there with some of these tools. It seems like, yeah, OK.
Thanks, we’re going to dive deeper a little bit later, but that’s a great intro. Thank you.
mm. So, the next thing we’re going to talk to is kind of wide robust tooling.
These are both, you know, leaders in this space.
So, they, they both have no solid web interfaces for management support, major third-party tools for data analytics and ETL.
They have already BC drivers and command line tools as well for automation.
Things like that see, the three visualization tools, that Senturus as partners with there at the bottom, all heavily support all three of these platforms very well. So, if you’re having issues connecting Snowflake with Cognos or Tableau or something like that, we’re more than happy to help with those types of issues.
Um, so, Gary, let’s dive a little bit deeper into the synapse kind of workspace overview and show us how you maybe use this in a development environment.
So, we briefly touched on this already, but so, the Synapse workspace, the web development interface has seen a lot of recent investment and updates within the workspace. You really have access to end-to-end BI capabilities. And there’s features for like data engineering, data science, data visualization.
As I mentioned, for data engineering, you have Azure data factory baked in into the workspace. It’s not 100% full Azure Data Factory as if you’ve ever worked with it.
You’ll notice there are some slight differences, but it’s pretty full featured. So, with Azure Data Factory, do you have a ton of functionality for transforming and moving data?
Uh, for example. Yes. So, here’s, you know, pipeline allows you to really orchestrate some of the transformations that you’ve made.
Actually, if you don’t mind, Bob, can you go over to Develop and absolutely show Do Flow, to start. All right. Yeah. Sure. So, Dataflow features a nice replacement for SSIS and has actually very easy drag and drop interface.
So, yeah, So, once you connect to a source So, of course, within Synapse, your source can be like, I think there’s 90 different types of sources. But, once you have a connection to your source, to show that, if you click that plus sign there.
And, as you can see here, you know, once you have a connection to your source, you can see all the different transformation activities that are available. So, this is similar to like an SSIS and where you can start getting into things like joining and doing lookups and aggregations and pivots and then pivots.
And you know, basically you can string activities together within this drag and drop workspace to make it very user friendly.
And really, most of the transformation activities in here are available without you having to do any real coding.
Occasionally, you might want to pass parameters back and forth, great variables, things like that, but for the most part, it’s, like, drag and drop. Otherwise, on this develop side, you can write SQL scripts directly against your SQL pool.
Or you can actually work with your data directly in an Apache Spark pool via Notebook.
The, yeah, I mean, in the web browser, kind of, interact directly with your data, which is great.
Kind of gives you a little bit of the kid, pre written as you get ready to go here.
Yep. And then, yes, you had showed below that with the integrations when you start getting into pipelines is where you can really orchestrate all kinds of activities, including data flows.
Synapse Notebooks, you can actually integrate with Azure functions, data bricks, Notebooks, you SQL scripts, web APIs, stored procedures, and even there’s some machine learning down at the bottom there.
Yeah, so it’s, you can see, you can get to a ton of different things depending on what your expertise is and what you’re already using within your environment.
Then also, you know, as you’re working in here, you can integrate directly into get, so up in the top there. So, we’re working in the dev branch, but you can actually save and commit your changes directly into Git, which is very nice. There was a quick, quick setup.
I guess it’s the last note, you know, with the SQL pool’s, whether dedicated or serverless. You can, you can also work in SQL management studio if that’s where you’re comfortable. Or Azure Data Studio. So, you can connect, just like you would connect to any other database.
So that’s a nice feature. Although, when you do start working in there, you will realize that it’s not 100% like, working with, like, an Azure SQL database or an on prem SQL database. Some of the features are different, or just aren’t there.
Are Reeves’, let’s talk a little bit about snowflake, and you’re going to develop, again, Snowflake.
Already, Like I showed.
with the web UI, you can do a lot of your development within the web UI.
But there’s not really any integrations.
So, what I mean by that is code source integration, so there’s a tool that I use.
You were seeing it? Perfect, OK, OK, thank you.
So, there’s a tool that I like to use and its data grip and it’s a really, full, featured UI for Working with Databases, and as you can see, I can do all the same things that I was doing that I was doing from within the within the Snowflake UI.
But now I’m able to, now I’m able to change or track those changes very easily, because this tool integrates very well with Git. And I’m able to see and commit changes as I’m going along. And I, and another nice feature that it gives you is, is code completion.
I can see, I must have typed something incorrect here, and I’m like, oh, it’s demo DB.
I can come in here, update that, and this UI understands that, OK, those fields are, those fields are, are now valid, and it’s not, it’s, it kind of helps you out. So, I think, that’s kind of a nice piece to this.
And then, I can make that, I can make that commit, right, I can come in here, and I can.
Oops, that’s the Maybe I didn’t know that. But I can make a commit here on that code change, and then say that up to my source code repository. That is the one thing that the UI doesn’t give us, right? In.
the UI, from here, I can’t keep this code, are put this code into a repository without cutting and pasting it out. So right now, that’s a small limitation with the way that their UIS kind of setup. So that’s where I like. Tools, like, you know, a date of grip that allows me to do that kind of stuff.
And I can keep my projects and I can, you know, the authors deal with any type of database, right. I don’t have to be dealing with a snowflake database. I can, I can deal with any of the other providers.
So, I can set up a connection here. So, this, this environment, I think, is really, really nice for working with Snowflake.
And, yeah, I would assume with both of those, you could then just layer on your kind of CI CD processes, as far as promoting changes to staging and right things like that that you’d have in a typical enterprise environment, right.
And then, with the other tools, like what I showed on that, it’s in the marketplace, a data marketplace, now, the other tap, sorry, the partner Connect.
You’ll be using, you’ll be using the interface that they app with those.
All right, Well, let’s move along, and let’s, let’s talk a little bit how patent these two tools are passed versus the platform as a service versus software as a service.
If you’re unfamiliar with those terms, like this, Microsoft provided graphic here, that kind of talks about where, each of the, there’s three commonly used terms there, and, but, it’s, it, it is a key differentiator here for reasons we’re going to talk about. So, Gary, I’m going to let you, go ahead and talk about the Azure Synapse side of being a platform as a service.
Sure. So, you know, being a platform, it is kind of a collection of Azure resources, which I mentioned earlier. They are available individually, but this brings them together, really, as a platform.
So, you know, as you go into the Azure portal and you want to fire up Azure Synapse, you’re basically, you know, creating some different resources within Azure. So, you’ll, you’ll want to either choose an existing or create, like, a resource group.
And, at the same time, you’ll spin up edge data, like storage, as well as your, your SQL pool.
Yep, so, here’s the, here’s the setup page that you would walk through, and you would choose your subscription.
And, like I said, either you would choose an existing, or create a new resource group, then, set up a workspace. Choose the region, and one thing we did find is, you as we’re kind of talking about this and working on the webinar, it’s interesting to see that the pricing for some of, these features are actually different in different regions. So, like, a west US, two versus a West US. The pricing can be different.
So, depending on where, where are you doing your work and where are some of the other services are held, you may want to consider that.
But, really, at its core, once you fire up Azure Synapse, at its core, you’re starting off with a serverless SQL pool, Azure Data Lake Storage and an Azure integration runtime.
I don’t know if you want to bounce over to the actual environment that you had spun up. All, right, perfect. And go to manage.
And up top, under Analytics Poolesville, SQL pool. Yes, when you start off, you’re going to start off with a Serverless SQL school. It’s called built-in.
This one cannot be deleted.
So that’s kind of just, it’s there for you when you need it. And with the serverless side, it’s similar to snowflake in some ways. In that it’s, it’s great for ad hoc querying of data within your lake.
So, you can use T SQL without having to, or any reserve any capacity in a dedicated SQL pool.
But also, here within SQL Pools, you can fire up, you know, a dedicated SQL pool or multiple of each.
Then, under added Analytics, pools is where you could also fire up an Apache Spark pool, depending on what your needs, what your needs are.
Great, and when thing we’ve found is what these dedicated SQL pools, they kind of tie 1 to 1 with the database. Is that set the right way to think of that?
Yes, Yep, and I’ll get into that little more later, but that’s definitely the case. And then we get into pricing a little bit more later, just to give a little more background on that.
Perfect. But, so, you see, you have the SQL pools and then down below, you see external connections. So, that’s just a great feature within here with linked services. So, when you initially start off, you’ll have to link services.
one is connecting to Azure Synapse analytics are kind of your SQL pool, and also to your Azure data Lake store Bridge. So, that allows you to, you know, start doing integrations between those two environments. But you can also add linked services to, like, 90 other things.
And those. Yeah, that’s great. So, you know, within Azure, you can connect to, like, blob Storage, or Cosmos DB, or if you just want to connect to an Azure SQL database, maybe some transactional data out there, you can do that. If you want to use additional compute, you can connect to Azure Databricks, or an Azure function, Azure Machine learning.
On the database side, you have, you can actually connect to like Amazon, Redshift, DB two, Google big query or Oracle, and then with files, you know, if you don’t, if you have some data that’s outside of Azure, you can actually connect to S three.
You can connect to Google Cloud Storage, Perfect.
And then also, when you when you start off, you’ll start off with one integration runtime, which is basically in Azure, you know, a place where compute happens.
But you can also spin up additional integration runtimes here. Whether they be in Azure or self-hosted.
Which you can have your on prem server that allows you to do some types of activities there as well.
And these integration runtimes, these are for pulling in on prem data in this case.
You can, Yeah, so if you did have almost like a gateway. It, like, it’s, yeah, it’s similar to a gateway. Like in Power BI, where you set it up to zero point or work on an on prem server.
and then yeah, so, like recently, I worked with a client that had data on prem in an Oracle database, and you basically set up that runtime and then pretty quickly, you’re working with on prem data, and pulling it in, and then, you know, transforming it and working it in with other types of data as well.
Great, thank you.
Alright, Reve, same thing, but let’s talk about Software as a service now.
So, let me have, I’m going to get a screen, let me share the screen. So, it’s not like it takes a little bit different approach, right? It is abstracting as much of that as it can to just make configuration a little bit easier.
And, as you can see, similar to the pricing that we’re looking at from a snap standpoint, yeah, the same kind of idea here, except for a big, is, you can pick any of the cloud providers right into AWS, Azure, or Google Platform.
And why that becomes important, is, as you’re searching for a cloud, if you’re searching for a cloud database, you don’t have to go with the provider based on the cloud. You can pick what you want to do, and then you can pick the cloud.
That is, is most relevant to you, and where I see that is, if you have, if you have a lot of data lake Asset’s, already, out there, and a cloud, you don’t have to move those data assets out of that cloud, and into a new cloud.
You can you can just go. You can set up snowflake within that region, and you can actually set up snowflake within any region, right? And you can set up multiple.
You can have snowflake, and AWS, you know, Snowflake in Azure, and they can actually replicate between each other.
With snowflake is what you’re doing, is you pick a region, and you pick, or you pick a provider, you pick a region, and then you basically decide, you know, what level of features you’d need.
From a standpoint of do I just need standard enterprise business critical, or BPS?
And those features just come on a per cost, or cost per credit basis.
So, your just your credit just costs a little bit more as you need more features.
So, these, these three are pretty standard. This one is if you decide I want to be in my own VPC. I do not want to share the VPC with anybody.
And that comes with, you know, this is, this is a little bit different, and it creates and make some changes on availability of resources as well.
But these three are the pretty standard ones, and then you just pay per monthly charge, on per monthly, charge, on, on storage.
So, there is a no there.
You could say buttons and nobs to turn are very limited from a standpoint of you just pay per credit.
Well, resources are running and he’s playing on years pay a storage fee per month, so it’s kind of a little bit different from that standpoint. But that allows you to, you’re not managing as much, right? And what I mean by that, and the big feature that we’ll talk about is snowflake doesn’t have indexes.
And anyone that’s worked on database, they might be going, oh, my word, but I can attest to the fact that I’ve worked on a petabyte scale database, and we, and they’ve performed very well.
I’ve got no queries where we’re running, you know, multibillion row records, and, and they were coming back, you know, and, and, very quickly.
So that’s not necessarily an issue, which allows you to concentrate on, just getting, you know, just getting requirements done, and not having to worry as much about tuning.
I found that use case interesting, where you could replicate your databases across clouds, but still be in this, you know, in a snowflake, very familiar sniffs like environment.
Right? And it’s, it can be an easy migration path, right? So, I could, I could migrate my data over your application and then turn off the previous turn off the previous plot, if I wanted to.
So, it lends some unique, unique ideas where you’re not necessarily tied to a cloud provider.
You can move, know, you can move within three cloud providers. Fairly easy. Yeah.
No, that’s a, really, that’s a really cool come use case.
Alright, so we’re going to keep talking a little bit more about architecture, at least as it pertains to compute. And this is little downhill, weeds a little technical.
But, um, it’s also very important to understand why, why these two systems are different.
So, no, you need compute capacity, anytime you’re performing the actions on your data.
And then, in both systems, and we’ll talk a little bit deeper on pricing, bed capacity is always priced by how much you use, how long you use it, and then what performance level you’re looking for the system.
Said, Gary, we dive deep with us.
A little Nana, kind of how Azure is approaching compute capacity with Synapse.
Sure. So, of course, within Azure Synapse, there’s multiple options when it comes to compute.
But focusing specifically on dedicated SQL pool, which, as I mentioned, it’s Microsoft’s parallel processing engine within SQL. So, there’s some interesting things to note.
First one, as you mentioned, Bob, a dedicated SQL pull aligns with one SQL database, and then also, that database can only be accessed from the SQL pools running, which is different than the Serverless SQL pool, which is always available.
But the dedicated SQL pool can be scaled, stopped and resumed either manually or via the portal, sorry, manually via the portal, or programmatically via an API.
Then, also, you can isolate and prioritize your actions using workload groups within the pool so that, you know, loads don’t take up too much too many resources, or, or maybe one individual doesn’t take a lot of resources either. But really, you know, thinking about architecture and under the covers, as you can see in this, this screenshot here.
The way it works is that there’s a control node that handles orchestrating your queries and passing operations to compute nodes.
And then, really, those compute nodes perform the work in parallel, using between 1 and 60 different nodes, depending on the service level you choose, so if you have the lowest level of service, you’ll have one node, and if you pay for the highest-level service, you’ll have 60.
Um, so it basically, it splits your queries into 60 smaller queries.
So, running the queries in parallel, of course, is where you get the performance boost.
But then also, you know, with a dedicated SQL pool, your data is actually stored and managed in Azure storage.
So, when you go into that database and you’re creating tables for your data warehouse, you actually had the ability to choose a distribution method for each table within the database. And I won’t get into the details here, but you have three options. There’s hash, there’s round robin and replicate, and basically the different distribution methods available.
They each have pros and cons and based on the size and usage of the data within the table. And this kind of gets into what Reeves was mentioning about.
Know, do you want fewer settings view out more settings? So, this gives you the ability to make those fine tunes that you might want to make and, you know, I mean, I, I’m not trying to knock Snowflake at all here, but, you know, the black box approach, a lot of times, is great until it’s not great. And then you’re like, well, what’s it actually doing? And then you don’t have that ability, So, just, definitely something to consider there.
Thanks, and so, so, just to clarify, you are saying the performance setting, you’re going to get more nodes and more kind of resources as you as you scale up. Exactly, your overall dedicated SQL.
And so, re-used, snowflakes, that’s all, that’s a little different.
Yeah. You have, you have the concept of compute and it is completely independent.
You can have as many as you want running. I would imagine. There’s some actual limit, but you can have quite a few.
Quite a few warehouses is running and they can access the same data and you don’t have to worry about doing data. You know, doing data federation or moving data out into different.
In the different nodes, it handles all that because it’s using, it’s using basically S three or Azure storage, that’s letting Azure storage and, and those services that are able to provide multiple connections to.
The same to the same data. And it doesn’t have to worry about that. So, he can set up these multiple Reese. These multiple computes like warehouses so that you can access that data.
So, you can run your data transformation during the day without any concern. And it, it will not interfere with, it will not interfere with the other warehouses.
So, it’s a little bit different from that standpoint, and it has the ability to scale, you know, to scale out.
I can, I can create a warehouse that has multiple nodes associated to it.
And as demand increases, those nodes can be added in.
Or, we could say, more warehouses are added in, handle that load, and then they will spend down as they’re not needed. So, it’s pretty nice from that standpoint.
I can also scale up and scale down, two, to bring more compute into, into data transformation or data, no query that I might need.
So, it’s different from that standpoint and You know, it’s like we’re saying is less nob and that can be kind of scary, but I’ve been in enough situations where I’ve seen that.
I haven’t had to I mean, they have somewhat of an option, which is called clustering, which enables, you know, better pruning of the data.
But we haven’t seen the need to do that in huge datasets.
So, I mean you can have some benefit in doing it for sure, but it hasn’t been what I would term as needed because the performance wasn’t there. So just a little bit different from that stamp.
As it’s kind of interesting to see how they took two different approaches to assault, to those solving this idea of contention and, hey, I want to load data while other users are querying data.
Um, and, they kind of, both got there, slightly different ways, but both systems work well as well.
Going to touch on a couple of system specific things now. So, we’ll start with Azure Synapse and then, Gary, I think we kind of already did a short demo here, this one, but you want to keep talking to this.
Sure. Yeah, this will be brief. I mean, this is just, one thing that’s nice in Azure Synapse is the Apache Spark pool. So, you know, if your team’s expertise or preferences work, working directly in a Spark notebook.
And, you know, say you want to quickly load SQL tables into data frames.
You can spin up an Apache Spark pool and develop in Python or Scala or c-sharp or Spark SQL.
And its successful right in the synapse workspace and then of course the data involves here. And the notebooks can be integrated with data flows and other activities within a pipeline. So, when you start scheduling your jobs to run, you have multiple different computes. So, you can use with Apache Spark being one of them, and actually in a similar fashion, you can set up a linked service to Azure Databricks as well, and develop in a Databricks notebook or a Python file there.
Yeah. It really just kind of depends on your team’s capabilities and what technologies you prefer.
But it seems like you get to just about anything you want to use all in that same environment. Very, very slick.
And review and attack and a couple Smith snowflake specifics Here. Pass me back the screen. kind of walk through a demo.
And what we’re going to show here is, I’m going to show, kind of, two things really quick is we’re going to talk about Cloning, which is a unique feature, which is unique feature to snowflake.
But to kind of set that up, I’m going to, I’m going to start back a little, I’m going to back up a little bit, and kind of just show kind of a performance thing. So, kind of just do that. So, I’m going to create or replace this table, which means I just created this table and there’s no data in it, right?
So, Rohn, you can see there’s no, there’s no data in this table.
So, I’m going to load this NBA copy command which there are files that we have out on Azure storage that we’re going to grab and load into this table.
So, I’m going to kick that off, and that’s going to take a second.
As you can see, it’s running.
This is loading a little bit ETS or doing a little bit ETL into that table.
And it’s going to take seconds somewhere around there.
I’m going to show you then how many rows run that.
And as we saw when we first, when we first ran this, there were no rows, right? So, we now have all the we had these files that loaded.
And to kind of show you from a performance standpoint, we just loaded 37 million records in what was the amount of time I go back 23, 24 seconds. And that is on.
As you can see, we’re on an extra small warehouse, so that’s a $2 an hour chart.
So, I think if I were to do the math on that one, it was pennies to just load that, right?
And nothing was there’s no sleight of hand here. I haven’t sorted anything. I haven’t done anything.
And if I were to just look at aggregations among that 37 million rows, you can see that comes back in less than a second. And I just aggregated all that data in less than a second.
So, from a performance standpoint, I want to say snowflakes off the charts.
but let’s just pretend that this data is our production data, and we want to run some tests.
We want to run some, you know, set some of our, some tasks are some scripts against that test data, but it’s like, how do we always do it? Because in dev, we never had the, we never have the, you know, the data, exactly, the same. Can’t replicate it, the bug and dev, because my three-row dataset just works every time.
So, what’s a nice part that you can do? And its snowflake, because I can create a database that has a clone of that database. And it’s a metadata operation.
So, it happens very quickly.
And as you can see, the current demo is not over here. And if I refresh, I now have a current demo.
And I have a table in here, and if I were to preview that table, you can see that I have that 37 million rows.
So, all of a sudden, I’ve just created that exact same, that exact same thing.
So, if I were to now go in and update that query, which will take a second or two, because it is a update over that 37 million rows, I think it’s 500,000 rows that I need to update.
So, it takes it Yeah, half a million rows.
I’m going to steal that ID, I’m going to steal that ID, then go back. I didn’t mean to do that.
I want to put it there, just Sleight of hand, OK. So, I’ve got a cell, if I want to look at what my current, what the current demo shows, from a price standpoint.
I can see that I’m at about $23, right, because my process that I was doing here was my eye.
ETL is short, my machine learning, or my, whatever, you know, increase.
The price for just, this customer, was, my, was my scenario.
So, I can see that that price has been affected, and then I can go back to the demo database, and I can see that nothing’s been affected.
So, I’m able to run these transformations against that production dataset.
And is what ends up happening is I only create micro partitions if the ones that I changed.
So, you’re still using that same, you’re still having access to that same data that says that, that’s in this database, along with all the records that you changed here. So, it’s keeping that differential, and understanding how to piece those things back together.
So, you haven’t heard production at all. I haven’t touched this at all, so nothing has changed there.
And, you know, that’s, that’s really cool. From a testing standpoint. I can test against the same, exact same thing that I’m seeing in production.
And if I want to see both of the results together, right, I can query both of them. At the same time, I can see the difference between the two.
And then, just as a, you know, just as an example of time travel, I still have access to go back.
If you remembered, I grabbed that query ID, I can say, what did this query look like prior to this query ID.
So, if I do that, I’ll see the exact same data that’s in exact same data that’s in my production database, because I’m saying, what did that look like prior to that, with that execution rant.
So that’s, it.
Opens up a whole new way of being able to test because you’re able to test against production data very easily.
And then I can drop that database and be done with it.
And it’s, and now it’s gone.
And I have an affected production at all.
Let me just prove that by nothing in production has changed.
So that’s kind of a neat feature.
That’s kind of a neat feature for, for snowflake, is the data or the cloning and the time travel.
It’s pretty slick.
I think we’ve already touched on this last point here, the less administration versus a synapse having kind of more settings and more configuration ability.
Um, so, let’s talk pricing Just a little bit.
We tried to put together something high level to, you know, pricing is so, customer specific, but we did want to touch on how it works. We’ve already talked a little bit on some of this.
But compute and storage are billed separately in both products and its consumption-based compute, as we’ve talked about, all, about how much, how long you use it and how powerful it is when you’re using it.
And then that last point, there are options to pre-purchase or reserve some of this, some of these resources to save you money.
So, if you’re, are using them, and you’re running them all the time, and think about that.
So, Gary, you know, kind of walk us through here, this and apps, approach to pricing.
Sure, yeah, and this, this pricing is focused specifically on pricing related to the Azure Synapse SQL pools and not anything else. That’s available like transformations or other integrated services. But, for the most part, pricing is broken down into three different components, as you mentioned. You know, first being compute.
So, for the dedicated SQL pool, I mentioned earlier, you choose a service level based on your needs.
So, within those service levels, they have what they call Data Warehousing units. You can see in the screenshot there, the Smalls Being 100, and you, can see there, it starts at a dollar, 20 an hour.
You can scale all the way up to, I think, it costs like $360 an hour, and that’s where you get the full, you know, 60, 60 nodes and things like that.
But the one thing with dedicated SQL pool, if you know you’re going that route and you want to reserve space, you know, Microsoft does offer up to like, I think a 65% discount with three-year reserved pricing.
But, you know, depending on your needs, with Land Anywhere. Kind of between the lowest and highest amount. With the serverless SQL pool, it’s pretty simple.
It’s $5 per terabyte of data processed. Since if you’re not really storing data in a database, you’re really just consuming data directly against your data lake.
So, it makes it a bit easier.
But, you know, there’s different pros and cons to that, of course.
Within the dedicated SQL pool, you are storing data in your database. So, with that, it’s like $23 per terabyte of data process. You can see on a note there and kind of what’s included in the data storage.
And then with both varieties, you’re, you’re consuming raw file storage that’s in your data lake. And you can see the screenshot there, yes? like, I think, 15% per gigabyte in their premium tier.
So there’s different components and different pieces to the pricing.
Yeah. Lots of nuances for each organization.
And Reeves you kind of touched on some of the additions and credit pricing here earlier, anything to add, here.
Now, just, yeah, you’re paying, basically.
You’re paying basically compute costs and storage costs that are eager as costs that everybody has to pay for, if you’re addressing data in and out.
But if you’re all within the same provider, that’s, that’s not going to be, that’s not going to be a cost You pay for, but it is a cost that could happen, but it’s pretty straightforward. And they have, I think, as, as I was shown on the screen, A little bit discounted price if you buy price, if you buy storage.
If you pre buy storage, sir.
You want, one thing, if you that I was thinking about is, you can, you can also query within smelt like I didn’t mention it.
You can query within snowflake data that is in your, your data lake, without having to bring it into without having to bring it into snowflake.
Once it lands in snowflake, the causes, it’s fairly minimal. So that if there’s, the reason I would see is that as if you wanted to look at files. But don’t look at them very often, and just don’t want to bring them into your system. I can set up what they call external tables. And those external tables. I can query the data and the data lake without bringing it in.
So, we’ve tried to mock up a couple of just, use cases.
And, again, your mileage may vary based on your organization, but in this first example, we just said, OK, well, let’s pick the smallest compute that you can spin up on each platform.
And let’s assume a one terabyte SQL database and, you know, kind of ignore all other cost and just that you run that compute all the time.
And so that’s kind of what your, your monthly cost breakdown looks like.
In that case, you know, you can argue fair, not fair, based off of so many factors here, but we tried to, kind of flip that on its head with the second example, where we showed more of it.
Variable demand pricing, and using that auto start auto pause feature in snowflake can really save you a ton of hours.
And then, over on the Azure Synapse side, maybe you’re, you’re only running it during your business hours or something like that.
Know, there’s so many different ways to look at pricing and so many different organizational specific use cases. We would be happy to dive deep on that with anyone, but it’s hard to wrap up at a high level to say, oh, this is the one to pick based off of a price alone, if that’s your, your driving decision point.
So just a couple of things to think about there.
Then a little bit of a recap table to kind of show maybe the strengths and weaknesses of each platform.
I guess I knew Gary and Reeves could probably argue certain points on every single line here.
Um, you know, I revisit one of the last things we talked about. It, was that SQL Stored procedure? you want to touch on that? Because we haven’t, we didn’t, that’s one, yeah. That’s the one feature that I gave you, guys.
And one of those, one of those differences, the fact that a stored procedure cannot return result sets.
But it’s actually not that difficult, but there’s also the fact that you’re not going to be pulling back results from stored procedures.
So, there’s just a little bit different, and that’s the thing, and I want to call it out, and that’s why it’s Asterix, because it is on the roadmap to add SQL stored procedures.
So, that is something they’re doing and it’s just something that, they don’t, they don’t currently, got you Gary.
here, anything you want to point out at the high level here on the Azure Synapse column, maybe you don’t agree with or you think should be a heavy consideration.
Know, I mean, I think I already mentioned, you know, kind of the ease-of-use thing. I mean, I guess that’s a preference but, I mean, they’re both great platforms if you’re a Microsoft shop already. I mean, I think Synapse makes sense. But if you’re not and maybe you’re already heavily invested, like ETL tool against radicchio or, you know, data stage, I mean, Snowflake is definitely a great option as well.
That sounds great.
Well, Mike, I think we’re going to hand it back over to you at this point, that the kind of finish rates are comparison of the two products.
That was pretty seamless between the three of you. Everybody sticks around, we got some questions here in the question panel. I’m going to run through a couple of things here. As you can tell, there’s a lot to consider. When trying to decide which one is the best data platform for you. So, we have’s interests, we do this all day, every day. We got out of bed in the morning for our pants, eats and breakfast.
And we can help you do this to evaluate your unique environments and requirements and provide unbiased recommendations, whether that’s cloud hybrid or on prem, give you best practices to optimize your spend and performance, as well as guiding you, and, or even executing the implementation. So, we have that experience across these different vendors and can offer this expertise and a truly unbiased.
Quickly about Senturus, we concentrate our expertise solely on business intelligence with the depth of knowledge across the entire BI stack.
Clients know us for providing clarity from the chaos of complex business requirements, disparate data sources, and constantly moving targets.
We’ve made a name for ourselves based on our strength at Bridging the Gap between IT and business users. We deliver solutions that give you access to reliable analysis, ready data across your organization, so you can quickly and easily get answers at the point of impact, in the form of the decisions you make and the actions you take.
Consultants are leading experts in the field of analytics with years of pragmatic, real-world expertise, and experience advancing the state-of-the-art.
We’re so confident in our team, and our methodology that we back our projects with a 100% money back guarantee, that is unique in the industry.
We’ve been doing this for a long time, about two decades, at this point. Over 1350 clients, and 2500 successful projects.
Working across the spectrum, from the Fortune 500 down to the mid-market, solving business problems across virtually every different industry, and functional areas, including the office of finance, sales and marketing, manufacturing, operations, HR, and IT.
Our team here at Senturus has built large enough to meet all your business analytics needs at small enough to provide personalized attention.
We do, again, invite you to visit Senturus.com/resources
and bookmark that you’ll find today’s webinar, other webinars, and a bunch of other tips and tricks and great things BI, including our fabulous up to the minute easily consumable blog.
In terms of upcoming events, similar to this one, we have an anomaly. We have our Power Query Editor Jumpstart that we’re going to be doing actually on a Wednesday.
And then we have which is next Wednesday, same time.
And then we’ll be doing six ways to publish and share Tableau dashboards a couple Thursdays from then, so head on over to our resource page and register for that and keep an eye on it.
We’d be remiss if we didn’t mention our comprehensive BI training offerings. Bob mentioned the three platforms that we focus on here.
Analytics platforms that we focus on here at Senturus is Cognos, Power BI and Tableau, and we offer training in all of those and are ideal for organizations running multiple platforms, or, those, perhaps, moving from one to the other.
We offer all the different modes, tailored group sessions, mentoring, instructor led online courses, and self-paced e-learning, and can customize that to best meet the needs of your user community.
And then, finally, we provide hundreds of free resources on our website. Some of those I’ve already alluded to, and I’ve been committed to sharing our BI expertise for over a decade.
I went through that very, very rapidly.
We do have some questions that, hopefully, gentlemen. I’ve had a chance to take a look at the panel here.
And one of the questions was, can the version of Azure Data Factory and Synapse do that SSIS cannot and how about the converse of that?
Yes. I mean, you know, on the spot here, I don’t know that I can pinpoint something that you can’t do in one or the other.
I think if, depending on how good you are, you could probably accomplish the same thing in both, but the idea is with, With Synapse, you know, it’s designed for the cloud. So, it just makes it so easy to connect to all different types of sources, you know, whether it be something that’s already in Azure, or if you want to connect to something in AWS or Google Cloud. It’s just the idea of that, you know, with self-service and, you know, cloud in mind, this tool is designed for making it very easy to consume data from multiple sources.
Hopefully, that answered that question.
Is the data pipeline in Azure Synapse the same as Azure Data Factory?
So, it is mostly the same. So, Azure Data Factory would be the full featured data set or it has all the capabilities. in Synapse there are a few key things that are missing. But of course, and, you know, as is usual with Microsoft, they have a roadmap and, you know, things are on the schedule and just not there yet, but they are really the same. When you start working with them, they’ll look and feel exactly the same, but there are a few settings that you might find aren’t there and synapse?
Got it. On the snowflake side when compute is turned off.
What kind of performance can be expected from the first query issued?
Compute is stopped. It will start back up.
Most of the time, it starts up in a very quick manner, I would say, a couple of seconds, there are times I’ve seen where you’re looking for resources. It takes a little bit of time to spin up, but it’s nothing like spinning up your own compute. It’s, it’s almost instantaneous.
And the only place that that can become a bigger issue is with your own VPC, and that addition, you don’t have this pool. You don’t have as much of a pool spinning in waiting. So, there’s, there’s a little bit of a difference here, but within all the rest, it’s pretty quick.
I’ve seen, I think once or twice while I’m trying to do work where it took maybe 30 seconds or more, but not very often. So, it spins up very quickly And, again, it can spend down so you’re not paying for that cost.
Great, good to know.
So, we actually have a great webinar up on the Senturus’s resources website where we discuss five different ways of doing ETL and we do have several that talk about various different types of data preparation. Some are platform specific, some not. So, definitely head on over to that Resources page, and you can find that one.
The one that we’re referencing earlier is, was presented by Andy Kinnear.
So, you can find that up there on the Senturus’s website.
There’s one last question about what are the key criteria that would make you choose one over the other?
Gosh, that’s the million-dollar question, right? And I think I would argue that, I hate to say it, but it depends. And the key criteria are going to be the ones that are most important to your organization.
Is it scale, is it availability, is it costs?
And what are the relative importance of those specific features? So again, those are the sorts of things that we help people do all the time, so if you want to drill into that, we’d be happy to chat with you about that.
If any of you have anything to add to that, that I might have overlooked but that seems like.
So, with that, I’d like to thank our presenters, Bob and Reeves and Gary. Thanks for taking time out of your day today to join us for the Knowledge Series presentation. If you have any business analytics needs, we do encourage you to reach out to us, you can e-mail us at [email protected]. Still use the phone, we’ve got a 888 number down here for you, and we’d love to hear from you. So, hopefully, we’ll see you all very soon on the next Knowledge Series presentation, and have a great rest of your day. Thanks a lot.