Despite significant advances in data architecture and data technologies over the last decades, data modeling continues to play a crucial role in attaining timely, meaningful analytics. If, how, and where data modeling is performed have significant implications to performance, data governance, cost and user productivity.
In the Azure data architecture, data modeling can take place at different points and across various components. It’s important to understand these areas and how to effectively apply modeling best practices to them.
In this on demand webinar, we explore and connect the different elements within Microsoft’s modern data architecture. We discuss where/if data modeling fits, the tradeoffs, and use cases within each stage of the Azure architecture.
You will learn about
- Different data modeling approaches
- Azure architecture
- Best practices for modeling by stage
- Data ingestion/data lakes
- Transformation layer
- Data warehouses
- Power BI visualizations
Principal Business Intelligence Architect
Pedro is an industry veteran with over 20 years of analytics experience. He has been instrumental in architecting and implementing large-scale analytics systems, data models and data warehousing systems for clients in a variety of industries. Keeping pace with the constantly evolving BI industry, Pedro’s rich skillset includes Cognos, MicroStrategy, Informatica, Tableau and Microsoft Azure.
Senior Microsoft Solutions Architect
Fluent in the Microsoft stack, Steve has more than a decade of experience overseeing the architecture and design of data warehouse, BI, reporting and dashboard solutions. Curious about new technologies, he is constantly downloading free trials of new platforms and arranging meetings with their product teams to discuss an ongoing relationship where he is granted a license free-of-charge in return for ongoing feedback and thoughts on the current state and future releases.Read more
Hello and welcome everyone to today’s Senturus webinar.
The topic for today is Demystifying Data Modeling across the Azure Architecture.
We’ve got a full agenda today.
We’ll start off with some introductions, discuss what is data modeling, data modeling myths and then where to model OLTP sources, the warehouse Data lakes.
We’ll touch on a little bit of additional resources and about Senturus and then we’ll get to your Q&A at the end to answer any questions that haven’t been answered in the chat.
Our presenters today are Pedro Ining.
He is a Principal Business Intelligence Architect at Senturus.
Pedro is a veteran data architect and his skills have evolved with industry through multiple iterations of BI platforms including Cognos, MicroStrategy and Tableau.
We also have Steve Nahrup, our practice Lead for Microsoft Fabric.
Steve is fluent in the Microsoft stack and has more than a decade of experience overseeing the architecture and design of data warehouses, BI reporting and dashboard solutions.
And I’m your host, Todd Schuman.
I have over 21 years of experience with analytics across multiple industries and I’m also Microsoft certified Power BI Data Analyst.
Before we jump to the content, we want to do a quick poll just to get a feed on today’s audience.
If you could take a second to answer the following question, we want to know where do you typically do your data modeling?
You do it in the Data Lake data warehouse, in the BI tool itself.
Go ahead and spend a couple seconds here just to click and vote on what you typically do.
This is a multiple choice, so if it’s more than one, feel free to click multiple choices here.
And I’m going to leave it open just for a couple seconds to get a full audience participation.
So we’re about 2/3 there, All right.
That’s pretty consistent.
We have a kind of a tie between data warehousing and BI tools.
So let’s go ahead and share the results.
So you could see it’s about 66% of you are doing a data warehouse.
The other 66% of you are doing it in the BI tool.
The data Lake is very small, about 4% there.
And we’ll kind of touch on these three different places in today’s webinar.
So thank you so much for your input.
That said, I’m going to turn it over to Pedro now to go ahead and present the content.
Thank you, Todd.
Make sure you guys can hear me, assuming that’s an OK, That’s a yes.
So data modeling is such a wide topic, right?
So let’s start the webinar a little bit with some definitions.
And as we get into this whole topic of data modeling for analytics, there’s a lot of different kinds of goals and organizations are trying to do, right.
Ultimately, our analytical kind of depends on we wanted to do analysis of data from disparate business systems, right.
So a business is producing a bunch of data and it could be maybe just a simple sales or the business which is producing data for sales and orders.
But increasingly, a lot of our businesses are getting data from Internet things type of data.
We’re getting data from Web APIs.
We get data, of course from our traditional ERP systems like Salesforce and Oracle.
And we have to make sense of everything, right?
We our goal is we need to extract some of all of that data and we want to integrate that data so that ultimately we can get a clear, concise and accurate delivery of reports and visualizations, so we can analyze the performance of our business etc and also show it may be in more visual format.
That’s really kind of our goal from analytics.
And on one side of benefit too is that we want to try to work towards a self-service environment so that we don’t really become a rapport factory, right.
We want to be able to empower our end users to be able to do analytics without a lot of roadblocks and things like that.
So those are some key goals.
There’s a lot more out there, but those are some of the key things that we want to do.
When we talk about analytics, data modeling for analytics, you can come up a lot of definitions.
Here’s one I think is pretty concise.
It’s a process, right?
We are, it’s a process used to visualize and understand the structure of information, support those analytic goals, right.
And the data architecture for analytics are all the things I’m a design and structure perspective of the data environment.
So that means the source data environment that your organization has, all the different systems, all the business processes that are producing that data, understanding all that, then eventually the technologies that support the data and analysts and reporting.
That means all your databases, your data warehouses, how you’re ingesting it.
Of course now we’re going to be talking a little bit more about the data architecture as regards to Azure, the modern data architecture if you will to support that and where the data modeling kind of goes and play and plays in.
So what is data modeling ultimately?
Yeah, we have a lot of source data, a lot of textual data coming in.
We have semi structured data, a lot of JSON file from us maybe because our businesses now are actually subscribing to web-based services, right.
And a lot of them are just API calls.
It’s no longer just simply relational structure data where we have the full ERP installed on our on premise systems and we can get the relation databases.
We still have that, but a lot of those systems are in the cloud too as well, but they still have relational models, right.
So we have all this data and what are we doing?
We need to again extract, parse and select the data from all those different sources semi and unstructured.
And ultimately what we’re trying to do is, we’re trying to get to really simplistically if you think about it, we’re trying to come up to a joinable structure.
We’re trying to get ultimately to get those measures from the source systems in a structured manner.
We want to come back to get dimensional elements.
And if you think of the old, well, not old, but I think it’s still very pertinent is the star schema A dimensional model?
And you could actually make this more of a logical thought process in a way because there’s a lot of places where you can do the data modelling.
And maybe we’re not implementing a really physical star scheme in a sense.
But in actuality, what you’re trying to get to, whether you do it in Power BI or data warehouse, you’re trying to Get the facts and dimensions, you’re trying to get to things you want to analyze and measure and you want to get to that by word, right.
You want to get back to, we want to analyze sales by customer by a certain time frame.
And even if you did, what’s new out there is this one big table approach.
OBT put everything in one big table.
Ultimately, those dimensions and facts are still kind of embedded in there.
So in a sense, we are trying to get to this point from a data modeling perspective, it’s not a webinar what a start schema is, but ultimately a business is measuring things, We’re getting facts, we’re getting measurements and we have things we need to group by and analyze.
So this is one way to kind of look at it.
Now if you think of the data modeling touch points, if we think of the reference architecture that we have nowadays, not just simply sources, data warehouse, BI layer, right, We have all these sources which have expanded on the left.
We’ve got log data and media files of course your traditional business applications, we’ve got sensor data, IoT, these things could be streaming in every second, right.
We got an ingestion layer now which we are doing in the cloud with potentially Azure data Factory data bricks for more advanced processing.
We have a data lake now there and we’re going to be probably producing other webinars specifically on the different components of the data Lake, right.
We have Azure Data Lake, we have Azure Data bricks, Delta Lake where we could do some modeling touch points as well.
The transform layer, you don’t think that there’s a modeling touch point in, but in a sense we are taking that data, we’re getting it ready, we’re staging it, we’re transforming it.
We’re modeling that data in a sense to get it ready for maybe consumption in the data warehouse.
Your traditional data warehouse, which now has a lot of different options because we have the traditional Azure SQL database and now we have the Synapse Analytics dedicated pool data warehouse used to be called SQL Data Warehouse for more bigger data sets, more compute power, massively parallel type of data warehouse.
And we have different ways of modelling in there in that concept too in terms of a star schema, one big table, you know, but you’re still kind of getting back to that dimensional framework.
And then of course you have the Power BI side of it too where we could do modelling in that side, we can model back to the source, we’ll talk about that.
But these are just a high level reference architecture.
Now we have where we have this data architecture in the cloud, all these different places where we can go to OK, some data modelling myths will kind of go through that.
So where you model, it’s not impact report performance.
And I mean the reality is you got to get the location right and you got to get your expectations right where you model within that data pipeline because you can do it in several different places now, for example, the data lake.
So the data lake could be your first point to where you’re getting your data in and maybe you haven’t had the time to pump it through the entire data warehouse architecture.
Data lakes are really good because they have high, you can put all kinds of storage in there, high data volumes, a mixture of unstructured and mixed structured data.
You know, one use case there could be you just merge with a brand new company.
And I need to analyze the customer list.
I need to analyze their sales.
They’ve given me these things and files, file structures, right.
And I need to get to it, OK.
I could put it in my data lake, I could model it a little bit there, maybe potentially organize it better, cleanse it, maybe slightly, maybe not depending on your use case.
If I got it today, I need to do it now.
I can do the modelling there, but maybe the report performance might be impacted because it’s not cleanse, it’s not highly performing, but I can do it.
So that’s one thing you have to think about.
Of course, the data warehouse you know best for relational data, that’s your governor’s source of data typically.
And then your BI system, all these things you have to think about because it’s controversial in the sense that the performance will be impacted depending on where you potentially model.
And sometimes from a resource perspective, the people that are doing the analysis might not know a lot about modeling efficiency, but they’re going to point right back to a lake house or data where a data lake or files in the BI system.
So expectations of report performance need to be managed as you pick the area where you want to model. Data Modeling is solely a technical task for IT.
So what this is trying to say is you know before there could have been a lot of organizations did have a dedicated group within IT typically that did the data modeling.
I remember the days where we had a data warehouse and before we could add to the data warehouse, we had to consult with the modeling group.
The modeling group had to sign off on certain things.
They did the modeling, they had the big ER diagram, they did the DDL, all this kind of stuff.
But in reality now in the new paradigm, you know, it’s basically a collaboration across both IT business analysts and end users to produce a functional data modeling.
It’s more collaborative in nature, right.
You still need IT for the infrastructure, but a lot of the cross functional benefits will be obtained by getting that synergy between technical feasibility and real world applicability, right.
So the business folks need to get to their data.
IT can help you to understand where that data can be modeled at, right?
And this collaborative effort is going to lead to more accurate user centric etc type of benefits, right.
So those are some of the reasons why it’s controversial.
You know if we restrict modeling to IT, it’s going to overlook those crucial business insights from that perspective we get these strategic misalignments if it’s totally devoted into IT.
So that’s something we always think about in terms of some of these myths, another myth you could do all the modeling in Power BI.
Well the answer is, you know you can, yeah, but do you really want to, right?
Power BI is very adept at doing modelling tasks.
Do you have Power Query in there?
But do you really want to try and build that star schema or maybe not necessarily use the word star schema but your dimensions and facts in Power BI, right.
There could be a lot of complex data transformations that are needed because ultimately the source systems are very complicated and what you’re doing is probably building a lot of that in the Power BI model and from a downstream perspective, right.
But a lot of that can happen upstream, right.
So the point of this slide is that, yeah, you can have really good time to build in terms of doing everything in Power BI, but a lot of the performance maximization can be done if you do a lot of the transformations and ETL upstream and leverage some of the resource up there Now.
So let’s talk about that.
We’re going to go through some use cases here.
We’re going to do a little demo and one area that we just talked about is where to model, maybe do we do this directly on the Power BI layer, the Power BI modelling, like half of you that says I do all the modelling in the BI tool, right.
So what does that imply?
That kind of implies the fact that you are going from your source systems, you’re doing like that ETL and ELT process directly into Power BI, right.
So some of the pros for that approach are things like, OK, yeah, I have a speed to report.
I can actually bring all my data and not worry about the stuff in the middle.
OK, I don’t have to worry about the humans, I don’t have to worry about the services.
I have full control.
I don’t have that intermediate step where I have to wait for a table to be created in dated warehouse or something to land on a lake.
I can go right to maybe to the files or the business applications that are out there.
It’s flexible, it’s user friendly, and it’s good in the sense that it could be an initial prototyping step, provincial, upstream changes, right?
So if you think about it, I have to add a column, but it’s not in the data warehouse.
But I need to do this calculation.
I need to get it over to the Power BI, I need to serve it up in a report.
I’ve done all the transformation and Power Query, etc.
I’ve got the steps I need to create that calculation.
Well, that could be a basis for eventually putting it upstream and having it part of your data lake or your data warehouse, etc, right?
But the con is, again, your source systems are very complex.
You might be spending more time trying to build dimensions and facts from your source systems as you bring it into Power BI.
OK, you’re going to be doing a lot of DAX.
You’re going to be doing a lot of complicated DAX in Power Query.
Model duplication is another kind.
So I’ve created my great Power BI model.
I had everything in there.
The guy in the other department is doing the same thing against the same source system.
He’s redoing the work.
Maybe he’s doing it slightly different.
Maybe he actually did it wrong and you guys go to the same meeting you have the you’re showing a report and there’s slightly different answers, right?
So that single source of truth is not there when you have multiple Power BI models, maybe multiple Power BI data sets out in the organization and of course maintainability, right.
So if you’re creating a very complex Power BI model, which is really based on a lot of complicated DAX and Power Query, are you going to remember that six months from now what you did, right?
Or if somebody leaves your organization and somebody else has to maintain it and they open up that Power BI model and sees all these complex transformations.
So those are some of the things that you have to think about when you do model directly in the Power BI layer.
But now I’m going to turn it over to Steve Nahrup.
He’s got to show a little demo.
Well, that kind of looks like one of the things he has to do if he has to do the modeling potentially directly in the PI and the BI layer.
So Steve out there.
OK, here we go.
So the most common, one of these really complex sources that you know Pedro did a wonderful job walking through was taking a transactional type database such as an OLTP, ERP system.
And you can see we’ve somewhat simplified it based on color code and the objective would be to create dimension and measure tables based on you know, the color.
So the reason that you’re doing so, we’re just going to walk through quickly how you do this, like Pedro was saying, this is you really do not want to, you know, redo all of these complex transformation steps directly within Power BI.
It becomes very convoluted, complicated and it’s very hard to replicate across departments.
So what we recommend is instead of doing what you know doing here, highlighting all of these sales tables and then basically grouping it and creating one large table for the dimension you would like.
I was just saying you would just create a single fact table and we recommend doing it.
Either you can do it in Power BI like Pedro was saying in terms of you know, yes it gives the best business user flexibility and it just it doesn’t, then allow you to maintain the ability to scale across like department, division etc.
So instead of doing that, we recommend either putting you know, if your business user getting in touch with an admin, or you know the data steward of a specific data source that’s familiar with SQL, and testing them to create all of the necessary joins in a convoluted, complex data source like this.
And they can provide you with a single query.
Sometimes they’re really long and very complicated and they can actually end up looking like this.
So each one of these lines, when you see a joint, those are.
That would be the equivalent of adding two or three steps just to.
Merge tables just to pull in their mention for a specific column.
Very convoluted, very complex and way too much so we recommend pushing it up further upstream.
Either, you know, in the data flow, in the data like in the service so that other people can leverage the work that you did or pushing it even further upstream and creating a view in the database or a materialized table using a stored procedure or something along those lines.
Thank you, Steve.
So just to add on to that.
So what he’s trying to show here too is the fact that this is typically like probably like some sort of Oracle ERP application and they’re really just trying to come up with a fact table for PO facts, right.
And as you can see you’ve got about 20 joins in there, inner, left, right, outer joins just to get maybe the business requirements here and if you had to do that in, Power BI, that’s a lot of work, right?
I mean from Power Query perspective, even creating this particular view up on the source system and then having Power BI hit it embeds all this logic in one particular view and imagine changing this over time.
So this is really the point of Power BI going against the source system and then maybe even leveraging something else because you need to have Power BI more optimized.
The other area where we want to talk about now modelling Power BI against the data warehouse layer, right.
So in this particular example, we did build a warehouse, it could be a Synapse data warehouse because we have a very large volumes, it could be like we said in Azure SQL database, right.
We’re doing, they’re doing the ETL there and then you’re importing this data into Power BI.
So what are the benefits here?
Obviously this is the classic approach.
We all know about it, but it’s easy to use, understand that particular table count is much smaller.
So if you had about 50 tables from an Oracle ERP system that we just showed you how to join, after you’ve done all the ETL work into the warehouse etc, you’re down to maybe a five table count, right, faster queries.
And the key thing of it is, it’s governed data, right?
This has been done, it’s very structured, it’s been queued, it’s been signed off, right.
Obviously the cons are that it has more.
It has longer development time, right?
It takes a while to build to do this.
If you had to add 1 column, the changes also require more time.
And of course there’s an infrastructure expensiveness.
From a cloud perspective, there’s compute you have to work with to do this.
OK, so now Steve’s going to actually talk through this and do this.
Same thing with Power BI.
So, Steve, go ahead.
So Yep, so this is your standard star schema pulled from a SQL database.
This end result, what you see here is very similar, if not exactly what you want the model within Power BI to look like.
And that’s what Pedro was talking about earlier.
So the dimensional tables or filter tables are on the outside the at the point of the stars and you have your fact table directly in the center.
So here we’re just going to do what we did in the last one except we’re going to connect to it and we’re going to pull in all of the tables, the dimensions and the fact and luckily you know on in this database you have pre-existing primary keys and foreign keys within the database itself.
So we’ll see in a second why that’s beneficial.
But yeah, so normally you would have to bring in these tables that wouldn’t be connected or have any relationships whatsoever.
But even when you have to remember, even when tables do have like existing relationships, more times than not they’re somewhat accurate.
But you want to double check that they have a date needs you know needs to connect to all the dates in the table and just making sure that it’s one to many, not many to many.
You want to avoid that.
And so Yep, you can go ahead and OK, great.
So as you can see, right, the table count went down.
It’s a lot easier to model.
It’s easier for and used to use this particular approach and it puts all the lot of the business logic back up onto the data warehouse.
OK, we’re pretty familiar with that.
So the other one I’m going to talk about now which is kind of interesting to me is now we’ve got, it’s not you right, it’s been out there for a while, but maybe for a lot of organizations and we saw that from the poll only like 4% of you are actually in a sense modelling on the top of the Data Lake layer, right.
So with Azure and others other cloud vendors, right, we can store data on the Data Lake.
I just got Data Lake Gen.
2 here, ADLS if you want to call it from Azure.
We can point Power BI now to the Data Lake.
And for those of you who have been kind of kicking the tires on Fabric, that is their whole foundation piece in which now they’ve renamed 1 Lake, they’ve added things on top of that.
Parquet Files really pretty much make this the point here, right?
So here again the pros right?
You’re bypassing the ETL of a data warehouse scheme, right?
Since I’m bypassing start schemas and data warehouse, I don’t have to wait for the data engineers to get the data out of the data lake and put in the data warehouse and then have Power BI go against it.
Fast development in a sense, no compute resources, they’re very minimal, right?
Because what you’re doing, you’re paying for storage on a lake, right?
You’re having Power BI go against that this, the computer resources are nowhere near something like an Azure SQL pool, dedicated pool, right.
And like I said, and this is this whole concept now is even more prevalent with fabric, which we’ve got a bunch of webinars.
We’ve got a webinar coming down, our fabric coming out.
And this is what they’re kind of pushing in terms of a unified approach.
Some of the cons, well, the data, yeah, it may not be cleansed, right.
And depending on how you organize that data lake you, it becomes a data swap.
But maybe you have a very, organized data lake.
Maybe they’ve just done enough cleansing.
It hasn’t made the data warehouse yet.
But I’ve actually, I haven’t created maybe the files for fax and I’ve got my dimensions etc maybe but you have the risk of maybe going against not cleanse data, but maybe you have that usage scenario.
I need to look at it again.
Back to my scenario where you’ve just merged with a brand new company.
You need to look at that data.
I need to look at it now.
Depending on the file structure, you still could have potentially heavy modeling in Power BI.
Yeah, maybe the files are not just simply a customer dimension and a fact file.
Maybe everything’s all put together.
Maybe there’s things you have to cleanse and you have to transform.
You have to take it and have to create complicated DAX calculations.
I have to do a lot of stuff, empower our query to get it to look right.
And so those are some of the things.
But I think the key thing here is really I have data in the day lake and I need to analyse it.
I’ve got some sensor IoT data.
I can do it now and I can do it pretty efficiently now with the new Park a formats that are available in Data Lake.
So what I wanted to show here is just a demo and this is not a fabric demo, but basically it’s trying to show you how that kind of works now through the pieces of the architecture.
This is Azure Data Lake Data Explorer and what I’ve got here our storage accounts and I do have ADLS Gen.
2 account and I’ve kind of what I’ve done here is a scenario or use case to where I’m actually I’m going to a folder or a block container called star schemas and somebody that’s out there actually put some dimensions and facts for you.
I just want to show you the mechanics of how Power BI, it kind of does this, right?
So I’m going to push the little play button here.
And yeah, we’re talking about Explorer.
We’ve got a container, so let’s start schema.
We’ve got folders for dimensions and fax and maybe they are kind of cleanse, right?
I want to bring in my fax sales.
I’m going to go to here and I’m going to copy this endpoint.
I’m going to go back to my Power BI application.
I’m going to press get data.
I’m going to go ahead and look for the Azure ADLS Gen.
2 Data Lake Storage Gen.
I’m going to connect to it.
I’m going to go ahead and paste that URL in right.
So if you I see 2% of you have not done this or 2% have only done this.
This is one way of doing it.
I’m going to.
I can see that fax sales CSV here now and it comes up with this query.
One additional step you have to do is instantiate that table.
It shows you that it’s pointing to the Data Lake storage account with a URL.
If I click on that little binary hyperlink, it actually loads the table and I’m just going to rename this guy Fax Sales, OK, And I’m going to go ahead and close and apply that Power Query transformation.
It loads it and now I have a new table here called Fax Sales with data inside it with all the different fields that were in that CSV.
OK, now I’m going to go back to my data explorer and I’m going to go in the Dimensions folder.
I see my files there and this way I’m going to go ahead and copy the URLDFS endpoint to the dimensions folder directly.
And the difference is it’s going to bring all those files in.
I’m going to show you how that works.
I’m going to connect and I’m going to put that URL in just before and I’m going to go in and I can see those files there.
Customer, date, product, I see those files there.
I’m going to say load and again I’m presented with the query.
The one little difference I’m going to say here is that I’m going to actually instantiate these actual separate queries or separate tables.
The way I do that is I right click on that, I say add as New query and I’m going to rename that to and load it.
I’m going to rename that to Dim Customer.
OK, rename the Dim Customer.
I’m going to go back to that reference query one and then do the same thing for each one of the other dimensional tables, right.
I’m going to add new query that’s my date dimension.
Say for example, load that, there’s all my dates, OK.
And that’s one of the common things that probably is going to be out there is a date dimension.
How much more common across an organization can you get other than a date dimension, the product dimension?
OK, add as new query and reload that, rename that.
OK, call that Dim Product and I’m going to go ahead close and apply the Power Query editor.
I’m getting rid of that query one because I don’t need that anymore.
That was more of a way to get these guys loaded, close and apply, OK, loading the data into the my Power BI model.
And I’m just going to organize these things.
What do we come back with now?
We’re pretty much looking like a relational data warehouse schema with customer, Product, date, fax, Sales.
Power BI has recognized that the keys are matching on both table and dimensions, so it kind of infers a join there date dimension.
I’m going to have to join that to that.
I’m going to do additional clean up on the Power BI side for renaming metadata, semantic layers, maybe add some calculations.
But the point of it is this demonstration is trying to show that I bypassed data warehouse.
I went to the data lake.
In reality, I’m pointing to CSV files.
I would have probably point to parquet files because the benefit of parquet files is they’re very highly compressed, they’re high performance, they have metadata there and data typing, unlike CSV files which you know.
If you load the CSV files you can get a lot of misconstrued data or errors.
But this was just from the point of an example.
If you point to parquet files, you’d have the same thing.
What you’ve got there basically is again with dimensions and facts from a data lake.
And so a little bit of Power BI modelling inside here.
But there was some modelling in a sense happening at the data lake layer where somebody actually organized it and created the facts, created the dimensions.
And this could be that use case where these are some files that I got from my new merged company, right?
And I need to analyze it.
So somebody went there and gave me the new customer list, gave me the new product list, gave me the sales from that company, put it together and I could do that.
OK, so let’s go ahead and kind of wrap this up a little bit here.
I’m bringing this back again, the reference architecture, because what we tried to show you here was all the data modelling touch points, right?
And in the data lake, which is a complete webinar on its own.
We have a lot of options to make that a really nice data modelling touch point which I tried to show you depending on how you organize and there’s a lot of ways out there in the industry.
They’re kind of referencing this lake, Delta Lake, Azure Delta Lake, Data bricks architecture is, if you curate it enough, you create all these different zones in your lake house or your delta Lake.
They’ve used concepts like bronze, silver and gold, which bronze is raw, Silver is cleanse, gold is curated right and you’re hidden.
Maybe the gold structure, you can use this as your an initial point for data modeling to bring it into Power BI right?
Like the bottom arrow is showing Power BI going against that gold structure right.
Then you have your other traditional places like you know your sit ups analytics, your SQL database.
This is just bringing home the different points.
So really what we’re trying to get to here is this is Roche’s Maxim that you’ve seen a lot of stuff that in social media, this guy’s name is Matthew Roche, he’s a Microsoft Program Manager with the Power BI Advisory Board.
He came up with this very nice maxim.
You know, it’s basically saying data should be transformed as far upstream as possible and as far downstream as necessary.
Upstream meaning it’s closer to the source of the data, Downstream meaning is more closer to the consumption of the data, right.
So although there are all these modelling points that we can now do in a modern data architecture, it’s always best practice to create all those calculations, those transformation points as far upstream, farther you do it even if you’re hitting the data lake, if you’ve actually done some transformation and you within the data lake you create a gold area where that stuff is ready for consumption.
It’s not ready to go into the data warehouse yet.
It’s ready for consumption.
If you do that there, the heavy work on the Power BI side is less and the possibility of errors with calculations is less.
The maintainability is better, right?
And as you’re leveraging the data architecture as much as possible, even though Power BI itself can do all those things, OK.
So I just wanted to wrap everything with that.
Up with that and next slide here, I’m going to turn it over to Todd, see where we are.
All right, thanks, Pedro.
Lots of great information there for everyone still out there.
If you have any questions or need guidance on how to data model in your environment, please reach out to us at [email protected].
I’m going to touch on a couple additional resources.
Senturus provides hundreds of free resources on our website in our Knowledge Center that are comparisons.
We’ve been committed to sharing our BI expertise for over a decade.
So just go to Senturus.com/resources and take advantage of all this free information.
We’ve got some upcoming events that you might be interested in.
We’ve got a webinar on Accelerators for Cognos to Power BI Migration.
That’s on Thursday, November 30th, 11:00 AM Pacific, 2:00 PM Eastern.
You also have a new exciting webinar on December 7th, Microsoft Fabric Architecture or Marketecture.
We’ll talk about is the hype real?
We’ll touch on things like Data Factory, Data Warehousing, Data Sciences, One Leg, Power BI, a lot of good information in that webinar.
We also have an in person workshop on Microsoft Fabric TBD on the date and location for that.
So again, make sure you’re signed up for our emails and check out senturus.com/events.
A little bit more about Senturus.
We provide a full spectrum of analytic services and enablement and proprietary software to accelerate bimodal BI and migrations.
We particularly shine in hybrid BI environments.
So no matter how big or small your project, Senturus provides flexibility and knowledge to get the job done right.
We also have a long history of success.
We’ve been focused exclusively on business analytics for over 20 years now and our team is large enough to meet all of your business analytic needs, yet small enough to provide personal attention.
I am going to open it up now to some Q&A.
We didn’t get too many questions during the webinar, so if you do have questions at this point, please put them in.
We have a couple I can start off with here.
I don’t know if Pedro or Steve you want to feel these.
Maybe this first one is good for you, Pedro, since you’re also a Cognos guy.
Question is how does Senturus recommend we build layered models thinking Cognos Framework Manager model where we have a Power BI perspective.
I’m actually on a project right now where they are definitely using Cognos, but they’re leveraging the Azure architecture, right.
So and they actually created a Synapse data warehouse.
I don’t know why because it’s a pretty big footprint.
But they’re using that for data that might they might need to bring a large volume of data in.
But they’re using Cognos Analytics, and what I’ve seen done now is actually using more of the data module concepts within Cognos, right?
Framework Manager is great still being used, but for newer implementations, especially against like an Azure framework, using data modules I think gives you the best flexibility and I’ve actually just quickly spun it up and use it against that dedicated pool sit ups analytics and created dashboards with Cognos analytics against it real time without actually creating.
If you’re familiar with Cognos data sets similar to Power BI data sets, we can actually take it out and create parquet files within Cognos.
But the performance I thought was really nice.
We’re using a data module, so I think within the Cognos framework, especially for new implementations like that.
Cognos has mimicked a lot of the features set of Power BI and that they now you know for a while now they’ve created data sets, data modules, they don’t have the transformation capabilities like Power Query that BI does, right?
But I would start with using data modules against an Azure environment, whether it’s actually SQL Data Warehouse or cloud based databases, and I think you’ll find a pretty good performance there.
If you don’t, then using data module as a source, you can create Cognos data sets and instantiate those parquet files within the Cognos system itself.
So it’s kind of funny because in a way you’re kind of creating a Cognos lake, right?
Those are parquet files on Cognos, but you’re using the SQL Data Warehouse or dedicated Synapse SQL pool up there on Azure because they’ve created that architecture and you’re basically extracting parquet files and putting in Cognos.
So, long winded answer, but there’s a lot of ways to do that, a lot of interesting ways and new techniques really make Cognos shine even on top of a Microsoft Azure architecture.
So hope that answers your question.
And along those same lines, is there any sort of, you know, if you’re familiar with Framework Manager, different layers, like a database layer, transformation layer, business layers, Is there any equivalent that you recommend in Power BI similar to that?
Power BI Yeah, that’s a total different paradigm.
And what we’re referring to is within a Framework Manager, you typically set up a database layer and then a presentation layer.
I’ll let Steve answer a little bit of that.
But to Steve, you’re talking about how you basically you bring your tables in and then you create a semantic layer on top of that.
What’s the equivalent of that?
Just a star schema approach where pretty much bring it in and you create your relationships directly within the model view.
So not a ton of transformation in each table or in Power Query, but more so just it’s kind of in that end state star schema that you want to get to eventually.
So basically you’re you have to bring a lot of raw tables in.
But then maybe using Power Query you’re creating merged query subjects.
I’m trying to relate it back to Cognos Framework Manager, right.
You’re creating other views on top of what’s in there and re renaming the raw table columns somewhere in that in the Power BI framework, right.
We don’t have any other open questions right now.
Again, if we will prefer a couple of chat, Todd.
In the chat.
Yeah, there’s one in the chat I think.
Let me just double check.
It was from Duraid and the question was what’s the difference between getting Dim in fact from a container and from getting it directly from the Azure, getting it directly from the Azure Data Lake.
He makes the point that eventually you might be doing the modeling part at Power BI level.
Anyways, that was just really try to illustrate.
Maybe we didn’t talk about this.
We’ll probably have a webinar on this, how you structure a data lake.
And that was kind of an example of a very curated gold section where somebody has actually done that for you.
But maybe a lot of times you’re going after a raw file, one big file which has all the measures, your customer information, your product information, all there.
So you won’t have that luxury, right?
So if you’re doing that, you will have to do more modelling.
And within Power BI, because basically what is a customer dimension off one huge file, you’re doing a select distinct customer attributes from that and coming up with a customer.
How are you going to get the products out of a one huge file?
Well, you have to do a select DISTINCT and model it that way, right?
So you’re going to have to do a lot more work.
So I think we’re going to have a more detailed discussion on properly setting up a data leg structure for consumption.
And it’s usually divided into basically the raw area, maybe a bronze area where you have done some cleansing and a gold area where maybe it looks like that.
I have a folder where I have set up all the facts and dimensions and customers.
Because one thing from a even like a customer list, you might have in the raw area, customer extractions from five different source systems, all the different ways of representing customer.
And then you as a Power BI developer will have to bring all those files in and do some Power Query transformations just to cleanse it up because you’re going against the raw area.
But if you go against the gold area, maybe somebody has done an integration for you, gotten rid of duplicated customers.
Simply take taking lowercase to uppercase, fixing the zip codes, all that stuff you have to figure you figure out and do on a customer.
Just a customer dimension itself might be already in the gold area.
So that was basically the whole point of that.
Here’s another question.
So the point of using containers is to clean and simplify the views to be used in Power BI level, correct.
And yeah, going back to my last answer, yes, in a sense, it’s organization, right?
So and I think if you think of Data Lake as really just a bunch of files and Microsoft says, well, OneDrive has become one lake, right.
It’s really a way to organize your files, containers, think of them as just simply directory structures or folders, right.
Within the containers you have other folders.
So if you’ve got a container called raw, right?
That’s just really it’s your dumping zone, your every day you’re going out to a Web API call and you’re dumping all this data in the raw folder, hasn’t been even touched yet, hasn’t even been luck, right?
Another container will be called structured because you’ve taken that around and you’ve structured out for consumption.
But that’s where you have to leverage getting the best structure format for your lake for consumption by Power BI, because if you don’t, then you have more work at Power BI because maybe you’re going against the raw area.
Just a quick question on here about when we will be discussing Microsoft fabric.
There is that upcoming webinar on Thursday, December 7th.
Again, just go to our senturus.com/events and you can register for that.
You wrapping it up, Todd?
Yeah, I think that’s it.
So again, thanks everyone for joining.
We will post this webinar and the questions to be answered up on the senturus.com website.
So feel free to check back.
And thanks everyone for attending.