Python is now the top programming language for data analytics, outpacing R and SAS. Why this popularity? Python is open-source, easy to learn, scalable and flexible. It can handle large volumes of data and has extraordinary libraries for data science. Having Python skills is now pretty much a standard requirement for any data science role.
View this mini-class and get started on harnessing Python’s power. Long-time Python developer and instructor Mark Copley shows you how to use Python for ingesting, cleaning and visualizing large data sets. He uses popular tools from the Python environment, including NumPy, matplotlib, pandas and Jupyter notebooks, and apply them to data sets from real-world NOAA temperature and wind information.
If you’ve been wanting to learn Python, get started now.
Python Developer and Instructor
For more than 20 years, Mark has been an instructor, developer and consultant specializing in Python, Perl and Ruby. He holds an MS in Computer Science from Brown University and a MS in Electrical Engineering from the University of Colorado-Colorado Springs.Read more
Welcome to today’s Senturus webinar on using Python to create visualizations.
Happy to have everyone here today. Just want to do a little bit of housekeeping before we get started. On your GoToWebinar control panel, you’ll see a questions panel. If you have any questions, as we’re going through the presentation today, you can type them in there. We may try to answer some of those in real time, but if not, we’ll be answering questions at the end of this session.
If you’d like to get a copy of today’s presentation, you can find that on our website: senturus.com/resources. There will be both the PowerPoint deck and Jupyter Notebooks that you can download, take a look at our website to pick up those items.
In terms of today’s agenda, we’ll do some quick introductions, review of popular tools related to Python and data visualizations. And we’ll go through a few demos and intro to putting with Jupiter Notebooks. Some overview of the popular plots that are available, particular use case around visualizing when the information is pretty and an interesting demo. Then we’ll do a quick overview of some Senturus additional resources that are available, and as I said, we’ll do Q&A at the end to answer any questions you have.
Couple of quick introductions here. Our presenter today is Mark Copley. For the last 20 plus years has been an instructor developer and consultant. He specializes in Python and Perl and Ruby. Mark holds an MS in computer science from Brown University, and an MS in electrical engineering from the University of Colorado. So we’re happy to have Mark here today To take us through the ins and outs of Python visualizations. I’m Steve Reed-Pitman your host for today. I’m director of Enterprise Architecture and Engineering here at Senturus, if you’ve been on many of our webinars, you’ve probably heard me here before, I’m the bookends for the beginning and end. And I’m happy to be here and happy to have Mark leading us through today’s presentation. So with that, I’m going to turn it over to you for an overview of some popular tools.
All right, thanks a lot Steve. I just want to pop back real quick in case and I was wondering I’m the one on the left with the sunglasses on the right. That’s buddy the dog who’s actually here with us as well. But right now, he’s sleeping and that’s on top of Pikes Peak. So, as he said, first thing I wanted to start real quick is just give you a heads up on what we’re going to be touching on, just as a kind of introductory level on some of these topics. So the first thing, obviously, we’re going to be talking about is Python.
One of the great things about Python is, it’s not just limited visualizations, of course, it’s a general purpose programming language. And it has a very large, very collaborative ecosystem of people out there. Constantly creating stuff to meet their own needs, and then sharing it with the community.
So pretty much anything you’d want to do in the computing space, there’s people doing it in Python, some of the hot areas are data science and machine learning, both of, which involve a lot of visualizations, for sure. So today, we’re going to be using Python as the underlying behind the scenes to prepare some of the data before we visualize it, maybe transform it and do some modeling of it.
The module that underlies almost all kind of scientific computing or data science computing in Python, is called NumPy.
And what NumPy provides you is very efficient, very fast, highly optimized code to do any kind of computational stuff. Most of the code in NumPy is actually written in C or Fortran. Some of this numerical computation code that’s been around for 50 years, written in Fortran, which works perfectly. But now you have the beauty of this nice high level Python language wrapper around it.
Another tool I’m going to be using a little bit, which is widely used in any kind of large data environment is called pandas. Pandas is named after something called Panel Data, which was first used in another system for doing visualization and statistics called R. And as we’ll see in the example, this provides us with a high level way of ingesting large datasets in something that’s very familiar to most of us. Something that looks kind of like a spreadsheet honestly, pandas, calls it a data frame, but you can think of it as having columns of data which pandas calls each column, a serious object, and rows of data, which would be our observations. And we’re not going to use pandas to hardly maybe 5% of its capability here. We’re going to use it just to pull in large datasets, but it’s a whole ecosystem onto itself of data visualization and selections and filtering and interpolation.
The library that we’re going to be using, which is the go to library for doing visualization in Python, is called plot. It actually lives in a package called Map plot Lib. So often people will just call it map plot lib. And what this provides us is all the kind of basic sort of visualizations you want to do, line graphs, bar charts, pie charts, vector, diagrams, that sort of thing. And as we’ll see a taste of it, it’s a highly configurable system. So if you want your plots to be very sparse and austere maybe for a paper, you can do that. Or if you want to be more visually compelling for like presentation, you can do that as well.
Will see that you can use the functionality and pipe plot both from low-level NumPy data structures the Ndarrays or using those data frames that we talked about in pandas.
And just to show you a taste of what’s out there in third party libraries.
So, as I said, Python itself has kind of the standard libraries, but there’s just every day there’s people out there doing amazing things with Python to meet their own needs and then sharing it back to the community.
So, one of the modules I will be playing with a little bit is called cartopy, which is a play on words of cartography, mapmaking and this is a module that allows us to do visualizations In geometric space that’s more than just rectangular.
You know, if we plot something, XY, we usually think of as just a nice rectangular plot, but here we can plot something on a map projection and, like I said, most of this code is open source. It’s freely available. It’s put back into the community.
And you can maintain, sort of just like a laboratory notebook, if you’re an engineer or a scientist, where you’re just adding to the stuff, you’re working. But when it’s all said and done, you can use this to save it off as running Python code, Or you could save it off as a PDF file, or you could just share the whole notebook with somebody.
So, like I said, we’re not limited to using the stuff we’re going to do today in the Jupyter Notebook environment. But it’s a really great environment for doing interactive demonstrations. So that’s why we’re using it here.
So, on to the demos, So, I’m going to shut down the PowerPoint deck for now and switch over to Jupyter Notebook.
So, let me actually bring up my browser. I accidentally sorry about that. I closed it during our setup here and need to re-open it.
All right, so, Jupyter Notebook essentially starts up on your machine a web server.
Be able to include more than just code and what you’re doing. So, this type of notebook, or the cell that makes up this type of notebook is what we call a Markdown cell. If I double click on this real quick, you can see the actual content of this is a special language called Markdown, which is a play on words of markup. You know, we talked about markup language. So this is one aspect of a Jupyter Notebook.
The other aspect of it, and we’ll look down at the bottom of this, is the other types of cells or content you can put in a notebook is what we call a code cell. And you can tell that some things that code cell, because there’s the word in over here, which is an indication we’re talking to Python. Python actually keeps track of what you type in a history mechanism. So what I actually put some stuff in here, we’ll see that it assigns a number to that cell. So let’s do a little quick Python programming in here.
Let’s set X 2, 3, and Y to 10, and print X plus Y.
So right now, this hasn’t done anything, but the way you tell a cell that it needs to be executed or needs to run is, you tell it to run the cell. And one way to do that, that gets a little old, just go up and click on this thing that says run up here. And that executes the code and runs it. Another way, and I’ll be doing this a lot in the notebooks to come here is when you’re sitting on a cell, you can just do some keyboard shortcuts. The keyboard shortcut I’m using here as I hold down the Shift key
and I hit Enter.
So let me change this so we can see some change our output.
So if I hit Shift Enter on a cell, it runs the cell. I get a new results here, and then it moves me down to the next cell. So you can think of Shift Enter as kind of like reading through a paper or through a story. We started at the top of the notebook. And we just walk our way through it. So that’s what I’m going to be doing for the notebook. So I’m going to be presenting to you now. And if you get a chance to try this, that’s another way to use it. If you just want to see the overall result, and kind of, you know, hop ahead to the end of the movie, right.
You can also just go up here and you can say a cell, run all, and that’ll turn the notebook into a finished version where all the code has been executed and you kind of see all the results at once.
But we’re going to walk you through in a more conversational style here. So let’s jump in with that map plot lib library or pipeline. So I’m going to bring up the second notebook here. And the first thing, we’ll see it, whenever you’re doing any kind of visualization. You’re almost always going to need that NumPy module that we talked about are Low-level Computational number Crunching Module. And we’re going to bring in map plot lib dot py plot. Now, that’s a lot to type every time we want to use it. And so as a convention, almost everybody who uses these will give aliases, they’ll call NumPy as NP.
And they’ll call map plot lib dot py plot, which my mouse just fills with saliva
saying it too many times.
We’ll just call that PLT I’m also setting a few other parameters here to make my plots a little bit bigger for presentation purposes. So I need to execute that cell. So I’m going to, you can tell that I’m on this cell because we got this green border over here. So I’m just going to hit Shift enter.
And you’ll notice a number pops up here. So that shows me that shell that cell has been executed, and now we’re ready for our first example.
So let’s say we wanted to visualize a piece of equipment, electronic equipment.
And the piece of equipment is going to have ideally the behaviour that, if we put a signal in, we get the same result out.
So, as he mentioned, my, one of my degree is Electrical Engineering. So I think about boxes where you put things in one end. And somebody comes out the other. So in a perfect box, if you put a one in, will get a one out, But maybe there’s a little noise in there. You know? You ever been a long distance connection or a cell phone. You may put a one in and you get a 1.05 out, or you put a two and then you get a 1.93 out. So what we want to do is we want to put some signals in with some noise and see what that looks like.
So here’s where the power of NumPy comes in.
I’m going to create a list of numbers, essentially, from 10 to 80.
Those are going to be my input signals, let’s say. But I’m also going to create some random noise of the same size.
So, my random noise is going to be what we call normally, distributed a bell curve, and it’s going to center around zero, plus, or minus . So, essentially, what I’m saying is, what our little picture was there.
Ideally, my input output relationship would look like, that’s a perfect line, but what is actually going to look like is a little bit of noise, above and below.
And so, let’s see how those two compare.
So this is my input signal. I add to it my noise. And you notice that I’m treating these as single variables. But they actually represent an array of points. And that’s what NumPy does for us. It allows us to treat multiple values as just one entity. And the computations are all done in parallel behind us. So my first plot is just going to be the input versus itself. That would be a perfect system.
And then I’m going to plot my input, plus my noise, the output.
Then I’m going to put some nice labels on it, because I’m sure you’ve heard, at some point in your science class, in high school or whatever, if you don’t put X labels on your axes and your plot is meaningless, all, right?
So let’s say you’re an engineer, and you’re sitting here, and this is the red line is what a perfect system would look like. But what you actually got are these blue crosses here. So you want to try to work backwards and do things. We do all the time, in statistics, we want to do a linear regression. I’ve got these blue crosses here.
I want to try to fit a line to it, Ideally, it would be exactly this red line, but because of the noise there, we might not be able to get exactly that red line back. So let’s take a look.
So what I’m going to do now is I’m going to take my input signal and my output signal.
And I’m going to do my best fit against that using a function, again from NumPy called poly fit.
The degree of poly fit in this case is one, which just means I’m going to do a straight line if I had degree to try to fit it with parabola, etcetera.
And what that tells us is, Hey, from the numbers you gave me, the best estimate I have of a line that fits those numbers has this slope and this Y intercept.
So, let’s plot that out.
So here now we have three plots, we have the original two that we had before just to compare it with. So the red would be a perfect system. The blue is what we actually observed, and now we’ve got this green line, which you can see, it’s pretty darn close. But that’s based on what we told it. With the blue, observed data points, that’s its idea of a best fit. Now, we don’t care so much about this. Physical system modelling, we’re more interested in the visualization. So I wanted to point out a few things here.
We have three plots that we made on top of each other.
Basic plots, we’ll pick colors for you, but we can have control over that. So for example, I set G for green, R for red, B for blue.
That’s one way to refer to colors. For the blue, I also said, instead of drawing a line, which is a normal behaviour, I want you to draw a little cross, or a little plus sign. And then as always, I want to label my graph, my axes, and then since I had multiple plots here, I also provide a legend.
Which says, hey, I have three signals I’m plotting in order to tell them apart. I’m going to put a little legend up here that will describe them and what they look like.
So that’s kind of a real basic kind of plotting capability. In fact, line plots, what we call this, or the default plot. So if you just say, plot, you get a line plot, but you see that if we change the marker, we can actually get kind of what we would call a scatterplot.
Now, if you look at this plot is a little bit boring and a little bit austere, and maybe that’s perfect. If you just want a printed into a paper, put it up on our website. But maybe you wanted to have a little bit more style to it or different colors. Maybe your audience is potentially got people who have color, blind and Peronist. So, there’s a lot of the things we can tweak about these plots. Now, there’s a high level way of doing that map plot lib provides a series of predefined plot styles.
And they all have names the ones that start with seaborne are very popular, so let me show you a quick example of that. So I’m going to set up a context where this one plot will be plotted in the Seaborne style.
So again, we’re less interested in the actual visualization, in this case, but we’re plotting X value is just going up linearly, and the Y value is a sinusoid. But here now you can see where there’s a nice gray background, where the tick marks are, there’s white lines here. So it’s a lot easier for us to interpret this plot, work our way back to the axis, and getting into that. Another way to change it, is all the different plot functions can take different parameters.
So we’ll see in the next workbook here, or notebook.
Some of the types of plots are available. And then functions for annotating them. We can set the title, label, the X, and Y axis. We can control how many checkmark show up, how regular they’re spaced. We saw a little bit about colors already. We can specify colors by single letters RGB. We can also give names of colors for a little bit more subtlety. And if you want ultimate control, you can even specify colors by hexadecimal, red, green, blue values.
We saw that we can make markers, square boxes, checked, crosses numbers, and we can even control if our lines are solid, which is the default, or dashed R dot dash.
Then some of the plots we’ll be looking at, it’s nice to be able to control what’s called the color map. Plots that show radiation, like a contour plot.
It’s nice if you have a color map that is designed with low values to be, say, really dark and high values, to be really bright, and we’ll show you an example of that.
There’s a lot of flexibility built into the system.
So, as kind of a fun example, I wanted to talk about real briefly, a comic on the web.
web comic called X Case CD, which is created by a former guy worked at JPL, and he’s got a lot of things about math, and science and philosophy. But a lot of his cartoon strips will have plots in them, and they have a real hand drawn. Kind of look to. The lines are a little bit wavy bars aren’t perfect rectangles and somebody was so impressed by this, they said, well, I’d like to be able to make this sort of plot with this very rigorous map plot lib. So they created a style called X case CD. And here we’re going to plot a histogram of IQ. Now, normally IQ’s average out at 100.
That’s the whole definition of IQ, but if you’re an X case CD reader, we’re going to have the average IQ be 110.
And when we plot this out, so this is still coming from that same program that draws all this nice solid straight lines. Now we’ve got variation in our text up here in our labels, the bars and everything.
So this not particularly relevant to any one particular thing, but it does give you an idea of the configure ability of this system.
So let’s look at some of the other things that we have available and map plot lib, So this our second notebook, we’re going to look at.
And the first one thing we always have to do, like I said, is bring in our capabilities, our map plot, lib, and our NumPy.
So I’ll execute that cell And let’s say we have somebody who’s trying to track their results on a diet. So this is just a Python data structure called a list. Maybe they write their weight down every week, or maybe every month, that’s a That’s a lot of weight loss, if this every week. So we’re going to plot it with the basic plot, which make a line.
We’re going to use a red line, and we’re going to put squares after each data point, and the line style is going to be dotted.
So normally, it would just draw a straight line and a color of its choosing, but here we’ve customized it in three different ways. We’ve customized the color.
We’ve customized the Marker, and we’ve customized the line style.
For some types of presentation, a scatterplot is more appropriate than a line plot. We don’t necessarily want to connect the lines, the dots with the line.
We just want to show the actual data points separately.
So here I’ve plotted out some mythical data of ages first year working income. So, when you’re five, you typically don’t have a job.
As you get older, you start making more money.
So what we want to do is just start off and just plot the age versus the income, and see what that looks like.
So pretty typical scatterplot, as we go along. You know, you hit your stride in your twenties, or thirties, you get really good at your career. Maybe you change careers and you have to take a pay cut, and you work your way up and then you get into retirement, you’re working less, and you don’t have as much income from a regular job. Hopefully you’ve got some good stock investments and stuff going on, but we can actually add a little bit more to this.
So one of the things I want to show you is, I can actually do what we call parameterize markers.
Instead of just putting a marker at every point, I’m actually going to say that the width of the marker is going to be proportional to the number.
So here as the marker gets bigger and bigger, the value gets larger and larger. So even something very simple like this, there’s some tweaks we can make to it. You can also set the color based on something that’s going on.
Another standard kind of visualization is histograms, which take values and put them into bins. For any kind of natural phenomena, there’s a good chance it’s going to be what we call normally distributed or that bell curve kind of shape to it. So I actually looked it up in the US for adults, which they said were over 20. The average US female is 63.5, with the standard deviation of 2.2. And average US mail above 2060, 9.1, and these are in inches with the standard deviation of 2.5. So again, here we’re using another module from NumPy to produce these normal distributions the bell curves.
So like I said, NumPy and often go hand in hand. We’re doing visualizations.
So I’m going to plot 2 Histograms 1 on top of each other.
And label them.
You can see here just for a variety I set the color instead of RGB which are pretty garish colors. I picked a little bit more Subtle Colors, Orchid and Steel blue. There’s a bunch of color setup names.
But another thing, I passed each of these functions as an Alpha because there’s a significant overlap here and if I didn’t make these somewhat transparent, I would just lose information visually. So I basically made them transparent.
So I can see through the immense distribution, which lower values would cover up some of the higher values and the women’s distribution.
Another favorite is the pie chart.
Not necessarily the most useful visualization. You know, it’s kind of people aren’t really good at visualizing angles, but they’re very popular among the managers. In fact, historically, the first pie chart that got why publication was actually by Florence Nightingale, the nurse and it was information about Cholera in London. And she actually travelled around London and gathered up data about number of cases and sickness and discovered that the source of cholera was this one contaminated water place. So the pie chart has a long history to it. In this case, we’re going to say you’re in charge of monitoring a process that has a log file, and it produces information messages.
So normally, they’re like status and info and status, and occasionally there’s something more concerning error warning. We might get one that’s critical.
So we’re going to do a little bit of Python work behind the scenes here to take this, break this into a list of messages and then count them.
And then once we count them, we say, hey, pie chart, here’s the number of times we saw each message. We’re going to put some labels on our slices. So the names of the message, and we can go really crazy here. So let me visualize this and show us what else we got here.
So we said shadow equals true. That gives us a nice little grey shadow here. We can even make one slice stand out by saying exploded.
So if the message is critical, we’re going to make it come 20% out from the centre to make it stand out.
So, I get a good, quick visualization with a long history, Put it in your paper and press your friends right.
Now, up until now, we’ve been visualizing things in one dimension, I put in one number, I get out one number, but there’s some phenomena. We want to visualize that we would think of as what we call fields, where we put in, say, two numbers, and that produces a result.
So, an example of that might be in your standards kind of weather map, you know.
You have weather stations around different locations.
So these are where you have weather offices, but you want to get an idea of where that was, what the temperature would look like, other places. So a contour plot would do something where it would try to go through.
And sorry, I draw areas where roughly this is 51. The temperature is higher here.
And then it drops off significantly down here to 47, and then raises up a little bit to 51 over here. So we want to take our two-dimensional input, our longitude and latitude, for example, and visualize that as a contour graph.
Lost my mouse cursor, there we go. So the phenomenon going to visualize here to stand in for our substitute is obscure a function called the Hemo Blau Function. You don’t need to worry about it. But what it’s nice that it does is, it gives you a nice variation of a variable over an area.
So I’m going to set up, using, again, Python, some special variables, X and Y.
And an X, each row of this, you could think of as having the corresponding X value.
And then why each row would have the corresponding Y value.
And this is again where we can think about NumPy as treating these as this whole collection as one piece of information.
So, what I’m going to do is, I’m going to run a function on X and Y It’s basically say, All right, at position 2, 2, what temperature is it, Let’s say it’s 47.
At Position 4, 3, what temperature is it?
53, and I’m going to compute the temperature at all those locations, and make a nice, what we call a contour plot.
So, let’s look at that one.
Oops. So, let’s define that function, and then scroll down.
So, you see this all the time in, like temperature, are maps on altitude distribution. And this is what I was mentioning before, about color maps. A color map that I chose here is called Hot, and it’s designed specifically for this sort of application where low values are darker, and high values are lighter.
And so we can see this is kind of like a valley. This is a peak, these little saddle points and then things kind of slope up here. Now for some of these things is actually even more compelling if we use another function called Contour F, which fills in the colors.
Now, that might be a little too much, but you can see very clearly, there’s some very low values here, and some high values here. And then they get really high up here towards the yellows.
Now, sometimes, the thing you’re visualizing in two dimensions also produces two values, and that’s going to be an example we’ll use on our final slide here in a minute.
So, let’s think about those same weather stations, but at each location, they tell you what the wind is doing.
So, at each location, instead of having a single value like the temperature.
We have two values are what we call a vector field, if you want to think about it as a bunch of arrows. So we have the component of the wind and the east west direction, and the component of the wind in the north-south direction.
So there’s actually two values that we want to display in our output, and that case, since we’re drawing Arrow’s, they call it quiver.
So here’s our longitude and latitude values.
I’m not using relaunches latitudes I’m just picking a number between 0 and 1. and then I have some wind observations at each point. So it’s 15 degrees to the East, 7, or 17 knots to the north. So each of these locations, there’s a vector that says what the wind is doing at that particular location.
Now, here, we can see a little bit of a trend. The winds are kind of turning towards the south here and then maybe turning back towards the north over here. There’s a lot of things to be filled in here, but we’ll see that in our final example.
Now, I have 1 last 1. I’m going to go through kind of quickly. But your plots don’t have to always be rectangular. So this last one I want to show you, is the idea of plotting on what we call polar co-ordinates.
So, the two values I put in or not X and Y, instead they’re an angle and the distance from the origin. So the angle and the radius.
So if you have a function that the distance from the radius goes up logarithmically with the Angular increasing, we get what we call an archimedes’ logarithmic spiral. So this is sort of thing you’d find on a sunflower or in a Seashell.
The final one I should show you real quick is I’m going to take that same function and make it interactive.
So it turns out, if you plot this function and you change this number here, each time you run it, you can see it.
This number seems to correspond to what the final view looks like. And we can get a little interactivity just by changing this number and rerunning the cell. But let’s say you wanted to explore this a little bit more conveniently, or you wanted to make a notebook, so somebody else could explore it.
The final thing, in this example, is we can create, and this only works in Jupyter Notebook, by the way. So this is one thing that won’t work in a regular program. We create what’s called a widget and the widget.
It’s just a slider that we can move around, and as we move it, it updates that function. So it keeps redrawing the graph each time. I change this input value.
So there’s a ton of interactivity, just by writing code, but then you can also turn control over to the Notebook itself, and allow people to explore the data even further.
Alright, the final use case I wanted to show you is visualizing some real data from a satellite from NOAA, the weather folks.
And this data comes from a satellite. I, this a little bit redundant Here. I was talking about pandas.
So the main thing to thing about pandas, it’s like a spreadsheet We’ve got rows and columns, are rows, are called, series. And our columns are just observations.
So in on the website, originally from Noah, there was a web link you could go to for every day. you wanted to get information. You could go to this URL and pull down the satellite data.
Now, sadly, the satellite’s no longer in orbit, and they took the website down, but I saved a few copies of it as a file.
And so let me show you the file real quick that we’re going to be working with.
So, this is another thing you can do in Jupyter Notebooks is just look at regular files. So this is a very typical example of a large dataset.
It’s in CSP.
It’s got column headers at the beginning. Sometimes you do, sometimes you don’t, and we want to injustice and turn it into Python data structures we can work with. So let’s see how we can take this text file that’s in comma separated variables and turn it into something we can work with.
So the function we’re going to be using from the Pandas module, which is always abbreviate as PD, that’s what everybody calls it, is called read CSV.
And we’re using the most simple version of read CSV. We’re basically saying, Hey, look at the file, see what you can figure out. But if you know a lot about your data already, you can tell it like, Oh, yeah. Instead of using, commas are actually using colons and they’re using extra quotes. So filter those out. So we’re seeing just the most minuscule capabilities of read CSV by far here, So this brings in that thing that I said, we call it data frame.
And one of the things we can do is we can select the first three rows of that data frame. This is in Python, what we call slicing, and then display just displays them nicely for our visualization purpose.
And this is pretty typical anytime you’re doing exploratory data science. I’ve got this big dataset, I’m just going to bring it in and start looking at it, Kind of get a feel for it. See if it needs to be cleaned up. Maybe there’s some missing values, Maybe I need to relabel things or drop some of the columns.
Another kind of quick kit you can get on your data is you can ask a data frame to describe itself, and what it will do, if any column that has a numeric set of numeric values into it, will give you some basic statistics. How many observations are there?
What’s the average value?
Standard deviation, maximum and minimum values.
So for example, this is longitude latitude.
If I look at the average longitude, here is about 133.5 degrees.
So roughly around Australia, it turns out, and we’ll actually end up using this number a little bit later, that’s why I wanted to point it out. Now, one thing I don’t like about this particular data frame. The way it brought it in is the names of these columns.
The names of these columns have square brackets in it, and those are kind of special characters, and they’re not particularly meaningful to me. So, not a big deal, but what I’m going to do is tidy up my data a little bit.
So I going to go to the data frame and say, the columns that you’re using, I don’t like, I want you to change them to these names that I do like, and then I’ll print those out.
So now I can refer to my columns instead of no ups.
Instead of the kind of clunky S P D, square bracket, K T, square bracket, I could just call it speed.
Katie, by the way, is knots.
All right, the other thing, I need to do a little bit of massaging this data before I can plot it, remember our quiver plot wanted us to say for this location what is the X component and the Y component of the arrow that you want me to draw.
But what I got in my data is this.
I’ve got this angle.
And I’ve got the speed. I’ve got the length of the arrow.
So this is a classic example. The data is there, but we just need to do a little bit of Python to massage it in the form we need.
So, if I give you this information and I want this information, this is what I need to do. And by the way, traditionally, we call this X and Y is the location. And then U and V are the components in the east west direction and the north-south direction.
So, basically, the computation is this U is the speed.
Times, the cosine.
And we have to take the angle in degrees and turn it into Radians because all the trig functions work on Radians, and then you put in the angle.
So, if you give me these two values, I’ll give you back this value.
Same thing for V, except we do the sign here.
And that’s exactly what we do in our code.
So I treat this collection of thousands of observations as one unit. In one little bit of code, it computes the cosine of all of those values and it multiplies by the speed of all of those values.
So now we have a form that our scatterplot from Before should give us a little bit way of kind of seeing what we got going on here.
So I’m going to take my X and Y values, and I’m going to scatter plot them. And actually, the first plot I’m going to do is not going to use the vectors.
But instead, I’m just going to say for each X and Y location, I’m going to set the color of a marker there, proportional to the speed. Remember, the speed is the absolute value of the wind. Doesn’t care what direction? It’s blowing, just. How hard is it blowing?
And I’m actually calling this from the data frame.
So we’ve been using NumPy data structures to plot before. But you can also use data frames objects to plot.
So this is starting to make sense. Now we look at the assembly. First of all we can see really obviously the satellite is what we call a polar orbit. So it’s orbiting around the Earth from Pole to Pole. That’s why we get a slice from North to South. And during the time and observed, it was only able to observe this swipe.
The darker the spot is, the faster the wind is blowing in that location. And you’ll also notice that there are some missing observations. That’s because this particular satellite actually works by looking at clouds in the upper atmosphere.
They’ll take a picture, and then let’s take another picture slightly later to see how far the cloud moved.
All right. So we said, We’re interested in looking at winds, So let’s do the same thing with a quiver plot.
So at each X and Y location, each longitude and latitude, I want to plot the east west component of the wind and the north south component of the wind.
All right, well, that’s kind of a mess. If you look closely, you can kind of see there’s a little bit of arrows there, but the arrows are so small, that really it’s just basically showing us where it has data. The darker it is, the more arrows that are there, But we’re not really getting a whole lot of meaning out of this other than what we saw before that this is a polar orbit. So, let’s add some context to this.
So, this is where I’m going to bring in another module called Pi.
This is a third party module that’s available for anybody to use. You don’t have to pay for anything, And what provides us with is the ability to define what we call co-ordinate systems. So, normally, when we plot stuff, we saw most things for X and Y, but we did see one example of a co-ordinate system instead of X and Y.
We talked about angle and radius.
So, Encarta Pi, we get co-ordinate systems. That gives us projections on maps, like you’ve all seen the Mercator projection where the earth is.
Sort of peeled off like a orange and the North Pole is kind of spread across the top
So, let’s start off with the simplest kind of projection, which we call a flat map.
So, I’m going to create a co-ordinate system called Flat gurlar, which is French for flat map and the map that I’m going to draw is rectangular and the data that I provide is rectangular.
The final thing I’m going to do is I’m going to focus in on a particular area so that, you’ll notice, there’s a little bit of wrap around over here.
This data over here is actually because longitude and latitude wrap around. So I’m just going to filter this data out.
I’m going to say, only show me values where the longitude, the X value, is greater than zero.
And once I pick those out, and then plot them on a map, now we start really seeing what’s going on with our data, OK?
So, Called Pi providing all the contexts, it, put the oceans in, and cut the colors, it put rivers, political boundaries, all those things are being provided by Called Pi, And then we just drop our data on top of it.
And you’ll notice, like I said, there’s not observations everywhere. There’s just observations where the satellite provided it. But we can already see, there’s a little bit of something going on here.
There’s a sub ocean Ridge here that the winds going north or south east here. And it’s going south-west here. So that sub ocean rich seems to be doing something. you know says A lot of activity over Australia where. The winds diverging from the centre of the continent. There seems to be a trend over here and there’s a lot of data points over here that a lot smaller.
So let’s try this again but now we’ll use some of the capabilities of Python to do what we call interpellation.
So I’m going to do a quiver plot again on my axis. But I’m going to tell it to do interpolation.
So basically interpellation is just saying I only have so many weather stations right. I know the wind here.
And I know the wind here let’s say it’s a lot smaller but what do I think the interior is.
Well if I just do linear interpolation well the angles seems to be turning this way right. So let’s pick about halfway. And this one was, let’s say, a two, and this 1 is 1, so we’ll call this 1.5, so that’s all we’re going to do here. We’re going to take the data that we have available. And instead of eyeballing it like I did it, we’re going to do it computationally.
All right, so now it’s really clear what’s going on with our data. We’ve got a trend here where the winds are turning to the south and kind of being funneled in along here.
Again, that sub Ocean Ridge, you can see, there’s really an influence on the winds, because the shallower water may be as generating heat.
We see across Australia, the wind’s, not only are kind of funneling out towards the center call it, but they’re really picking up speed the further south you go.
So one final, just kind of gee whiz.
We can pick out co-ordinate systems other than flat maps. And so this last one is what we call a Geosynchronous map.
We basically said assume that we’re in a satellite orbiting the Earth so far out that it sits exactly at the same point on the Earth. That’s where all our all our communications satellites it so that they’re always at the same point in the Earth. And as I said earlier, well, if we want to get the best coverage, we need to tell what longitude we want that satellite to be looking at.
So we’re going to pick up all the data we had. We saw we had data from north to south.
We’re going to pick the average longitude right in the middle, so that that’s right in the center of our graph here.
So now we can even see, with the curve of the Earth, how that may be affecting our data as well.
So this is just a taste of some of the visualizations you can do in Python. Obviously, we just scratched the surface. But the last thing I did want to say is that there are tons of other modules out there that other people have created for their own purposes, of similar quality and similar capabilities of this card module.
All free, all open source, all available for you to use. Alright, I’m going to turn it back to Steve, and let him do his thing.
All right. Thank you Mark for those demos. That was wonderful. Just a couple of quick things. Before we jump into Q&A, if you’re interested in Python training, we have a number of Python training courses available, And many of these are taught by Marc himself.
So if you are interested in getting deeper into Python and Python for visualizations, check out the training section of our website, and you can find more info about those courses there.
You can also find a recording on our website of a webinar that we did. Move this to the next slide.
So we did a webinar earlier this year on using Python with Power BI. So if that is of interest to you, check out the resource section of the Senturus website, and you can find the details of that webinar earlier this year.
We do have a whole array of additional resources on our site. We’ve been committed to sharing our expertise for over two decades now. If you visit our resource site, you’ll find a wealth of information in our Knowledge Center, everything from past webinars, tips, and tricks to product comparisons. So, check that out. We also have a few more upcoming webinars later this month we have What’s New in Cognos 11.2.3.
So, if you happen to be a Cognos shop, join us for that event to see what’s new in the latest release of IBM Cognos Analytics. We also have a webinar that hasn’t didn’t make it to this slide, because it was just posted today on September 29th, we’ll be doing a webinar on Agile Analytics for Cloud Cost Management. Check out the Events section of our website and they’ll see the information for that webinar. And, last but not least, coming up in October, we have automating Cognos Migrations & Cleanups if you’re moving away from Cognos or just trying to clean up your Cognos environment, which is pretty common if you’ve been using Cognos for many years.
Couple of quick notes about Senturus and then we’re going to jump to Q and A.
We concentrate on BI modernizations and migrations across the entire BI stack. If you can get to the next slide there for me, Mark, we do a full spectrum of services. We do have expertise, and Cognos, Tableau, Power, BI, Python, and Azure and we are happy to help you out, where we particularly shine and hybrid environments. If you need some help with your BI environment, please reach out to us. We’ve been in business over 20 years now.
I can jump to the next slide, Mark, 1350 plus clients, over 3000 projects. We have a long, strong history of success, we have a great team here, and we’re big enough to help you with all your business analytics needs, but are also small enough to pay attention to you.
We’re also hiring right now, we’re looking for a senior Microsoft BI consultant and a management consultant. So, if either of those roles interests you, reach out to us on our website or you can send your resume to us at [email protected]
So with that, we’re going to jump into Q&A. Just have a couple of questions in the panel here.
Does the documentation in the libraries describe how to use the functions?
Yeah, absolutely no one of the nature of these being open source that the documentation may not always be in one central location, but one resource I wanted to point you to on my second book on map plot lib. I do have some links there for the documentation for map plot lib down towards the bottom.
So just before that fun example about X case CD, I have a couple of links to bunch of really great tutorials on map plot lib, basic usage, life cycle plots, colors. Any of the kinds of things I talked about, Much more detail here, about changing the styles change? The color is changing.
The markers have multiple figures on a set of axis then, for example, modules carts pi, which are modules developed by third party people, often.
They’ll have their own website.
And often typing in carto pi or the module you’re interested in is sufficient unto itself.
And they have a really good coverage here of Carto pi. And then they go, first of all, through that idea of transformations. I have my data in a rectangular format, but I want to plot it on a spherical globe. How do I transform those coordinate systems and work with that? And then things like pandas as well.
Typically if you type in something like pandas or NumPy, the first link to get to is going to get to the documentation for it. And most of them will have their own website or website system.
And the user guide for These are very, very high quality. Particularly pandas and NumPy, which have been around for you know going on about 15 years for some of these there.
It’s just amazing how generous the people who produce this software are that they don’t just stop for the stop with the software. They also have people who write very thoughtful, very well produced documentation for them.
Next question: Is there a recording of this? And, there is. For anyone who has registered for today’s event. You’ll receive a link to the recording, in the next few days, so, it is being recorded right now, and then we’ll run through and clean up any noise on the recording. So, you’ll get a link to the recording from today’s presentation.
And I wanted to point out one other resource.
I provided here, even if you don’t have access to Jupyter Notebooks, I’ve provided a PDF of each of the notebooks after they’ve been rendered, and so, I just turn this into a single document.
And I tried to break it up nicely on pages. So if you want to have something that you can just sit down with your computer and not necessarily have to run NumPy and map plot lib and something in Jupyter, that’s another resource that’s available in the notebooks or zipped up together on that resource website.
If anybody else is not able to download from the site, you can use that link. The one question I had, Mark, do you have an introductory resource on visualizations that you would encourage students to take a look? The book is Scientific programming with Python.
The book covers the idea of NumPy and map plot lib and a little bit of pandas.
There’s another question, any notable differences between when using Python and Power BI? And I’ll actually, But you feel that when Steve, you’re, you’re definitely the power of AI expert on the panel here. What would you say are the differences compared to some of this stuff we saw here, if you’re doing them in BI?
I would encourage you to go check out the recording of our previous webinar that I mentioned there, and the one on Python with Power BI. And one of our talented instructors walked through how you use Python or BI.
Yeah, and actually, I did remember one other book that I like Data Visualization with Python.
This is a nice coverage. Unfortunately. It’s pretty expensive, but you can, you can find it as a PDF, because it’s a little bit out of print. So maybe not the most up to date one. But that’s what I like.
But it does do a good job of covering some of the stuff I’ve covered here, and more detail.
Thank you everybody for attending our webinar today. If you do have questions, comments, I need more info contact us at [email protected]