Additional links:
- Original class material
<< Previous lecture | Next lecture >>
Plotting
Visualization in Python can be performed by a famous library named matplotlib
, in particular its sub-package matplotlib.pyplot
. Documentation can be found at matplotlib.org.
Plotting in matplotlib is very easy. Given two -dimensional vectors and , containing the N datapoints we want to represent, the function plot(x, y)
will plot on the plate each couple for , and will connect (by default) them with a line. Such a plot can be visualized by calling the function show()
.
For instance:
Output:
As you can see, the code above will plot the sine function. We now want to see how we can change the aesthetics of this plot, by adding title, axis grid, axis label, β¦
Customizing the plot
In matplotlib, most of the customization we want to add to the plot must be inserted in between the line plt.plot(x, y)
and the line plt.show()
. The most common customization functions are:
plt.title(str)
: Add a title to the plot;plt.xlabel(str)
: Add a label to the x-axis;plt.ylabel(str)
: Add a label to the y-axis;plt.grid()
: Add an axis grid on the background of the plot;plt.xlim([a, b])
: Force the horizontal limit of the axis to bea
andb
;plt.ylim([a, b])
: Force the vertical limit of the axis to bea
andb
;
For example, we can customize the plot above to obtain something like that
Output:
Multiplot and Line customization
Clearly, it is also possible to plot more than one line at the same time. Simply define other containing the new data we want to plot and add another plt.plot(x', y')
in between plt.plot(x, y)
and plt.show()
.
Output:
As you can see, in the bottom-left of the plot, we also printed out a legend
. Following the code above, it is easy to understand that a legend can be simply introduced by listing the name of the lines, ordered with respect to the ordering of the plt.plot()
functions. Matplotlib will visualize the correct color of the line accordingly.
Clearly, we can also modify the line specifications such as the color, the thickness and the style. To to that, we have to insert the following specifications inside of the corresponding plt.plot()
line.
color='str'
: Change the color of the line. A list of all the available colors can be found here;linewidth=int
: Change the thickness of the line.
Moreover, the style of the line can be modified by adding some specifications just after the y
input. For example,
"o"
: Changes the linestyle to rounded markers;"--"
: Changes the linestyle to be dotted lines;"o-"
: Changes the linestyle to be a continuous line with markers on the points defined by(x, y)
.
A complete list of all the possible linestyles can be found here.
Output:
Subplots
Subplots are required to create a matrix of plots inside of the same figure, which can be very useful for various visualizations.
A subplot is created by first defining a figure
. This can be done by the line plt.figure(figsize=(w, h))
where the figsize
argument is required to change the proportion of the resulting plot. After that, it is possible to open a subplot with the command plt.subplot(nrow, ncol, idx)
, where nrow
and ncol
represents the number of images per rows and the number of images per columns in our matrix of plots, while idx
is an incremental value, starting from 1, that indicate where the plot we are going to do should be placed inside of the matrix. idx=1
represents the upper-left corner and, while increasing, it moves the image from left to right and from up to down into the matrix.
Each time we want to open a different plot in our subplot, we have to specify the command plt.subplot(nrow, ncol, idx)
again, with the same nrow
and ncol
argument, but different idx
.
Output:
Plotting exercise
Exercise: Plotting data
Going back to the example in the introductory post on Numpy where we introduced the library pandas, useful to read data into Python, we can now use matplotlib to visualize it. If you havenβt already, download the dataset here and place the resulting
.csv
file into the same folder of your.py
file. Then:
- Import the data into Python;
- Explore the data by visualizing the first rows and the columns of it (the function
data.head()
from pandas can be useful), or alternatively, use the data documentation on Kaggle;- Create a new column,
total_date
, representing each date into an increasing number, the days since the beginning of the data collection;- Plot the number of birth with respect to
total_date
to visualize the incremental number of birth during the years;- Optional: Plot a barplot of the number of birth with respect to the day of the week and investigate if there are asymetries in the birth number in some days of the week.
Example solution