Plotting Two Data Sets


Let's prepare some more advanced plots. Here, we'll look at adding two columns of data to a single plot, each sharing the same horizontal axis. We will use to external libraries: numpy and matplotlib

import numpy as np
import matplotlib.pyplot as plt

We can grab a small csv file from the city of new york. This has water consumption values for the last 40 years along with population.

path = "https://data.cityofnewyork.us/api/views/ia2d-e54m/rows.csv?accessType=DOWNLOAD"
data = np.genfromtxt(path, delimiter=',', names=True) 
data.dtype.names

While not necessary, we will just make some arrays of the columns so that we can do some math on them if we want. (eg: see the population)

year = data['Year']
NYCPopulation = data['New_York_City_Population']*1e-6
NYCConsumption = data['NYC_ConsumptionMillion_gallons_per_day']
NYCConsumptionPerCapita = data['Per_CapitaGallons_per_person_per_day']

If we just wanted a simple plot of one of the columns over time, we could do this.

fig, ax = plt.subplots()
ax.scatter(year, NYCConsumption, s=8,color='steelblue')
ax.set_xlabel('Year')
ax.set_ylabel('Million gallons per day')
ax.set_title('NYC Water Consumption')
ax.grid()
plt.show()

But, it might be more interesting to see how that water consumption also compares to the population. Thus, we need two axes objects in the same figure.

To do this, we wave to create two axis objects in the same figure, and link their horizontal axes:

fig, ax1 = plt.subplots(figsize = [8,5])

# make the first axis 
# maybe a bar graph is most appropriate here

ax1.set_xlabel('Year')
ax1.set_ylabel('Gallons per day [Millions]', color="steelblue")
ax1.bar(year, NYCConsumption, width=1, edgecolor="white", linewidth=0.7, color="steelblue",label='Gallons per Day')

ax1.tick_params(axis='y', labelcolor="steelblue")
ax1.set_title('NYC Water Consumption and Population')

# instantiate a second axes that shares the same x-axis
ax2 = ax1.twinx()  

#this on can just be a regular plot with lines and markers

ax2.set_ylabel('Population [Millions of People]', color='darkred')  # we already handled the x-label with ax1
ax2.plot(year, NYCPopulation, color='darkred', marker='o', markersize=4, label='Population')
ax2.tick_params(axis='y', labelcolor='darkred')

# otherwise the right y-label is slightly clipped
fig.tight_layout()  

# add a legend that uses the label arguments in the bar and plot lines.
# and put it in a nice place

fig.legend(loc='lower right', bbox_to_anchor=(0.8, 0.2))

plt.show()


This tutorial also exists as a Colab Notebook. You can find it here: Plotting Two Data Sets with Python (Colab)