Introduction to Data Importing


import numpy as np
import matplotlib.pyplot as plt

We can import a comma separated values (csv) file and make an array of the data points that is stored as a variable called data. I like to put data in public repositories on Github, since then I can access them anywhere with no fuss. You can also store them in your google drive and use the file systems within Colab to load the data. Here is the raw file we will start with: Example data csv file

# This csv file has headers for the names of the columns, i.e.
# Time | Distance_meters
# 1.0  | 4.9
# 2.0  | 19.6
# etc
# so we set names = true when importing to automatically names the columns

path = "https://raw.githubusercontent.com/hedbergj/CCNY-PHYS37100-F2022/main/example-data/free-fall-headers.csv"
data = np.genfromtxt(path, delimiter=',', names=True)

We can create new lists based on the imported data.

time = data['Time']
position = data['Distance_meters']

Now make a quick scatter plot of the data in the csv file.

fig, ax = plt.subplots()

ax.scatter(time, position, linewidth=2.0)
ax.set_xlabel('Time [s]')
ax.set_ylabel('Position [m]')
ax.set_title('Position as a function of time')
ax.grid()
plt.show()

# This csv file does not have headers for the names of the columns, i.e.
#
# 1.0  | 4.9
# 2.0  | 19.6
# etc
# so we can define them in the import function 

path = "https://raw.githubusercontent.com/hedbergj/CCNY-PHYS37100-F2022/main/example-data/free-fall-noheaders.csv"
data = np.genfromtxt(path, delimiter=',', names=['Time','Distance_meters']) 

Similary, we'll make some lists to store the data.

time = data['Time']
position = data['Distance_meters']
fig, ax = plt.subplots()

ax.scatter(time, position, marker='x')
ax.set_xlabel('Time [s]')
ax.set_ylabel('Position [m]')
ax.set_title('Position as a function of time')
ax.grid()
plt.show()

If we wanted to convert the position data from meters into feet, this would be one way.

Let's also make another list of data based on the theoretical prediction that position is given by: $$ y = \frac{1}{2}g t^2$$

# make a new list of data that is in the units of feet
feetpermeter = 3.281
position_ft = data['Distance_meters']*feetpermeter
# add an analytical prediction
predicted_ft = 0.5*9.8*feetpermeter*np.square(time)

We can plot the experimental data points as markers, and then use the regular line plot command to plot the theoretical prediction.

fig, ax = plt.subplots()

ax.scatter(time, position_ft, marker='x',color='darkred',label="Exp. Data")
ax.plot(time,predicted_ft, color="darkblue", label="Theory Prediction")
ax.set_xlabel('Time [s]')
ax.set_ylabel('Position [ft]')
ax.set_title('Position as a function of time: Theory vs. Experiment')
ax.grid()
ax.legend()
plt.show()


This tutorial also exists as a Colab/Jupyter Notebook. You can find it here: Colab Import Data