GalacticBox Anything unexpected

Getting Started with CSV Data Analysis using Python

Resource for this tutorial is available here.


In this tutorial, we're going to learn the basics of data analysis using Python. We'll start by importing a simple CSV file and will then plot the raw data.

While reading this tutorial, you may wonder: what's the point of doing this with Python when I can do it with Excel in a few seconds only? Well, the data we'll use to practice is ridiculously simple, but now, imagine yourself working on several hundred megabytes of data with Excel. That would be (trust me) a true nightmare. That's where Python comes in handy: it's reliable enough to handle such files with no restriction and the scripts are easy to update.

To achieve our goal, we'll use several libraries dedicated to data analysis and presentation.

Data Import

The first step is to import the CSV file. A CSV file is a text file containing data structured in a very simple way. If you open a CSV, you'll see that values are seperated by commas, and if you look right above the first values, you'll find the header: this line simply names the columns.

We should make sure our script understands where the header is. As you may imagine, the header will often be the first line but some CSVs contain other informations above the header. If the data you're analyzing was generated, say, using an oscilloscope, you can expect this additional information to be present — in this case, the first lines will contain the device name and maybe additional acquisition parameters. A simple approach could be to simply remove those lines, and we'll stick with that for now, but at the end of this article, I'll present a straightforward way to do this easily and safely.

Now, this is what data1.csv looks like:


And this is what data2.csv looks like:


Quite straightforward if you ask me. Let's import data1.csv:

import csv

read_data = csv.reader(open('data1.csv', 'r'))
data = []

for row in read_data:

At this point, you created an empty list and used the csv module to open your file. You then populated your list with all lines in data1.csv. If you try data[0], you'll notice it returns a list containing the two column names: the header. Let's keep track of this by doing: header = data[0] for later use. We can now safely remove the header from the data list, and keep only pure data in it: data.pop(0)

If we print data[0] one more time, we get the value on the first line: the header is gone, as expected.

You might agree it's not very practical to have data stored in terms of lines. If we want to plot it, we need it organized by columns so we can plot y against x. There is a nice module known as pandas that will allow us to do this conversion and much more!

Let's fire this:

import pandas as pd
dataframe = pd.DataFrame(data, columns=header)

That's as simple as it gets. Make sure to specify the column names. Here, we simply used our previously created header list. At this point you can go ahead and try: print(dataframe). This will return a summary of your data in the form you'd expect. Notice that the column names are rightly placed, on top.

The next step is to plot the data using Matplotlib.

Data Plot

import matplotlib.pyplot as plt

figure = plt.figure()
subplot = fig_angle.add_subplot(111)
subplot.plot(dataframe[header[0]].tolist(), dataframe[header[1]].tolist(), color = 'r', label = 'Output voltage')

At this point, a figure containing a subplot consisting was created. On this subplot, a red curve labeled as 'Output voltage' was drawn. Let's adjust the presentation a little more:

subplot.legend(loc = 'upper left')
subplot.set_ylabel('Amplitude (V)', fontsize = 15)
subplot.set_xlabel('Time (s)', fontsize = 15)

We should be good to go. To see your masterpiece, simply add: to show all figures. In our case, there's only one.

CSV Data Plot

Want to see something else added? Feel free to contact GalacticBox at or leave a comment.