Warmup hints: Python using Jupyter Notebook

By Michael Brydon, 25 February, 2024

The purpose of the document is to give you some hints to get started with the warmup assignment. This assumes you have decided to use Jupyter Notebook—either in the Virtual Lab or on your own machine—as your development environment. If not, select the appropriate development environment below:

Overview

Using Python in Jupyter Notebook is described in the learning materials assigned for the module (see this video). What follows is a quick summary.

Steps

Ensure you know where your data file is located

You can download the data file (Excel) from the Canvas assignment page if you do not have it. Remember where you save it (e.g., the Downloads folder). See the last item below "Using your Documents folder in the Virtual Lab" if you are working in the Virtual Lab.
Start Jupyter notebook and navigate to your preferred working directory (containing your data and *.ipynb files).

Working with the data in Python

The Python tutorial provides more details on using the Pandas library to read and manipulate data in an Excel (or other) file.

Import the Pandas library
import pandas as pd
As in all interactive Python environments, you press Shift-Enter to execute the cell and create a new empty cell underneath.
All files within the default working directory are available without typing in the full pathname. You can confirm the working directory ("print working directory"):
pwd
Use the Pandas library to read the Excel file, assign it to a new dataframe object called (for example) vg, and print the head (the first few rows) of the dataframe:

vg = pd.read_excel("VideoGameSalesData.xlsx")
vg.head()

I picked the name vg for my dataframe because it is short and easy to remember (vg = video game). You can choose any name you want for your Python variables (within certain standard naming conventions, like no spaces or weird characters).

Descriptive statistics

Call the Pandas function that summarizes data. Since you do not want to summarize all the columns, first select a series by putting the column name in square brackets:

vg["FirstYearSales (M)"].describe()

You can round the results by passing the results of the describe() function to a function called round(n), where n is the desired number of decimal places:

vg["FirstYearSales (M)"].describe().round(2)

Visualization

Once you know the name of your dataframe (in my case vg) and the name of the target column (e.g., FirstYearSales (M)), it is a simple matter to use the functionality provided by various graphics libraries to plot the data.

One such library is Seaborn, which can be imported into your notebook with:

import seaborn as sns

Typically these imports are added to the first cell in your notebook; you then need to re-run the cell (Shift-Enter) to load the library.

Histogram: sns.histplot(vg["FirstYearSales (M)"])
Boxplot: sns.boxplot(vg["FirstYearSales (M)"])

Using your Documents folder in the Virtual Lab

Recall the /Downloads folder on the Virtual Lab machine might be wiped by the IT guys from time to time (due to upgrades, etc.). You may therefore want to store all your files, including the data files, in the /Documents folder, which is mapped to your own network drive provided by SFU IT Services (never wiped, private). Although this is an elegant solution, it adds a bit of complexity since Jupyter does not recognize the network drive: if you start Jupyter from the default Windows link on the Virtual Lab, you will not see /Documents anywhere. As noted in the video tutorial, an easy way around this is to start Jupyter from /Documents. It then reads and writes by default to your networked drive.

Start the "Anaconda Powershell Prompt". This gives you a terminal window.
Change the drive letter from the default (C:) to your networked drive:
U:
You can check the contents of your networked drive by typing dir (directory).
Change directory (cd) to /Documents:
cd Documents
Fire up Jupyter Notebook from that location:
jupyter notebook
Jupyter will open in the normal way, but it will be rooted in your /Documents folder.
You can kill the terminal window at any time by hitting Control-C. Make sure your Python notebooks (*.ipynb) are saved first though.