The purpose of the document is to give you some hints to get started with the warmup assignment. This assumes you have decided to use Google Colab as your development environment. If not, select the appropriate environment below:
- Python using Jupyter Notebooks (Virtual Lab or on your own machine)
- R using RStudio (Virtual Lab or on your own machine)
- (you can also run R in Google Colab)
Overview
Google Colab provides a cloud-based version of interactive Python that is virtually identical to Jupyter Notebook. The advantage of Colab, like all cloud-based applications, is that you can access it with only a browser; you do not need to install any other software. The disadvantage (perhaps) is that you need a Google (e.g., GMail) account to get started.
Steps
Ensure you have a Google account
If you don't have a Google account and want one: simply search "create a google account" and follow the instructions. As always, make sure to pick a strong password and remember it.
Log into Colab
- Open https://colab.research.google.com in a browser
Log in using your Google account. You will get a blank Jupyter-like notebook:
Upload your data to the Google server
All the Python code you write in Colab executes on the remote server (i.e., Google's computer, not the computer you are working on). As such, a copy of your data has to be stored on the Google server. Note: Google deletes your data from its server at the end of a session. So you have to repeat the upload step every time you use Colab (unless you use Google Drive: see below). Your code, however, is not deleted (which is the important thing).
- Ensure you have a copy of the video game data file (Excel) on your local machine. You can download it from the Canvas assignment page if you do not have it. Remember where you save it (e.g., the Downloads folder)
Click the files icon on the far left. It is the little folder at the bottom. Then click the upload icon.
- Colab will give you a file navigation window on your local machine. Navigate to where you saved the data file (e.g., the Downloads folder). Select the file you want to upload to Google.
You should now see the data file on the remote machine. Your code running on Colab can now access your data.
Losing your files
It is relatively easy when using the Colab file system to get lost. All you need to know is this: The default location for uploaded data files is /content
.
If you lose your data file, navigate to
/content
from the root directory.- If you do not find your data in
/content
, it may have been deleted after your last session. You can either re-upload it or upload it to your Google Drive account (where it is not deleted). Using Google Drive with Colab is a bit more complex and is likely unnecessary for the warmup assignment. You can see instructions below if you a really interested in storing data long term in your Google Drive account.
Working with the data in Python
The Python tutorial provides more details on using the Pandas library to read and manipulate data in an Excel (or other) file.
- Import the Pandas library:
import pandas as pd
- As in all interactive Python environments, you press Shift-Enter to execute the cell and create a new empty cell underneath.
- We already know that the default file upload location is
/content
. Fortunately, this is also the default working directory for Colab. All files within the default working directory are available without typing in the full pathname. You can confirm the working directory with the "print working directory" command:pwd
- Use the Pandas library to read the Excel file, assign it to a new dataframe object called (for example)
vg
, and print the head (the first few rows) of the dataframe:
vg = pd.read_excel("VideoGameSalesData.xlsx")
vg.head()
data:image/s3,"s3://crabby-images/2c794/2c7948af3a222db88c056d2824986ec9c1d1753d" alt="output"
I picked the name vg
for my dataframe because it is short and easy to remember (vg = video game). You can choose any name you want for your Python variables (within certain standard naming conventions, like no spaces or weird characters).
Descriptive statistics
- Call the Pandas function that summarizes data. Since you do not want to summarize all the columns, first select a series by putting the column name in square brackets:
vg["FirstYearSales (M)"].describe()
- You can round the results by passing the results of the
describe()
function to a function calledround(n)
, where n is the desired number of decimal places:
vg["FirstYearSales (M)"].describe().round(2)
That is pretty much all you need to do in Python to complete the warmup assignment.
Using your Google Drive
If you have a Google account, you also have a Google Drive account. The free version of Google Drive comes with about 15 GB of storage to store your GMail emails, photos, and whatever (should you so choose). The advantage of using Google Drive with Colab is that your data files persist between Colab sessions, meaning you do not have to keep uploading your data from your local machine.
Click the "Mount Drive" button:
Since you are logged into Colab with your Google account, Google knows about your Drive. It mounts it as:
/content/drive/MyDrive
I have created subfolders on my Google Drive so yours will look different:- If you type
pwd
into a Colab Python cell and hit Shift-Enter, you will see that you are still in the/content
folder. Thus, to access your data file on Google Drive, you have to use the path of the mounted drive:
vg = pd.read_excel("drive/MyDrive/Data/VideoGameSalesData.xlsx")