Lesson 1: Read And Analyse OCI Dataset Using Pandas


Welcome

To Lesson-1 of the Data Exploration with Python series! In this tutorial, we will explore a public dataset that contains Country/Mission-wise OCI details from the Indian Open Government Data Platform. I am using some of these less explored datasets trying to mimic a typical work situation where the given data is from a new domain or has information that you may not have observed before.

We will cover the lesson through below steps, introducing you to some basic commands/functions to analyze the dataset:

  1. Import required python libraries
  2. Read the dataset into a dataframe. Since it is a small file, the dataset is already saved to my GitHub repository.
  3. Analyze the dataset through various commands/functions

You may use either of these options to run through the lesson,

  1. Click on the launch-binder button below. You will get an interactive live JupyterLab notebook that’s created directly from my GitHub repo. You will then be able to start playing around with the data right away!

Binder

Note: It takes about 1-2 minutes for the notebook to be ready. Since it prepares the entire environment on a server. So, hang in there!

2. Set up the whole environment on your laptop. You may do so by creating a virtual environment, installing any python libraries needed, and cloning my GitHub repo from https://github.com/hgante/telestreak.git. My notes from How I setup JupyterLab for Data Exploration in Python would come in handy!


I have also embedded the entire Jupyter notebook in a Github gist below, for your reference.

Summary

In conclusion, we used some basic commands and functions from the pandas library to read a public dataset having OCI details. We also performed some basic data transformations and analysis to build familiarity with this dataset.

Data transformations

In addition to the basic commands, I have demonstrated two basic data transformations:

  • renaming a dataframe column
  • changing datatype of an object type to date

Functions for data exploration

Here’s a list of functions I have used in the tutorial. I highly encourage you to practice reading through the documentation for any function or command, to learn about it’s functionality and parameters by placing your cursor on the function name and hitting Shift + Tab in your Jupyter notebook.

read_csvRead the .csv file that is saved in this repository into a pandas dataframe
shapeSpecifies the rows and columns in the dataset
columnsLists the column names
headdisplay top 5 rows in the dataset
minGet the minimum value in the conditions specified
maxGet the maximum value in the conditions specified
sumdisplays the sum of values in the requested axis
meandisplays the mean of values in the requested axis
describedescribes the dataset with respect to its statistical values
date_rangeretrieves the data in a particular range of date
nuniqueunique entries in the dataset for a particular axis
groupbygroup a certain set of values based on a criteria
sort_valuessort the values in ascending or descending
Panda functions and description

To sum up, with these basic commands you should be able to read any kind of csv file. After that, you can gain deeper insights into them.

Most importantly, hoping that you enjoyed this tutorial. We welcome comments and examples on the dataset you’ve just explored!

Prev: Setup Jupyter notebook

Next: Coming soon