Lesson 1: Read And Analyse OCI Dataset Using Pandas -

Table of Contents

Welcome

To Lesson-1 of the Data Exploration with Python series! In this tutorial, we will explore a public dataset that contains Country/Mission-wise OCI details from the Indian Open Government Data Platform. I am using some of these less explored datasets trying to mimic a typical work situation where the given data is from a new domain or has information that you may not have observed before.

We will cover the lesson through below steps, introducing you to some basic commands/functions to analyze the dataset:

Import required python libraries
Read the dataset into a dataframe. Since it is a small file, the dataset is already saved to my GitHub repository.
Analyze the dataset through various commands/functions

You may use either of these options to run through the lesson,

Click on the launch-binder button below. You will get an interactive live JupyterLab notebook that’s created directly from my GitHub repo. You will then be able to start playing around with the data right away!

Note: It takes about 1-2 minutes for the notebook to be ready. Since it prepares the entire environment on a server. So, hang in there!

2. Set up the whole environment on your laptop. You may do so by creating a virtual environment, installing any python libraries needed, and cloning my GitHub repo from https://github.com/hgante/telestreak.git. My notes from How I setup JupyterLab for Data Exploration in Python would come in handy!

I have also embedded the entire Jupyter notebook in a Github gist below, for your reference.

Summary

In conclusion, we used some basic commands and functions from the pandas library to read a public dataset having OCI details. We also performed some basic data transformations and analysis to build familiarity with this dataset.

Data transformations

In addition to the basic commands, I have demonstrated two basic data transformations:

renaming a dataframe column
changing datatype of an object type to date

Functions for data exploration

Here’s a list of functions I have used in the tutorial. I highly encourage you to practice reading through the documentation for any function or command, to learn about it’s functionality and parameters by placing your cursor on the function name and hitting Shift + Tab in your Jupyter notebook.

read_csv	Read the .csv file that is saved in this repository into a pandas dataframe
shape	Specifies the rows and columns in the dataset
columns	Lists the column names
head	display top 5 rows in the dataset
min	Get the minimum value in the conditions specified
max	Get the maximum value in the conditions specified
sum	displays the sum of values in the requested axis
mean	displays the mean of values in the requested axis
describe	describes the dataset with respect to its statistical values
date_range	retrieves the data in a particular range of date
nunique	unique entries in the dataset for a particular axis
groupby	group a certain set of values based on a criteria
sort_values	sort the values in ascending or descending

Panda functions and description

To sum up, with these basic commands you should be able to read any kind of csv file. After that, you can gain deeper insights into them.

Most importantly, hoping that you enjoyed this tutorial. We welcome comments and examples on the dataset you’ve just explored!

Prev: Setup Jupyter notebook

Next: Coming soon

Lesson 1: Read And Analyse OCI Dataset Using Pandas

Welcome

Summary

Data transformations

Functions for data exploration

Like this:

Related

Welcome

Summary

Data transformations

Functions for data exploration

Share this:

Like this:

Related

Check this too

How I setup JupyterLab for Data Exploration in Python

Data Exploration with Python