cft

Automating Exploratory Data Analysis Using QuickDA

Using QuickDA for Preprocessing and Manipulating Data


user

Himanshu Sharma

3 years ago | 2 min read

Exploratory data analysis consists of different parts like visualizing the data patterns, analyzing the statistical properties, preprocessing data, etc. This process takes around 30% of the total project time but this problem can be solved by automating exploratory data analysis.

Automating exploratory data analysis can save a lot of time and effort because now we don’t have to write code for every visualization or statistical analysis. Automating the process will generate the report of all the visualization and data analysis also. Similarly for other preprocessing, we can use a single line of code. But how to automate the process?

QuickDA is the answer to this. It is an open-source python library that is used for exploratory data analysis and also manipulating the data. It is easy to use and performs operations in a single line of code.

In this article, we will explore some of the functionalities that QuickDA provides.

Let’s get started…

Installing required libraries

We will start by installing a QuickDA using pip. The command given below will do that.

!pip install quickda

Importing required libraries

In this step, we will import the required libraries for performing exploratory data analysis.

from quickda.explore_data import *
from quickda.clean_data import *
from quickda.explore_numeric import *
from quickda.explore_categoric import *
from quickda.explore_numeric_categoric import *
from quickda.explore_time_series import *
import pandas as pd

Loading the dataset

The dataset that I am using here is the famous diabetes dataset which can be downloaded from online sources you can also use any other datasets.

df = pd.read_csv("Diabetes.csv")
df
Source: By Author
Source: By Author

Exploring Statistical Properties

In this step, we will use the explore function of QuickDA which generates visualization to analyze the statistical properties of data.

explore(df)
Statistical Properties(Source: By Author)
Statistical Properties(Source: By Author)

EDA Report

Here we will generate an EDA report using Pandas Profiling in the backend.

explore(df, method='profile', report_name='Report')
EDA Report(Source: By Author)
EDA Report(Source: By Author)

Data Preprocessing

In this step, we will apply some of the preprocessing to data. All these functions are of single-line code.

  1. Column name standardization
df = clean(df, method='standardize')
df
Source: By Author
Source: By Author

2. Drop columns

df = clean(df, method='dropcols', columns="skinthickness")
df
Source: By Author
Source: By Author

3. Remove duplicate rows

df = clean(df, method='duplicates')

4. Fill in missing data

df = clean(df, method='fillmissing')

Go ahead try this with different datasets and perform all the operations mentioned above using QuickDA. In case you find any difficulty please let me know in the response section.

This article is in collaboration with Piyush Ingale.

Upvote


user
Created by

Himanshu Sharma


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles