Automating Exploratory Data Analysis Using QuickDA
Using QuickDA for Preprocessing and Manipulating Data
Exploratory data analysis consists of different parts like visualizing the data patterns, analyzing the statistical properties, preprocessing data, etc. This process takes around 30% of the total project time but this problem can be solved by automating exploratory data analysis.
Automating exploratory data analysis can save a lot of time and effort because now we don’t have to write code for every visualization or statistical analysis. Automating the process will generate the report of all the visualization and data analysis also. Similarly for other preprocessing, we can use a single line of code. But how to automate the process?
QuickDA is the answer to this. It is an open-source python library that is used for exploratory data analysis and also manipulating the data. It is easy to use and performs operations in a single line of code.
In this article, we will explore some of the functionalities that QuickDA provides.
Let’s get started…
Installing required libraries
We will start by installing a QuickDA using pip. The command given below will do that.
!pip install quickda
Importing required libraries
In this step, we will import the required libraries for performing exploratory data analysis.
from quickda.explore_data import *
from quickda.clean_data import *
from quickda.explore_numeric import *
from quickda.explore_categoric import *
from quickda.explore_numeric_categoric import *
from quickda.explore_time_series import *
import pandas as pd
Loading the dataset
The dataset that I am using here is the famous diabetes dataset which can be downloaded from online sources you can also use any other datasets.
df = pd.read_csv("Diabetes.csv")
Exploring Statistical Properties
In this step, we will use the explore function of QuickDA which generates visualization to analyze the statistical properties of data.
Here we will generate an EDA report using Pandas Profiling in the backend.
explore(df, method='profile', report_name='Report')
In this step, we will apply some of the preprocessing to data. All these functions are of single-line code.
- Column name standardization
df = clean(df, method='standardize')
2. Drop columns
df = clean(df, method='dropcols', columns="skinthickness")
3. Remove duplicate rows
df = clean(df, method='duplicates')
4. Fill in missing data
df = clean(df, method='fillmissing')
Go ahead try this with different datasets and perform all the operations mentioned above using QuickDA. In case you find any difficulty please let me know in the response section.
This article is in collaboration with Piyush Ingale.