cft

Python Package for Evaluating Regression Models With Single Line of Code

Utility package for practitioners to evaluate the different machine learning algorithms


user

Ajay Arunachalam

3 years ago | 3 min read

I always believe in democratizing AI and machine learning, and spreading the knowledge in such a way, to cater the larger audiences in general, to harness the power of AI.

An attempt inline to this is the development of the python package “regressormetricgraphplot” that is aimed to help users plot the evaluation metric graph with single line code for different widely used regression model metrics comparing them at a glance.

With this utility package, it also significantly lowers the barrier for the practitioners to evaluate the different machine learning algorithms in an amateur fashion by applying it to their everyday predictive regression problems. 

Before we dwell into the package details, let’s understand a few basic concepts in simple layman terms. 

In general, the modeling pipeline involves the pre-processing stage, fitting the machine learning algorithms, and followed by their evaluation. In the figure below, as an example the modeling steps for ensemble learning is depicted. The block A includes the data processing like cleaning, wrangling, aggregation, deriving new features, feature selection, etc.

The block B & C depicts the ensemble learning where the pre-processed data is input to the individual models in Layer-1 which are evaluated and tuned. The input to Layer-2 includes predictions from the previous Layer-1 where then the voting ensemble scheme is used to derive the final predictions.

The results are combined using the average. Finally, the block D shows the model evaluation and result interpretation. The data is split (70:30 ratio) into training and testing data.

The three standalone ML algorithms namely Linear Regression, Random Forest and XGBoost were used. All the models were created with tuned parameters, and then finally a Voting Regression model is used. 

Fig: Modeling Pipeline Ensemble Learning Example

Different regression metrics were used for evaluation. Let’s discuss each of them with their formulae, and corresponding simple explanation.

A voting regressor is an ensemble meta-estimator that fits base regressors each on the whole dataset. It then averages the individual predictions to form a final prediction as shown below.

Getting Started

Terminal Installation

Shell1

$ pip install regressormetricgraphplot

2

          OR

3

$ git clone https://github.com/ajayarunachalam/RegressorMetricGraphPlot

4

$ cd RegressorMetricGraphPlot

5

$ python setup.py install

Notebook

Shell

1

git clone https://github.com/ajayarunachalam/RegressorMetricGraphPlot.git

2

cd RegressorMetricGraphPlot/

3

Just replace the line ‘from CompareModels import *’ with ‘from regressioncomparemetricplot import CompareModels’

4

Follow the rest as demonstrated in the demo example [here] — (https://github.com/ajayarunachalam/RegressorMetricGraphPlot/blob/main/regressormetricgraphplot/demo.ipynb)

Installation With Anaconda

If you installed your Python with Anaconda you can run the following commands to get started:

Shell

1

# Clone the repository

2

git clone https://github.com/ajayarunachalam/RegressorMetricGraphPlot.git

3

cd RegressorMetricGraphPlot

4

# Create new conda environment with Python 3.6

5

conda create — new your-env-name python=3.6

6

# Activate the environment

7

conda activate your-env-name

8

# Install conda dependencies

9

conda install — yes — file conda_requirements.txt

10

# Instal pip dependencies

11

pip install requirements.txt

Code Walkthrough

Python

1

class CompareModels:

2

   def __init__(self):

3

       import pandas as pd

4

       self._models = pd.DataFrame(

5

           data=['r', 'R^2', 'RMSE', 'RMSRE', 'MAPE'],

6

           columns=['Model']

7

      ).set_index(keys='Model')

8

       

9

   def add(self, model_name, y_test, y_pred):

10

       import numpy as np

11

       from sklearn.metrics import r2_score, mean_squared_error

12

       self._models[model_name] = np.array(

13

           object=[

14

               np.corrcoef(y_test, y_pred)[0, 1], # r

15

               r2_score(y_true=y_test, y_pred=y_pred), # R^2

16

               np.sqrt(mean_squared_error(y_true=y_test, y_pred=y_pred)), # RMSE

17

               np.sqrt(np.mean(((y_test-y_pred)/y_test)**2)), # RMSRE

18

               np.mean(np.abs((y_test-y_pred) / y_test)) * 100 # MAPE

19

          ]

20

      )

21

       

22

   def R2AndRMSE(y_test, y_pred):

23

       import numpy as np

24

       from sklearn.metrics import r2_score, mean_squared_error

25

       return r2_score(y_true=y_test, y_pred=y_pred), np.sqrt(mean_squared_error(y_true=y_test, y_pred=y_pred))

26

   

27

   @property

28

   def models(self):

29

       return self._models

30

   

31

   @models.setter

32

   def models(self, _):

33

       print('Cannot perform such task.')

34

   

35

   def show(self, **kwargs):

36

       import matplotlib.pyplot as plt

37

       kwargs['marker'] = kwargs.get('marker', 'X')

38

       self._models.plot(**kwargs)

39

       plt.xticks(range(len(self._models)), self._models.index)

40

       plt.xlabel('')

41

       plt.axis('auto')

42

       plt.show()

43

Usage

Python

1

plot = CompareModels()

2

plot.add(model_name=“Linear Regression”, y_test=y_test, y_pred=y_pred)

3

plot.show(figsize=(10, 5))

Python

1

# Metrics

2

CompareModels.R2AndRMSE(y_test=y_test, y_pred=y_pred)

Complete Demo

Comprehensive demonstrations can be found in the Demo.ipynb file. 

Upvote


user
Created by

Ajay Arunachalam


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles