cft

Assumptions of Linear Regression

The parametric nature of regression, it is restrictive in nature. Hence due to this, it fails to deliver good results with data sets that don't fulfill its assumptions. These are the assumptions needed for linear model.


user

Rutik Bhoyar

3 years ago | 1 min read

Assumptions of Linear Regression

Due to the parametric nature of regression, it is restrictive in nature. Hence due to this, it fails to deliver good results with data sets that don't fulfill its assumptions.
Following are the assumptions:

  • Linear Relationship
  • No correlation of error terms
  • Constant Variance of error terms
  • No correlation among independent variables
  • The error normally distributed.

Now let's understand what these terms actually mean...

  1. Linear or Additive Model: If you fit a model to a non-linear, non-additive dataset, the regression algorithm would fail to capture the trend mathematically, thus result in an inefficient model. Also, this will lead to an erroneous prediction on an unseen dataset.
  2. No correlation of error terms: The presence of correlation in error terms drastically reduces the model's accuracy. This usually occurs in time series models where the next instant is dependent on the previous instant. If the error terms are correlated, the estimated standard errors tend to underestimate the true standard error.
    • This is also known as autocorrelation.
  3. Constant Variance of error terms: This phenomenon exists when the independent variables are found to be moderately or highly correlated. In the model with correlated variables, it becomes difficult to find out which variable is actually contributing to predict the response variable.
    • It also leads to an increase in standard error.
    • This is also called Multicollinearity.
  4. No Correlation among independent variables: The presence of non-constant variance in the error terms results in heteroskedasticity.
    • Generally, non-constant variance arises in presence of outliers or extreme leverage values. Look like, these values get too much weight, thereby disproportionality influences the model's performance. When this phenomenon occurs, the confidence interval for out-of-sample prediction tends to be unrealistically wide or narrows.
    • This is also known as heteroskedasticity.
  5. Normal Distribution: If error terms are non-normally distributed, confidence intervals may become too wide or narrow.
    • Once confidence interval becomes unstable, it leads to difficulty in estimating coefficients based on minimization of least squares.
    • The presence of non-normal distribution suggests that there are a few unusual data points that must be studied closely to make a better model.

These are the assumptions you need to get a better linear model.

Upvote


user
Created by

Rutik Bhoyar

Engineer at _VOIS | Talks about Artificial Intelligence, Open Source, Startups


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles