Member-only story
Predictors need not be normally distributed but the errors do.
A place where discrimination is legal.
It’s nice that a lot of people know what the assumptions of a linear regression are, but not many dig deep into why they have to be taken into account. And yes, it is for mathematical convenience but I like reasoning more than math. It’s funny—as a data scientist, I’ve never really been ‘good’ at math or even liked it for that matter. So this is an article where I choose to skip all the ugly formulas.
What is Linear Regression?
Linear regression finds the best straight-line relationship between two variables (X & Y). It helps predict one based on the other by minimizing the difference between actual and predicted values.
Linear Regression Assumptions
- Linear Relationship: The relationship between the predictor and outcome is linear.
- Homoscedasticity: The variability of residuals is consistent across all levels of the predictor variable.
- Normality: The residuals are normally distributed.
- Independence of Errors: Residuals are not correlated with one another and are independent of predictor variables.
- Autocorrelation: Residuals should be…