Three Simple Things About Regression That Every Data Scientist Should Know

Understanding these three things will improve how you go about linear and generalized linear modeling

6 min readAug 6, 2023

I consider myself more of a mathematician than a data scientist. I can’t bring myself to execute methods blindly, with no understanding of what’s going on under the hood. I have to get deep into the math to trust the results. That’s a good thing because it’s very easy nowadays to just run models and go home.

A model is only as good as your understanding of it, and I worry that a lot of people are running models and just accepting the first thing that comes out of them. When it comes to regression modeling — one of the most common forms of modeling out there — you’ll be a better data scientist if you can understand a few simple things about how these models work and why they are set up the way they are.

1. You are predicting an average — not an actual value

When you run a regression model, usually you are finding a relationship between the input variables and some sort of mean value related to the outcome. Let’s look at linear regression. When we run a linear regression we are making two very important assumptions about our outcome variable y:

That the possible values of y for any given input variables are distributed around a mean.
That the mean of y has an…

Three Simple Things About Regression That Every Data Scientist Should Know

Understanding these three things will improve how you go about linear and generalized linear modeling

1. You are predicting an average — not an actual value

Written by Keith McNulty