Five Massively Misused Words in Data Science

Watch your language if you want to have impact

Keith McNulty
5 min readJan 23, 2024

--

Photo by Julien L on Unsplash

As we all know, data science as a discipline is very new to our world. This makes it a very exciting field in which to work. But it also creates problems. Today I want to talk about one of those problems which I deal with all the time: using the wrong language to describe data science results or concepts.

Here are five words that I commonly see misused, as well as an explanation of the typical misuses. Hopefully, this will help you become more aware of booby traps in the communication and implementation of data science results.

1. Predictive

OMG, people LOVE the world predictive, don’t they? Since around 2010 when it started to come into fashion, I don’t think I have heard a word get banded about like the p-word. The biggest misuse I have seen is when it is used to describe any positive result for any variable in any model. Variable x is significant in a linear model, therefore variable x is predictive. That’s quite a jump to make.

Variables that have a significant effect in trained statistical models are only predictive on the training sample, and even then their effect might be so minuscule as to be practically irrelevant, and so it might be a misrepresentation of reality to…

--

--

Keith McNulty

Pure and Applied Mathematician. LinkedIn Top Voice in Tech. Expert and Author in Data Science and Statistics. Find me on LinkedIn, Twitter or keithmcnulty.org