Five Massively Misused Words in Data Science
Watch your language if you want to have impact
As we all know, data science as a discipline is very new to our world. This makes it a very exciting field in which to work. But it also creates problems. Today I want to talk about one of those problems which I deal with all the time: using the wrong language to describe data science results or concepts.
Here are five words that I commonly see misused, as well as an explanation of the typical misuses. Hopefully, this will help you become more aware of booby traps in the communication and implementation of data science results.
1. Predictive
OMG, people LOVE the world predictive, don’t they? Since around 2010 when it started to come into fashion, I don’t think I have heard a word get banded about like the p-word. The biggest misuse I have seen is when it is used to describe any positive result for any variable in any model. Variable x is significant in a linear model, therefore variable x is predictive. That’s quite a jump to make.
Variables that have a significant effect in trained statistical models are only predictive on the training sample, and even then their effect might be so minuscule as to be practically irrelevant, and so it might be a misrepresentation of reality to…