What Smoking did for Data Science

How survival analysis was born out of necessity

Keith McNulty
6 min readMay 12, 2023

--

Anyone who is, like me, a fan of the US TV show Mad Men, will recall that a substantial thread of the plot and historical context of that outstanding period drama revolves around smoking. In the 1960s, the US medical establishment were pivoting towards a point of view that smoking was a cancer-causing habit — a hard thing for them to face up to since they pretty much all smoked themselves.

Today, massively fewer numbers of people smoke habitually or at all. We are certainly not out of the woods, but 50 years has made a substantial difference. However, 50 years is a long time — and it’s natural to ask why it takes so long to create change.

One answer is, of course, that smoking is a terribly addictive behavior — and addictive behaviors are the hardest things to change. Another reason though, is that it takes a long time to build up the evidence that smoking correlates with negative survival outcomes, and even longer to show that smoking causes cancer.

Those of us who work in data science, however, had something to gain from the research around smoking and its health outcomes. It was this research that brought methods of epidemiological analytics into the fore that offer incredible value to us today. It was during the 1960s to 1980s, when the medical establishment took on the giants of tobacco in a long drawn out slug fest, that survival analysis stepped up to land a knockout punch.

--

--

Keith McNulty

Pure and Applied Mathematician. LinkedIn Top Voice in Tech. Expert and Author in Data Science and Statistics. Find me on LinkedIn, Twitter or keithmcnulty.org