Three Common Hypothesis Tests All Data Scientists Should Know

With code examples in R and Python

Keith McNulty
9 min readJan 10, 2024

--

Hypothesis testing is one of the most fundamental elements of inferential statistics. In modern languages like Python and R, these tests are easy to conduct — often with a single line of code. But it never fails to puzzle me how few people use them or understand how they work. In this article I want to use an example to show three common hypothesis tests and how they work under the hood, as well as showing how to run them in R and Python and to understand the results.

The general principles and process of hypothesis testing

Hypothesis testing exists because it is almost never the case that we can observe an entire population when trying to make a conclusion or inference about it. Almost always, we are trying to make that inference on the basis of a sample of data from that population.

Given that we only ever have a sample, we can never be 100% certain about the inference we want to make. We can be 90%, 95%, 99%, 99.999% certain, but never 100%.

Hypothesis testing is essentially about calculating how certain we can be about an inference based on our sample. The most common process for calculating this has several steps:

  1. Assume the inference is not true on the population — this is called the null hypothesis
  2. Calculate the statistic of the inference on the sample

--

--

Keith McNulty

Pure and Applied Mathematician. LinkedIn Top Voice in Tech. Expert and Author in Data Science and Statistics. Find me on LinkedIn, Twitter or keithmcnulty.org