# Three Common Hypothesis Tests All Data Scientists Should Know

## With code examples in R and Python

--

Hypothesis testing is one of the most fundamental elements of inferential statistics. In modern languages like Python and R, these tests are easy to conduct — often with a single line of code. But it never fails to puzzle me how few people use them or understand how they work. In this article I want to use an example to show three common hypothesis tests and how they work under the hood, as well as showing how to run them in R and Python and to understand the results.

## The general principles and process of hypothesis testing

Hypothesis testing exists because it is almost never the case that we can observe an entire population when trying to make a conclusion or inference about it. Almost always, we are trying to make that inference on the basis of a sample of data from that population.

Given that we only ever have a sample, we can never be 100% certain about the inference we want to make. We can be 90%, 95%, 99%, 99.999% certain, but never 100%.

Hypothesis testing is essentially about calculating how certain we can be about an inference based on our sample. The most common process for calculating this has several steps:

1. Assume the inference is not true on the population — this is called the null hypothesis
2. Calculate the statistic of the inference on the sample

--

--

Pure and Applied Mathematician. LinkedIn Top Voice in Tech. Expert and Author in Data Science and Statistics. Find me on LinkedIn, Twitter or keithmcnulty.org