# Three Common Hypothesis Tests All Data Scientists Should Know

## With code examples in R and Python

Hypothesis testing is one of the most fundamental elements of inferential statistics. In modern languages like Python and R, these tests are easy to conduct — often with a single line of code. But it never fails to puzzle me how few people use them or understand how they work. In this article I want to use an example to show three common hypothesis tests and how they work under the hood, as well as showing how to run them in R and Python and to understand the results.

## The general principles and process of hypothesis testing

Hypothesis testing exists because it is almost never the case that we can observe an entire population when trying to make a conclusion or inference about it. Almost always, we are trying to make that inference on the basis of a sample of data from that population.

Given that we only ever have a sample, we can never be 100% certain about the inference we want to make. We can be 90%, 95%, 99%, 99.999% certain, but never 100%.

Hypothesis testing is essentially about calculating how certain we can be about an inference based on our sample. The most common process for calculating this has several steps:

- Assume the inference is
*not true*on the population — this is called the*null hypothesis* - Calculate the statistic of the inference on the sample
- Understand the expected distribution of the sampling error around that statistic
- Use that distribution to understand the maximum likelihood of your sample statistic being consistent with the null hypothesis
- Use a chosen ‘likelihood cutoff’ — known as
*alpha*— to make a binary decision on whether to accept the null hypothesis or reject it. The most commonly used value of alpha is 0.05. That is, we usually reject a null hypothesis if it renders the maximum likelihood of our sample statistic to be less than 1 in 20.

## The salespeople data set

To illustrate some common hypothesis tests in this article I will use the `salespeople`

dataset which can be obtained here. Let’s download it in R and take a quick look at the first few rows.

`url <- "http:://peopleanalytics-regression-book.org/data/salespeople.csv"`

salespeople <…