# Odds != Probability

Many people use the words ‘odds’ and ‘probability’ interchangeably. They are both terms that imply an estimate of likelihood or chance. I can understand this for laypeople, but I often see data scientists and statisticians mixing up these concepts also, which is a shame, because mathematically they mean different things.

Although they are both related, odds and probability are very different in scale and meaning. When mixed up in the wrong contexts this can lead to mistaken estimates of chance, which can then lead to erroneous decision making.

In this article, I want to illustrate what those differences are and…

# Your Data Science Work Has No Impact? Here’s Why…

There are a lot of frustrated data scientists out there right now. A number of recent surveys indicate that this field is among the most likely to have discontented employees who feel their work is not valued. In fact, it’s quite likely that you – the reader of this article – are currently unhappy or frustrated in your job or have been at some time in the past.

One thing to consider is whether there is something you can do about it. Sometimes data scientists find themselves systemically disenfranchised, and that is something it is very difficult to change unless…

# Create organization diagrams in a few lines of code (The 5-minute learn)

Organization diagrams are very popular but can be a real headache to create. Manual software for creating them is annoying and time consuming. Wouldn’t it be great if we could just spin them up with a few lines of code?

Well the good news is you can with the right data, because an org diagram is a special type of graph called a tree or dendrogram. A tree is a graph where there is only one path between any two vertices. Because we have the tools to work with and visualize graphs in data science languages, we can use these…

# The 5-minute learn: Create pretty and geographically accurate transport maps in R

I’m experimenting with an occasional article called ‘the 5-minute learn’ where I try to teach a useful technique in 5 minutes reading time or less. In this first attempt, we are going to look at how to create a graph of the London Tube network which is geographically accurate (unlike the one we often see). If you have the right data, you should be able to easily apply this technique to other transport networks.

To do this, we will need a data set that we can use to build a graph with the stations as vertices. I found a JSON…

# How I am reversing my diabetes thanks to my data science skills

This is an unusually personal post, but one I feel I have to write about to encourage others. Early in 2020, I suspected that I might have issues with my blood glucose levels. I bought one of those inexpensive prick test devices and indeed my readings were higher than they should be. I contacted my doctor to see about a more formal test, but because of the COVID pandemic I was put off until later in the year.

Nevertheless I embarked on a process to reduce my weight. I was told that weight loss is the first step in managing…

# How to explain Machine Learning to a lay person

Advancements in computer technology over the past decades have meant that the collection of electronic data has become more commonplace in most fields of human endeavor. Many organizations now find themselves holding large amounts of data spanning many prior years. This data can relate to people, financial transactions, biological information, and much, much more.

Simultaneously, data scientists have been developing iterative computer programs called algorithms that can look at this large amount of data, analyse it and identify patterns and relationships that cannot be identified by humans. Analyzing past phenomena can provide extremely valuable information about what to expect in…

# Twenty questions to test your R knowledge

While not as popular as Python, R has a strong and growing user base as a programming language, and as an applied statistician it will always be my language of choice. There are many types of R users in my experience. There are those that just scrape by on enough knowledge to finish their stats homework assignments, there are those that use it on a regular basis but mostly work around convenient data wrangling packages like `dplyr`, and there are those with a deep knowledge of the language and its underlying structures.

Where do you sit? Here are twenty questions…

# Why choose between R and Python?

R and Python have many similarities and many differences. Most of the underlying concepts of data structures are very similar between the two languages, and there are many data science packages that now exist in both languages. But R is set up in a way that I would describe as ‘data first, application second’, whereas Python feels more application development driven from the outset. Javascript programmers, for example, would slot into Python a little quicker than they would slot into R, purely from a syntax and environment management point of view.

More and more I have been working in R…

# The New Native Pipe Operator in R

Version 4.1.0 of R was released on 18th May and with it comes the new native pipe operator `|>`. The new operator is intended over time to replace the common pipe operator that is found in a number of R packages (most notably `magrittr` and `dplyr`) and which is well-known and loved by many.

There has been rising demand for a native pipe among R users, so its advent is not all that surprising. Among the reasons why native pipe is considered an improvement are:

1. Because it is native, it does not depend on any package loading. Many R users…

# Three simple things about regression that every data scientist should know

I’m more of a mathematician than a data scientist. I can’t bring myself to execute methods blindly, with no understanding of what’s going on under the hood. I have to get deep into the math to trust the results. That’s a good thing because it’s very easy nowadays to just run models and go home.

A model is only as good as your understanding of it, and I worry that a lot of people are running models and just accepting the first thing that comes out of them. When it comes to regression modeling — one of the most common…

## Keith McNulty

Expert and Author in Applied Mathematics, Data Science, Statistics and Psychometrics. Find me on LinkedIn or Twitter or drkeithmcnulty.com

Get the Medium app