Probability theory by example (Part 1)

Glenn Henshaw
4 min readMay 18, 2019


A few examples to start with

Source: unsplash Artist: Aldric RIVAT

Probability theory is the among the most fascinating subjects you can study. It comes with an enormous variety of fascinating concrete examples and challenging problems. But if you crack open any graduate-level textbook on probability you will notice that mathematicians and statisticians use on an extremely abstract set of constructions as a foundation for probability theory. These rigorous constructions are quite necessary and have a beauty all their own. But I believe abstractions should be introduced only after having been sufficiently motivated by examples. Otherwise they can seem like “abstract nonsense” to the learner.

Here is my plan for this series of posts:

  • Present the most interesting, concrete examples that fit the topic
  • Introduce new definitions and levels of abstraction only when they are required and motivate them with examples
  • Keep the posts short and example-centered

So let’s begin.

What is probability?

Colloquially, people think of probability as a number assigned to an event that somehow quantifies likelihood of the event.

There is a 30% chance of rain today.

But what does such a statement really mean? Does it refer to the level of certainty in the mind of the news caster? Is it saying that if the day was repeated 100 times (like the movie Groundhog Day) then it would rain on 30 of those days? As we dive into more difficult examples, a more rigorous definition of probability will be necessary. For now let’s just try our best to choose probabilities that reflect the likelihood that an event will occur.

Let’s agree that an event that is impossible should have probability 0 and an event that is certain to occur deserves a probability of 1, and that all other events are in between. That is, if E is an event, 0 ≤ P(E) ≤1.

Example 1.1 What is the probability of getting heads on a coin flip?

Of course we know P(H) = 1/2, but why? Well, when we flip a coin there is one event that is certain to occur — we will get heads or tails. So P(H or T) = 1. Assuming we are tossing a fair coin we ought to have P(H) = P(T). But why should their probabilities add to 1? For now we will just argue that each outcome of the toss contributes exactly half of the possible outcomes of the event H or T so each event should have probability 1/2.

Example 1.2 Now suppose we are rolling a six-sided die. Consider the following two events. Let’s say the first event, call it L, consists of rolling 4 or less on the die. That is, L = {4, 3, 2,1}. The second event is H = {4, 5, 6}. Like in the previous example, we are certain that the event L or H will occur. So P(L or H) = 1. Can we use the same argument to show that P(L) = P(H) = 1/2?

Example 1.3 Instead of one coin flip, let’s toss a coin repeatedly and only stop when we get a total of two tails. Let’s look at some possible outcomes. We could get lucky and get two tails right off the bat, TT (tails then tails). Or perhaps we will get a long sequence of heads before we get our two tails, HHHHHHHHHTT. Perhaps we will get a tail near the beginning and have to wait a while to get the next one HTHHHHHHHHHHT.

Even though it is unlikely for us to get say, 500 heads before getting our first tail, it is still a possible outcome. Theoretically there is no limit to the number of heads before getting our first tail. There are an infinite number of possible outcomes to this game. How can we assign probabilities for an infinite number of outcomes?

Example 1.4 Similar to the last example, suppose we were repeatedly tossing a coin. Clearly the probability of rolling all heads in your first 10 rolls is very low. But there’s nothing special about rolling 10 heads in a row. Every sequence of 10 rolls composed of heads and tails are equally likely. After all, the coin has no memory of past rolls. Read about the gambler’s fallacy for more information on this concept.

Example 1.5 This example gets a bit philosophical. Suppose your best friend is part of Earth’s first mission to Mars. You are watching the news coverage of the landing. The problem is that there is a 14-minute communication delay for transmissions to and from Mars. You’re watching coverage of the last few minutes of the her decent. At that point is it reasonable to assign a probability to the prospect that she landed safely? How can we? The event already happened. The results are already determined we are just not in a position to know the results.


I described my plan for an example-driven series of posts on probability theory and and presented some initial examples. In part 2 we will talk more about the meaning of probability, its various interpretations, and present some more examples. If you read this far please send me comments, questions, and advice.