Probability theory by example (Part 2)
Outcomes, sample spaces, and events
In part 1 we considered some examples of random experiments and sketched some probability concepts that we will cover more thoroughly later. In particular we decided that the probability of tossing heads in one flip of a coin should be 1/2. This was easy because coin tosses tend to have exactly two equally likely outcomes. Consider the following, less trivial, experiment.
- Randomly select a word from a particular book. What is the probability that the word only appears once in the book?
This example is more difficult than a coin flip. To answer this question it seems we would have to count the number of times each word in the book appears. Generally this approach of counting outcomes is our first step. But there is something very special about this experiment.
In every book, in any language, approximately 1/2 the words only appear once.
The distribution of words in books tend to follow a type of probability distribution called Zipf’s Law.
Before getting into exotic probability distributions like Zipf’s Law we should work on trying to understand the set of possible outcomes of random processes. Let’s settle on some vocabulary.
- Experiment: any random process we are observing, i.e. the rolling of a die, randomly selecting a person from a group, and so on
- Outcome: a single possible result of an experiment
- Sample space (Ω): the set of all outcomes of an experiment. it is denoted by the Greek letter omega
- Event: any collection, or set, of outcomes
Consider the experiment where we roll a 6-sided die and flip a coin. The sample space Ω is the set of all possible outcomes. In this case there are 12 possible outcomes. Each outcome is listed as a number with a letter. Numbers indicate dice results and letters for coin toss results.
Ω = {1H,2H,3H,4H,5H,6H,1T,2T,3T,4T,5T,6T}.
We use curly braces around the elements of the sample space to indicate that they are members of a set. We will discuss sets later. From the sample space we can define all kinds of events.
A = {2H,3H}, B = {6H,6T}, C ={1H}
D = {all outcomes with dice rolls greater than 2}
Events can contain any elements of the sample space or it can be empty. The sample space Ω itself is an event. Each individual outcome can be thought of an event. We can define events without having to list its individual outcomes like we did for D. It’s a good thing we can do this because sample spaces can get very large, infinite in fact.
Events can be infinite as well. Let’s say our experiment is to randomly select any positive integer. The sample space is Ω = {1,2,3,4,…}. We can define an event containing all even integers in the sample space or all prime numbers. In this example we didn’t have to think too hard to decide how to represent possible outcomes. They were just integers. But as the following example shows, sometimes it’s not easy to write down outcomes.
Example 2.1 Two friends are having a basketball free-throw competition. They will take turns shooting free-throw shots. They agree to keep playing until one of them scores two shots in a row or one of them misses two shots in a row. Describe the sample space.
An outcome for this experiment is a sequence of shot results for the players. Let’s say 1 indicates a successful shot and 0 indicates a miss. The sequence will be composed of pairs of results, corresponding to the two players. Here is an example outcome
(1,0)(0,1)(1,1).
The game ended after three rounds because the second player scored two shots in a row. The games can be of any length so the sample space is infinite. There are 4 outcomes where the games last forever. Here is one of them.
(1,0)(0,1)(1,0)(0,1)…
Our ultimate goal is to learn to calculate probabilities of events from a sample space. So our choice of sample space should depend on the the events we are interested in.
Example 2.2 Let’s say we have a medical test for colon cancer. Your goal is to find the probability the the test will come back as a false positive for a randomly selected patient. What sample space should you consider (a false positive result occurs when a positive test result is given to someone without the illness)?
The test can come back with two possible results, positive and negative. And the patient may have the illness or not. So we should consider all four combinations in our sample space.
Ω = {(positive, no disease), (positive, has disease),(negative, no disease), (negative, has disease)}
The first outcome is a false positive and the last outcome is a false negative. We will return to this example later.
We’ve seen finite sample spaces and infinite sample spaces, but so far all our sample spaces have been countable. A set is countable when you can label each of its elements with a positive integer. Some infinite sets are countable but some are of a higher order of infinity. The set of all real numbers (and intervals of real numbers) are un-countable. It is necessary to treat experiments with countable sample spaces differently than those with un-countable sample spaces. There are several reasons for this and we’ll discuss them later.
Example 2.3 Randomly choose a set of 10 days in 2018 and find the average daily temperature for each day. Describe the sample space.
Temperatures occupy some un-countable interval of real numbers. Of course thermometers have limited precision so they can only output a finite number of values. But for now let’s ignore measurement precision.
Conclusion
We learned how to identify a sample space of an experiment. The sample spaces we choose should match the events we are interested in. We discussed the possibility of having finite, infinite, and un-countably infinite sample spaces. Next we will learn some useful counting techniques. Below are some exercises. The solutions will be in part 3.
Exercises
- Roll two dice. Let E be the event that the first dice rolls 5 or 6. Let F be the event that the sum of the rolls is greater than 7. How many outcomes are in the sample space, in E, and in F?
- Define the following experiment. (1) Randomly select a letter from the alphabet that hasn’t been selected yet (2) If the letter is a vowel (A,E,I,O, or U), then go back to step 1. Otherwise, stop. Describe the sample space of this experiment.
- Find all four infinitely long outcomes from Example 2.1.
- Consider the task of pairing a randomly selected student with a grad school. Your goal is to measure the probability that the pairing is a good fit for both parties. Describe some useful sample spaces for this experiment.
- Randomly choose a person from some population and find their height. Our sample space will consist of some interval of real numbers. What might you say about the probability of any single height in our sample space?