Lesson 7: Discrete Random Variables

Introduction

In this lesson, we'll learn about general discrete random variables and general discrete probability distributions. Then, we'll investigate one particular probability distribution called the hypergeometric distribution.

Objectives

Discrete Random Variables

Example

Beaver StadiumSelect three fans randomly at a football game in which Penn State is playing Notre Dame. Identify whether the fan is a Penn State fan (P) or a Notre Dame fan (N). This experiment yields the following sample space:

S = {PPP, PPN, PNP, NPP, NNP, NPN, PNN, NNN}

Let X = the number of Penn State fans selected. The possible values of X are, therefore, either 0, 1, 2, or 3. Now, we could find probabilities of individual events, P(PPP) or P(PPN), for example. Alternatively, we could find P(X = x), the probability that X takes on a particular value x. Let's do that!

Since the game is a home game, let's suppose that 80% of the fans attending the game are Penn State fans, while 20% are Notre Dame fans. That is, P(P) = 0.8 and P(N) = 0.2. Then, by independence:

P(X = 0) = P(NNN) = 0.2 × 0.2 × 0.2 = 0.008

And, by independence and mutual exclusivity of NNP, NPN, and PNN:

P(X = 1) = P(NNP) + P(NPN) + P(PNN) = 3 × 0.2 × 0.2 × 0.8 = 0.096

Likewise, by independence and mutual exclusivity of PPN, PNP, and NPP:

P(X = 2) = P(PPN) + P(PNP) + P(NPP) = 3 × 0.8 × 0.8 × 0.2 = 0.384

Finally, by independence:

P(X = 3) = P(PPP) = 0.8 × 0.8 × 0.8 = 0.512

There are a few things to note here:

  • The results make sense! Given that 80% of the fans in the stands are Penn State fans, it shouldn't seem surprising that we would be most likely to select 2 or 3 Penn State fans.
  • The probabilities behave well in that (1) the probabilities are all greater than 0, that is, P(X = x) > 0 and (2) the probability of the sample space is 1, that is, P(S) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) = 1.
  • Because the values that it takes on are random, the variable X has a special name. It is called a random variable!  Ta-daaaa!

Let's give a formal definition of a random variable.

Definition. Given a random experiment with sample space S, a random variable X is a set function that assigns one and only one real number to each element s that belongs in the sample space S

The set of all possible values of the random variable X, denoted x, is called the support, or space, of X. 

Note that the capital letters at the end of the alphabet, such as W, X, Y, and Z typically represent the definition of the random variable. The corresponding lowercase letters, such as w, x, y, and z, represent the random variable's possible values.

Example

Rats in a cageA rat is selected at random from a cage of male (M) and female rats (F). Once selected, the gender of the selected rat is noted. The sample space is thus:

S = {M, F}

Define the random variable X as follows:

Note that the random variable X assigns one and only one real number (0 and 1) to each element of the sample space (M and F). The support, or space, of X is {0, 1}.

Note that we don't necessarily need to use the numbers 0 and 1 as the support. For example, we could have alternatively (and perhaps arbitrarily?!) used the numbers 5 and 15, respectively. In that case, our random variable would be defined as X = 5 of the rat is male, and X = 15 if the rat is female.

Example

Roulette wheelA roulette wheel has 38 numbers on it: a zero (0), a double zero (00), and the numbers 1, 2, 3, ..., 36. Spin the wheel until the pointer lands on number 36. One possibility is that the wheel lands on 36 on the first spin.  Another possibility is that the wheel lands on 0 on the first spin, and 36 on the second spin.  Yet another possibility is that the wheel lands on 0 on the first spin, 7 on the second spin, and 36 on the third spin. The sample space must list all of the countably infinite (!) number of possible sequences. That is, the sample space looks like this:

S = {36, 0-36, 00-36, 1-36, ... 35-36, 0-0-36, 0-1-36, ...}

If we define the random variable X to equal the number of spins until the wheel lands on 36, then the support of X is {0, 1, 2, 3, ....}.

Note that in the rat example, there were a finite (two, to be exact) number of possible outcomes, while in the roulette example, there were a countably infinite number of possible outcomes. This leads us to the following formal definition.

Definition. A random variable X is a discrete random variable if:

  • there are a finite number of possible outcomes of X, or
  • there are a countably infinite number of possible outcomes of X.

Recall that a countably infinite number of possible outcomes means that there is a one-to-one correspondence between the outcomes and the set of integers. No such one-to-one correspondence exists for an uncountably infinite number of possible outcomes.

As you might have guessed by its name, we will be studying discrete random variables and their probability distributions throughout Section 2.

Probability Mass Functions

The probability that a discrete random variable X takes on a particular value x, that is, P(X = x), is frequently denoted f(x). The function f(x) is typically called the probability mass function, although some authors also refer to it as the probability function, the frequency function, or probability density function.  We will use the common terminology — the probability mass function — and its common abbreviation —the p.m.f.

Definition. The probability mass function, P(X = x) = f(x), of a discrete random variable X is a function that satisfies the following properties:

(1) P(X = x) = f(x) > 0  if x ∈ the support S

(2)  \(\sum\limits_{x\in S} f(x)=1\)

(3) \(P(X\in A)=\sum\limits_{x\in A} f(x)\)

Item #1 basically says that, for every element x in the support S, all of the probabilities must be positive. Note that if x does not belong in the support S, then f(x) = 0. Item #2 basically says that if you add up the probabilities for all of the possible x values in the support S, then the sum must equal 1. And, item #3 says to determine the probability associated with the event A, you just sum up the probabilities of the x values in A.

Since f(x) is a function, it can be presented:

Let's take a look at a few examples.

Example

Let X equal the number of siblings of Penn State students. The support of X is, of course, 0, 1, 2, 3, ... Because the support contains a countably infinite number of possible values, X is a discrete random variable with a probability mass function. Find f(x) = P(X = x), the probability mass function of X, for all x in the support.

This example illustrated the tabular and graphical forms of a p.m.f. Now let's take a look at an example of a p.m.f. in functional form.

Example

Let f(x) = cx2 for x = 1, 2, 3. Determine the constant c so that the function f(x) satisfies the conditions of being a probability mass function.

Solution. The key to finding c is to use item #2 in the definition of a p.m.f.

The support in this example is finite. Let's take a look at an example in which the support is countably infinite.

Example

Determine the constant c so that the following p.m.f. of the random variable Y is a valid probability mass function: 

\(f(y)=c\left(\dfrac{1}{4}\right)^y\) for y = 1, 2, 3, ...

Solution. Again, the key to finding c is to use item #2 in the definition of a p.m.f.

Hypergeometric Distribution

Box of light bulbsExample

A crate contains 50 light bulbs of which 5 are defective and 45 are not. A Quality Control Inspector randomly samples 4 bulbs without replacement.  Let X = the number of defective bulbs selected. Find the probability mass function, f(x), of the discrete random variable X.

This example is an example of a random variable X following what is called the hypergeometric distribution. Let's generalize our findings.

Definition. If we randomly select n items without replacement from a set of N items of which:

  • m of the items are of one type 
  • and m of the items are of a second type

then the probability mass function of the discrete random variable X is called the hypergeometric distribution and is of the form:

\(P(X=x)=f(x)=\dfrac{\dbinom{m}{x} \dbinom{N-m}{n-x}}{\dbinom{N}{n}}\)

where the support S is the collection of nonnegative integers x that satisfies the inequalities:

  • xn
  • x ≤ m
  • nx ≤ Nm

Note that one of the key features of the hypergeometric distribution is that it is associated with sampling without replacement. We will see later, in Lesson 9, that when the samples are drawn with replacement, the discrete random variable X follows what is called the binomial distribution.  

More Examples

fish in lakeExample

A lake contains 600 fish, eighty (80) of which have been tagged by scientists. A researcher randomly catches 15 fish from the lake. Find a formula for the probability mass function of X, the number of fish in the researcher's sample which are tagged.

Solution. This problem is very similar to the example on the previous page in which we were interested in finding the p.m.f. of X, the number of defective bulbs selected in a sample of 4 bulbs. Here, we are interested in finding X, the number of tagged fish selected in a sample of 15 fish. That is, X is a hypergeometric random variable with m = 80, N = 600, and n = 15. Therefore, the p.m.f. of X is: 

formula

for the support x = 0, 1, 2, ..., 15.

Example

Let the random variable X denote the number of aces in a five-card hand dealt from a standard 52-card deck. Find a formula for the probability mass function of X.

Solution. The random variable X here also follows the hypergeometric distribution. Here, there are N = 52 total cards, n = 5 cards sampled, and m = 4 aces. Therefore, the p.m.f. of X is:

\(f(x)=\dfrac{\dbinom{4}{x} \dbinom{48}{5-x}}{\dbinom{52}{5}}\)

for the support x = 0, 1, 2, 3, and 4.

waiting in lineExample

Suppose that 5 people, including you and a friend, line up at random. Let the random variable X denote the number of people standing between you and a friend. Determine the probability mass function of in tabular form. Also, verify that the p.m.f. is a valid p.m.f.