Lesson 21: Bivariate Normal Distributions

Introduction

bathroom scaleLet the random variable Y denote the weight of a randomly selected individual, in pounds. Then, suppose we are interested in determining the probability that a randomly selected individual weighs between 140 and 160 pounds. That is, what is P(140 < Y < 160)?

But, if we think about it, we could imagine that the weight of an individual increases (linearly?) as height increases. If that's the case, in calculating the probability that a randomly selected individual weighs between 140 and 160 pounds, we might find it more informative to first take into account a person's height, say X. That is, we might want to find instead P(140 < Y < 160| X = x). To calculate such a conditional probability, we clearly first need to find the conditional distribution of Y given X = x. That's what we'll do in this lesson, that is, after first making a few assumptions.

First, we'll assume that (1) Y follows a normal distribution, (2) E(Y|x), the conditional mean of Y given x is linear in x, and (3) Var(Y|x), the conditional variance of Y given x is constant. Based on these three stated assumptions, we'll find the conditional distribution of given x

Then, to the three assumptions we've already made, we'll then add the assumption that the random variable X follows a normal distribution, too. Based on the now four stated assumptions, we'll find the joint probability density function of X and Y.

Objectives

Conditional Distribution of Y Given X

Let's start with the assumptions that we stated previously in the introduction to this lesson. That is, let's assume that:

(1) The continuous random variable Y follows a normal distribution for each x.

(2) The conditional mean of Y given x, that is, E(Y|x), is linear in x. Recall that that means, based on our work in the previous lesson, that:

\(E(Y|x)=\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)\)

(3) The conditional variance of Y given x, that is, \(Var(Y|x)=\sigma^2_{Y|X}\) is constant, that is, the same for each x. 

There's a pretty good three-dimensional graph in our textbook depicting these assumptions. A two-dimensional graph with our height and weight example might look something like this:

graph of assumptions

The blue line represents the linear relationship between x and the conditional mean of Y given x. For a given height x, say x1, the red dots are meant to represent possible weights y for that x value. Note that the range of red dots is intentionally the same for each x value. That's because we are assuming that the conditional variance \(\sigma^2_{Y|X}\) is the same for each x. If we were to turn this two-dimensional drawing into a three-dimensional  drawing, we'd want to draw identical looking normal curves over the top of each set of red dots.

So, in summary, our assumptions tell us so far that the conditional distribution of Y given X = x is:

\(Y|x \sim N \left(\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X),\qquad ??\right)\)

If we could just fill in those question marks, that is, find  \(\sigma^2_{Y|X}\), the conditional variance of Y given x, then we could use what we already know about the normal distribution to find conditional probabilities, such as P(140 < Y < 160|X = x). The following theorem does the trick for us.

Theorem. If the conditional distribution of Y given X = x follows a normal distribution with mean \(\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)\) and constant variance \(\sigma^2_{Y|X}\), then the conditional variance is:

\(\sigma^2_{Y|X}=\sigma^2_Y(1-\rho^2)\)

Proof. Because Y is a continuous random variable, we need to use the definition of the conditional variance of Y given X = x for continuous random variables. That is:

\(\sigma^2_{Y|X}=Var(Y|x)=\int_{-\infty}^\infty (y-\mu_{Y|X})^2 h(y|x) dy\)

Now, if we replace the μY|X in the integrand with what we know it to be, that is,  \(E(Y|x)=\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)\), we get:

\(\sigma^2_{Y|X}=\int_{-\infty}^\infty \left[y-\mu_Y-\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)\right]^2 h(y|x) dy\)

Then, multiplying both sides of the equation by fX(x) and integrating over range of x, we get:

\(\int_{-\infty}^\infty \sigma^2_{Y|X} f_X(x)dx=\int_{-\infty}^\infty \int_{-\infty}^\infty \left[y-\mu_Y-\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)\right]^2 h(y|x) f_X(x)dydx\)

Now, on the left side of the equation, since \(\sigma^2_{Y|X}\) is a constant that doesn't depend on x, we we can pull it through the integral. And, you might recognize that the right side of the equation is an (unconditional) expectation, because:

proof statement

After pulling the conditional variance through the integral on the left side of the equation, and rewriting the right side of the equation as an expectation, we have:

\(\sigma^2_{Y|X}\int_{-\infty}^\infty f_X(x)dx=E\left\{\left[(Y-\mu_Y)-\left(\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)\right)\right]^2\right\}\)

Now, by the definition of a valid p.d.f., the integral on the left side of the equation equals 1:

proof statement

And, dealing with the expectation on the right hand side, that is, squaring the term and distributing the expectation, we get:

\(\sigma^2_{Y|X}=E[(Y-\mu_Y)^2]-2\rho \dfrac{\sigma_Y}{\sigma_X}E[(X-\mu_X)(Y-\mu_Y)]+\rho^2\dfrac{\sigma^2_Y}{\sigma^2_X}E[(X-\mu_X)^2]\)

Now, it's just a matter of recognizing various terms on the right-hand side of the equation:

proof statement

That is:

\(\sigma^2_{Y|X}= \sigma^2_Y-2\rho \dfrac{\sigma_Y}{\sigma_X} \rho \sigma_X \sigma_Y +\rho^2\dfrac{\sigma^2_Y}{\sigma^2_X}\sigma^2_X\)

Simplifying yet more, we get:

\(\sigma^2_{Y|X}= \sigma^2_Y-2\rho^2\sigma^2_Y+\rho^2\sigma^2_Y=\sigma^2_Y-\rho^2\sigma^2_Y\)

And, finally, we get:

\(\sigma^2_{Y|X}= \sigma^2_Y(1-\rho^2)\)

as was to be proved!


So, in summary, our assumptions tell us that the conditional distribution of Y given X = x is:

\(Y|X=x\sim N\left(\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(X-\mu_X),\quad \sigma^2_Y(1-\rho^2)\right)\)

Now that we have completely defined the conditional distribution of Y given X = x, we can now use what we already know about the normal distribution to find conditional probabilities, such as P(140 < Y < 160|x). Let's take a look at an example.

Example

taking a testLet X denote the math score on the ACT college entrance exam of a randomly selected student. Let Y denote the verbal score on the ACT college entrance exam of a randomly selected student.  Previous history suggests that:

(1)  X is normally distributed with a mean of 22.7 and a variance of 17.64

(2) Y is normally distributed with a mean of 22.7 and variance of 12.25

(3) The correlation between X and Y is 0.78.

What is the probability that a randomly selected student's verbal ACT score is between 18.5 and 25.5 points?

Solution. Because Y, the verbal ACT score, is assumed to be normally distributed with a mean of 22.7 and a variance of 12.25, calculating the requested probability involves just making a simple normal probability calculation:

ACT unconditional normal distribution

Now converting the Y scores to standard normal Z scores, we get:

\(P(18.5<Y<25.5)=P\left(\dfrac{18.5-22.7}{\sqrt{12.25}} <Z<\dfrac{25.5-22.7}{\sqrt{12.25}}\right)\)

And, simplifying and looking up the probabilities in the standard normal table in the back of your textbook, we get:

\begin{align}
P(18.5<Y<25.5) &= P(-1.20<Z<0.80)\\
&= 0.7881-0.1151=0.6730
\end{align}

That is, the probability that a randomly selected student's verbal ACT score is between 18.5 and 25.5 points is 0.673.

Now, what happens to our probability calculation if we taken into account the student's ACT math score? That is, what is the probability that a randomly selected student's verbal ACT score is between 18.5 and 25.5 given that his or her ACT math score was 23? That is, what is P(18.5 < Y < 25.5| X = 23)?

Solution. Before we can do the probability calculation, we first need to fully define the conditional distribution of Y given X = x:

conditional distribution

Now, if we just plug in the values that we know, we can calculate the conditional mean of Y given X = 23:

\(\mu_{Y|23}=22.7+0.78\left(\dfrac{\sqrt{12.25}}{\sqrt{17.64}}\right)(23-22.7)=22.895\)

and the conditional variance of Y given X = x:

\(\sigma^2_{Y|X}= \sigma^2_Y(1-\rho^2)=12.25(1-0.78^2)=4.7971\)

It is worth noting that \(\sigma^2_{Y|X}\), the conditional variance of Y given X = x, is much smaller than \(\sigma^2_Y\), the unconditional variance of Y (12.25). This should make sense, as we have more information about the student. That is, we should expect the verbal ACT scores of all students to span a greater range than the verbal ACT scores of just those students whose math ACT score was 23.

Now, given that a student's math ACT score is 23, we now know that the student's verbal ACT score, Y, is normally distributed with a mean of 22.895 and a variance of 4.7971. Now, calculating the requested probability again involves just making a simple normal probability calculation:

normal distribution

Converting the Y scores to standard normal Z scores, we get:

\(P(18.5<Y<25.5|X=23)=P\left(\dfrac{18.5-22.895}{\sqrt{4.7971}} <Z<\dfrac{25.5-22.895}{\sqrt{4.7971}}\right)\)

And, simplifying and looking up the probabilities in the standard normal table in the back of your textbook, we get:

\(P(18.5<Y<25.5|X=23)=P(-2.01<Z<1.19)=0.8830-0.0222=0.8608\)

That is, given that a random selected student's math ACT score is 23, the probability that the student's verbal ACT score is between 18.5 and 25.5 points is 0.8608.

Joint P.D.F. of X and Y

We previously assumed that:

(1) Y follows a normal distribution,

(2) E(Y|x), the conditional mean of Y given x is linear in x, and

(3) Var(Y|x), the conditional variance of Y given x is constant.

Based on these three stated assumptions, we found the conditional distribution of given x.  Now, we'll add a fourth assumption, namely that:

(4) X follows a normal distribution.

Based on the four stated assumptions, we will now define the joint probability density function of X and Y. 

Definition. Assume X is normal, so that the p.d.f. of X is:

\(f_X(x)=\dfrac{1}{\sigma_X \sqrt{2\pi}} \text{exp}\left[-\dfrac{(x-\mu_X)^2}{2\sigma^2_X}\right]\)

for −∞ < x < ∞. And, assume that the conditional distribution of Y given X =is normal with conditional mean:

\(E(Y|x)=\mu_Y+\rho \dfrac{\sigma_Y}{\sigma_X}(x-\mu_X)\)

and conditional variance:

\(\sigma^2_{Y|X}= \sigma^2_Y(1-\rho^2)\)

That is, the conditional distribution of Y given X = x is:

\begin{align}
h(y|x)  &= \dfrac{1}{\sigma_{Y|X} \sqrt{2\pi}} \text{exp}\left[-\dfrac{(Y-\mu_{Y|X})^2}{2\sigma^2_{Y|X}}\right]\\
&= \dfrac{1}{\sigma_Y \sqrt{1-\rho^2} \sqrt{2\pi}}\text{exp}\left[-\dfrac{[y-\mu_Y-\rho \dfrac{\sigma_Y}{\sigma_X}(X-\mu_X)]^2}{2\sigma^2_Y(1-\rho^2)}\right],\quad -\infty<x<\infty\\
\end{align}

Therefore, the joint probability density function of X and Y is:

\(f(x,y)=f_X(x) \cdot h(y|x)=\dfrac{1}{2\pi \sigma_X \sigma_Y \sqrt{1-\rho^2}} \text{exp}\left[-\dfrac{q(x,y)}{2}\right]\)

where:

\(q(x,y)=\left(\dfrac{1}{1-\rho^2}\right) \left[\left(\dfrac{X-\mu_X}{\sigma_X}\right)^2-2\rho \left(\dfrac{X-\mu_X}{\sigma_X}\right) \left(\dfrac{Y-\mu_Y}{\sigma_Y}\right)+\left(\dfrac{Y-\mu_Y}{\sigma_Y}\right)^2\right]\)

This joint p.d.f. is called the bivariate normal distribution.

Our textbook has a nice three-dimensional graph of a bivariate normal distribution. You might want to take a look at it to get a feel for the shape of the distribution. Now, let's turn our attention to an important property of the correlation coefficient if X and Y have a bivariate normal distribution.

Theorem. If X and Y have a bivariate normal distribution with correlation coefficient \(\rho_{XY}\), then X and Y are independent if and only if  \(\rho_{XY}=0\). That "if and only if" means:

(1) If X and Y are independent, then \(\rho_{XY}=0\). 

(2) If \(\rho_{XY}=0\), then X and Y are independent. 

Recall that item (1) is always true. We proved it back in the lesson that addresses the correlation coefficient. We also looked at a counterexample i that lesson that illustrated that item (2) was not necessarily true! Well, now we've just learned a situation in which it is true, that is, when and have a bivariate normal distribution. Let's see why item (2) must be true in that case.

Proof. Since we previously proved item (1), our focus here will be in proving item (2). In order to prove that X and Y are independent when X and Y have the bivariate normal distribution and with zero correlation, we need to show that the bivariate normal density function:

\(f(x,y)=f_X(x)\cdot h(y|x)=\dfrac{1}{2\pi \sigma_X \sigma_Y \sqrt{1-\rho^2}} \text{exp}\left[-\dfrac{q(x,y)}{2}\right]\)

factors into the normal p.d.f of X and the normal p.d.f. of Y. Well, when \(\rho_{XY}=0\):

\(q(x,y)=\left(\dfrac{1}{1-0^2}\right) \left[\left(\dfrac{X-\mu_X}{\sigma_X}\right)^2+0+\left(\dfrac{Y-\mu_Y}{\sigma_Y}\right)^2 \right]\)

which simplifies to:

\(q(x,y)=\left(\dfrac{X-\mu_X}{\sigma_X}\right)^2+\left(\dfrac{Y-\mu_Y}{\sigma_Y}\right)^2 \)

Substituting this simplified q(x,y) into the joint p.d.f. of X and Y, and simplifying, we see that f(x,y) does indeed factor into the product of f(x) and f(y):

\begin{align}
f(x,y) &= \dfrac{1}{2\pi \sigma_X \sigma_Y \sqrt{1-\rho^2}} \text{exp}\left[-\dfrac{1}{2}\left(\dfrac{X-\mu_X}{\sigma_X}\right)^2--\dfrac{1}{2}\left(\dfrac{Y-\mu_Y}{\sigma_Y}\right)^2\right]\\
&= \dfrac{1}{\sigma_X \sqrt{2\pi} \sigma_Y \sqrt{2\pi}}\text{exp}\left[-\dfrac{(x-\mu_X)^2}{2\sigma_X^2}\right] \text{exp}\left[-\dfrac{(y-\mu_Y)^2}{2\sigma_Y^2}\right]\\
&= \dfrac{1}{\sigma_X \sqrt{2\pi}}\text{exp}\left[-\dfrac{(x-\mu_X)^2}{2\sigma_X^2}\right]\cdot \dfrac{1}{\sigma_Y \sqrt{2\pi}}\text{exp}\left[-\dfrac{(y-\mu_Y)^2}{2\sigma_Y^2}\right]\\
&=f_X(x)\cdot f_Y(y)\\
\end{align}

Because we have shown that:

\(f(x,y)=f_X(x)\cdot f_Y(y)\)

we can conclude, by the definition of independence, that X and Y are independent. Our proof is complete.