In this Chapter, we investigate the probability distributions of continuous random variables that are so important to the field of statistics that they are given special names. They are:
In the previous lesson, we investigated the probability distribution of the waiting time, X, until the first event of an approximate Poisson process occurs. We learned that the probability distribution of X is the exponential distribution with mean θ = 1/λ. In this lesson, we investigate the waiting time, W, until the α^{th} (that is, "alpha"-th) event occurs. As we'll soon learn, that distribution is known as the gamma distribution. After investigating the gamma distribution, we'll take a look at a special case of the gamma distribution, a distribution known as the chi-square distribution.
Suppose X, following an (approximate) Poisson process, equals the number of customers arriving at a bank in an interval of length 1. If λ, the mean number of customers arriving in an interval of length 1, is 6, say, then we might observe something like this:
In this particular representation, seven (7) customers arrived in the unit interval. Previously, our focus would have been on the discrete random variable X, the number of customers arriving. As the picture suggests, however, we could alternatively be interested in the continuous random variable W, the waiting time until the first customer arrives. Let's push this a bit further to see if we can find F(w), the cumulative distribution function of W:
Now, to find the probability density function f(w), all we need to do is differentiate F(w). Doing so, we get:
\(f(w)=F'(w)=-e^{-\lambda w}(-\lambda)=\lambda e^{-\lambda w}\)
for 0 < w < ∞. Typically, though we "reparameterize" before defining the "official" probability density function. If λ (the Greek letter "lambda") equals the mean number of events in an interval, and θ (the Greek letter "theta") equals the mean waiting time until the first customer arrives, then:
\(\theta=\dfrac{1}{\lambda}\) and \(\lambda=\dfrac{1}{\theta}\)
For example, suppose the mean number of customers to arrive at a bank in a 1-hour interval is 10. Then, the average (waiting) time until the first customer is 1/10 of an hour, or 6 minutes.
Let's now formally define the probability density function we have just derived.
Definition. The continuous random variable X follows an exponential distribution if its probability density function is: \(f(x)=\dfrac{1}{\theta} e^{-x/\theta}\) for θ > 0 and x ≥ 0. |
Because there are an infinite number of possible constants θ, there are an infinite number of possible exponential distributions. That's why this page is called Exponential Distributions (with an s!) and not Exponential Distribution (with no s!).
Here, we present and prove four key properties of an exponential random variable.
Theorem. The exponential probability density function: \(f(x)=\dfrac{1}{\theta} e^{-x/\theta}\) for x ≥ 0 and θ > 0 is a valid probability density function. |
Proof.
Theorem. The moment generating function of an exponential random variable X with parameter θ is: \(M(t)=\dfrac{1}{1-\theta t}\) for t < 1/θ. |
Proof. The moment generating function is by definition:
\(M(t)=E(e^{tX})=\int_0^\infty e^{tx} \left(\dfrac{1}{\theta}\right) e^{-x/\theta} dx\)
Simplifying and rewriting the integral as a limit, we have:
\(M(t)=\dfrac{1}{\theta}\lim\limits_{b \to \infty} \int_0^b e^{x(t-1/\theta)} dx\)
Integrating, we have:
\(M(t)=\dfrac{1}{\theta}\lim\limits_{b \to \infty} \left[ \dfrac{1}{t-1/\theta} e^{x(t-1/\theta)} \right]^{x=b}_{x=0}\)
Evaluating at x = 0 and x = b, we have:
\(M(t)=\dfrac{1}{\theta}\lim\limits_{b \to \infty} \left[ \dfrac{1}{t-1/\theta} e^{b(t-1/\theta)} - \dfrac{1}{t-1/\theta} \right]=\dfrac{1}{\theta}\lim\limits_{b \to \infty} \left\{ \left(\dfrac{1}{t-1/\theta}\right) e^{b(t-1/\theta)} \right\}-\dfrac{1}{t-1/\theta}\)
Now, the limit approaches 0 provided t − 1/θ < 0, that is, provided t < 1/θ, and so we have:
\(M(t)=\dfrac{1}{\theta} \left(0-\dfrac{1}{t-1/\theta}\right)\)
Simplifying more:
\(M(t)=\dfrac{1}{\theta} \left(-\dfrac{1}{\dfrac{\theta t-1}{\theta}}\right)=\dfrac{1}{\theta}\left(-\dfrac{\theta}{\theta t-1}\right)=-\dfrac{1}{\theta t-1}\)
and finally:
\(M(t)=\dfrac{1}{1-\theta t}\)
provided t < 1/θ, as was to be proved.
Theorem. The mean of an exponential random variable X with parameter θ is: \(\mu=E(X)=\theta\) |
Proof.
Theorem. The variance of an exponential random variable X with parameter θ is: \(\sigma^2=Var(X)=\theta^2\) |
Proof.
Students arrive at a local bar and restaurant according to an approximate Poisson process at a mean rate of 30 students per hour. What is the probability that the bouncer has to wait more than 3 minutes to card the next student?
Solution. If we let X equal the number of students, then the Poisson mean λ is 30 students per 60 minutes, or 1/2 student per minute! Now, if we let W denote the (waiting) time between students, we can expect that there would be, on average, θ = 1/λ = 2 minutes between arriving students. Because W is (assumed to be) exponentially distributed with mean θ = 2, its probability density function is:
\(f(w)=\dfrac{1}{2} e^{-w/2}\)
for w ≥ 0. Now, we just need to find the area under the curve, and greater than 3, to find the desired probability:
The number of miles that a particular car can run before its battery wears out is exponentially distributed with an average of 10,000 miles. The owner of the car needs to take a 5000-mile trip. What is the probability that he will be able to complete the trip without having to replace the car battery?
Solution. At first glance, it might seem that a vital piece of information is missing. It seems that we should need to know how many miles the battery in question already has on it before we can answer the question! Hmmm.... or do we? Well, let's let X denote the number of miles that the car can run before its battery wears out. Now, suppose the following is true:
\(P(X>x+y|X>x)=P(X>y)\)
If it is true, it would tell us that the probability that the car battery wears out in more than y = 5000 miles doesn't matter if the car battery was already running for x = 0 miles or x = 1000 miles or x = 15000 miles. Now, we are given that X is exponentially distributed. It turns out that the above statement is true for the exponential distribution (you will be asked to prove it for homework)! It is for this reason that we say that the exponential distribution is "memoryless."
It can also be shown (do you want to show that one too?) that if X is exponentially distributed with mean θ, then:
\(P(X>k)=e^{-k/\theta}\)
Therefore, the probability in question is simply:
\(P(X>5000)=e^{-5000/10000}=e^{-1/2}\approx 0.604\)
We'll leave it to the gentleman in question to decide whether that probability is large enough to give him comfort that he won't be stranded somewhere along a remote desert highway!
In the previous lesson, we learned that in an approximate Poisson process with mean λ, the waiting time X until the first event occurs follows an exponential distribution with mean θ = 1/λ. We now let W denote the waiting time until the α^{th} event occurs and find the distribution of W. We could represent the situation as follows:
Just as we did in our work with deriving the exponential distribution, our strategy here is going to be to first find the cumulative distribution function F(w) and then differentiate it to get the probability density function f(w). Now, for w > 0 and λ > 0, the definition of the cumulative distribution function gives us:
F(w) = P(W ≤ w)
The rule of complementary events tells us then that:
F(w) = 1 − P(W > w)
Now, the waiting time W is greater than some value w only if there are fewer than α events in the interval [0,w]. That is:
F(w) = 1 − P(fewer than α events in [0,w])
A more specific way of writing that is:
F(w) = 1 − P(0 events or 1 event or … or (α−1) events in [0,w])
Those mutually exclusive "ors" mean that we need to add up the probabilities of having 0 events occurring in the interval [0,w], 1 event occurring in the interval [0,w], ..., up to (α−1) events in [0,w]. Well, that just involves using the probability mass function of a Poisson random variable with mean λw. That is:
\(F(w)=1-\sum\limits_{k=0}^{\alpha-1} \dfrac{(\lambda w)^k e^{-\lambda w}}{k!}\)
Now, we could leave F(w) well enough alone and begin the process of differentiating it, but it turns out that the differentiation goes much smoother if we rewrite F(w) as follows:
\(F(w)=1-e^{-\lambda w}-\sum\limits_{k=1}^{\alpha-1} \dfrac{1}{k!} \left[(\lambda w)^k e^{-\lambda w}\right]\)
As you can see, we merely pulled the k = 0 out of the summation and rewrote the probability mass function so that it would be easier to administer the product rule for differentiation.
Now, let's do that differentiation! We need to differentiate F(w) with respect to w to get the probability density function f(w). Using the product rule, and what we know about the derivative of e^{−λw} and (λw)^{k}, we get:
\(f(w)=F'(w)=\lambda e^{-\lambda w} -\sum\limits_{k=1}^{\alpha-1} \dfrac{1}{k!} \left[(\lambda w)^k \cdot (-\lambda e^{-\lambda w})+ e^{-\lambda w} \cdot k(\lambda w)^{k-1} \cdot \lambda \right]\)
Pulling λe^{−λw} out of the summation, and dividing k by k! (to get 1/(k−1)!) in the second term in the summation, we get that f(w) equals:
\(=\lambda e^{-\lambda w}+\lambda e^{-\lambda w}\left[\sum\limits_{k=1}^{\alpha-1} \left\{ \dfrac{(\lambda w)^k}{k!}-\dfrac{(\lambda w)^{k-1}}{(k-1)!} \right\}\right]\)
Evaluating the terms in the summation at k =1, k = 2, up to k = α−1, we get that f(w) equals:
\(=\lambda e^{-\lambda w}+\lambda e^{-\lambda w}\left[(\lambda w-1)+\left(\dfrac{(\lambda w)^2}{2!}-\lambda w\right)+\cdots+\left(\dfrac{(\lambda w)^{\alpha-1}}{(\alpha-1)!}-\dfrac{(\lambda w)^{\alpha-2}}{(\alpha-2)!}\right)\right]\)
Do some (lots of!) crossing out (λw − λw = 0, for example), and a bit more simplifying to get that f(w) equals:
\(=\lambda e^{-\lambda w}+\lambda e^{-\lambda w}\left[-1+\dfrac{(\lambda w)^{\alpha-1}}{(\alpha-1)!}\right]=\lambda e^{-\lambda w}-\lambda e^{-\lambda w}+\dfrac{\lambda e^{-\lambda w} (\lambda w)^{\alpha-1}}{(\alpha-1)!}\)
And since λe^{−λw} − λe^{−λw} = 0, we get that f(w) equals:
\(=\dfrac{\lambda e^{-\lambda w} (\lambda w)^{\alpha-1}}{(\alpha-1)!}\)
Are we there yet? Almost! We just need to reparameterize (if θ = 1/λ, then λ = 1/θ). Doing so, we get that the probability density function of W, the waiting time until the α^{th} event occurs, is:
\(f(w)=\dfrac{1}{(\alpha-1)! \theta^\alpha} e^{-w/\theta} w^{\alpha-1}\)
for w > 0, θ > 0, and α > 0.
Note that, as usual, there are an infinite number of possible gamma distributions because there are an infinite number of possible θ and α values. That's, again, why this page is called Gamma Distributions (with an s) and not Gamma Distribution (with no s). Because each gamma distribution depends on the value of θ and α, it shouldn't be surprising that the shape of the probability distribution changes as θ and α change.
Recall that θ is the mean waiting time until the first event, and α is the number of events for which you are waiting to occur. It makes sense then that for fixed α, as θ increases, the probability "moves to the right," as illustrated here with α fixed at 3, and θ increasing from 1 to 2 to 3:
The plots illustrate, for example, that if we are waiting for α = 3 events to occur, we have a greater probability of our waiting time X being large if our mean waiting time until the first event is large (θ = 3, say) than if it is small (θ = 1, say).
It also makes sense that for fixed θ, as α increases, the probability "moves to the right," as illustrated here with θ fixed at 3, and α increasing from 1 to 2 to 3:
The plots illustrate, for example, that if the mean waiting time until the first event is θ = 3, then we have a greater probability of our waiting time X being large if we are waiting for more events to occur (α = 3, say) than fewer (α = 1, say).
Definition. The gamma function, denoted Γ(t), is defined, for t > 0, by: \(\Gamma(t)=\int_0^\infty y^{t-1} e^{-y} dy\) |
We'll primarily use the definition in order to help us prove the two theorems that follow.
Theorem. Provided t > 1: \(\Gamma(t)=(t-1) \times \Gamma(t-1) \) |
Proof. We'll use integration by parts with:
\(u=y^{t-1}\) and \(dv=e^{-y}dy\)
to get:
\(du=(t-1)y^{t-2}\) and \(v=-e^{-y}\)
Then, the integration by parts gives us:
\(\Gamma(t)=\lim\limits_{b \to \infty} \left[-y^{t-1}e^{-y}\right]^{y=b}_{y=0} + (t-1)\int_0^\infty y^{t-2}e^{-y}dy\)
Evaluating at y = b and y = 0 for the first term, and using the definition of the gamma function (provided t − 1 > 0) for the second term, we have:
\(\Gamma(t)=-\lim\limits_{b \to \infty} \left[\dfrac{b^{t-1}}{e^b}\right]+(t-1)\Gamma(t-1)\)
Now, if we were to be lazy, we would just wave our hands, and say that the first term goes to 0, and therefore:
\(\Gamma(t)=(t-1) \times \Gamma(t-1)\)
provided t > 1, as was to be proved.
Let's not be too lazy though! Taking the limit as b goes to infinity for that first term, we get infinity over infinity. Ugh! Maybe we should have left well enough alone! We can take the exponent and the natural log of the numerator without changing the limit. Doing so, we get:
\(-\lim\limits_{b \to \infty} \left[\dfrac{b^{t-1}}{e^b}\right] =-\lim\limits_{b \to \infty} \left\{\dfrac{\text{exp}[(t-1) \ln b]}{\text{exp}(b)}\right\}\)
Then, because both the numerator and denominator are exponents, we can write the limit as:
\(-\lim\limits_{b \to \infty} \left[\dfrac{b^{t-1}}{e^b}\right] =-\lim\limits_{b \to \infty}\{\text{exp}[(t-1) \ln b-b]\}\)
Manipulating the limit a bit more, so that we can easily apply L'Hôpital's Rule, we get:
\(-\lim\limits_{b \to \infty} \left[\dfrac{b^{t-1}}{e^b}\right] =-\lim\limits_{b \to \infty} \left\{\text{exp}\left[(t-1)b\left(\dfrac{ \ln b}{b}-1\right)\right]\right\}\)
Now, let's take the limit as b goes to infinity:
Okay, our proof is now officially complete! We have shown what we set out to show. Maybe next time, I'll just wave my hands when I need a limit to go to 0.
Theorem. If t = n, a positive integer, then: \(\Gamma(n)=(n-1)!\) |
Proof. Using the previous theorem:
\begin{align}
\Gamma(n) &= (n-1)\Gamma(n-1)\\
&= (n-1)(n-2)\Gamma(n-2)\\
&= (n-1)(n-2)(n-3)\cdots (2)(1)\Gamma(1)
\end{align}
And, since by the definition of the gamma function:
\(\Gamma(1)=\int_0^\infty y^{1-1}e^{-y} dy=\int_0^\infty e^{-y} dy=1\)
we have:
\(\Gamma(n)=(n-1)!\)
as was to be proved.
Here, after formally defining the gamma distribution (we haven't done that yet?!), we present and prove (well, sort of!) three key properties of the gamma distribution.
Definition. A continuous random variable X follows a gamma distribution with parameters θ > 0 and α > 0 if its probability density function is: \(f(x)=\dfrac{1}{\Gamma(\alpha)\theta^\alpha} x^{\alpha-1} e^{-x/\theta}\) for x > 0. |
Before we get to the three theorems and proofs, two notes:
1) We consider α > 0 a positive integer if the derivation of the p.d.f. is motivated by waiting times until α events. But the p.d.f. is actually a valid p.d.f. for any α > 0 (since Γ(α) is defined for all positive α).
2) The gamma p.d.f. reaffirms that the exponential distribution is just a special case of the gamma distribution. That is, when you put α =1 into the gamma p.d.f., you get the exponential p.d.f.
Theorem. The moment generating function of a gamma random variable is: \(M(t)=\dfrac{1}{(1-\theta t)^\alpha}\) for t < 1/θ. |
Proof. By definition, the moment generating function M(t) of a gamma random variable is:
\(M(t)=E(e^{tX})=\int_0^\infty \dfrac{1}{\Gamma(\alpha)\theta^\alpha}e^{-x/\theta} x^{\alpha-1} e^{tx}dx\)
Collecting like terms, we get:
\(M(t)=E(e^{tX})=\int_0^\infty \dfrac{1}{\Gamma(\alpha)\theta^\alpha}e^{-x\left(\frac{1}{\theta}-t\right)} x^{\alpha-1} dx\)
Now, let's use the change of variable technique with:
\(y=x\left(\dfrac{1}{\theta}-t\right)\)
Rearranging, we get:
\(x=\dfrac{\theta}{1-\theta t}y\) and therefore \(dx=\dfrac{\theta}{1-\theta t}dy\)
Now, making the substitutions for x and dx into our integral, we get:
Theorem. The mean of a gamma random variable is: \(\mu=E(X)=\alpha \theta\) |
Proof. The proof is left for you as an exercise.
Theorem. The variance of a gamma random variable is: \(\sigma^2=Var(X)=\alpha \theta^2\) |
Proof. This proof is also left for you as an exercise.
Engineers designing the next generation of space shuttles plan to include two fuel pumps —one active, the other in reserve. If the primary pump malfunctions, the second is automatically brought on line. Suppose a typical mission is expected to require that fuel be pumped for at most 50 hours. According to the manufacturer's specifications, pumps are expected to fail once every 100 hours. What are the chances that such a fuel pump system would not remain functioning for the full 50 hours?
Solution. We are given that λ, the average number of failures in a 100-hour interval is 1. Therefore, θ, the mean waiting time until the first failure is 1/λ, or 100 hours. Knowing that, let's now let Y denote the time elapsed until the α = 2nd pump breaks down. Assuming the failures follow a Poisson process, the probability density function of Y is:
\(f_Y(y)=\dfrac{1}{100^2 \Gamma(2)}e^{-y/100} y^{2-1}=\dfrac{1}{10000}ye^{-y/100} \)
for y > 0. Therefore, the probability that the system fails to last for 50 hours is:
\(P(Y<50)=\int^{50}_0 \dfrac{1}{10000}ye^{-y/100} dy\)
Integrating that baby is going to require integration by parts. Let's let:
\(u=y\) and \(dv=e^{-y/100} \)
So that:
\(du=dy\) and \(v=-100e^{-y/100} \)
Now, for the integration:
Chi-squared distributions are very important distributions in the field of statistics. As such, if you go on to take the sequel course, Stat 415, you will encounter the chi-squared distributions quite regularly. In this course, we'll focus just on introducing the basics of the distributions to you. In Stat 415, you'll see its many applications.
As it turns out, the chi-square distribution is just a special case of the gamma distribution! Let's take a look.
Definition. Let X follow a gamma distribution with θ = 2 and α = r/2, where r is a positive integer. Then the probability density function of X is: \(f(x)=\dfrac{1}{\Gamma (r/2) 2^{r/2}}x^{r/2-1}e^{-x/2}\) for x > 0. We say that X follows a chi-square distribution with r degrees of freedom, denoted χ^{2}(r) and read "chi-square-r." |
There are, of course, an infinite number of possible values for r, the degrees of freedom. Therefore, there are an infinite number of possible chi-square distributions. That is why (again!) the title of this page is called Chi-Square Distributions (with an s!), rather than Chi-Square Distribution (with no s)!
As the following theorems illustrate, the moment generating function, mean and variance of the chi-square distributions are just straightforward extensions of those for the gamma distributions.
Theorem. Let X be a chi-square random variable with r degrees of freedom. Then, the moment generating function of X is: \(M(t)=\dfrac{1}{(1-2t)^{r/2}}\) for t < ½. |
Proof. The moment generating function of a gamma random variable is:
\(M(t)=\dfrac{1}{(1-\theta t)^\alpha}\)
The proof is therefore straightforward by substituting 2 in for θ and r/2 in for α.
Theorem. Let X be a chi-square random variable with r degrees of freedom. Then, the mean of X is: \(\mu=E(X)=r\) That is, the mean of X is the number of degrees of freedom. |
Proof. The mean of a gamma random variable is:
\(\mu=E(X)=\alpha \theta\)
The proof is again straightforward by substituting 2 in for θ and r/2 in for α.
Theorem. Let X be a chi-square random variable with r degrees of freedom. Then, the variance of X is: \(\sigma^2=Var(X)=2r\) That is, the variance of X is the number of degrees of freedom. |
Proof. The variance of a gamma random variable is:
\(\sigma^2=Var(X)=\alpha \theta^2\)
The proof is again straightforward by substituting 2 in for θ and r/2 in for α.
One of the primary ways that you will find yourself interacting with the chi-square distribution, primarily later in Stat 415, is by needing to know either a chi-square value or a chi-square probability in order to complete a statistical analysis. For that reason, we'll now explore how to use a typical chi-square table to look up chi-square values and/or chi-square probabilities. Let's start with two definitions.
Definition. Let α be some probability between 0 and 1 (most often, a small probability less than 0.10). The upper 100α^{th} percentile of a chi-square distribution with r degrees of freedom is the value \(\chi^2_\alpha (r)\) such that the area under the curve and to the right of \(\chi^2_\alpha (r)\) is α: |
The above definition is used, as is the one that follows, in Table IV, the chi-square distribution table in the back of your textbook.
Definition. Let α be some probability between 0 and 1 (most often, a small probability less than 0.10). The 100α^{th} percentile of a chi-square distribution with r degrees of freedom is the value \(\chi^2_{1-\alpha} (r)\) such that the area under the curve and to the right of \(\chi^2_{1-\alpha} (r)\) is 1−α: |
With these definitions behind us, let's now take a look at the chi-square table in the back of your textbook.
In summary, here are the steps you should use in using the chi-square table to find a chi-square value:
Now, at least theoretically, you could also use the chi-square table to find the probability associated with a particular chi-square value. But, as you can see, the table is pretty limited in that direction. For example, if you have a chi-square random variable with 5 degrees of freedom, you could only find the probabilities associated with the chi-square values of 0.554, 0.831, 1.145, 1.610, 9.236, 11.07, 12.83, and 15.09:
What would you do if you wanted to find the probability that a chi-square random variable with 5 degrees of freedom was less than 6.2, say? Well, the answer is, of course... statistical software, such as SAS or Minitab! For what we'll be doing in Stat 414 and 415, the chi-square table will (mostly) serve our purpose. Let's get a bit more practice now using the chi-square table.
Let X be a chi-square random variable with 10 degrees of freedom. What is the upper fifth percentile?
Solution. The upper fifth percentile is the chi-square value x such that the probability to the right of x is 0.05, and therefore the probability to the left of x is 0.95. To find x using the chi-square table, we:
Now, all we need to do is read the chi-square value where the r = 10 row and the P(X ≤ x) = 0.95 column intersect. What do you get?
The table tells us that the upper fifth percentile of a chi-square random variable with 10 degrees of freedom is 18.31.
What is the tenth percentile?
Solution. The tenth percentile is the chi-square value x such that the probability to the left of x is 0.10. To find x using the chi-square table, we:
Now, all we need to do is read the chi-square value where the r = 10 row and the P(X ≤ x) = 0.10 column intersect. What do you get?
The table tells us that the tenth percentile of a chi-square random variable with 10 degrees of freedom is 4.865.
What is the probability that a chi-square random variable with 10 degrees of freedom is greater than 15.99?
Solution. There I go... just a minute ago, I said that the chi-square table isn't very helpful in finding probabilities, then I turn around and ask you to use the table to find a probability! Doing it at least once helps us make sure that we fully understand the table. In this case, we are going to need to read the table "backwards." To find the probability, we:
What do you get?
The table tells us that the probability that a chi-square random variable with 10 degrees of freedom is less than 15.99 is 0.90. Therefore, the probability that a chi-square random variable with 10 degrees of freedom is greater than 15.99 is 1−0.90, or 0.10.