Lesson 10: Discriminant Analysis

Introduction

Discriminant analysis is a classification problem, where two or more groups or clusters or populations are known a priori and one or more new observations are classified into one of the known populations based on the measured characteristics. Let us look at three different examples.

Example 1 - Swiss Bank Notes:

We have two populations of bank notes, genuine, and counterfeit. Six measures are taken on each note:

Take a bank note of unknown origin and determine just from these six measurements whether or not it is real or counterfeit. Perhaps this is not as impractical as it might sound. A more modern equivalent is a scanner that would measure the notes automatically and makes a decision.

Example 2 - Pottery Data:

Pottery shards are sampled from four sites: L) Llanedyrn, C) Caldicot, I) Ilse Thornes, and A) Ashley Rails and the concentrations of the following chemical constituents were measured at a laboratory

An archaeologist encounters a pottery specimen of unknown origin. To determine possible trade routes, the archaeologist may wish to classify its site of origin.

Example 3 - Insect Data:

Data were collected on two species of insects in the genus Chaetocnema, (a) Ch. concinna and (b) Ch. heikertlingeri. Three variables were measured on each insect:

Our objective is to obtain a classification rule for identifying the insect species based on these three variables. An entomologist can identify these two closely related species, but the differences are so subtle that one has to have considerable experience to be able to tell the difference. If a classification rule may be developed, then this might be a more accurate way to help differentiate between these two different species.

Learning objectives & outcomes

Upon completion of this lesson, you should be able to do the following:

10.1 - Bayes Rule and Classification Problem

Bayes’ Rule

Consider any two events A and B. To find P(B|A), the probability that B occurs given that A has occurred, Bayes’ Rule states the following:

\[P(B|A) = \frac{P(A \text{ and } B)}{P(A)}\]

This says that the conditional probability is the probability that both A and B occur divided by the unconditional probability that A occurs. This is a simple algebraic restatement of a rule for finding the probability that two events occur together, which is P(A and B) = P(A)P(B|A).

Bayes’ Rule Applied to the Classification Problem

We are interested in Pi | x), the conditional probability that an observation came from population πi given that the observed values of the multivariate vector of variables x. We will classify an observation to the population for which the value of Pi | x) is greatest. This is the most probable group given the observed values of x.

Technical Note: We have to be careful about the word probability in conjunction with our observed vector x. A probability density function for continuous variables does not give a probability, but instead gives a measure of “likelihood.”

Using the notation of Bayes’ Rule above, event A = observing the vector x and event B = observation came from population πi . Thus our probability of interest can be found as

\[P(\text{ member of } \pi_i | \text{ we observed } \mathbf{x}) = \frac{P(\text{ member of } \pi_i \text{ and we observe } \mathbf{x})}{P(\text{ we observe } \mathbf{x})}\]

Thus the posterior probability that an observation is a member of population πi is

\[p(\pi_i|\mathbf{x}) = \frac{p_i f(\mathbf{x}|\pi_i)}{\sum_{j=1}^{g}p_j f(\mathbf{x}|\pi_j)}\]

The classification rule is to assign observation x to the population for which the posterior probability is the greatest.

The denominator is the same for all posterior probabilities (for the various populations) so it is equivalent to say that we will classify an observation to the population for which pi f (x | πi ) is greatest.

Two Populations

With only two populations we can express a classification rule in terms of the ratio of the two posterior probabilities. Specifically we would classify to population 1 when

\[\frac{p_1 f(\mathbf{x}|\pi_1)}{p_2 f(\mathbf{x}|\pi_2)} > 1\]

This can be rewritten to say the we classify to population 1 when

\[\frac{ f(\mathbf{x}|\pi_1)}{ f(\mathbf{x}|\pi_2)} > \frac{p_2}{p_1}\]

Decision Rule

We are going to classify the sample unit or subject into the population πi that maximizes the posterior probability p(πi). that is the population that maximizes

\(f(\mathbf{x|\pi_i})p_i\)

We are going to calculate the posterior probabilities for each of the populations. Then we are going to assign the subject or sample unit to that population that has the highest posterior probability. Ideally that posterior probability is going to be greater than a half, the closer to 100% the better!

Equivalently we are going to assign it to the population that maximizes this product:

\(\log f(\mathbf{x|\pi_i})p_i\)

The denominator that appears above does not depend on the population since it involves summing over all the populations. Equivalently all we really need to do is to assign it to the population that has the largest for this product, or equivalently again we can maximize the log of that product. A lot of times it is easier to write this log down.

10.2 - Discriminant Analysis Procedure

This is a 7 (or 6?) step procedure that is usually carried out in discriminant analysis:

The prior probability pi represents the expected portion of the community that belongs to population πi. There are three common choices:

1) Equal priors: \[\hat{p}_i = \frac{1}{g}\] This would be used if we believe that all of the population sizes are equal.

2) Arbitrary priors selected according to the investigators beliefs regarding the relative population sizes. Note that we require:

\(\hat{p}_1 + \hat{p}_2 + \dots + \hat{p}_g = 1\)

3) Estimated priors:

\[\hat{p}_i = \frac{n_i}{N}\]

where ni is the number observations from population πi in the training data, and N = n1 + n2 + ... + ng    

Case 1: Linear discriminant analysis is for homogeneous variance-covariance matrices:

\(\Sigma_1 = \Sigma_2 = \dots = \Sigma_g = \Sigma\)

In this case the variance-covariance matrix does not depend on the population from which the data are obtained.

Case 2: Quadratic discriminant analysis is used for heterogeneous variance-covariance matrices:

\(\Sigma_i \ne \Sigma_j\) for some \(i \ne j\)

This allows the variance-covariance matrices to depend on which population we are looking at.

 (We do not discuss testing whether the means of the populations are different. If they are not, there is no case for DA)

  1. The data from group i has common mean vector μi
  2. The data from group i has common variance-covariance matrix Σ.
  3. Independence: The subjects are independently sampled.
  4. Normality: The data are multivariate normally distributed.

 The procedure described above assumes that the unit or subject which is being classified actually belongs to one of the populations which has been considered. If you have a study where you are looking at two species of insects, A and B, and the insect being classified actually belongs to species C, then it will obviously be misclassified as to belonging to either A or B.

10.3 - Linear Discriminant Analysis

We assume that in population πi the probability density function of x is multivariate normal with mean vector μi and variance-covariance matrix Σ (same for all populations). As a formula, this is

\[f(\mathbf{x}|\pi_i) = \frac{1}{(2\pi)^{p/2}|\mathbf{\Sigma}|^{1/2}}\exp\left[-\frac{1}{2}\mathbf{(x-\mu_i)'\Sigma^{-1}(x-\mu_i)}\right]\]

We classify to the population for which pi f (x | πi ) is largest.

Because a log transform is monotonic, this equivalent to classifying an observation to the population for which log[ pi f (x | πi )] is largest.

 

Linear discriminant analysis is used when the variance-covariance matrix does not depend on the population from which the data are obtained. In this case, our decision rule is based on the so-called Linear Score Function which is a function of the population means for each of our g populations μi, as well as the pooled variance-covariance matrix.

The Linear Score Function is:       

 \[s^L_i(\mathbf{X}) = -\frac{1}{2}\mathbf{\mu'_i \Sigma^{-1}\mu'_i + \mu'_i \Sigma^{-1}x}+ \log p_i = d_{i0}+\sum_{j=1}^{p}d_{ij}x_i + \log p_i\] 

where

\[d_{i0} = -\frac{1}{2}\mathbf{\mu'_i\Sigma^{-1}\mu_i}\]

\(d_{ij} = j\text{th element of } \mu'_i\Sigma^{-1}\)

The far left-hand expression resembles a linear regression with intercept term di0 and regression coefficients dij.

Linear Discriminant Function:

\[d^L_i(\mathbf{x}) = -\frac{1}{2}\mathbf{\mu'_i\Sigma^{-1}\mu_i + \mu'_i\Sigma^{-1}x} = d_{i0} + \sum_{j=1}^{p}d_{ij}x_i\]

\[d_{i0} = -\frac{1}{2}\mathbf{\mu'_i\Sigma^{-1}\mu_i}\]

Given a sample unit with measurements x1, x2, ... , xp, we classify the sample unit into the population that has the largest Linear Score Function. This is equivalent to classifying to the population for which the posterior probability of membership is largest. The linear score function is computed for each population, then we assign the unit to the population with the largest score.

However, this is a function of unknown parameters, μi and Σ. So, these must be estimated from the data.

Discriminant analysis requires estimates of:

Prior probabilities:

\(p_i = \text{Pr}(\pi_i);\) \(i = 1, 2, \dots, g\)

The Population Means: these can be estimated by the sample mean vectors:

\(\mathbf{\mu_i} = E(\mathbf{X}|\pi_i)\); \(i = 1, 2, \dots, g\)

The Variance-covariance matrix: this is going to be estimated by using the pooled variance-covariance matrix

\(\Sigma = \text{var}(\mathbf{X}| \pi_i)\); \(i = 1, 2, \dots, g\)

Typically, these parameters are estimated from training data, in which the population membership is known.

Conditional Density Function Parameters:

Population Means: μi can be estimated by substituting in the sample means \(\bar{\mathbf{x}}_i\).

Variance-Covariance matrix: Let Si denote the sample variance-covariance matrix for population i. Then the variance-covariance matrix Σ can be estimated by substituting in the pooled variance-covariance matrix into the Linear Score Function as shown below:

\[\mathbf{S}_p = \frac{\sum_{i=1}^{g}(n_i-1)\mathbf{S}_i}{\sum_{i=1}^{g}(n_i-1)}\]

to obtain the estimated linear score function:

\[\hat{s}^L_i(\mathbf{x}) = -\frac{1}{2}\mathbf{\bar{x}'_i S^{-1}_p \bar{x}_i +\bar{x}'_i S^{-1}_p x } + \log{\hat{p}_i} = \hat{d}_{i0} + \sum_{j=1}^{p}\hat{d}_{ij}x_i + \log{p}_i\]

where

\[\hat{d}_{i0} = -\frac{1}{2}\mathbf{\bar{x}'_i S^{-1}_p \bar{x}_i} \]

and

\(\hat{d}_{ij} = j\)th element of \(\mathbf{\bar{x}'_iS^{-1}_p}\)

This is a function of the sample mean vectors, the pooled variance-covariance matrix and prior probabilities for g different populations. This is written in a form that looks like a linear regression formula with an intercept term plus a linear combination of response variables, plus the natural log of the prior probabilities.

Decision Rule: Classify the sample unit into the population that has the largest estimated linear score function.

10.4 - Example: Insect Data

Data were collected on two species of insects in the genus Chaetocnema, (species a) Ch. concinna and (species b) Ch. heikertlingeri. Three variables were measured on each insect:

We have ten individuals of each species to make up training data. Data on these ten individuals of each species is used to estimate the model parameters which we will use in linear score function.

Our objective is to obtain a classification rule for identifying the insect species from these three variables.

Let's begin...

Step 1: Collect the ground truth data or training data. (described above)

Step 2: Specify the prior probabilities. In this case we do not have any information regarding the relative abundances of the two species. Having no information in order to help specify prior probabilities, by default equal priors are selected

\[\hat{p}_1 = \hat{p}_2 = \frac{1}{2}\]

Step 3: Test for homogeneity of the variance-covariance matrices using Bartlett's test.

Here we will use the SAS program insect.sas as shown below:

SAS Program

Inspect SAS program code launch SAS program

Click on the arrow in the window below to see how discriminant analysis is performed using the Minitab statistical software application.

minitab dialog box

Discriminant Analysis using Minitab

No significant difference between the variance-covariance matrices for the two species (L' = 9.83; d.f. = 6; p = 0.132) is found. Thus linear discriminant analysis is appropriate for the data.

Step 4: Estimate the parameters of the conditional probability density functions, i.e., the population mean vectors and the population variance-covariance matrices involved. It turns out that all of this is done automatically in the discriminant analysis procedure.

Step 5: The linear discriminant functions for the two species can be obtained directly from the SAS or Minitab output.

Now, consider an insect with the following measurements. Which species does this belong to?

Variable
Measurement
Joint 1
194
Joint 2
124
Aedeagus
49

These are responses for the first three variables. The linear discriminant function for species a is obtained by plugging in the values for these three measurements into the equation for species (a):

 \(\hat{d}^{L}_a(\textbf{x}) = -247.276 - 1.417 x 194 + 1.520 x 124 + 10.954 x 49 = 203.052\)

and then for species (b):

\(\hat{d}^{L}_b(\textbf{x}) = -193.178 - 0.738 x 194 + 1.113 x 124 + 8.250 x 49 = 205.912\)

Then the linear score function is obtained by adding in a log of one half, here for species (a):

\(\hat{s}^L_a(\mathbf{x}) = \hat{d}^L_a(\mathbf{x}) + \log{\hat{p}_a} = 203.052 + \log{0.5} = 202.359\)

and then for species (b):

\(\hat{s}^L_b(\mathbf{x}) = \hat{d}^L_b(\mathbf{x}) + \log{\hat{p}_b} = 205.912 + \log{0.5} = 205.219\)

Conclusion

According to the classificaqtion rule the insect is classified into the species that has the highest linear discriminant function. Since \(\hat{s}^L_b(\mathbf{x}) > \hat{s}^L_a(\mathbf{x})\), we conclude that the insect belongs to species (b) Ch. heikertlingeri.

Of course here addition of log of one half does not make any difference. Whether we classify on the basis of \(\hat{d}^L_b(\mathbf{x})\) or on the basis of score function, the decision will remain the same. In case the priors are not equal, this would not hold.

You can think of these priors as a 'penalty' in some sense. If you have a higher prior probability of a given species you will give it very little 'penalty' because you will be taking the log of a number close to one which is not going to subtract much. But if there is a low prior probability you will be taking the log of a very small number, this will end up in a large reduction.

Note: SAS by default will assume equal priors. Later on we will look at an example where we will not assume equal priors - the Swiss Banks Notes example.

Posterior Probabilities

You can also calculate the posterior probabilities. These are used to measure uncertainty regarding the classification of a unit from an unknown group. They will give us some indication of our confidence in our classification of individual subjects.

In this case, the estimated posterior probability that the insect belongs to species (a) Ch. concinna given the observed measurements can be obtained by using this formula:

\[\begin{array}{ccl} p(\pi_a|\mathbf{x}) & = & \frac{\exp\{\hat{s}^L_a(\mathbf{x})\}}{\exp\{\hat{s}^L_a(\mathbf{x})\}+\exp\{\hat{s}^L_b(\mathbf{x})\}} \\ & = & \frac{\exp\{202.359\}}{\exp\{202.359\}+\exp\{205.219\}} \\ & = & 0.05\end{array}\]

This is a function of our linear score functions for our two species. Here we are looking at the exponential function of the linear score function for species (a) divided by the sum of the exponential functions of the score functions for species (a) and species (b). Using the numbers that we obtained earlier we can carry out the math and get 0.05.

Similarly for species (b), the estimated posterior probability that the insect belongs to Ch. heikertlingeri is:

\[\begin{array}{ccl} p(\pi_b|\mathbf{x}) & = & \frac{\exp\{\hat{s}^L_b(\mathbf{x})\}}{\exp\{\hat{s}^L_a(\mathbf{x})\}+\exp\{\hat{s}^L_b(\mathbf{x})\}} \\ & = & \frac{\exp\{205.219\}}{\exp\{202.359\}+\exp\{205.219\}} \\ & = & 0.95\end{array}\]

In this case we are 95% confident that the insect belongs to species (b). This is a pretty high level of confidence but there is a 5% chance that we might be in error in this classification. One of the things that you would have to decide is what is an acceptable error rate here. For classification of insects this might be perfectly acceptable, however, in some situations it might not be. For example, looking at the cancer case that we talked about earlier where we were trying to classify someone as having cancer or not having cancer, it may not be acceptable to have 5% error rate. This is an ethical decision that has to be made. It is a decision that has nothing to do with statistics but must be tailored to the situation at hand.

10.5 - Estimating Misclassification Probabilities

When an umknown specimen is classified according to any decision rule, there is always a possibility that the item is wrongly classified. This must not be taken as error! This is part of the inherent uncertainty in any statistical procedure. One procedure to measure how good the discriminant rule is, we classify the training data according to the developed discrimination rule. Since we know which unit comes from which population among the training data, this will give us some idea of the validity of the discrimination procedure.


Method 1. The confusion table describes how the discriminant function will classify each observation in the data set. In general, the confusion table takes the form:

Rows 1 through g are g populations to which the items truly belong. Across the columns we are looking at how they are classified. n11 is the number of insects correctly classified in species (1). But n12 is the number of insects incorrectly classified into species (2). In this case nij = the number belonging to population i classified into population j. Ideally this matrix will be a diagonal matrix; in practice we hope to get off-diagonal elements to be very small numbers.

The row totals give the number of individuals belonging to each of our populations or species in our training dataset. The column totals give the number classified into each of these species. The total number of observations in the dataset is n... The dot notation is used here in the row totals for summing over the second subscript, whereas in the column totals we are summing over the first subscript.

We will let:

\(p(i|j)\)

denote the probability that a unit from population πj is classified into population πi. These misclassification probabilities can be estimated by taking the number of insects from population j that are misclassified into population i divided by the total number of insects in the sample from population j as shown here:

\[\hat{p}(i|j) = \frac{n_{ji}}{n_{j.}}\]

This will give the misclassification probabilities.

Example - Insect Data:

From the SAS output, we obtain the following confusion table.

Classified As
Truth
a
b
Total
a
10
0
10
b
 0
10
10
Total
10
10
20

Here, no insect was misclassified. So, the misclassification probabilities are all estimated to be equal to zero.


Method 2: Set Aside Method

Step 1: Randomly partition the observations into two ”halves”

Step 2: Use one ”half” to obtain the discriminant function.

Step 3: Use the discriminant function from Step 2 to classify all members of the second ”half” of the data, from which the proportion of misclassified observations can be computed.

Advantage: This method yield unbiased estimates of the misclassification probabilities.

Problem: Does not make optimum use of the data, and so, estimated misclassification probabilities are not as precise as possible.


Method 3: Cross validation

Step1: Delete one observation from the data.

Step 2: Use the remaining observations to compute a discriminant function.

Step 3: Use the discriminant function from Step 2 to classify the observation removed in Step 1. Steps 1-3 are repeated for all observations; compute the proportions of observations that are misclassified.

Example: Insect Data

The confusion table for the cross validation is

Classified As
Truth
a
b
Total
a
10
0
10
b
2
8
20
Total
10
10
20

Here, the estimated misclassification probabilities are:

\[\hat{p}(b|a) = \frac{0}{10} = 0.0\]

for insects belonging to species A, and

\[\hat{p}(a|b) = \frac{2}{10} = 0.2\]

for insects belonging to species B.

Specifying Unequal Priors

Suppose that we have information (from prior experience or from another study) that suggests that 90% of the insects belong to Ch. concinna. Then the score functions for the unidentified specimen are

\(\hat{s}^L_a(\mathbf{x}) = \hat{d}^L_a(\mathbf{x}) + \log{\hat{p}_a} = 203.052 + \log{0.9} = 202.946\)

and

\(\hat{s}^L_b(\mathbf{x}) = \hat{d}^L_b(\mathbf{x}) + \log{\hat{p}_b} = 205.912 + \log{0.1} = 203.609\)

In this case, we would still classify this specimen into Ch. heikertlingeri with posterior probabilities

\(p(\pi_a|\mathbf{x}) = 0.36\) and \(p(\pi_b|\mathbf{x}) = 0.64\)

These priors can be specified in SAS by adding the ”priors” statement: priors ”a” = 0.9 ”b” = 0.1; following the var statement.  However, it should be noted that when the "priors" statement is added, SAS will include log pi as part of the constant term.  In other words, in this case, SAS outputs the estimated linear score function, not the estimated linear discriminant function.

10.6 - Quadratic Discriminant Analysis

Linear Discriminant Analysis is for homogeneous variance-covariance matrices. However not in all cases data may come from such simplified situations.  Quadratic Discriminant Analysis is used for heterogeneous variance-covariance matrices:

\(\Sigma_i \ne \Sigma_j\) for some \(i \ne j\)

Again, this allows the variance-covariance matrices to depend on which population we are looking at.

Quadratic discriminant analysis calculates a Quadratic Score Function which looks like this:

\[s^Q_i (\mathbf{x}) = -\frac{1}{2}\log{|\mathbf{\Sigma_i}|}-\frac{1}{2}{\mathbf{(x-\mu_i)'\Sigma^{-1}_i(x - \mu_i)}}+\log{p_i}\]

This is a function of population mean vectors and the variance-covariance matrices for ith group. Similarly we will determine a separate quadratic score function for each of the groups.

This is of course a function of unknown population mean vector for group i and the variance-covariance matrix for group i. These will have to be estimated from ground truth data. As before, we replace the unknown values of μi, Σi,and pi by their estimates to obtain the estimated quadratic score function as shown below:

 

All natural logs are used in this function.

Decision Rule: Our decision rule remains the same as well. We will classify the sample unit or subject into the population that has the largest quadratic score function.

\[s^Q_i (\mathbf{x}) = -\frac{1}{2}\log{|\mathbf{S_i}|}-\frac{1}{2}{\mathbf{(x-\bar{x})'S^{-1}_i(x -\bar{x})}}+\log{p_i}\]

Let's illustrate this using the Swiss Bank Notes example...

10.7 - Example: Swiss Bank Notes

Recall that we have two populations of notes, genuine, and counterfeit and that six measurements were taken on each note:

Priors

In this case it would not be reasonable to consider equal priors for the two types of banknotes. Equal priors would assume that half the banknotes in circulation are counterfeit and half are genuine. This is a very high counterfeit rate and if it was that bad the Swiss government would probably by bankrupt! So we need to consider unequal priors in which the vast majority of banknotes are thought to be genuine. For this example let us assume that no more than 1% of bank notes in circulation are counterfeit and 99% of the notes are genuine. The prior probabilities can then be expressed as:

\(\hat{p}_1 = 0.99\) and \(\hat{p}_2 = 0.01\)

The first step in the analysis is going to carry out Bartlett's test to check for homogeneity of the variance-covariance matrices.

To do this we will use the SAS program swiss9.sas - shown below:

SAS Program

Inspect SAS program code launch SAS program

SAS Notes

By default, SAS will make this decision for you. Let's look at the proc descrim procedures in the SAS Program swiss9.sas that we just used.

SAS Program

By including this pool=test, above, what SAS will do is decide what kind of discriminant analysis is going to be carried based on the results of this test.

If you fail to reject, SAS will automatically do a linear discriminant analysis. If you reject, then SAS will do a quadratic discriminant analysis.

There are two other options here. If we put pool=yes then SAS will not carry out Bartlett's test but will go ahead and do a linear discriminant analysis whether it is warranted or not. It will pool the variance-covariance matrices and do a linear discriminant analysis.

If pool=no then SAS will not pool the variance-covariance matrices and SAS will then perform the quadratic discriminant analysis.

SAS does not actually print out the quadratic discriminant function, but it will use quadratic discriminant analysis to classify sample units into populations.

Click on the arrow in the window below to see how discriminant analysis is performed using the Minitab statistical software application.

minitab dialog box

Discriminant Analysis using Minitab

Bartlett's Test finds a significant difference between the variance-covariance matrices of the genuine and counterfeit bank notes (L' = 121.90; d.f. = 21; p < 0.0001). The variance-covariance matrix for the genuine notes is not equal to the variance-covariance matrix for the counterfeit notes. Since we reject the null hypothesis here of equal variance-covariance matrices this suggest that a linear discriminant analysis will not be appropriate for these data.Hence a quadratic discriminant analysis for these data is necessary.


Let us consider a bank note with the following measurements that were entered into program:

Variable
Measurement
Length
214.9
Left Width
130.1
Right Width
129.9
Bottom Margin
9.0
Top Margin
10.6
Diagonal
140.5

Any number of lines of measurements may be considered. Here we are just interested in one set of measurements. It is reported that this bank note should be classified as real or genuine. The posterior probability that it is fake or counterfeit is only 0.000002526. So, the posterior probability that it is genuine is very close to one (actually, this posterior probability is 1 - 0.000002526 = 0.999997474). We are nearly 100% confident that this is a real note and not counterfeit.

Next consider the results of crossvalidation. Note that crossvalidation yields estimates of the probability that a randomly selected note will be correctly classified. The resulting confusion table is as follows:

Classified As
Truth Counterfeit Genuine Total
Counterfeit
98
2
100
Genuine
1
99
100
Total
99
101
200

Here, we can see that 98 out of 100 counterfeit notes are expected to be correctly classified, while 99 out of 100 genuine notes are expected to be correctly classified.Thus, the estimated misclassification probabilities are estimated to be:

\(\hat{p}(\text{real | fake}) = 0.02 \) and \(\hat{p}(\text{fake | real}) = 0.01 \)

The question remains: Are these acceptable misclassification rates?

A decision should be made in advance as to what would be the acceptable levels of error. Here again, you need to think about the consequences of making a mistake. In terms of classifying a genuine note as a counterfeit, one might put somebody in jail who is innocent. If you make the opposite error you might let a criminal get away. What are the costs of these types of errors? And, are the above error rates acceptable? This decision should be made in advance. You should have some prior notion of what you would consider reasonable.

10.8 - Summary

In this lesson we learned about:

Complete the homework problems that will give you a chance to put what you have learned to use.