Discriminant analysis is a classification problem, where two or more groups or clusters or populations are known a priori and one or more new observations are classified into one of the known populations based on the measured characteristics. Let us look at three different examples.
We have two populations of bank notes, genuine, and counterfeit. Six measures are taken on each note:
Take a bank note of unknown origin and determine just from these six measurements whether or not it is real or counterfeit. Perhaps this is not as impractical as it might sound. A more modern equivalent is a scanner that would measure the notes automatically and makes a decision.
Pottery shards are sampled from four sites: L) Llanedyrn, C) Caldicot, I) Ilse Thornes, and A) Ashley Rails and the concentrations of the following chemical constituents were measured at a laboratory
An archaeologist encounters a pottery specimen of unknown origin. To determine possible trade routes, the archaeologist may wish to classify its site of origin.
Data were collected on two species of insects in the genus Chaetocnema, (a) Ch. concinna and (b) Ch. heikertlingeri. Three variables were measured on each insect:
Our objective is to obtain a classification rule for identifying the insect species based on these three variables. An entomologist can identify these two closely related species, but the differences are so subtle that one has to have considerable experience to be able to tell the difference. If a classification rule may be developed, then this might be a more accurate way to help differentiate between these two different species.
Upon completion of this lesson, you should be able to do the following:
Consider any two events A and B. To find P(BA), the probability that B occurs given that A has occurred, Bayes’ Rule states the following:
\[P(BA) = \frac{P(A \text{ and } B)}{P(A)}\]
This says that the conditional probability is the probability that both A and B occur divided by the unconditional probability that A occurs. This is a simple algebraic restatement of a rule for finding the probability that two events occur together, which is P(A and B) = P(A)P(BA).
We are interested in P(π_{i}  x), the conditional probability that an observation came from population π_{i} given that the observed values of the multivariate vector of variables x. We will classify an observation to the population for which the value of P(π_{i}  x) is greatest. This is the most probable group given the observed values of x.
Technical Note: We have to be careful about the word probability in conjunction with our observed vector x. A probability density function for continuous variables does not give a probability, but instead gives a measure of “likelihood.”
Using the notation of Bayes’ Rule above, event A = observing the vector x and event B = observation came from population π_{i} . Thus our probability of interest can be found as
\[P(\text{ member of } \pi_i  \text{ we observed } \mathbf{x}) = \frac{P(\text{ member of } \pi_i \text{ and we observe } \mathbf{x})}{P(\text{ we observe } \mathbf{x})}\]
Thus the posterior probability that an observation is a member of population π_{i} is
\[p(\pi_i\mathbf{x}) = \frac{p_i f(\mathbf{x}\pi_i)}{\sum_{j=1}^{g}p_j f(\mathbf{x}\pi_j)}\]
The classification rule is to assign observation x to the population for which the posterior probability is the greatest.
The denominator is the same for all posterior probabilities (for the various populations) so it is equivalent to say that we will classify an observation to the population for which p_{i }f (x  π_{i} ) is greatest.
With only two populations we can express a classification rule in terms of the ratio of the two posterior probabilities. Specifically we would classify to population 1 when
\[\frac{p_1 f(\mathbf{x}\pi_1)}{p_2 f(\mathbf{x}\pi_2)} > 1\]
This can be rewritten to say the we classify to population 1 when
\[\frac{ f(\mathbf{x}\pi_1)}{ f(\mathbf{x}\pi_2)} > \frac{p_2}{p_1}\]
We are going to classify the sample unit or subject into the population π_{i} that maximizes the posterior probability p(π_{i}). that is the population that maximizes
\(f(\mathbf{x\pi_i})p_i\)
We are going to calculate the posterior probabilities for each of the populations. Then we are going to assign the subject or sample unit to that population that has the highest posterior probability. Ideally that posterior probability is going to be greater than a half, the closer to 100% the better!
Equivalently we are going to assign it to the population that maximizes this product:
\(\log f(\mathbf{x\pi_i})p_i\)
The denominator that appears above does not depend on the population since it involves summing over all the populations. Equivalently all we really need to do is to assign it to the population that has the largest for this product, or equivalently again we can maximize the log of that product. A lot of times it is easier to write this log down.
This is a 7 (or 6?) step procedure that is usually carried out in discriminant analysis:
Ground truth or training data are data with known group memberships. Here, we actually know to which population each subject belongs. For example, in the Swiss Bank Notes, we actually know which of these are genuine notes and which others are counterfeit examples.
The prior probability p_{i} represents the expected portion of the community that belongs to population π_{i}. There are three common choices:
1) Equal priors: \[\hat{p}_i = \frac{1}{g}\] This would be used if we believe that all of the population sizes are equal.
2) Arbitrary priors selected according to the investigators beliefs regarding the relative population sizes. Note that we require:
\(\hat{p}_1 + \hat{p}_2 + \dots + \hat{p}_g = 1\)
3) Estimated priors:
\[\hat{p}_i = \frac{n_i}{N}\]
where n_{i} is the number observations from population π_{i} in the training data, and N = n_{1} + n_{2} + ... + n_{g}
Case 1: Linear discriminant analysis is for homogeneous variancecovariance matrices:
\(\Sigma_1 = \Sigma_2 = \dots = \Sigma_g = \Sigma\)
In this case the variancecovariance matrix does not depend on the population from which the data are obtained.
Case 2: Quadratic discriminant analysis is used for heterogeneous variancecovariance matrices:
\(\Sigma_i \ne \Sigma_j\) for some \(i \ne j\)
This allows the variancecovariance matrices to depend on which population we are looking at.
(We do not discuss testing whether the means of the populations are different. If they are not, there is no case for DA)
As in all statistical procedures it is helpful to use diagnostic procedures to asses the efficacy of the discriminant analysis. We use crossvalidation to assess the classification probability. Typically you are going to have some prior rule as to what is an acceptable misclassification rate. Those rules might involve things like, "what is the cost of misclassification?" This could come up in a medical study where you might be able to diagnose cancer. There are really two alternative costs. The cost of misclassifying someone as having cancer when they don't. This could cause a certain amount of emotional grief!! There is also the alternative cost of misclassifying someone as not having cancer when in fact they do have it! The cost here is obviously greater if early diagnosis improves cure rates.
The procedure described above assumes that the unit or subject which is being classified actually belongs to one of the populations which has been considered. If you have a study where you are looking at two species of insects, A and B, and the insect being classified actually belongs to species C, then it will obviously be misclassified as to belonging to either A or B.
We assume that in population π_{i} the probability density function of x is multivariate normal with mean vector μ_{i} and variancecovariance matrix Σ (same for all populations). As a formula, this is
\[f(\mathbf{x}\pi_i) = \frac{1}{(2\pi)^{p/2}\mathbf{\Sigma}^{1/2}}\exp\left[\frac{1}{2}\mathbf{(x\mu_i)'\Sigma^{1}(x\mu_i)}\right]\]
We classify to the population for which p_{i }f (x  π_{i} ) is largest.
Because a log transform is monotonic, this equivalent to classifying an observation to the population for which log[ p_{i }f (x  π_{i} )] is largest.
Linear discriminant analysis is used when the variancecovariance matrix does not depend on the population from which the data are obtained. In this case, our decision rule is based on the socalled Linear Score Function which is a function of the population means for each of our g populations μ_{i}, as well as the pooled variancecovariance matrix.
The Linear Score Function is:
\[s^L_i(\mathbf{X}) = \frac{1}{2}\mathbf{\mu'_i \Sigma^{1}\mu_i + \mu'_i \Sigma^{1}x}+ \log p_i = d_{i0}+\sum_{j=1}^{p}d_{ij}x_j + \log p_i\]
where
\[d_{i0} = \frac{1}{2}\mathbf{\mu'_i\Sigma^{1}\mu_i}\]
\(d_{ij} = j\text{th element of } \mu'_i\Sigma^{1}\)
The far lefthand expression resembles a linear regression with intercept term d_{i}_{0} and regression coefficients d_{ij}.
Linear Discriminant Function:
\[d^L_i(\mathbf{x}) = \frac{1}{2}\mathbf{\mu'_i\Sigma^{1}\mu_i + \mu'_i\Sigma^{1}x} = d_{i0} + \sum_{j=1}^{p}d_{ij}x_j\]
\[d_{i0} = \frac{1}{2}\mathbf{\mu'_i\Sigma^{1}\mu_i}\]
Given a sample unit with measurements x_{1}, x_{2}, ... , x_{p}, we classify the sample unit into the population that has the largest Linear Score Function. This is equivalent to classifying to the population for which the posterior probability of membership is largest. The linear score function is computed for each population, then we assign the unit to the population with the largest score.
However, this is a function of unknown parameters, μ_{i} and Σ. So, these must be estimated from the data.
Discriminant analysis requires estimates of:
Prior probabilities:
\(p_i = \text{Pr}(\pi_i);\) \(i = 1, 2, \dots, g\)
The Population Means: these can be estimated by the sample mean vectors:
\(\mathbf{\mu_i} = E(\mathbf{X}\pi_i)\); \(i = 1, 2, \dots, g\)
The Variancecovariance matrix: this is going to be estimated by using the pooled variancecovariance matrix
\(\Sigma = \text{var}(\mathbf{X} \pi_i)\); \(i = 1, 2, \dots, g\)
Typically, these parameters are estimated from training data, in which the population membership is known.
Conditional Density Function Parameters:
Population Means: μ_{i} can be estimated by substituting in the sample means \(\bar{\mathbf{x}}_i\).
VarianceCovariance matrix: Let S_{i} denote the sample variancecovariance matrix for population i. Then the variancecovariance matrix Σ can be estimated by substituting in the pooled variancecovariance matrix into the Linear Score Function as shown below:
\[\mathbf{S}_p = \frac{\sum_{i=1}^{g}(n_i1)\mathbf{S}_i}{\sum_{i=1}^{g}(n_i1)}\]
to obtain the estimated linear score function:
\[\hat{s}^L_i(\mathbf{x}) = \frac{1}{2}\mathbf{\bar{x}'_i S^{1}_p \bar{x}_i +\bar{x}'_i S^{1}_p x } + \log{\hat{p}_i} = \hat{d}_{i0} + \sum_{j=1}^{p}\hat{d}_{ij}x_j + \log{p}_i\]
where
\[\hat{d}_{i0} = \frac{1}{2}\mathbf{\bar{x}'_i S^{1}_p \bar{x}_i} \]
and
\(\hat{d}_{ij} = j\)th element of \(\mathbf{\bar{x}'_iS^{1}_p}\)
This is a function of the sample mean vectors, the pooled variancecovariance matrix and prior probabilities for g different populations. This is written in a form that looks like a linear regression formula with an intercept term plus a linear combination of response variables, plus the natural log of the prior probabilities.
Decision Rule: Classify the sample unit into the population that has the largest estimated linear score function.
Data were collected on two species of insects in the genus Chaetocnema, (species a) Ch. concinna and (species b) Ch. heikertlingeri. Three variables were measured on each insect:
We have ten individuals of each species to make up training data. Data on these ten individuals of each species is used to estimate the model parameters which we will use in linear score function.
Our objective is to obtain a classification rule for identifying the insect species from these three variables.
Let's begin...
Step 1: Collect the ground truth data or training data. (described above)
Step 2: Specify the prior probabilities. In this case we do not have any information regarding the relative abundances of the two species. Having no information in order to help specify prior probabilities, by default equal priors are selected
\[\hat{p}_1 = \hat{p}_2 = \frac{1}{2}\]
Step 3: Test for homogeneity of the variancecovariance matrices using Bartlett's test.
Here we will use the SAS program insect.sas as shown below:
Click on the arrow in the window below to see how discriminant analysis is performed using the Minitab statistical software application.
No significant difference between the variancecovariance matrices for the two species (L' = 9.83; d.f. = 6; p = 0.132) is found. Thus linear discriminant analysis is appropriate for the data.
Step 4: Estimate the parameters of the conditional probability density functions, i.e., the population mean vectors and the population variancecovariance matrices involved. It turns out that all of this is done automatically in the discriminant analysis procedure.
Step 5: The linear discriminant functions for the two species can be obtained directly from the SAS or Minitab output.
Now, consider an insect with the following measurements. Which species does this belong to?
Variable

Measurement

Joint 1

194

Joint 2

124

Aedeagus

49

These are responses for the first three variables. The linear discriminant function for species a is obtained by plugging in the values for these three measurements into the equation for species (a):
\(\hat{d}^{L}_a(\textbf{x}) = 247.276  1.417 x 194 + 1.520 x 124 + 10.954 x 49 = 203.052\)
and then for species (b):
\(\hat{d}^{L}_b(\textbf{x}) = 193.178  0.738 x 194 + 1.113 x 124 + 8.250 x 49 = 205.912\)
Then the linear score function is obtained by adding in a log of one half, here for species (a):
\(\hat{s}^L_a(\mathbf{x}) = \hat{d}^L_a(\mathbf{x}) + \log{\hat{p}_a} = 203.052 + \log{0.5} = 202.359\)
and then for species (b):
\(\hat{s}^L_b(\mathbf{x}) = \hat{d}^L_b(\mathbf{x}) + \log{\hat{p}_b} = 205.912 + \log{0.5} = 205.219\)
According to the classificaqtion rule the insect is classified into the species that has the highest linear discriminant function. Since \(\hat{s}^L_b(\mathbf{x}) > \hat{s}^L_a(\mathbf{x})\), we conclude that the insect belongs to species (b) Ch. heikertlingeri.
Of course here addition of log of one half does not make any difference. Whether we classify on the basis of \(\hat{d}^L_b(\mathbf{x})\) or on the basis of score function, the decision will remain the same. In case the priors are not equal, this would not hold.
You can think of these priors as a 'penalty' in some sense. If you have a higher prior probability of a given species you will give it very little 'penalty' because you will be taking the log of a number close to one which is not going to subtract much. But if there is a low prior probability you will be taking the log of a very small number, this will end up in a large reduction.
Note: SAS by default will assume equal priors. Later on we will look at an example where we will not assume equal priors  the Swiss Banks Notes example.
You can also calculate the posterior probabilities. These are used to measure uncertainty regarding the classification of a unit from an unknown group. They will give us some indication of our confidence in our classification of individual subjects.
In this case, the estimated posterior probability that the insect belongs to species (a) Ch. concinna given the observed measurements can be obtained by using this formula:
\[\begin{array}{ccl} p(\pi_a\mathbf{x}) & = & \frac{\exp\{\hat{s}^L_a(\mathbf{x})\}}{\exp\{\hat{s}^L_a(\mathbf{x})\}+\exp\{\hat{s}^L_b(\mathbf{x})\}} \\ & = & \frac{\exp\{202.359\}}{\exp\{202.359\}+\exp\{205.219\}} \\ & = & 0.05\end{array}\]
This is a function of our linear score functions for our two species. Here we are looking at the exponential function of the linear score function for species (a) divided by the sum of the exponential functions of the score functions for species (a) and species (b). Using the numbers that we obtained earlier we can carry out the math and get 0.05.
Similarly for species (b), the estimated posterior probability that the insect belongs to Ch. heikertlingeri is:
\[\begin{array}{ccl} p(\pi_b\mathbf{x}) & = & \frac{\exp\{\hat{s}^L_b(\mathbf{x})\}}{\exp\{\hat{s}^L_a(\mathbf{x})\}+\exp\{\hat{s}^L_b(\mathbf{x})\}} \\ & = & \frac{\exp\{205.219\}}{\exp\{202.359\}+\exp\{205.219\}} \\ & = & 0.95\end{array}\]
In this case we are 95% confident that the insect belongs to species (b). This is a pretty high level of confidence but there is a 5% chance that we might be in error in this classification. One of the things that you would have to decide is what is an acceptable error rate here. For classification of insects this might be perfectly acceptable, however, in some situations it might not be. For example, looking at the cancer case that we talked about earlier where we were trying to classify someone as having cancer or not having cancer, it may not be acceptable to have 5% error rate. This is an ethical decision that has to be made. It is a decision that has nothing to do with statistics but must be tailored to the situation at hand.
When an umknown specimen is classified according to any decision rule, there is always a possibility that the item is wrongly classified. This must not be taken as error! This is part of the inherent uncertainty in any statistical procedure. One procedure to measure how good the discriminant rule is, we classify the training data according to the developed discrimination rule. Since we know which unit comes from which population among the training data, this will give us some idea of the validity of the discrimination procedure.
Method 1. The confusion table describes how the discriminant function will classify each observation in the data set. In general, the confusion table takes the form:
Rows 1 through g are g populations to which the items truly belong. Across the columns we are looking at how they are classified. n_{11} is the number of insects correctly classified in species (1). But n_{12} is the number of insects incorrectly classified into species (2). In this case n_{ij} = the number belonging to population i classified into population j. Ideally this matrix will be a diagonal matrix; in practice we hope to get offdiagonal elements to be very small numbers.
The row totals give the number of individuals belonging to each of our populations or species in our training dataset. The column totals give the number classified into each of these species. The total number of observations in the dataset is n... The dot notation is used here in the row totals for summing over the second subscript, whereas in the column totals we are summing over the first subscript.
We will let:
\(p(ij)\)
denote the probability that a unit from population π_{j} is classified into population π_{i}. These misclassification probabilities can be estimated by taking the number of insects from population j that are misclassified into population i divided by the total number of insects in the sample from population j as shown here:
\[\hat{p}(ij) = \frac{n_{ji}}{n_{j.}}\]
This will give the misclassification probabilities.
Example  Insect Data:
From the SAS output, we obtain the following confusion table.
Classified As


Truth

a

b

Total

a

10

0

10

b

0

10

10

Total

10

10

20

Here, no insect was misclassified. So, the misclassification probabilities are all estimated to be equal to zero.
Method 2: Set Aside Method
Step 1: Randomly partition the observations into two ”halves”
Step 2: Use one ”half” to obtain the discriminant function.
Step 3: Use the discriminant function from Step 2 to classify all members of the second ”half” of the data, from which the proportion of misclassified observations can be computed.
Advantage: This method yield unbiased estimates of the misclassification probabilities.
Problem: Does not make optimum use of the data, and so, estimated misclassification probabilities are not as precise as possible.
Method 3: Cross validation
Step1: Delete one observation from the data.
Step 2: Use the remaining observations to compute a discriminant function.
Step 3: Use the discriminant function from Step 2 to classify the observation removed in Step 1. Steps 13 are repeated for all observations; compute the proportions of observations that are misclassified.
Example: Insect Data
The confusion table for the cross validation is
Classified As


Truth

a

b

Total

a

10

0

10

b

2

8

10

Total

12

8 
20

Here, the estimated misclassification probabilities are:
\[\hat{p}(ba) = \frac{0}{10} = 0.0\]
for insects belonging to species A, and
\[\hat{p}(ab) = \frac{2}{10} = 0.2\]
for insects belonging to species B.
Specifying Unequal Priors
Suppose that we have information (from prior experience or from another study) that suggests that 90% of the insects belong to Ch. concinna. Then the score functions for the unidentified specimen are
\(\hat{s}^L_a(\mathbf{x}) = \hat{d}^L_a(\mathbf{x}) + \log{\hat{p}_a} = 203.052 + \log{0.9} = 202.946\)
and
\(\hat{s}^L_b(\mathbf{x}) = \hat{d}^L_b(\mathbf{x}) + \log{\hat{p}_b} = 205.912 + \log{0.1} = 203.609\)
In this case, we would still classify this specimen into Ch. heikertlingeri with posterior probabilities
\(p(\pi_a\mathbf{x}) = 0.36\) and \(p(\pi_b\mathbf{x}) = 0.64\)
These priors can be specified in SAS by adding the ”priors” statement: priors ”a” = 0.9 ”b” = 0.1; following the var statement. However, it should be noted that when the "priors" statement is added, SAS will include log p_{i} as part of the constant term. In other words, in this case, SAS outputs the estimated linear score function, not the estimated linear discriminant function.
Linear Discriminant Analysis is for homogeneous variancecovariance matrices. However not in all cases data may come from such simplified situations. Quadratic Discriminant Analysis is used for heterogeneous variancecovariance matrices:
\(\Sigma_i \ne \Sigma_j\) for some \(i \ne j\)
Again, this allows the variancecovariance matrices to depend on which population we are looking at.
Quadratic discriminant analysis calculates a Quadratic Score Function which looks like this:
\[s^Q_i (\mathbf{x}) = \frac{1}{2}\log{\mathbf{\Sigma_i}}\frac{1}{2}{\mathbf{(x\mu_i)'\Sigma^{1}_i(x  \mu_i)}}+\log{p_i}\]
This is a function of population mean vectors and the variancecovariance matrices for ith group. Similarly we will determine a separate quadratic score function for each of the groups.
This is of course a function of unknown population mean vector for group i and the variancecovariance matrix for group i. These will have to be estimated from ground truth data. As before, we replace the unknown values of μ_{i}, Σ_{i},and p_{i} by their estimates to obtain the estimated quadratic score function as shown below:
All natural logs are used in this function.
Decision Rule: Our decision rule remains the same as well. We will classify the sample unit or subject into the population that has the largest quadratic score function.
\[s^Q_i (\mathbf{x}) = \frac{1}{2}\log{\mathbf{S_i}}\frac{1}{2}{\mathbf{(x\bar{x})'S^{1}_i(x \bar{x})}}+\log{p_i}\]
Let's illustrate this using the Swiss Bank Notes example...
Recall that we have two populations of notes, genuine, and counterfeit and that six measurements were taken on each note:
In this case it would not be reasonable to consider equal priors for the two types of banknotes. Equal priors would assume that half the banknotes in circulation are counterfeit and half are genuine. This is a very high counterfeit rate and if it was that bad the Swiss government would probably by bankrupt! So we need to consider unequal priors in which the vast majority of banknotes are thought to be genuine. For this example let us assume that no more than 1% of bank notes in circulation are counterfeit and 99% of the notes are genuine. The prior probabilities can then be expressed as:
\(\hat{p}_1 = 0.99\) and \(\hat{p}_2 = 0.01\)
The first step in the analysis is going to carry out Bartlett's test to check for homogeneity of the variancecovariance matrices.
To do this we will use the SAS program swiss9.sas  shown below:
By default, SAS will make this decision for you. Let's look at the proc descrim procedures in the SAS Program swiss9.sas that we just used.
By including this pool=test, above, what SAS will do is decide what kind of discriminant analysis is going to be carried based on the results of this test.
If you fail to reject, SAS will automatically do a linear discriminant analysis. If you reject, then SAS will do a quadratic discriminant analysis.
There are two other options here. If we put pool=yes then SAS will not carry out Bartlett's test but will go ahead and do a linear discriminant analysis whether it is warranted or not. It will pool the variancecovariance matrices and do a linear discriminant analysis.
If pool=no then SAS will not pool the variancecovariance matrices and SAS will then perform the quadratic discriminant analysis.
SAS does not actually print out the quadratic discriminant function, but it will use quadratic discriminant analysis to classify sample units into populations.
Click on the arrow in the window below to see how discriminant analysis is performed using the Minitab statistical software application.
Bartlett's Test finds a significant difference between the variancecovariance matrices of the genuine and counterfeit bank notes (L' = 121.90; d.f. = 21; p < 0.0001). The variancecovariance matrix for the genuine notes is not equal to the variancecovariance matrix for the counterfeit notes. Since we reject the null hypothesis here of equal variancecovariance matrices this suggest that a linear discriminant analysis will not be appropriate for these data.Hence a quadratic discriminant analysis for these data is necessary.
Let us consider a bank note with the following measurements that were entered into program:
Variable

Measurement

Length

214.9

Left Width

130.1

Right Width

129.9

Bottom Margin

9.0

Top Margin

10.6

Diagonal

140.5

Any number of lines of measurements may be considered. Here we are just interested in one set of measurements. It is reported that this bank note should be classified as real or genuine. The posterior probability that it is fake or counterfeit is only 0.000002526. So, the posterior probability that it is genuine is very close to one (actually, this posterior probability is 1  0.000002526 = 0.999997474). We are nearly 100% confident that this is a real note and not counterfeit.
Next consider the results of crossvalidation. Note that crossvalidation yields estimates of the probability that a randomly selected note will be correctly classified. The resulting confusion table is as follows:
Classified As


Truth  Counterfeit  Genuine  Total 
Counterfeit 
98

2

100

Genuine 
1

99

100

Total 
99

101

200

Here, we can see that 98 out of 100 counterfeit notes are expected to be correctly classified, while 99 out of 100 genuine notes are expected to be correctly classified.Thus, the estimated misclassification probabilities are estimated to be:
\(\hat{p}(\text{real  fake}) = 0.02 \) and \(\hat{p}(\text{fake  real}) = 0.01 \)
The question remains: Are these acceptable misclassification rates?
A decision should be made in advance as to what would be the acceptable levels of error. Here again, you need to think about the consequences of making a mistake. In terms of classifying a genuine note as a counterfeit, one might put somebody in jail who is innocent. If you make the opposite error you might let a criminal get away. What are the costs of these types of errors? And, are the above error rates acceptable? This decision should be made in advance. You should have some prior notion of what you would consider reasonable.
In this lesson we learned about:
Complete the homework problems that will give you a chance to put what you have learned to use.