During a clinical trial over a lengthy period of time, it can be desirable to monitor treatment effects as well as tracking safety issues. "Interim analysis" or "early stopping" procedures are used to interpret the accumulating information during a clinical trial. There may be a variety of practical reasons for terminating a clinical trial at an early stage. Some of these are overlapping:
(Piantodosi, 2005)
This lesson will look examine different methods or guidelines that can be used to help decide whether or not to terminate a clinical trial in progress.
Upon completion of this lesson, you should be able to do the following:
DeMets DL, Lan KK, 1994, Interim analysis: The alpha spending function approach, Statistics in Medicine 13: 13411352.
Ellenberg, SS. Fleming, TR. DeMets, DL. 2002, Data Monitoring Committees in Clinical Trials, New York, NY: Wiley.
Piantadosi, Steven. (2005) Treatment Effects Monitoring. In: Piantadosi Steven. Clinical Trials: A Methodologic Perspective. 2nd ed. Hobaken, NJ: John Wiley and Sons, Inc.
Pocock, S.J. 1983 Clinical Trials: A Practical Approach. Chichester: John Wiley and Sons.
Datadependent stopping is a general term to describe any statistical or administrative reason for stopping a trial. Consideration of the reasons given earlier may lead you to stop the trial at an early stage, or at least change the protocol.
The review, interpretation, and decisionmaking aspects of clinical trials based on interim data are necessary but prone to error. If the investigators learn of interim results, then it could affect objectivity during the remainder of the trial, or if statistical tests are performed repeatedly on the accumulating data, then the Type I error rate is increased.
There is a natural conflict. On one hand, terminating the trial as early as possible will save costs and labor, expose as few patients as possible to inferior treatments, and allow disseminating information about the treatments quickly. On the other hand, there are pressures to continue the trial for as long as possible in order to increase precision, reduce errors of inference, obtain sufficient statistical power to account for prognostic factors and examine subgroups of interest, and gather information on secondary endpoints.
All of the available statistical methods for interim analyses have some similar characteristics. They
No statistical method is a substitute for judgment. The statistical criteria provide guidelines for terminating the trial because the decision to stop a trial is not based just on statistical information collected on one endpoint.
It may be possible to assess treatment effects after each patient is accrued, treated, and evaluated. Such an approach is impractical in most circumstances, especially for trials that require lengthy followup to determine outcomes.
The first classical likelihood method proposed for this situation is called the sequential probability ratio test (SPRT) and it is based on the likelihood function. (This method is very rarely implemented because it is impractical in the clinical setting, but is important for historical reasons.) Let's review this method in general terms here.
A likelihood function is constructed from a probability model for a sequence of random variables which correspond to the outcome measurements on the experimental units. In the likelihood function, however, the observed data points replace the random variables. Suppose we have a binary response (success/failure) from each patient which is determined immediately after a treatment is administered. (Again, not very practical.) However, for the situation discussed, we are examining one treatment which is administered to every patient. If there are N patients with K successes, and p represents the probability of success within each patient, then the likelihood function is based on the binomial probability function:
\[L(p, K)=p^K(1p)^{NK}\]
This is a very simple likelihood function for a very simple example.
If the investigator is trying to decide whether p_{0} or p_{1} is the more appropriate value of p, then the likelihood ratio can be constructed to assess the evidence:
\[R=\frac{L(p_0, K)}{L(p_1, K}=\left(\frac{p_0}{p_1} \right)^K \left(\frac{1p_0}{1p_1} \right)^{NK} \]
This is a ratio of two different likelihood functions. If R is large, then the evidence is going to favor p_{0}. If R is small, then the evidence is going to favor p_{1}. Therefore, when analyzing interim data, we can calculate the likelihood ratio and stop the trial only if we have the amount of evidence that is expected for the target sample size.
Suppose that N is the target sample size and that after n patients there are k successes. After each treatment we will stop and analyze the data to determine whether to continue the trial or not. Under this scenario, we stop the trial if:
\[R=\frac{L(p_0, K)}{L(p_1, K}=\left(\frac{p_0}{p_1} \right)^k \left(\frac{1p_0}{1p_1} \right)^{nk} \le R_L \text{ or }\ge R_U \]
where R_{L} and R_{U} are prespecified constants. Let's not worry about the details of the statistical calculation here. The values of R_{L} and R_{U} that correspond to testing H_{0}: p = p_{0} versus H_{1}: p = p_{1} are R_{L} = α/(1  β) and R_{U} = (1  β)/α.
A sample schematic of the SPRT in practice is shown below. Here you would calculate R after the treatment of each patient. As you accumulate patients you can see that R is moving around as the trial proceeds. Before we had accrued all of the patients that we wanted we hit the upper boundary and would not recruit the remaining patients.
Here is another example...
The SPRT might be useful in a phase II SE trial in which a treatment is to be monitored closely to determine if it reaches a certain level of success or failure. For example, suppose the investigator considers the treatment successful if p = 0.4 (40% or greater), but considers it a failure if p = 0.2 (20 % or less). Thus, the hypothesis testing problem is H_{0}: p = 0.2 vs. H_{1}: p = 0.4. Suppose we take α = 0.05 and β = 0.05. Then the bounds would be calculated as R_{L} = 1/19 and R_{U} = 19. We would reject H_{0} in favor of H_{1}, and claim success, as soon as R gets small enough, R = (0.5)^{k}(1.33)^{nk} ≤ 1/19. On the other hand, we would stop the trial and accept H_{0} and reject H_{1}, and claim failure, as soon as R ≥ 19.
The statistical formulation for the SPRT is relatively straightforward, but it is more commonly used in a quality control setting than in clinical trials. The obvious criticism is that each patient’s outcome must be observed quickly before you recruit the next patient. The SPRT also has the statistical property that it has a positive probability of never reaching the boundaries R_{L} and R_{U}. If this is the case after the target sample size, N, is reached, then the trial is inconclusive.
First, let's review the Bayesian approach in general and then apply it to our current topic of likelihood methods.
The Bayesian approach to statistical design and inference is very different from the classical approach (the frequentist approach).
Before a trial begins, a Bayesian statistician summarizes the current knowledge or belief about the treatment effect, say we call it θ, in the form of a probability distribution. This is known as the prior distribution for θ. These assumptions are made prior to conducting the study and collecting any data.
Next, the data from the trial are observed, say we call it X, and the likelihood function of X given θ is constructed. Finally, the posterior distribution for θ given X is constructed. In essence, the prior distribution for θ is revised into the posterior distribution based on the data X. The data collection in the study informs or revises the earlier assumptions.
The following schematic describes this Bayesian approach:
The development of the posterior distribution may be very difficult mathematically and it may be necessary to approximate it through computer algorithms.
The Bayesian statistician performs all inference for the treatment effect by formulating probability statements based on the posterior distribution. This is a very different approach and is not always accepted by the more traditional frequentist oriented statisticians.
In the Bayesian approach, θ is regarded as a random variable, about which probability statements can be made. This is the appealing aspect of the Bayesian approach. In contrast, the frequentist approach regards θ as a fixed but unknown quantity (called a parameter) that can be estimated from the data.
As an example of the contrasting philosophies, consider the frequentist description and the Bayesian description of a 95% confidence interval for θ.
Frequentist: "If a very large number of samples, each with the same sample size as the original sample, were taken from the same population as the original sample, and a 95% confidence interval constructed for each sample, then 95% of those confidence intervals would contain the true value of θ." This is an extremely awkward and dissatisfying definition but technically represents the frequentist's approach.
Bayesian: "The 95% confidence interval defines a region that covers 95% of the possible values of θ." This is much more simple and straightforward. (As a matter of fact, most people when they first take a statistics course believe that this is the definition of a confidence interval.)
In a Bayesian analysis, if θ is a parameter of interest, the analysis results in a probability distribution for θ. Using the probability distribution, many statements can be made. For example, if θ represents a probability of success for a treatment, a statement can be made about the probability that θ > 0.90 (or any other value).
With respect to clinical trials, a Bayesian approach can cause some difficulties for investigators because they are not accustomed to representing their prior beliefs about a treatment effect in the form of a probability distribution. In addition, there may be very little prior knowledge about a new experimental therapy, so investigators may be reluctant to or not be able to quantify their prior beliefs. In the business world, the Bayesian approach is used quite often because of the availability of prior information. In the medical field, more often than not, this is not the case.
The choice of a prior distribution can be very controversial. Different investigators may select different priors for the same situation, which could lead to different conclusions about the trial. This is especially true when the data, X, are based on a small sample size because in such situations the prior distributions are modified only slightly to form the posterior distributions. Small sample sizes only modify the prior slightly. This tends to weight the posterior distribution very closely to the prior, therefore you are basing your results almost entirely on your prior assumptions.
When there is little prior information to base your assumptions of the distribution on, Bayesians employ a reference (or vague or noninformative) prior. These are intended to represent a minimal amount of prior information. Although vague priors may yield results similar to those of a frequentist approach, the priors may be unrealistic because they attempt to assign equal weight to all values of θ. Below you can see a very flat distribution, very spread out over a wide range of values.
Similarly, skeptical prior distributions are those that quantify the belief that large treatment effects are unlikely. Enthusiastic prior distributions are those that quantify large treatment effects. Let's not worry about the calculations, but focus instead on the concepts here...
An example of a Bayesian approach for interim monitoring is as follows. Suppose an investigator plans a trial to detect a hazard ratio of 2 (Λ = 2) with 90% statistical power (β = 0.10) using at least a sample size of 90 events. The investigator plans one interim analysis, approximately halfway through trial, and a final analysis. (This is the more standard approach, as opposed to the SPRT where R was calculated after each treatment.)
The estimated logarithm of the hazard ratio is approximately normally distributed with variance (1/d_{1}) + (1/d_{2}), where d_{1} and d_{2} are the numbers of events in the two treatment groups. The null hypothesis is that the treatment groups are the same, i.e., H_{0}: Λ = 1. Note that the log_{e} hazard ratio is 0 under the null hypothesis and the log_{e} hazard ratio is 0.693 when Λ = 2, the proposed effect size.
Suppose the investigator has access to some pilot data or the published report of another investigator, in which there appeared to be a very small treatment effect with 16 events occurring within each of the two treatment groups. The investigator decides that this preliminary study will form the basis of a skeptical prior distribution for the log_{e} hazard ratio with a mean of 0 and a standard deviation of 0.35 = {(1/16) + (1/16)}^{1/2}. This is called a skeptical prior because it expresses skepticism that the treatment is beneficial.
Next, suppose that at the time of the interim analysis, (45 events have occurred), there are 31 events in one group and 14 events in the other group, such that the estimated hazard ratio is 2.25 (calculations not shown). These values are incorporated into the likelihood function, which modifies the prior distribution to yield the posterior distribution for the estimated log_{e} hazard ratio that has a mean = 0.474 and standard deviation = 0.228 (calculations not shown). Therefore we can calculate the probability that Λ is > 2. From the posterior distribution we construct the following probability statement:
\[Pr[\Lambda \ge 2]=1\Phi \left(\frac{log_e(2)0.474}{0.228} \right)=1\Phi(0.961)=0.168\]
where θ represents the cumulative distribution function for the standard normal and is the true hazard ratio.
Conclusion: Based on the results from the interim analysis with a skeptical prior, there is not strong evidence that the treatment is effective because the posterior probability of the hazard ratio exceeding 2 is relatively small. Therefore, there is not enough evidence here to suggest that the study be stopped. What is too large? A reasonable value should be specified in your protocol before these values are determined.
In contrast, suppose that before the onset of the trial the investigator is very excited about the potential benefit of the treatment. Therefore, the investigator wants to use an enthusiastic prior for the log_{e} hazard ratio, i.e., a normal distribution with mean = log_{e}(2) = 0.693 and standard deviation = 0.35 (same as the skeptical prior).
Suppose the interim data results are the same as those described above. This time, the posterior distribution for the log_{e} hazard ratio is normal with mean = 0.762 and standard deviation = 0.228. Then the probability for the posterior prior is:
\[Pr[\Lambda \ge 2]=1\Phi \left(\frac{log_e(2)0.762}{.228} \right)=1\Phi(0.302)=0.682\]
This is a drastic change in the probability based on the assumptions that were made ahead of time. In this case, the investigator still may not consider this to be strong evidence that the trial should terminate because the posterior probability of the hazard ratio exceeding 2 does not exceed 0.90.
Nevertheless, the example demonstrates the controversy that can arise with a Bayesian analysis when the amount of experimental data is small, i.e., the selection of the prior distribution drives the decisionmaking process. For this reason, many investigators prefer to use noninformative priors. Using the Bayesian methods, you can make probability statements about your expected results.
From a frequentist point of view, repeated hypothesis testing of accumulating data increases the type I error rate of a clinical trial. Therefore, the frequentist approach to interim monitoring of clinical trials focuses on controlling the type I error rate.
In most clinical trials, it is not necessary to perform a statistical analysis after each patient is accrued. In fact, for most multicenter clinical trials, interim statistical analyses are conducted only once or twice per year. Usually this frequency of interim analyses detects treatment effects nearly as early as continuous monitoring. The group sequential analysis is defined as the situation in which only a few scheduled analyses are conducted. Again, let's focus more on the concepts than the statistical details.
Suppose that the group sequential approach consists of R analyses, and we let Z_{1}, ... , Z_{R} denote the test statistic at the R times of hypothesis testing. So, we are accumulating data over time. We are adding to the dataset and analyzing the current set that has been collected. Also, we let B_{1}, ... , B_{R} denote the corresponding boundary points (critical values). At the rth interim analysis, the clinical trial is terminated with rejection of the null hypothesis if:
\[ Z_r \ge B_r, r = 1, 2, ... , R\]
The boundary points are chosen such that the overall significance level does not exceed the desired α. There are primarily three schemes for selecting the boundary points which have been proposed. These are illustrated in the following table for an overall significance level of α = 0.05 and for R = 2,3,4,5. The table is constructed under the assumption that n patients are accrued at each of the R statistical analyses so that the total sample size is N = nR.
R

Interim Analysis Number

O'BrienFleming

HaybittlePeto*

Pocock


B

α

B

α

B

α


2

1

2.782

0.0054

3.0

0.002

2.178

0.0294

2

1.967

0.0492

1.960

0.0500

2.178

0.0294


3

1

3.438

0.0006

3.291

0.0010

2.289

0.0221

2

2.431

0.0151

3.291

0.0010

2.289

0.0221


3

1.985

0.0471

1.960

0.0500

2.289

0.0221


4

1

4.084

0.00005

3.291

0.00100

2.361

0.0182

2

2.888

0.0039

3.291

0.00100

2.361

0.0182


3

2.358

0.0184

3.291

0.00100

2.361

0.0182


4

2.042

0.0412

1.960

0.0500

2.361

0.0182


5

1

4.555

0.000005

3.291

0.00100

2.413

0.0158

2

3.221

0.0013

3.291

0.00100

2.413

0.0158


3

2.630

0.0085

3.291

0.00100

2.413

0.0158


4

2.277

0.0228

3.291

0.00100

2.413

0.0158


5

2.037

0.0417

1.960

0.0500

2.413

0.0158

For example, if we plan one interim analysis and a final analysis, we will select the row in this table with R=2. Using these first two rows of the table, we find the critical values for the interim analysis and for the final analysis. If using O'BrienFleming approach, the interim analysis is conducted with bound 2.782 and final analysis with bound 1.967. On the other hand, had the choice been a HaybittlePeto approach, the first test would be conducted with bound 3.0 and the final analysis at 1.96.
In another situation with three interim analyses and a final analysis, R=4. View the corresponding four rows in the middle of the table to determine critical values for each interim and the final analysis. Notice different approaches 'spend' or distribute the overall significance differently across the interim and final analyses.
The Pocock approach uses the same significance level at each of the R interim analyses. Of the three procedures described in the table, it provides the best chance of early trial termination. Many investigators dislike the Pocock approach, however, because of its properties at the final stage of analysis. For example, suppose R = 3 analyses are planned and that statistical significance is not attained at any of the analyses. Suppose that the pvalue at the final analysis is 0.0350 (this is > 0.0221 found in the table for the Pocock approach). If interim analyses had not been scheduled, however, this pvalue would be considered to provide a statistically significant result (cp = 0.0350 < 0.0500).
The HaybittlePeto (based on intuitive reasoning) and O'BrienFleming (based on statistical reasoning) approaches were designed to avoid this problem. On the other hand, these two approaches render it very difficult to attain statistical significance at an early stage.
An example of the Pocock approach is provided in Pocock's book (Pocock. 1983. Clinical Trials: A Practical Approach, New York, John Wiley & Sons). A trial was conducted in patients with nonHodgkin's lymphoma, in which two drug combinations were compared, namely cytoxanprednisone (CP) and cytoxanvincristineprednisone (CVP). The primary endpoint was presence/absence of tumor shrinkage, a surrogate variable.
Patient accrual lasted over two years and 126 patients participated. Statistical analyses were scheduled after approximately every 25 patients. Chisquare tests (without the continuity correction) were performed at each of the five scheduled analyses. The Pocock approach to group sequential testing requires a significance level of 0.0158 at each analysis. Here is a table with the results of these analyses.
Tumor shrinkage treatment  pvalue  
CP  CVP  
Analysis #1  3/14  5/11  p > 0.10 
Analysis #2  11/27  13/24  p > 0.10 
Analysis #3  18/40  17/36  p > 0.10 
Analysis #4  18/54  24/48  0.05 < p < 0.10 
Analysis #5  23/67  31/59  0.0158 < p < 0.10 
Thus, the researchers were concerned that the CVP combination appeared to be clinically better than the CP combination (53% success versus 34% success), yet it did not lead to a statistically significant result with Pocock’s approach. Further analyses with secondary endpoints convinced the researchers that the CVP combination is superior to the CP combination.
How would you decide which of these group sequential methods to use? Since a major concern is the significance level at the final analysis and O'BrienFleming preserves close to the desired alpha for final analysis as well as allowing a strong result to terminate a trial, this has been a popular approach. The REMATCH clinical trial is a good example. Regardless of your choice, it is important to make it clear to study investigators the operating characteristics of any approach selected for interim analyses.
A few drawbacks to the group sequential approach to interim statistical testing include the strict requirements that (1) the number of scheduled analyses, R, must be determined prior to the onset of the trial, and (2) there is equal spacing between scheduled analyses with respect to patient accrual. The alpha spending function approach was developed to overcome these drawbacks: (DeMets DL, Lan KK, 1994, Interim analysis: The alpha spending function approach, Statistics in Medicine 13: 13411352.)
Let τ denote the information fraction available during the course of a clinical trial. For example, in a clinical trial with a target sample size, N, in which treatment group means will be compared, the information fraction at an interim analysis is τ = n/N, where n is the sample size at the time of the interim analysis. If your target sample size is 500 and you have taken measurements on 400 patients then τ = .8
If the clinical trial involves a timetoevent endpoint, then the information fraction is τ = d/D, where D is the target number of events for the entire trial and d is the events that have occurred at the time of the interim analysis.
The alpha spending function, α(τ), is an increasing function. At the beginning of trial: t = 0 and α(t) = 0; at the end of trial: t = 1 and α(t) = α, the desired overall significance level. In other words, every time an analysis is performed, part of the overal alpha is "spent". For the r^{th} interim analysis, where the information fraction is τ_{r}, 0 ≤ τ_{r} ≤ 1, α(τ_{r}) determines the probability of any of the first r analyses leading to rejection of the null hypothesis when the null hypothesis is true. Obtaining the critical values consecutively requires numerically integrating the distribution function. A program is available in this module, along with the DemetsLan paper.
As a simple example, suppose investigators are planning a trial in which patients are examined every two weeks over a 12week period. The investigators would like to incorporate an interim analysis when onehalf of the subjects have completed at least onehalf of the trial. This corresponds to τ = 0.25.
A simple spending function that is a compromise between the Pocock and O'BrienFleming functions, is α(τ) = τα, 0 ≤ τ ≤ 1. This leads to a significance level of 0.012 at the interim analysis and a significance level of 0.04 at the final analysis (calculations not shown). Many variations of spending functions have been devised.
Regardless of whether a sequential, group sequential, or alpha spending function approach is invoked, the estimates of a treatment effect will be biased when a trial is terminated at an early stage. The earlier the decision, the larger the bias. Intuitively, if target sample size is 200 and the trial terminates after 25 patients because of a significant different between treatment groups, you recognize the potential for a lot of bias in this situation. Are 25 patients a representative sample from the population?
As an alternative to the above methods, we might want to terminate a trial when the results of the interim analysis are unlikely to change after accruing more patients (futility assessment/curtailed sampling). It just doesn't look like there could ever be a significant difference!
Unconditional power, as we have used in earlier sample size calculations is the probability of acheiving a significant result at a prespecified alpha under a prespecified alternative treatment effect as calculated at the beginning of a trial. Conditional power is an approach that quantifies the probability of rejecting the null hypothesis of no effect once some data are available. If this quantity is very small, a conclusion can be reached that it would be futile to continue the investigation.
As a simple example, consider the situation in which we want to determine if a coin is fair, so the hypothesis testing problem is:
\[H_0: p = Pr[\text{Heads}] = 0.5 \text{ versus } H_1: p = Pr[\text{Heads}] > 0.5\].
The fixed sample size plan is to toss the coin 500 times, count the number of heads, X. But do we actually need to flip the coin 500 times? Using this futility assessment procedure we could reject H_{0} at the 0.025 significance level if:
\[Z=\frac{X250}{\sqrt{(500)(0.5)(0.5)}} \ge 1.96 \]
This is equivalent to rejecting H_{0} if X ≥ 272. Suppose that after 400 tosses of the coin there are 272 heads. It is futile to proceed further because even if the remaining 100 tosses yielded tails, the null hypothesis still would be rejected at the 0.025 significance level. The calculation of the conditional power in this example is trivial (it equals 1) because no matter what is assumed about the true value of p, the null hypothesis would be rejected if the trial were taken to completion.
You can also look at this in the other direction. Suppose that after 400 tosses of the coin there are 200 heads. The null hypothesis will be rejected if there are at least 72 heads during the remaining 100 tosses.
Even if p = 0.6 (arbitrary assignment), the conditional power is:
\(Pr[X \ge 72  n=100, p=0.6]\)
\(= Pr\left[\frac{X60}{\sqrt{(100)(0.6)(0.4)}} \ge \frac{7260}{\sqrt{(100)(0.6)(0.4)}} \right]\)
\(= Pr[X \ge 2.45] = 0.007\)
The probability based on a standard normal table is calculated to be .007, a very small probability. Thus, it is futile to continue because there is such a small chance of rejecting H_{0}.
Similarly two clinical trial scenarios can be envisioned:
(2) A negative trend consistent with H0 at t . Compute conditional probability of rejecting H0 at end of trial at T given some alternative H1 is true. How large does the true effect need to be before the negative trend is reversed? If probability of trend reversal is highly unlikely, termination might be considered.
9.7.5 Adaptive Designs
As we have seen, emerging trends may cause investigators to consider making changes in a study, such as increasing a sample size or terminating the study. An adaptive design which prespecifies how the study design may change based on observed results can be useful. Group sequential strategies that we have already discussed are examples of a classical approach to some adaptation, that is early termination. In confirmatory trials, any adaptive design must maintain the statistical validity of the conclusions; control of Type I error is critical. On the other hand, adaptive designs for studies aimed at finding safe and effective doses emphasize strategies for assigning more participants to treatments with favorable responses and do not consider control of the type I error rate as important as identifying the most effective doses to enter confirmatory trials.
Table 1 below, (from Bhatt DL, Mehta C. Adaptive designs for clinical trials. New England Journal of Medicine. 2016;375(1):65–74. doi: 10.1056/NEJMra1510061. pmid:27406349)
summarizes strengths and weaknesses of different adaptive designs. In the paper, the authors examine 4 case studies of different adaptive designs used in confirmatory trials.
Here are some practical issues as they relate to single center trials. Typically, an investigator for a singlecenter trial needs to submit an annual report to his/her IRB. The report should address whether the study is safe and whether it is appropriate to continue.
The report should include the following topics:
A multicenter trial is one in which there are one or more clinical investigators at each of a number of locations (centers). Obviously, multicenter trials are of great importance when the disease is not common and a single investigator is capable of recruiting only a handful of patients.
Advantages of a multicenter trial (Pocock, 1983) include the following:
Of course there is a down side... Disadvantages of a multicenter trial include the following:
The NIH requires a Data and Safety Monitoring Board (DSMB) to monitor the progress of a multicenter clinical trial that it sponsors. Although the FDA does not require a pharmaceutical/biotech company to construct a DSMB for its multicenter clinical trials, many companies are starting to use DSMBs on a regular basis.
There are several advantages that a DSMB provides, such as yielding a mechanism for protecting the interests and safety of the trial participants, while maintaining scientific integrity. The manner in which it is constructed should ensure that the DSMB is financially and scientifically independent of the study investigators, so that decisions about early stopping or study continuation are made objectively. Depending on the circumstances, a DSMB may be composed of anywhere from three to ten experts in medicine, statistics, epidemiology, data management, clinical chemistry, and ethics. None of the study investigators should be a part of the DSMB. In addition, the DSMB should not be masked to treatment assignment when it is evaluating a clinical trial. Although investigators and statisticians may submit information and materials to the DSMB for their study, most of the deliberations made by the DSMB are kept confidential. The DSMB reports directly to the sponsor of the multicenter trial (the NIH or the company) and does not report to the investigators.
A DSMB typically examines the following issues when assessing the worth of a multicenter clinical trial:
The major disadvantage of a DSMB holding the decisionmaking authority in a multicenter clinical trial, instead of the investigators, is that expertise may be sacrificed in order to maintain impartiality. Investigators gain valuable knowledge during the course of the trial and it is not possible to provide the DSMB with the totality of this knowledge. Nevertheless, the advantages of a DSMB seem to outweigh this disadvantage during the conduct of a multicenter trial.
A comprehensive book on the aspects of DSMBs is available: Ellenberg, SS. Fleming, TR. DeMets, DL. 2002, Data Monitoring Committees in Clinical Trials, New York, NY: Wiley.
In this lesson, among other things, we learned:
Let's explore Bayesian methods further in this week's discussion and apply what we have learned to the assessment questions.