7.1.4 - Developing and Evaluating Hypotheses

Developing Hypotheses

After interviewing affected individuals, gathering data to characterize the outbreak by time, place, and person, and consulting with other health officials, a disease detective will have more focused hypotheses about the source of the disease, its mode of transmission, and the exposures which cause the disease. Hypotheses should be stated in a manner that can be tested.

Hypotheses are developed in a variety of ways. First, consider what the known epidemiology for the disease: What is the agent's usual reservoir? How is it usually transmitted? What are the known risk factors? Consider all the 'usual suspects'.

Open-ended conversations with those who fell ill or even visiting homes to look for clues in refrigerators and shelves can be helpful. If the epidemic curve points to a short period of exposure, ask what events occurred around that time. If people living in a particular area have the highest attack rates, or if some groups with particular age, sex, or other personal characteristics are at greatest risk, ask "why?". Such questions about the data should lead to hypotheses that can be tested.

Evaluating Hypotheses

There are two approaches to evaluating hypotheses: comparison of the hypotheses with the established facts and analytic epidemiology, which allows testing hypotheses.

A comparison with established facts is useful when the evidence is so strong that the hypothesis does not need to be tested. A 1991 investigation of an outbreak of vitamin D intoxication in Massachusetts is a good example. All of the people affected drank milk delivered to their homes by a local dairy. Investigators hypothesized that the dairy was the source, and the milk was the vehicle of excess vitamin D. When they visited the dairy, they quickly recognized that far more than the recommended dose of vitamin D was inadvertently being adding to the milk. No further analysis was necessary.

Analytic epidemiology, is used when the cause is less clear. Hypotheses are tested, using a comparison group to quantify relationships between various exposures and disease. Case-control, occasionally cohort studies, are useful for this purpose.

Case-control studies

As you recall from last week's lesson, in a case-control study case-patients and controls are asked about their exposures. An odds ratio is calculated to quantify the relationship between exposure and disease.

In general, the more case-patients (and controls) you have, the easier it is to find an association. Often, however, an outbreak is small. For example, 4 or 5 cases may constitute an outbreak. An adequate number of potential controls is more easily located. In an outbreak of 50 or more cases, 1 control per case-patient will usually suffice. In smaller outbreaks, you might use 2, 3, or 4 controls per case-patient. More than 4 controls per case-patient is rarely worth the effort because the power of the study does not increase much when you have more than 4 controls per case-patient (we will talk more on power and sample size in epidemiologic studies later in this course!).

Testing statistical significance
The final step in testing a hypothesis is to determine how likely it is that the study results could have occurred by chance alone. Is the exposure the study results suggest as the source of the outbreak related to the disease after all? The significance of the odds-ratio can be assessed with a chi-square test.  We will also discuss statistical tests that control for many possible factors later in the course.

Cohort studies
If the outbreak occurs in a small, well-defined population a cohort study may be possible. For example, if an outbreak of gastroenteritis occurs among people who attended a particular social function, such as a banquet, and a complete list of guests is available, it is possible to ask each attendee the same set of questions about potential exposures and whether he or she had become ill with gastroenteritis.

After collecting this information from each guests, an attack rate can be calculated for people who ate a particular item (were exposed) and an attack rate for those who did not eat that item (were not exposed). For the exposed group, the attack rate is found by dividing the number of people who ate the item and became ill by the total number of people who ate that item. For those who were not exposed, the attack rate is found by dividing the number of people who did not eat the item but still became ill by the total number of people who did not eat that item.

To identify the source of the outbreak from this information, you would look for an item with:

  • high attack rate among those exposed and
  • a low attack rate among those not exposed (so the difference or ratio between attack rates for the two exposure groups is high); in addition
  • most of the people who became ill should have consumed the item, so that the exposure could explain most, if not all, of the cases.

We will learn more about cohort studies in Week 9 of this course.