Lesson 13: Proportional Hazards Regression

After reading this lesson material, you will be able to:

Suppose you wish to consider the impact of a risk factor on the time to the occurrence of an event. For example, Arnlov et al (2010) consider the impact of body mass index (BMI) and metabolic syndrome on the development of cardiovascular disease and death in middle-aged men. The associations were investigated using data from a cohort study of 1758 middle-aged Swedish men residing in one county with over 30 years of follow-up.  The figure below depicts the time to a major cardiovascular event by BMI category and presence (B) or absence (A) of metabolic syndrome.  Is there a difference in these survival curves?

Figure 2: Kaplan-Meier curves for major cardiovascular events in different BMI categories in individuals without MetS (A) and with MetS (B).   

figure 2

figure 2

Figures reproduced from Arnlov, J et al. Impact of Body Mass Index and the Metabolic Syndrome on the Risk of Cardiovascular Disease and Death in Middle-Aged Men. Circulation 2010; 121; 230-236, originally published online 12/28/2009; DOI 10.1161/CIRCULATIONAHA.109.887521

Q:  The occurrence of a major cardiovascular event is a binary response. Would logistic regression, with BMI as a predictor variable, be appropriate to analyze these data?

A: The relationship between the presence or absence of a major cardiovascular event and the predictor variable could be assessed with logistic regression at a particular time, but this would not directly compare the survival curves.  A survival analysis would compare the curves on the basis of time to the event.

Survival analysis methods, such as proportional hazards regression differ from logistic regression by assessing a rate instead of a proportion.  Proportional hazards regression, also called Cox regression, models the incidence or hazard rate, the number of new cases of disease per population at-risk per unit time. If the outcome is death, this is the mortality rate.  The hazard function is the probability that if a person survives to t, they will experience the event in the next instant.

Logistic regression in contrast, considers the proportion of new cases that develop in a given time period, i.e.  the cumulative incidence. Logistic regression estimates the odds ratio; proportional hazards regression estimates the hazard ratio.

The proportional hazards model is as follows:

Let λ(t|X1i, X2i, ... ,XKi) denote the hazard function for the ith person at time t, i = 1, 2, ... , n, where the K regressors are denoted as X1i, X2i, ... , XKi.  The baseline hazard function at time t, i.e., when X1i = 0, X2i = 0, ... , XKi = 0, is denoted as λ0(t). The baseline hazard function is analogous to the intercept term in a multiple regression or logistic regression model.  Notice the baseline hazard function is not specified, but must be positive.

The hazard ratio, λ1(t)/ λ0(t) can be regarded as the relative risk of the event occurring at time t.

The log of the hazard ratio, i.e. the hazard function divided by the baseline hazard function at time t,  is a linear combination of parameters and regressors, i.e.,

\[log\left ( \frac{\lambda \left ( t|X_{1i},X_{2i},...,X_{Ki} \right )}{\lambda_{0}(t) } \right )= \beta_{1}X_{1i}+\beta_{2}X_{2i}+...+\beta_{K}X_{Ki}\]

The ratio of hazard functions can be considered a ratio of risk functions, so the proportional hazards regression model can be considered as function of relative risk (while logistic regression models are a function of an odds ratio). Changes in a covariate have a multiplicative effect on the baseline risk. The model in terms of the hazard function at time t is:

\[\lambda \left ( t|X_{1i},X_{2i},...,X_{Ki} \right )=\lambda_{0} (t)exp\left ( \beta_{1}X_{1i}+\beta_{2}X_{2i}+...+\beta_{K}X_{Ki}  \right )\]

Although no particular probability model is selected to represent the survival times, proportional hazards regression does have an important assumption: the hazard for any individual is a fixed proportion of the hazard for any other individual. (i.e., proportional  hazards). Notice if λ 0(t) is the hazard function for a subject with all the predictor values equal to zero and λ1(t) is the hazard function for a subject with other values for the predictor variables, then the hazard ratio depends only on the predictor variables and not on time t.  This assumption means if a covariate doubles the risk of the event on day one, it also doubles the risk of the event on any other day.

Proportional hazards models can be used for discrete or continuous measures of event time and can incorporate time-dependent covariates (covariates whose values that may change during the observation period).  Using proportional hazards regression, covariate-adjusted hazard (or risk) ratios can be produced.

Let’s return to the original question posed by Arnlov and colleagues…do BMI and metabolic syndrome affect the development of cardiovascular disease? Read the Arnlov et al.  Impact of Body Mass Index and the Metabolic Syndrome on the Risk of Cardiovascular Disease and Death in Middle Aged Men Circulation 2010;121;230-236,  giving particular attention to the statistical methods, results and conclusions.

Compare and contrast the proportional hazards regression approach to a logistic regression approach by reading Franco et al.  Trajectories of Entering the Metabolic Syndrome: The Framingham Heart Study. Circulation 2009;120;1943-1950; originally published online Nov 2, 2009; American Heart Association. 7272 Greenville Avenue, Dallas, TX DOI: 10.1161/CIRCULATIONAHA.109.855817

You may also compare the results of the two studies. Both papers are in the Readings folder for Week 14.

 Note: If you are not familiar with Kaplan-Meier curves, read the Survival Analysis chapter in The Research Methods II monographs from the Journal of Tropical Pediatrics

(accessed 4/16/2010 at  http://www.oxfordjournals.org/our_journals/tropej/online/ma_chap12.pdf )