Printer-friendly versionPrinter-friendly version

Threshold models are used in several different areas of statistics, not just time series.  The general idea is that a process may behave differently when the values of a variable exceed a threshold value.  That is, a different model may apply when values are greater than a threshold than when they are below the threshold.  In drug toxicology applications, for example, it may be that all doses below a threshold amount are safe whereas there is increasing toxicity as the dose is increased above the threshold amount.  Or, in an animal population abundance study, the population may slowly increase up to a threshold size but then may rapidly decrease (due to limited food) once the population exceeds a certain size.

Threshold models are a special case of regime switching models (RSM).  In RSM modeling, different models apply to different intervals of values of some key variable(s).

Section 5.5 of our text discusses threshold autoregressive models (TAR) for univariate time series.  In a TAR model, AR models are estimated separately in two or more intervals of values as defined by the dependent variable.  These AR models may or may not be of the same order.  For convenience, it’s often assumed that they are of the same order.

The text considers only a single threshold value so that there will be two separate AR models – one for values that exceed the threshold and one for values that do not.  The difficulties are determining the need for a TAR model, the threshold value to use, and the order(s) of the AR models.  One feature for data that a TAR model may work is that the rates of increase and/or decrease may differ when the values are above some level than when the values are below that level.

The estimation of the threshold level is more or less subjective.  Many analysts explore several different threshold levels in an attempt to provide a good fit to the data (as measured by MSE values and general features of the residuals).  The order(s) of the AR model(s) can also be a trial and error expedition, especially since the inherent model for the data may not be an AR.  Generally analysts start with what they think may be a higher order than necessary and then reduce the order as necessary.

Section 5.5 of the text gives a nice explanation of threshold models and includes a good example.  In this lesson, we’ll discuss the example and give the R code.  The series for the example is monthly rates of deaths due to flu in the United States for 11 years (so n = 132).  Due to the epidemic nature of the flu, the behavior of the series is quite different when the rates go above some threshold value than when it’s below the value.

The first step (always) is to plot the data.  Following is the time series plot of the data.


Note the period of steep increase (and decrease).  The authors also note a slight downward trend so examined first differences.  Following is the time series plot of first differences.


Consistent with the original data, we see sharp increases and decreases in certain periods.  After some experimentation the authors decided to use separate AR(4) models for two regions: data following a first difference greater than or equal to .05 and data following a first difference less than .05.  The model fit well, as evidence by the following plots – the ACF and PACF of the residuals and a plot comparing actual first difference to predicted values of first differences.  In the plot comparing actual and predicted values, the predicted values are along the red dashed line.



R Code for the Example

R code for the example follows.  This code is adapted from code written by Douglas Wiens of the University of Alberta.  

Within the ts.intersect command the lag(,) commands create lags and the matrix that is output will not contain rows with missing values.  In the code, we do a regression fit of an AR(4) model for all of the data in order to set up variables that will be used in the separate regime regressions.  Note also that the threshold value is defined in the command c = .05.  The code will do both regressions, determine the residuals along with their acf/pacf, and create a plot of actual and predicted.

flu = scan("flu.dat")
flu = ts(flu)
y = diff(flu,1)
model = ts.intersect(y, lag1y=lag(y,-1), lag2y=lag(y, -2), lag3y=lag(y,-3), lag4y=lag(y, -4))
x = model[,1]
P = model[,2:5]
c = .05 ## Threshold value
##Regression for values below the threshold
less = (P[,1]<c)
x1 = x[less]
P1 = P[less,]
out1 = lm(x1~P1[,1]+P1[,2]+P1[,3]+P1[,4])
##Regression for values above the threshold
greater = (P[,1]>=c)
x2 = x[greater]
P2 = P[greater,]
out2 = lm(x2~P2[,1]+P2[,2]+P2[,3]+P2[,4])
res1 = residuals(out1)
res2 = residuals(out2)
less[less==1]= res1
greater[greater==1] = res2
resid = less + greater
##Predicted values
less = (P[,1]<c)
greater = (P[,1]>=c)
fit1 = predict(out1)
fit2 = predict(out2)
less[less==1]= fit1
greater[greater==1] = fit2
fit = less + greater
plot(y, type="o")
lines(fit, col = "red", lty="dashed")