10.1 Double Sampling for Ratio Estimation

 Unit Summary double sampling ratio estimation with double sampling allocation in double sampling for ratio estimation

What is double sampling?

Designs in which initially a sample of units is selected for obtaining auxiliary information only, and then a second sample is selected in which the variable of interest is observed in addition to the auxiliary information.

Double sampling is also called two-phase sampling. It is useful in obtaining auxiliary variables for ratio and regression estimation. Double sampling is also useful for finding information for stratified sampling.

Ratio estimation with double sampling

• yi - variable of interest
• xi - auxiliary variable
• n' - number of units in the first sample (which includes the second sample)
• n - number of units in the second sample

Only in the second samples, both xi and yi values are observed. In the remaining units, (in the first but not the second sample), xi but not yi are observed. Note that observing yi's are expensive whereas observing xi's are not.

If xi and yi are highly linearly correlated and approximately passing through the origin, then the ratio estimate with double sampling may lead to improved estimates. While using the ratio estimate for double sampling, the ratio will be estimated using samples where both (x, y) are observed, i.e., the second sample, whereas $$\tau_x$$ will be estimated by the larger first sample.

The ratio estimator is:

$$\hat{\tau}=r\hat{\tau}_x$$

where    $$r=\dfrac{\sum\limits_{i=1}^n y_i}{\sum\limits_{i=1}^n x_i}$$,

and    $$\hat{\tau}_x=\dfrac{N}{n'}\sum\limits_{i=1}^{n'} x_i$$

Let $$s^2$$ be the sample variance of the y-value, then the estimated variance of the ratio estimator is:

Note that s1 stands for the first sample.

Example for Double Sampling

A forest resource manager is interested in estimating the total number of dead trees in a 400 acre area of heavy infestation. She subdivides the area into 200 plots of equal sizes and uses photo counts to find the number of dead trees in 18 randomly sampled plots. She then randomly samples 8 plots out of these 18 plots and conducts a ground count on these 8 plots.

Estimate the total number of dead trees in the 400 acre area.

Let x denote the number of dead trees in the plot by photo count and y the number of dead trees by ground count. The data are given as:

 Plot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 x' 5 7 10 6 7 9 3 6 8 11 5 9 12 13 3 20 15 4

Out of these 18 plots, 8 are randomly selected and a ground count is conducted.

 Plot 2 3 5 6 12 15 16 17 x 7 10 7 9 9 3 20 15 y 9 13 10 11 10 4 25 17 y-rx 0.3375 0.625 1.3375 -0.1375 -1.1375 0.2875 0.25 -1.5625

Minitab output:

For this example:

• N = 200
• n' = 18
• n = 8

Application Exercise

Compute the ratio estimate for the population total.

[Come up with an answer to this question and then click on the icon to reveal the solution.]

Application Exercise

Now, compute the estimated variance of the ratio estimator.

[Come up with an answer to this question and then click on the icon to reveal the solution.]

Allocation in double sampling for ratio estimation

• c' - the cost of observing an x-variable on one unit
• c - the cost of observing an y -variable on one unit

The total cost = c'n' + cn

For a fixed total cost, the lowest variance of $$\hat{\tau}_r$$ is obtained by:

$$\dfrac{n}{n'}=\sqrt{\dfrac{c'}{c}\times \dfrac{\sigma^2_r}{\sigma^2-\sigma^2_r}}$$

where σr2 is the variance of Y about the ratio line. σ2 is the variance of Y.

σr2 can be estimated by $$s^2_r=\dfrac{\sum\limits_{i=1}^n (y_i-rx_i)^2}{n-1}$$ and σ2 can be estimated by $$s^2=\dfrac{\sum\limits_{i=1}^n(y_i-\bar{y})^2}{n-1}$$.

Application Exercise

If the cost of counting dead trees on a plot by photo count is 1/4 of the cost of a ground count, how are you going to decide upon the optimal subsampling fraction of n and n' ?

[Come up with an answer to this question and then click on the icon to reveal the solution.]

Note that here we use sr2 to estimate σr2 and s2 to estimate σ2. In order for these to be reasonably good estimates, the sample size should not be too small in practical use.

To understand the above result, we can see that if the study is very large scale, for example, if n' = 1000, then we will select n as 75. The proportion is small since 0.885 is small compared to (6.28)2.