Unit Summary 

Designs in which initially a sample of units is selected for obtaining auxiliary information only, and then a second sample is selected in which the variable of interest is observed in addition to the auxiliary information.
Double sampling is also called twophase sampling. It is useful in obtaining auxiliary variables for ratio and regression estimation. Double sampling is also useful for finding information for stratified sampling.
Ratio estimation with double sampling
Only in the second samples, both x_{i} and y_{i} values are observed. In the remaining units, (in the first but not the second sample), x_{i} but not y_{i} are observed. Note that observing y_{i}'s are expensive whereas observing x_{i}'s are not.
If x_{i} and y_{i} are highly linearly correlated and approximately passing through the origin, then the ratio estimate with double sampling may lead to improved estimates. While using the ratio estimate for double sampling, the ratio will be estimated using samples where both (x, y) are observed, i.e., the second sample, whereas \(\tau_x\) will be estimated by the larger first sample.
The ratio estimator is:
\(\hat{\tau}=r\hat{\tau}_x\)
where \(r=\dfrac{\sum\limits_{i=1}^n y_i}{\sum\limits_{i=1}^n x_i}\),
and \(\hat{\tau}_x=\dfrac{N}{n'}\sum\limits_{i=1}^{n'} x_i\)
Let \(s^2\) be the sample variance of the yvalue, then the estimated variance of the ratio estimator is:
Note that s_{1} stands for the first sample.
A forest resource manager is interested in estimating the total number of dead trees in a 400 acre area of heavy infestation. She subdivides the area into 200 plots of equal sizes and uses photo counts to find the number of dead trees in 18 randomly sampled plots. She then randomly samples 8 plots out of these 18 plots and conducts a ground count on these 8 plots.
Estimate the total number of dead trees in the 400 acre area.
Let x denote the number of dead trees in the plot by photo count and y the number of dead trees by ground count. The data are given as:
Plot 
1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

x' 
5

7

10

6

7

9

3

6

8

11

5

9

12

13

3

20

15

4

Out of these 18 plots, 8 are randomly selected and a ground count is conducted.
Plot 
2

3

5

6

12

15

16

17

x  7  10  7  9  9  3  20  15 
y  9  13  10  11  10  4  25  17 
yrx  0.3375  0.6250  1.3375  0.1375  1.1375  0.2875  0.2500  1.5625 
Minitab output:
For this example:
Compute the ratio estimate for the population total.
[Come up with an answer to this question and then click on the icon to reveal the solution.]
Now, compute the estimated variance of the ratio estimator.
[Come up with an answer to this question and then click on the icon to reveal the solution.]
The total cost = c'n' + cn
For a fixed total cost, the lowest variance of \(\hat{\tau}_r\) is obtained by:
\(\dfrac{n}{n'}=\sqrt{\dfrac{c'}{c}\times \dfrac{\sigma^2_r}{\sigma^2\sigma^2_r}}\)
where σ_{r}^{2} is the variance of Y about the ratio line. σ^{2} is the variance of Y.
σ_{r}^{2} can be estimated by \(s^2_r=\dfrac{\sum\limits_{i=1}^n (y_irx_i)^2}{n1}\) and σ^{2} can be estimated by \(s^2=\dfrac{\sum\limits_{i=1}^n(y_i\bar{y})^2}{n1}\).
If the cost of counting dead trees on a plot by photo count is 1/4 of the cost of a ground count, how are you going to decide upon the optimal subsampling fraction of n and n' ?
[Come up with an answer to this question and then click on the icon to reveal the solution.]
Note that here we use s_{r}^{2} to estimate σ_{r}^{2} and s^{2} to estimate σ^{2}. In order for these to be reasonably good estimates, the sample size should not be too small in practical use.
To understand the above result, we can see that if the study is very large scale, for example, if n' = 1000, then we will select n as 75. The proportion is small since 0.885 is small compared to (6.28)^{2}.