Lesson 18: Generating Data With Do Loops

Introduction

When programming, you can find yourself needing to tell SAS to execute the same statements over and over again. That's when a DO loop can come in and save your day. The actions of some DO loops are unconditional in that if you tell SAS to do something 20 times, SAS will do it 20 times regardless. We call those kinds of loops iterative DO loops. On the other hand, actions of some DO loops are conditional in that you tell SAS to do something until a particular condition is met or to do something while a particular condition is met. We call the former a DO UNTIL loop and the latter a DO WHILE loop. In this lesson, we'll explore the ins and outs of these three different kinds of loops, as well as take a look at lots of examples in which they are used. Then, in the next lesson, we'll use DO loops to help us process arrays.

Learning objectives & outcomes

Upon completing this lesson, you should be able to do the following:

Our "to do" list for this lesson

In order to complete the lesson you should:

  1. Read the lesson pages that follows.
  2. Type up your answers to the homework problems in a Word file named homework18_yourPSUloginid. By now you should be used to the format. If your PSU user id is xyz123, then name your file homework18_xyz123. Upload the file to the Lesson #18 Homework Dropbox.
  3. Post any questions or comments you have concerning the lesson's material to the Lesson #18 General Discussion Board.
  4. Take the Lesson #18 Mastery Quiz. Remember two things: i) You have 20 minutes to complete the quiz, and ii) as soon as you hit the "submit" button, your answers are submitted and graded, and the quiz becomes closed to you.

18.1 - Constructing Do Loops

In this section, we'll explore the use of iterative DO loops, in which you tell SAS to execute a statement or a group of statements a certain number of times. Let's take a look at some examples!

Example 18.1. The following program uses a DO loop to tell SAS to determine what four times three (4 × 3) equals:

Okay... admittedly, we could accomplish our goal of determining four times three in a much simpler way, but then we wouldn't have the pleasure of seeing how we can accomplish it using an iterative DO loop! The key to understanding the DATA step here is to recall that multiplication is just repeated addition. That is, four times three (4 × 3) is the same as adding three together four times, that is, 3 + 3 + 3 + 3. That's all that the iterative DO loop in the DATA step is telling SAS to do. After having initialized answer to 0, add 3 to answer, then add 3 to answer again, and add 3 to answer again, and add 3 to answer again. After SAS has added 3 to the answer variable four times, SAS exits the DO loop, and since that's the end of the DATA step, SAS moves onto the next procedure and prints the result.

The other thing you might want to notice about the DATA step is that there is no input data set or input data file. We are generating data from scratch here, rather than from some input source. Now, launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that our code properly calculates four times three.

Ahhh, what about that i variable that shows up in our multiply data set? If you look at our DATA step again, you can see that it comes from the DO loop. It is what is called the index variable (or counter variable). Most often, you'll want to drop it from your output data set, but its presence here is educational. As you can see, its current value is 5. That's what allows SAS to exit the DO loop... we tell SAS only to take the actions inside the loop until i equals 4. Once i becomes greater than 4, SAS jumps out of the loop, and moves on to the next statements in the DATA step. Let's take a look at the general form of iterative DO loops.

General Form of Iterative Do Loops

To construct an iterative DO loop, you need to start with a DO statement, then include some action statements, and then end with an END statement. Here's what a simple iterative DO loop should look like:

   DO index-variable = start TO stop BY increment;
        action statements;
   END;

where:

For example, this DO statement:

      do jack = 1 to 5;

tells SAS to create an index variable called jack, start at 1, increment by 1, and end at 5, so that the values of jack from iteration to iteration are 1, 2, 3, 4, and 5. And, this DO statement:

      do jill = 2 to 12 by 2;

tells SAS to create an index variable called jill, start at 2, increment by 2, and end at 12, so that the values of jill from iteration to iteration are 2, 4, 6, 8, 10, and 12.

Explicit OUTPUT Statements

Example 18.2. The following program uses an iterative DO loop to tell SAS to determine the multiples of 5 up to 100:

In this case, we are not interested in one particular multiplication, but rather in a series of multiplications, 1 × 5, 2 × 5, 3 × 5, ... That's where the OUTPUT statement comes into play. The previous example created just one observation, because it relied on the automatic output at the end of the DATA step. Here, we override the automatic output by explicitly telling SAS to output the value of the multiple variable every time that SAS adds 5 to it. The DATA statement's DROP= option tells SAS not to bother to output the index variable i. Now, launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that our code properly generates the multiples of 5.

Example 18.3. The following SAS program uses an iterative DO loop to count backwards by 1:

As you can see in this DO statement, you can decrement a DO loop's index variable by specifying a negative value for the BY clause. Here, we tell SAS to start at 20, and decrease the index variable by 1, until it reaches 1. The OUTPUT statement tells SAS to output the value of the index variable i for each iteration of the DO loop. Launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that our code properly counts backwards from 20 to 1.

Specifying a Series of Items

Rather than specifying start, stop and increment values in a DO statement, you can tell SAS how many times to execute a DO loop by listing items in a series. In this case, the general form of the iterative DO loop looks like this:

   DO index-variable = value1, value2, value3, ...;
        action statements;
   END;

where the values can be character or numeric. When the DO loop executes, it executes once for each item in the series. The index variable equals the value of the current item. You must use commas to separate items in a series. To list items in a series, you must specify

1) either all numeric values:

   DO i = 1, 2, 3, 4, 5;

2) all character values, with each value enclosed in quotation marks:

   DO j = 'winter', 'spring', 'summer', 'fall';

3) or all variable names:

   DO k = first, second, third;

In this case, the index variable takes on the values of the specified variables. Note that the variable names are not enclosed in quotation marks, while quotation marks are required for character values.

18.2 - Nesting Do Loops

One way to make iterative DO loops even more powerful is to place one DO loop inside of another. Putting a DO loop within another DO loop is called nesting. We'll take a look at a few examples here.

Example 18.4. Suppose you are interested in conducting an experiment with two factors A and B. Suppose factor A is, say, the amount of water with levels 1, 2, 3, and 4; and factor B is, say, the amount of sunlight, say with levels 1, 2, 3, 4, and 5. Then, the following SAS code uses nested iterative DO loops to generate the 4 by 5 factorial design:

First, launch and run the SAS program. Then, review the output from the PRINT procedure to see the contents of the design data set. By doing so, you can get a good feel for how the nested DO loops work. First, SAS sets the value of the index variable i to 1, then proceeds to the next step which happens to be another iterative DO loop. While i is 1:

SAS then sets the value of the index variable i to 2, then proceeds through the inside DO loop again just as described above. This process continues until SAS sets the value of index variable i to 5, jumps out of the outside DO loop, and ends the DATA step.

Example 18.5. Back to our experiment with two factors A and B. Suppose this time that factor A is, say, the amount of water with levels 10, 20, 30, and 40 liters; and factor B is, say, the amount of sunlight, say with levels 3, 6, 9, 12, and 15 hours. The following SAS code uses two DO loops with BY options to generate a more meaningful 4 by 5 factorial design that corresponds to the exact levels of the factors:

First, launch and run the SAS program. Then, review the output from the PRINT procedure to see the contents of the design data set. By doing so, you can get a good feel for how the nested DO loops with BY options work. First, SAS sets the value of the index variable i to 10, then proceeds to the next step which happens to be another iterative DO loop. While i is 10:

SAS then sets the value of the index variable i to 20, then proceeds through the inside DO loop again just as described above. This process continues until SAS sets the value of index variable i to 50, jumps out of the outside DO loop, and ends the DATA step.

18.3 - Iteratively Processing Data

So far all of the examples that we've looked at have involved using DO loops to generate one or more observations from one iteration of the DATA step. Now, let's look at a example that involves reading a data set, and then using a DO loop to compute the value of a new variable.

Example 18.6. Every Monday morning, a credit union in Pennsylvania announces the interest rates for certificates of deposit (CDs) that it will honor for CDs opened during the business week. Suppose you want to determine how much each CD will earn at maturity with an initial investment of $5,000. The following program reads in the interest rates advertised one week in early 2009, and then uses a DO loop to calculate the value of each CD when it matures:

Let's work our way through the code to see how SAS processes the first observation, say. As the INPUT statement suggests, each record in the instream data contains three pieces of information: the type of CD (Type), the annual interest rate (AnnualRate), and the time to maturity in months (Months). A new variable called Investment and the index variable i are created within the DATA step. Therefore, at the end of the compile phase, the program data vector looks like this:

In the first iteration of the DATA step, the first observation is read from the instream data, the Investment variable is initialized to 5000, and the index variable i is set to 1. At the start of the DO loop, therefore, the program data vector looks like this:

The assignment statement tells SAS to take the current value of Investment, 5000, and add to it the amount of interest earned in one month. Because our input data set contains annual rates, we need to divide the annual rates by 12 to get monthly interest rates. The annual rate for the 3 Month certificate is 0.0198, so that makes the monthly rate 0.0198 divided by 12, or 0.00165. Multiply that monthly rate, 0.00165, by the current value of Investment, 5000, and you get 8.25. So, after one month in a 3 Month certificate, your 5000 dollars will have turned into 5008.25. Here's what the program data vector looks like with the updated Investment value:

Being at the end of the DO loop SAS returns to the top of the DO loop to determine if it needs to be executed again. Notice that the Months variable is used as the stop value in the DO loop. As a result, the DO loop executes the number of times that are specified by the current value of Months, which is 3. The index variable is increased to 2. Because it is not greater than 3, SAS processes the DO loop again. SAS multiplies the current value of Investment, 5008.25, by the monthly rate, 0.00165, to determine that the interest earned in the second month is 8.2636125. Therefore, after two months in a 3 Month certificate, your 5000 dollars will have turned into 5008.25 + 8.2636125, or 5016.5136 dollars. Here's what the program data vector looks like at the end of the second iteration of the DO loop:

Being at the end of the DO loop SAS returns to the top of the DO loop to determine if it needs to be executed again. The index variable is increased to 3. Because it is not greater than 3, SAS processes the DO loop again. SAS multiplies the current value of Investment, 5016.5136, by the monthly rate, 0.00165, to determine that the interest earned in the third month is 8.2772474. Therefore, after three months in a 3 Month certificate, your 5000 dollars will have turned into 5016.5136 + 8.2772474, or 5024.7908 dollars. Here's what the program data vector looks like at the end of the third iteration of the DO loop:

Being at the end of the DO loop SAS returns to the top of the DO loop to determine if it needs to be executed again. The index variable is increased to 4. Because it is greater than 3, SAS steps out of the DO loop and moves onto the next statement. Here's what the program data vector looks like now:

The FORMAT statement is not an executable statement. It is used in the compile phase to create the program data vector. Therefore, SAS has reached the end of the DATA step, and therefore writes the program data vector to create the first observation in the cdinvest data set:

Because of the DROP= data set option, SAS does not write the value of the index variable i to the output data set. Launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that SAS created the first observation as claimed. Of course, the other observations are created just as the first one was created as described above.

18.4 - Conditionally Executing Do Loops

As you now know, the iterative DO loop requires that you specify the number of iterations for the DO loop. However, there are times when you want to execute a DO loop until a condition is reached or while a condition exists, but you don't know how many iterations are needed. That's when the DO UNTIL loop and the DO WHILE loop can help save the day!

In this section, we'll first learn about the DO UNTIL and DO WHILE loops. Then, we'll look at another form of the iterative DO loop that combines features of both conditional and unconditional DO loops.

The DO UNTIL Loop

When you use a DO UNTIL loop, SAS executes the DO loop until the expression you've specified is true. Here's the general form of a DO UNTIL loop:

   DO UNTIL (expression);
      action statements;
   END;

where expression is any valid SAS expression enclosed in parentheses. The key thing to remember is that the expression is not evaluated until the bottom of the loop. Therefore, a DO UNTIL loop always executes at least once. As soon as the expression is determined to be true, the DO loop does not execute again.

Example 18.7. Suppose you want to know how many years it would take to accumulate $50,000 if you deposit $1200 each year into an account that earns 5% interest. The following program uses a DO UNTIL loop to perform the calculation for us:

Recall that the expression in the DO UNTIL statement is not evaluated until the bottom of the loop. Therefore, the DO UNTIL loop executes at least once. On the first iteration, the value variable is increased by 1200, or in this case, set to 1200. Then, the value variable is updated by calculating 1200 + 1200*0.05 to get 1260. Then, the year variable is increased by 1, or in this case, set to 1. The first observation, for which year = 1 and value = 1260, is then written to the output data set called investment. Having reached the bottom of the DO UNTIL loop, the expression (value >= 50000) is evaluated to determine if it is true. Since value is just 1260, the expression is not true, and so the DO UNTIL loop is executed once again. The process continues as described until SAS determines that value is at least 50000 and therefore stops executing the DO UNTIL loop.

Launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that it would take 23 years to accumulate at least $50,000.

The DO WHILE Loop

When you use a DO WHILE loop, SAS executes the DO loop while the expression you've specified is true. Here's the general form of a DO WHILE loop:

   DO WHILE (expression);
      action statements;
   END;

where expression is any valid SAS expression enclosed in parentheses. An important difference between the DO UNTIL and DO WHILE statements is that the DO WHILE expression is evaluated at the top of the DO loop. If the expression is false the first time it is evaluated, then the DO loop doesn't even execute once.

Example 18.8. The following program attempts to use a DO WHILE loop to accomplish the same goal as the program above, namely to determine how many years it would take to accumulate $50,000 if you deposit $1200 each year into an account that earns 5% interest:

Launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that ... OOPS! There is no output! The program fails, because in a DO WHILE loop, the expression, in this case (value >= 50000), is evaluated at the top of the loop. Since value is set to missing before the first iteration of the DATA step, SAS can never enter the DO WHILE loop. Therefore, the code proves to be ineffective. Review the log to convince yourself that the investtwo data set contains no observations, because the DO WHILE loop was unable to execute.

Example 18.9. Now, the following program correctly uses a DO WHILE loop to determine how many years it would take to accumulate $50,000 if you deposit $1200 each year into an account that earns 5% interest:

Note that there are just three differences between this program and that of the successful program in Example 18.7 that uses the DO UNTIL loop: i) The value variable is initialized to 0; ii) UNTIL has been changed to WHILE; and iii) the expression to be checked is now (value < 50000). Because value is set to 0 and is therefore less than 50000 at the outset, SAS can now enter the DO WHILE loop to perform our desired calculations.

The calculations proceed as before. First, the value variable is updated to by calculating 0 + 1200, to get 1200. Then, the value variable is updated by calculating 1200 + 1200*0.05 to get 1260. Then, the year variable is increased by 1, or in this case, set to 1. The first observation, for which year = 1 and value = 1260, is then written to the output data set called investthree. SAS then returns to the top of the DO WHILE loop, to determine if the expression (value < 50000) is true. Since value is just 1260, the expression is true, and so the DO WHILE loop executes once again. The process continues as described until SAS determines that value is as least 50000 and therefore stops executing the DO WHILE loop.

Launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that this program also determines that it would take 23 years to accumulate at least $50,000.

Using Conditional Clauses in an Iterative DO Loop

You have now seen how the DO WHILE and DO UNTIL loops enable you to execute statements repeatedly, but conditionally so. You have also seen how the iterative DO loop enables you to execute statements a set number of times unconditionally. Now, we'll put the two together to create a form of the iterative DO loop that executes DO loops conditionally as well as unconditionally.

Example 18.10. Suppose again that you want to know how many years it would take to accumulate $50,000 if you deposit $1200 each year into an account that earns 5% interest. But this time, suppose you also want to limit the number of years that you invest to 15 years. The following program uses a conditional iterative DO loop to accumulate our investment until we reach 15 years or until the value of our investment exceeds 50000, whichever comes first:

Note that there are just two differences between this program and that of the program in Example 18.7 that uses the DO UNTIL loop: i) The iteration i = 1 to 15 has been inserted into the DO UNTIL statement; and ii) because the index variable i is created for the DO loop, it is dropped before writing the contents from the program data vector to the output data set investfour.

Launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that, in this case, the 15 years comes first. That is, the portion of the DO statement that tells SAS to stop executing the DO loop is the iterative i = 1 to 15 part.

Example 18.11. Suppose this time that you want to know how many years it would take to accumulate $50,000 if you deposit $3600 each year into an account that earns 5% interest. Suppose again that you want to limit the number of years that you invest to 15 years. The following program uses a conditional iterative DO loop to accumulate our investment until we reach 15 years or until the value of our investment exceeds 50000, whichever comes first:

There is just one difference between this program and that of the previous program. The amount value is increased has been changed from 1200 to 3600. Launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that, this time, the $50,000 comes first. That is, the portion of the DO statement that tells SAS to stop executing the DO loop is the conditional (value >= 50000) part.

The two examples of using conditional clauses with an iterative DO loop that we looked at involved using a DO UNTIL loop. We alternatively could have used a DO WHILE loop. The main thing to keep in mind is that, as before, the UNTIL expression is evaluated at the bottom of the DO loop, so the DO loop always executes at least once. The WHILE expression is evaluated before the execution of the DO loop. So, if the condition is not true, the DO loop never executes.

18.5 - Creating Samples

Because a DO loop executes statements iteratively, it provides an easy way to select a sample of observations from a large data set. Let's take a look at an example!

Example 18.12. The following program uses an iterative DO loop and the SET statement's POINT= option to select every 100th observation from the permanent data set called stat481.log11 that contains 8,624 observations:

Let's work our way through the code. The DO statement tells SAS to start at 100, increase i by 100 each time, and to end at 8600. That is, SAS will execute the DO loop when the index variable i equals 100, 200, 300, ..., 8600.

Now the SET statement contains an option that we've not seen before, namely the POINT= option. The POINT= option tells SAS not to read the stat481.log11 data set sequentially as is done by default, but rather to read the observation number specified by the POINT= option directly from the data set. For example, when i = 100, and therefore POINT = 100, SAS reads the 100th observation in the stat481.log11 data set. And when i = 3200, and therefore POINT = 3200, SAS reads the 3200th observation in the stat481.log11 data set.

The OUTPUT statement, of course, tells SAS to write to the output data set the observation that has been selected. If we did not place the OUTPUT statement within the DO loop, the resulting data set would contain only one observation, that is, the last observation read into the program data vector.

The STOP statement, which is new to us, is necessary because we are using the POINT= option. As you know, the DATA step by default continues to read observations until it reaches the end-of-file marker in the input data. Because the POINT= option reads only specified observations, SAS cannot read an end-of-file marker as it would if the file were being read sequentially. The STOP statement tells SAS to stop processing the current DATA step immediately and to resume processing statements after the end of the current DATA step. It is the use of the STOP statement, therefore, that keeps us from sending SAS into the no-man's land of continuous looping.

Now, right-click to download and save the stat481.log11 data set in a convenient location on your computer. Launch the SAS program, and edit the LIBNAME statement so that it reflects the location in which you saved the data set. Then, run the program and review the output from the PRINT procedure to see the selected observations. You shouldn't be surprised to see that the sample data set contains 86 observations:

as the iterative DO loop executes 8600 divided by 100, or 86 times.

Note! It is important to emphasize that the method we illustrated here for selecting a sample from a large data set has nothing random about it. That is, we selected a patterned sample, not a random sample, from a large data set. That's why this section is called Creating Samples, not Creating Random Samples. We'll learn how to select a random sample from a large data set in Stat 482.

18.6 - Summary

In this lesson, we explored four different kinds of loops — the iterative DO loop, the DO UNTIL loop, the DO WHILE loop, as well as an iterative DO loop with a conditional clause. We looked at many different applications of DO loops as well. The following homework will give you more practice with DO loops.

Homework

Directions. Type up your answers to the following problem in a Word file named homework18_yourPSUloginid. (So for example, I would submit mine as homework18_bel.) Copy, paste and label your SAS program code, your SAS log window, and the resulting output into your Word document. Once you have completed the homework problems in this lesson, upload the file to the Lesson #18 Homework Dropbox.

1. Write a program to create a data set that when printed looks like the following multiplication table:

Some hopefully helpful hints:

  • Your program should need just one DATA step containing one iterative DO loop and then, of course, one PRINT procedure.
  • If you think about it, each row of the multiplication table is just the set of multiples for the digit that heads that row. For example, for the row that is headed by i = 2, the row contains the multiples of two, that is, 2, 4, 6, ..., 18. So, you should be able to extend the code we used for creating multiples of five, so that it creates multiples of one, multiples of two, ..., up to multiples of nine.
  • As you know, SAS doesn't care for variable names that begin with numbers, such as 1, 2, and so on. Therefore, you can use valid variable names, like v1, v2, ... in conjunction with labels 1, 2, ... to create the appropriate column headings for the multiplication table.

2. Write a program to determine the number of months it would take to earn $10,000 if you saved $300 each month in an account earning 6 percent annually. When you print the data set that you create, it should look like this: