Again, once you've read your data into a SAS data set, you probably want to do something with it. A common thing to do is to change the original data in some way in an attempt to answer a research question of interest to you. In the last lesson, we learned how to use assignment statements (and functions) to add some information to all of the observations in the data set. In this lesson, we will learn how to use if-then-else statements to add some information to some but not all of the observations in your data set.
Upon completing this lesson, you should be able to do the following:
In order to complete the lesson you should:
In this lesson, we investigate a number of examples that illustrate how to change a subset of the observations in our data set. In SAS, the most common way to select observations that meet a certain condition is to utilize an if-then statement. The basic form of the statement is:
IF (condition is true) THEN (take this action);
In the previous lesson, we looked at an example in which the condition was:
avg < 65
and the action was:
status = 'Failed'
For each observation, SAS evaluates the condition that follows the keyword IF — in this case, is the student's average less than 65? — to determine if it is true or false. If the condition is true, SAS takes the action that follows the keyword THEN — in this case, change the student's status to 'Failed.' If the conditon is false, SAS ignores the THEN clause and proceeds to the next statement in the DATA step. The condition always involves a comparison of some sort, and the action taken is typically some sort of assignment statement.
Example 5.1. There is nothing really new here. You've already seen an if-then(-else) statement in the previous lesson. Our focus there was primarily on the assignment statement. Here, we'll focus on the entire if-then statement, including the condition. The following SAS program creates a character variable status, whose value depends on whether or not the student's first exam grade is less than 65:
First, note that we continue to work with the grades data set from the last lesson. Again, the data set contains student names (name), each of their four exam grades (e1, e2, e3, e4), their project grade (p1), and their final exam grade (f1). Then, launch and run the SAS program. Review the output from the print procedure to convince yourself that the values of the character variable status have been assigned correctly.
In the previous example, we used the less-than sign to make the comparison. We can use any of the standard comparison operators to make our comparisons as long as we follow the syntax that SAS expects, which is:
|Comparison||SAS syntax||Alternative SAS syntax|
|less than or equal to||<=||LE|
|greater than or equal to||>=||GE|
|not equal to||^=||NE|
|equal to one of a list||in||IN|
It doesn't really matter which of the two syntax choices you use. It's just a matter of preference. To convince yourself that you understand how to use the alternative SAS syntax though, replace the less-than sign (<) in the Example 5.1 program with the letters "LT" (or "lt"). Then, re-run the SAS program and review the output from the PRINT procedure to see that the program indeed performs as expected.
Example 5.2. The following SAS program uses the IN operator to identify those students who scored a 98, 99, or 100 on their project score. That is, students whose p1 value equals either 98, 99, or 100 are assigned the value 'Excellent' for the project variable:
Launch and run the SAS program and review the output from the PRINT procedure to convince yourself that the program performs as described.
A side comment. After being introduced to the comparison operators, students are often tempted to use the syntax EQ in an assignment statement. If you try it, you'll soon learn that SAS will hiccup at you. Assignment statements must always use the equal sign (=).
As the output from Example 5.1 illustrates, there may be occasions when you want to use an if-then-else statement instead of just an if-then statement. In that example, we told SAS only what to do if the condition following the IF keyword was true. By including an else statement, we can tell SAS what to do if the condition following the IF keyword is false.
Example 5.3. The following SAS program creates a character variable status, whose value is "Failed" IF the student's first exam grade is less than 65, otherwise (i.e., ELSE) the value is "Passed":
Launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that the values of the character variable status have been assigned correctly.
Note that, in general, using ELSE statements with IF-THEN statements can save resources:
For greater efficiency, you should construct your IF-THEN-ELSE statements with conditions of decreasing probabilities.
Example 5.4. This if-then-else stuff seems easy enough! Let's try creating another status variable for our grades data set, but this time let's allow its value to depend on the value of the student's fourth exam (e4) rather than the value of the student's first exam (e1):
Launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that the values of the character variable status have been assigned correctly. Oooops! What happened?! SAS assigned a "Failed" status to John Simon, seemingly because his exam score is missing. That's certainly one way of assigning grades, but it's probably not going to make John very happy.
The important point to remember is that SAS considers a missing value to be smaller than any other numerical value. That is, a missing value (.) is considered smaller than 12, smaller than 183, and even smaller than 0. Thus, we should stick to another good programming habit: always program for missing values. Say it to yourself over and over and over again ... always program for missing values ... until you remember it. It may save you alot of trouble down the road.
Example 5.5. Now, let's look at our SAS program again, but this time having written the program so that SAS is told to assign status a missing value (a blank space ' ' since it is a character variable) if e4 is missing (a period . since it is a numeric variable):
Launch and run the SAS program. Review the output from the PRINTprocedure to convince yourself that this time the values of the character variable status really have been assigned correctly. Note that this program also illustrates the use of more than one ELSE statement. You can use as many ELSE statements as necessary, as long as they are attached to a preceding IF-THEN statement.
In addition to the comparison operators that we learned previously, we can also use the following logical operators:
|Operation||SAS syntax||Alternative SAS syntax|
|are both conditions true?||&||AND|
|is either condition true?|||||OR|
|reverse the logic of a comparison||^ or ~||NOT|
You will want to use the AND operator to execute the THEN statement if both expressions that are linked by AND are true, such as here:
IF (p1 GT 90) AND (f1 GT 90) THEN performance = 'excellent';
You will want to use the OR operator to execute the THEN statement if either expression that is linked by the OR is true, such as here:
IF (p1 GT 90) OR (f1 GT 90) THEN performance = 'very good';
And, you will want to use the NOT operator in conjunction with other operators to reverse the logic of a comparison:
IF p1 NOT IN (98, 99, 100) THEN performance = 'not excellent';
Now when we look at examples using these logical operators, why stop at just two ELSE statements? Let's go crazy and program a bunch of them! One thing though — when we do, we have to be extra careful to make sure that our conditions are mutually exclusive. That is, we have to make sure that, for each observation in the data set, one and only one of the conditions holds. This most often means that we have to make sure that the endpoints of our intervals don't overlap in some way.
Example 5.6. The following SAS program illustrates the use of several mutually exclusive conditions within an if-then-else statement. The program uses the AND operator to define the conditions. Again, when comparisons are connected by AND, all of the comparisons must be true in order for the condition to be true.
First, inspect the program to make sure you understand the code. Then, launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that the letter grades have been assigned correctly. Also note how the program in general, and the if-then-else statement in particular, is formatted in order to make the program easy to read. The conditions and assignment statements are aligned nicely in columns and parentheses are used to help offset the conditions. Whenever possible ... okay, make that always ... format (and comment) your programs. After all, you may actually need to use them again in a few years. Trust me ... you'll appreciate it then!
Oh, one more point. You may have noticed, after the condition that takes care of missing values, that the conditions appear in order from A, B, ... down to F. Is the instructor treating the glass as being half-full as opposed to half-empty? Hmmm ... actually, the order has to do with the efficiency of the statements. When SAS encounters the condition that is true for a particular observation, it jumps out of the if-then-else statement to the next statement in the DATA step. SAS thereby avoids having to needlessly evaluate all of the remaining conditions. Hence, we have ourselves another good programming habit ... arrange the order of your conditions (roughly speaking, of course!) in an if-then-else statement so that the most common one appears first, the next most common one appears second, and so on. You'll also need to make sure that your condition concerning missing values appears first in the IF statement, otherwise SAS may bypass it.
Example 5.7. In the previous program, the conditions were written using the AND operator. Alternatively, we could have just used straightforward numerical intervals. The following SAS program illustrates the use of alternative intervals as well as the alternative syntax for the comparison operators:
Launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that the letter grades have again been assigned correctly.
Example 5.8. Now, suppose an instructor wants to give bonus points to students who show some sign of improvement from the beginning of the course to the end of the course. Suppose she wants to add two points to a student's overall average if either her first exam grade is less than her third and fourth exam grade or her second exam grade is less than her third and fourth exam grade. (Don't ask why! I'm just trying to motivate something here.) The operative words here are "either" and "or". In order to accommodate the instructor's wishes, we need to take advantage of the OR comparison operator. When comparisons are connected by OR, only one of the comparisons needs to be true in order for the condition to be true. The following SAS program illustrates the use of the OR operator, the AND operator, and the use of the OR and AND operators together:
First, inspect the program to make sure you understand the code. In particular, note that logical comparisons that are enclosed in parentheses are evaluated as true or false before they are compared to other expressions. In this example:
Launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that, where appropriate, two points were added to the student's average (avg) to get an adjusted average (adjavg). Also, note that we didn't have to worry about programming for missing values here, because the student's adjusted average (adjavg) would automatically be assigned missing if his or her average (avg) was missing. SAS calls this "propagation of missing values."
All of the if-then-else statement examples we've encountered so far involved only numeric variables. Our comparisons could just as easily involve character variables. The key point to remember when comparing character values is that SAS distinguishes between uppercase and lowercase letters. That is, character values must be specified in the same case in which they appear in the data set. We say that SAS is "case-sensitive." Character values must also be enclosed in quotation marks.
Example 5.9. Suppose our now infamous instructor wants to identify those students who either did not complete the course or failed. Because SAS is case-sensitive, any if-then-else statements written to identify the students have to check for those students whose status is 'failed' or 'Failed' or 'FAILED' or ... you get the idea. One rather tedious solution would be to check for all possible "typings" of the word "failed" and "incomp" (for incomplete). Alternatively, we could use the UPCASE function to first produce an uppercase value, and then make our comparisons only between uppercase values. The following SAS program takes such an approach:
Launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that the if-then-else statement that involves the creation of the variable action is inadequate while the one that uses the UPCASE function to create the variable action2 works like a charm.
By the way, when making comparisons that involve character values, you should know that SAS considers a missing character value (a blank space ' ') to be smaller than any letter, and so the good habit of programming for missing values holds when dealing with character variables as well.
All of the examples we've looked at so far have involved performing only one action for a given condition. There may be situations in which you want to perform more than one action.
Example 5.10. Suppose our instructor wants to assign a grade of zero to any student who missed the fourth exam, as well as notify the student that she has done so. The following SAS program illustrates the use of the DO-END clause to accommodate the instructors wishes:
The DO statement tells SAS to treat all of the statements it encounters as one all-inclusive action until a matching END appears. If no matching END appears, SAS will hiccup. Launch and run the SAS program, and review the output of the PRINT procedure to convince yourself that the program accomplishes what we claim.
In this lesson, we learned how to write if-then-else statements in order to change the contents of our SAS data set. The homework for this lesson will give you more practice with this technique so that you become even more familiar with how it works and can use them in your own SAS programming.
Directions. Type up your answers to the following question in a Word file named homework05_yourPSUid. Copy and paste and then label both your SAS program coding, the corresponding run in your log window, and the resulting output from SAS into your Word document. Once you have completed the homework problem, upload the file to the "Lesson #5 Homework Dropbox."
1. Dual-photon absorptiometry (DPA) is a method to measure bone mineral density and evaluate the quality of bone being considered for transplant. The DPA technique is expensive and often unavailable. Therefore, an alternative method is to use a bone score based on four less expensive bone quality measures: i) Singh index (singh), ii) CC ratio (ccratio), iii) calcar width (calcar), and iv) cortical shaft index (csi). The data stored in Bonescor2.dat were collected by Renee Smith of the Bone and Joint Research Lab at the VA Medical Center in Salt Lake City, UT. She used the data to compare the two methods of quantifying bone mineral density. We will use the data here to reinforce the concept of programming for missing values. Oh ... please bear with the details of this homework problem ... it seems longer than it really is.
a. First, read the bone data arising from 30 subjects that is stored in Bonescor2.dat into a temporary data set called bonescore1. The first column contains the variable singh, the second column contains the variable ccratio, the third column contains the variable csi, and the fourth column contains the variable calcar. The fifth column contains the bone score (bone) that Renee Smith formulated from the first four measures, and the sixth column contains the gold standard dual-photon absorptiometry (dpa) measures. When reading in the data, take note that there are a few missing values in the data set.
After, you've successfully read in the data, create four new variables in your bonescore1 data set: flag1, flag2, flag3, and flag4 according to the following specifications:
So, for example, flag1 should be assigned the value 1 if singh is less than or equal to 4, flag1 should be assigned the value 2 if singh is greater than 4, but less than or equal to 5, and flag1 should be assigned the value 3 if singh is greater than 5.
In writing the necessary if-then-else statements to create the four new variables intentionally fail to program for missing values. That is, for example, your if-then-else statement for the creation of flag1 should contain just three lines — one if condition, and two else-if conditions. Similarly, the if-then-else statements for the creation of flag2, flag3, and flag4 should each contain three lines. After you've created the four flag variables, create a fifth new variable called ourscore, which is merely the sum of the four flag variables. Print the bonescore1 data set, and observe that one of the values for flag1 and one of the values for flag3 have been improperly assigned. Why? Also note that the errors in flag1 and flag3 also cause errors in the ourscore variable. Why?
b. Now, let's try to fix the errors in our last if-then-else statements. Create a new data set called bonescore2 by re-reading the data contained in Bonescor2.dat. Copy your if-then-else statements from your bonescore1 data step into your bonescore2 data step. Edit the if-then-else statements by adding an else-if condition to the end of each one that supposedly programs for missing values. For example, add:
else if singh = . then flag1 = .;
to the end of your if-then-else statements that create the flag1 variable. Make a similar mistake while attempting to program for missing values when creating each new flag variable. Print the bonescore2 data set, and observe that one of the values for flag1 and one of the values for flag3 are still improperly assigned. Why is this the case? Again note that the errors in flag1 and flag3 have been propagated to the ourscore variable.
c. Okay, let's finally get it right! Create a new data set called bonescore3 by re-reading the data contained in Bonescor2.dat. Copy your if-then-else statements from your bonescore1 data step into your bonescore3 data step. Now, edit the if-then-else statements by placing the missing value conditions in the correct locations.
After completing this assignment, you should be able to appreciate not only the critical importance of programming for missing values, but doing it correctly!