Skip Navigation

UNC Carolina Population Center

 

Retained variables


What is a retained variable?

A retained variable

  • is not automatically set to missing before the next iteration of the data step
  • value is retained from the current iteration to the beginning of the next iteration of the data step
  • any change in value is controlled by the programmer

The RETAIN statement

  • names the variables which should not be set to missing before the next iteration of the data step (the "retained variables")
  • may give initial values (for first iteration of data step)
  • non-executable (can be placed anywhere in the data step)
  • examples:
          retain x y i;

    retain x1-x5 0 name 'alice';

    retain x1-x5 (0 1 2 3 4);


Retained variables are important especially in working with grouped observations. First we'll examine the concept with a simple example.


Example of a data step with a retained variable

     data alpha;
input a b c;
retain runtot 0; /*runtot will keep a running total of a, b, c */
runtot=runtot + (a+b+c);
cards;
2 4 6
3 1 5
0 7 9
8 5 4
;
run;

proc print;
run;

The output from proc print above would look like this:
      A    B    C  RUNTOT
 
      2    4    6  12
      3    1    5  21
      0    7    9  37
      8    5    4  54


How a retained variable behaves during data step execution

To understand the values of the retained variable RUNTOT on the output observations, picture the PDV during data step execution as follows.

Before input a b c; is executed the first time:

        A      B     C   RUNTOT
---------------------------
| | | | |
PDV | | | | 0 |
| | | | |
---------------------------
After input a b c; is executed the first time:
        A      B     C   RUNTOT
---------------------------
| | | | |
PDV | 2 | 4 | 6 | 0 |
| | | | |
---------------------------
After runtot=runtot + (a+b+c); is executed the first time:
        A      B     C   RUNTOT                                             
---------------------------
| | | | |
PDV | 2 | 4 | 6 | 12 | 1st obs output: A B C RUNTOT
| | | | | 2 4 6 12
---------------------------
After input a b c; is executed the second time:
        A      B     C   RUNTOT
---------------------------
| | | | |
PDV | 3 | 1 | 5 | 12 |
| | | | |
---------------------------
After runtot=runtot + (a+b+c); is executed the second time:
        A      B     C   RUNTOT
--------------------------
| | | | |
PDV | 3 | 1 | 5 | 21 | 2nd obs output: A B C RUNTOT
| | | | | 3 1 5 21
--------------------------


Question: What would be the result if you omitted the RETAIN statement in the data step above?


Another topic?
Questions or comments?  If you are affiliated with the Carolina Population Center, send them to Phil Bardsley.