Retained variables

 

What is a retained variable?

A retained variable

  • is not automatically set to missing before the next iteration of the data step
  • value is retained from the current iteration to the beginning of the next iteration of the data step
  • any change in value is controlled by the programmer

The RETAIN statement

  • names the variables which should not be set to missing before the next iteration of the data step (the "retained variables")
  • may give initial values (for first iteration of data step)
  • non-executable (can be placed anywhere in the data step)
  • examples:
          retain x y i . ; * initiate all vars to missing *;
    
          retain x1-x5 0 name 'alice'; * initiate vars x1-x5 to zero *;
                             * and initiate variable name to 'alice' *; 
    
          retain x1-x5 (0 1 2 3 4); * initiate x1 to zero, x2 to one *;
                            * x3 to two, x4 to three, and x5 to four *;

 

Retained variables are important especially in working with grouped observations. First we'll examine the concept with a simple example.

 

Example of a data step with a retained variable

     data alpha;
       input a b c;
       retain runtot 0; /* runtot will keep a running total of a, b, c */
       runtot= runtot + (a + b + c);  
     datalines;
     2 4 6
     3 1 5
     0 7 9
     8 5 4
     ;
     run;

     proc print noobs;
     run;

 

The output from proc print above would look like this:
      A    B    C  RUNTOT
 
      2    4    6  12
      3    1    5  21
      0    7    9  37
      8    5    4  54

 

How a retained variable behaves during data step execution

To understand the values of the retained variable RUNTOT on the output observations, picture the Program Data Vector (PDV) during data step execution as follows.

Before input a b c; is executed the first time:

        A      B     C   RUNTOT
      -------------------------
      |     |     |     |     |
PDV   |     |     |     |  0  |
      |     |     |     |     |
      -------------------------
After input a b c; is executed the first time:
        A      B     C   RUNTOT
      ---------------------------
      |     |     |     |     |
PDV   |  2  |  4  |  6  |  0  |
      |     |     |     |     |
      ---------------------------
After runtot= runtot + (a + b + c); is executed the first time:
        A      B     C   RUNTOT                                             
      -------------------------                         
      |     |     |     |     |
PDV   |  2  |  4  |  6  | 12  |              1st obs output:  A  B  C RUNTOT
      |     |     |     |     |                               2  4  6  12
      -------------------------
After input a b c; is executed the second time:
        A      B     C   RUNTOT
      -------------------------
      |     |     |     |     |
PDV   |  3  |  1  |  5  | 12  |
      |     |     |     |     |
      -------------------------
After runtot= runtot + (a + b + c ); is executed the second time:
        A      B     C   RUNTOT
      -------------------------
      |     |     |     |     |
PDV   |  3  |  1  |  5  | 21  |              2nd obs output:  A  B  C  RUNTOT
      |     |     |     |     |                               3  1  5   21
      -------------------------


Question: What would be the result if you omitted the RETAIN statement in the data step above?

Answer: RUNTOT would just be the value of just  . + a + b + c for the current observation which would make RUNTOT equal to missing for all observations.


Another topic?

 

Wink Plone Theme by Quintagroup © 2013.

Personal tools
This is themeComment for Wink theme