Skip to content. | Skip to navigation

Personal tools

Keeping only the variables you want

 

SAS will automatically write to any data set created by a data step all variables in the input data set(s) plus all new variables created in the data step. If you don't want all these variables you can let SAS know in one of several ways.

Using the KEEP or DROP statement

  • The KEEP statement
    • names variables you want to keep in the output data set(s)
              keep id test1-test10 av;
    • applies to all data sets being created in the data step
    • non-executable (can be placed anywhere in the data step)
              data males females;  /* creating 2 temporary SAS data sets */
                                   /* from one input SAS data set*/        
                set persons;
                keep id age mstat;
                if sex = 1 then output males;
                 else if sex = 2 then output females;
              run;
  • The DROP statement
    • names variables you want to omit from the data set(s) being created
              drop test9 test10 i ;
    • applies to all data sets being created in the data step
    • non-executable (can be place anywhere in the data step)

 

Note that the KEEP and DROP statements simply give you two ways to accomplish the same thing. The one you choose will usually be the one that requires the least typing.

 

Also note that the OUTPUT statement as used above directs the output of the current observation to either the data set males or the data set females. The use of an OUTPUT statement in a data step cancels the automatic output of observations which is otherwise the default.

 

Using the KEEP= or DROP= data set options

  • The KEEP= data set option
    • placed in parentheses following data set name and applies to that data set only
    • if following output data set name, KEEP= names variables to be kept
    • if following input data set name, KEEP= names variables available for processing in the data step
                data males  females;        
                  set persons(keep= id age mstat sex);
                  keep id age mstat;
                  if sex = 1 then output males;
                           else if sex = 2 then output females;
                run;
    • can be used in proc steps
               proc print data= alpha(keep= a b d); 
               run;
      
               proc freq data= temp(drop= id i j k); 
               run;
      
               proc sort data= test(keep= id a b c) 
                          out= s_test; 
                 by id; 
               run;

  • The DROP= data set option works like KEEP= except that variables named are either not kept or not available for processing.

 

 


Another topic?

Questions or comments? If you are affiliated with the Carolina Population Center, send them to Phil Bardsley.