A simple example

Review of the data step and how it works

 

The most important things about the data step you can learn from this example:

               data alpha;
                infile datalines;
                input a b c;
                d = c - b;
                datalines;
               1 5 10
               0 8 7
               1 4 6
               ;
               run;

               proc print data= alpha;
               run;
The lines beginning with data alpha; and ending with the first run; are a simple example of a SAS data step. This data step creates a temporary SAS data set named "alpha" and it exists in the default WORK library.

 

Questions:

1. What is meant by a "temporary SAS data set?" How many observations and how many variables are in the data set alpha?  Answer.


2. What does the output from the proc print procedure look like?  Answer.


3. What is meant by the Program Data Vector (PDV)?  Answer.


4. When (after the execution of which statement in the data step) is an observation output?  Answer.


5. What happens after an observation is output?  Answer.


6. What are the three most important default actions of a data step like the one above?  Answer.


7. What would you change in the program to make it read one thousand records placed after the DATALINES statement instead of three?  Answer.


8. What would you change in the program if your raw data records were in a file separate from the program?  Assume your data file has the following path:

"C:\smith\project\data\part1.dat"
Answer.


9. Describe the result if you switched the positions of the statements:

       input a b c;
       d= c - b;
to read:
       d= c - b;
       input a b c;
Answer.


10. What would you add to the program if you wanted it to create a permanent data set alpha instead of a temporary data set?  Answer.


Answers:

1. A temporary SAS data set exists for the duration of your SAS session only (for the duration of your SAS job if you are running SAS in batch) in a temporary directory on the hard drive of the machine that is processing the SAS program.  SAS deletes this directory at the end of your SAS session.  The SAS data set "alpha" has 3 observations and 4 variables.


 

2. Three rows (corresponding to the three observations) and four columns (corresponding to the four variables) showing the values of the variables on each observation. The OBS column shows the observation number.
     OBS  A    B    C    D
      1   1    5    10   5
      2   0    8     7   -1
      3   1    4     6   2

 


 

3. The PDV is an area in memory where the SAS observation that will be output is formed.

 

You can picture the PDV on the first iteration of the data step (first time through) like this.

Before input a b c; is executed:

         A     B     C     D
      -------------------------
      |     |     |     |     |
      |     |     |     |     |
      |     |     |     |     |
      -------------------------

After input a b c; is executed:

         A     B     C     D
      -------------------------
      |     |     |     |     |
      |  1  |  5  | 10  |     |
      |     |     |     |     |
      -------------------------

After d = c - b; is executed:

         A     B     C     D
      -------------------------
      |     |     |     |     |
      |  1  |  5  | 10  | 5   |
      |     |     |     |     |
      -------------------------

 


 

4. After d= c - b; is executed.

 


 

5. The variables: a b c d in the PDV are reset to SAS missing and the statement input a b c; is executed again if there are more data to read in.

 


 

6. Default actions:
  • automatic output of an observation
  • automatic reset of variables in the PDV to missing
  • automatic return to the top of the data step to read the next data record

 


 

7. Nothing.

 


 

8. This program would do it:
filename in "C:\smith\project\data\part1.dat";

data alpha;
 infile in;
 input a b c;
 d= c - b;
run;

 


 

9. The value of D would be missing on all observations.

 


 

10. You would add a libname statement and use a two-part name for the data set to be written.  The libname statement will associate a name of your choice (a "library reference" or "libref" for short) with a name the operating system will recognize as a place to write the permanent SAS data set.  That place, on a directory-based system, is simply a directory, and all the SAS files stored there constitute a "SAS data library."  Here is how a libname statement could be used to create a permanent SAS data set named alpha in the directory:  "C:\smith\project\data_sas\".
libname out 'C:\smith\project\data_sas\';
filename in 'C:\smith\project\data\part1.dat';

data out.alpha;
 infile in;
 input a b c;
 d= c - b;
run;

 


Another topic?

Wink Plone Theme by Quintagroup © 2013.

Personal tools
This is themeComment for Wink theme