Skip Navigation

UNC Carolina Population Center

 

Creating a household level variable from person level data


A person level data set derived from U.S. Census data

  • H_SEQ is the household sequence number
  • PERSON is the person number within the household
  • each observation is a person
  • the original data set consists of 132,324 persons from the 50,785 households interviewed in the Current Population Survey of March 1999.
H_SEQ     PERSON   AGE    MARITL  SEX      HGA       RACE     PERRP     

 1         1       31     1       1        39        1         1
 1         2       36     1       2        44        1         3
 1         3        7     7       1         0        1         4
 1         4        2     7       2         0        1         4

 2         1       31     7       2        39        1         1
 2         2       49     5       2        39        1         6
 2         3       10     7       1         0        1         4

 3         1       50     1       1        43        1         1
 3         2       52     1       2        44        1         3
 3         3        8     7       2         0        1         4
 3         4       10     7       1         0        1         4
 3         5       46     6       2        35        1        12

 4         1       71     1       1        45        1         1
 4         2       68     1       2        43        1         3

 5         1       67     4       2        31        1         2

      .
      .
      .
      .


The data step below creates a household level variable MEMLT18 from the person data shown above. The value of MEMLT18 will be the number of household members less than 18 years of age. A household level temporary SAS data set named hh is written consisting of two variables: H_SEQ and MEMLT18. The person level data set which is input to the data step is a permanent SAS data set named percps99.

  libname in '/afs/isis/depts/cpc/computer/stone/data/class01/';

data hh(keep=h_seq memlt18);
set in.percps99(keep=h_seq age);
by h_seq;

retain memlt18 0;
if first.h_seq then memlt18=0;
if age<18 then memlt18=memlt18+1;
if last.h_seq then output;

label memlt18='hh members < 18 yrs';

run;




Continue with BY groups?
Another topic?
Questions or comments?  If you are affiliated with the Carolina Population Center, send them to Phil Bardsley.