Testing for uniqueness of an identifier variable

 

A household level data set derived from U.S. Census data

 

  • H_SEQ is the household sequence number (which is supposed to be unique)
  • each observation is a household
  • the original data set consists of 50,785 households interviewed in the Current Population Survey of March 1999.
H_SEQ    H_FAMINC  H_NUMPER  HG_REG    HRHTYPE   HSUP_WGT 

   1       6         4        3         1         140484     
   2      10         3        3         4         179294
   3      13         5        3         1         193890  
   4       0         2        1         1          33756
   5       1         1        1         7         124633
   6       8         2        1         1         100164
   7       6         5        3         2         133469
      .
      .
      .


The program below makes use of the FIRST.H_SEQ and LAST.H_SEQ variables to determine whether the variable H_SEQ is unique. The first observation of each H_SEQ BY-group is written to the temporary data set unique and all other observations, if any, in a BY-group (the duplicates) are written to the temporary data set dups. The household level data set being examined is a permanent SAS data set named hhcps99 and has been previously sorted by the variable H_SEQ.

   libname in "C:\data\class01\";    

   data unique dups;
    set in.hhcps99;
    by h_seq;

    if first.h_seq = 1 then output unique;
                        else output dups;
   run;

 

 


Continue with BY groups?

 

Another topic?

Wink Plone Theme by Quintagroup © 2013.

Personal tools
This is themeComment for Wink theme