*Programming Examples
*1984 data
* 1994 data
*1994 Moved HH data
*1994 Migrant data
*2000 data
*2000 Moved HH data
*2000 Migrant data
 
You are here: Home > Data > Identifiers > Person Identifier > Programming Examples > 1994 data

Add the Person Identifier (NRPID) to the 1994 data

SAS Program                     go to SAS Log
******************************************************************************
Attach NRPID to the 1994 Individual-Level Data File
1. Select ONLY 1994 Individuals in the PERSONID Data File
2. Restructure PERSON94 into "Child" File,
as 2516 Code 2 Individuals are in the data TWICE
3. Match SPERSON94B to the INDIV94 Data File.

Input data: /nangrong/personid.X01
/nangrong/1994/indiv94.03
*****************************************************************************;

libname in1 xport '/nangrong/personid.X01';
libname in2 xport '/nangrong/1994/indiv94.03';

*********************************************************
* Attach NRPID to the 1994 Individual-Level Data File *
*********************************************************;

* 1. Select ONLY 1994 Individuals in the PERSONID Data File *
--------------------------------------------------------------;
data person94;
set in1.personid(keep=HHID94 CEP94 OHHID94 OCEP94 NRPID);

if (HHID94 ne .);

*** Rename Identifiers ***;

rename
HHID94=DHHID94
CEP94=DCEP94
;

run;

* 2. Restructure PERSON94 into "Child" File,
as 2516 Code 2 Individuals are in the data TWICE *
--------------------------------------------------------;
data person94b;
set person94;
length CEP94 $ 3;

keep HHID94 CEP94 NRPID;

array i1 {2} DHHID94 OHHID94;
array i2 {2} DCEP94 OCEP94;

do i=1 to 2;
HHID94=i1{i};
CEP94=i2{i};
if HHID94 ne . then output;
end;

run;

*** Sort PERSON94B by HHID94 CEP94 ***;

proc sort data=person94b out=sperson94b nodupkey;
by HHID94 CEP94;
run;

* 3. Match SPERSON94B to the INDIV94 Data File *
-------------------------------------------------;
data indiv94_nrpid notin_indiv94 notin_person94a;
merge sperson94b(in=a)
in3.indiv94(in=b);
by HHID94 CEP94;

if a=1 and b=1 then output indiv94_nrpid;
if a=1 and b=0 then output notin_indiv94;
if a=0 and b=1 then output notin_person94a;

run;

*** Check for Duplicates on NRPID in INDIV94_NRPID (SHOULD HAVE 2516!) ***;

proc sort data=indiv94_nrpid out=sindiv94_nrpid nodupkey;
by NRPID;
run;



SAS Log                     go back to SAS Program
102        *********************************************************
103 * Attach NRPID to the 1994 Individual-Level Data File *
104 *********************************************************;
105
106 * 1. Select ONLY 1994 Individuals in the PERSONID Data File *
107 --------------------------------------------------------------;
108 data person94;
109 set in1.personid(keep=HHID94 CEP94 OHHID94 OCEP94 NRPID);
110
111 if (HHID94 ne .);
112
113 *** Rename Identifiers ***;
114
115 rename
116 HHID94=DHHID94
117 CEP94=DCEP94
118 ;
119
120 run;

NOTE: There were 57416 observations read from the data set IN1.PERSONID.
NOTE: The data set WORK.PERSON94 has 42249 observations and 5 variables.
NOTE: DATA statement used:
real time 0.95 seconds
cpu time 0.92 seconds


121
122 * 2. Restructure PERSON94 into "Child" File,
 as 2516 Code 2 Individuals are in data TWICE *
123 ----------------------------------------------------;
124 data person94b;
125 set person94;
126 length CEP94 $ 3;
127
128 keep HHID94 CEP94 NRPID;
129
130 array i1 {2} DHHID94 OHHID94;
131 array i2 {2} DCEP94 OCEP94;
132
133 do i=1 to 2;
134 HHID94=i1{i};
135 CEP94=i2{i};
136 if HHID94 ne . then output;
137 end;
138
139 run;

NOTE: There were 42249 observations read from the data set WORK.PERSON94.
NOTE: The data set WORK.PERSON94B has 44765 observations and 3 variables.
NOTE: DATA statement used:
real time 0.45 seconds
cpu time 0.43 seconds

140
141 *** Sort PERSON94B by HHID94 CEP94 ***;
142
143 proc sort data=person94b out=sperson94b nodupkey;
144 by HHID94 CEP94;
145 run;

NOTE: 0 observations with duplicate key values were deleted.
NOTE: There were 44765 observations read from the data set WORK.PERSON94B.
NOTE: The data set WORK.SPERSON94B has 44765 observations and 3 variables.
NOTE: PROCEDURE SORT used:
real time 0.60 seconds
cpu time 0.56 seconds


146
147 * 3. Match SPERSON94B to the INDIV94 Data File *
148 -------------------------------------------------;
149 data indiv94_nrpid notin_indiv94 notin_person94a;
150 merge sperson94b(in=a)
151 in3.indiv94(in=b);
152 by HHID94 CEP94;
153
154 if a=1 and b=1 then output indiv94_nrpid;
155 if a=1 and b=0 then output notin_indiv94;
156 if a=0 and b=1 then output notin_person94a;
157
158 run;

NOTE: There were 44765 observations read from the data set WORK.SPERSON94B.
NOTE: There were 44765 observations read from the data set IN3.INDIV94.
NOTE: The data set WORK.INDIV94_NRPID has 44765 observations and 70 variables.
NOTE: The data set WORK.NOTIN_INDIV94 has 0 observations and 70 variables.
NOTE: The data set WORK.NOTIN_PERSON94A has 0 observations and 70 variables.
NOTE: DATA statement used:
real time 10.19 seconds
cpu time 9.81 seconds


159
160 *** Check for Duplicates on NRPID in INDIV94_NRPID (SHOULD HAVE 2516!) ***;
161
162 proc sort data=indiv94_nrpid out=sindiv94_nrpid nodupkey;
163 by NRPID;
164 run;

NOTE: 2516 observations with duplicate key values were deleted.
NOTE: There were 44765 observations read from the data set WORK.INDIV94_NRPID.
NOTE: The data set WORK.SINDIV94_NRPID has 42249 observations and 70 variables.
NOTE: PROCEDURE SORT used:
real time 3.21 seconds
cpu time 2.22 seconds


  Last Modified: 02/16/2005 UNC Carolina Population Center