Skip Navigation

UNC Carolina Population Center

 

Restructuring a SAS data set: from a child file to a mother file


Data set CHILD (1 obs=child):

A child is identified by the pair of variables MID CID, where MID identifies the mother.


             MID      CID    AGE    SEX

              1        1      2      f 
              1        2      5      m 
              2        1     10      m 
              2        2     12      m 
              2        3     13      f 
              3        1      7      f 
                  .
                  .
                  .



Data step to create data set MOTHER (1 obs=mother):

First solution

data mother(keep=mid cid1-cid3 age1-age3 sex1-sex3);
set child;
by mid;

length sex1-sex3 $ 1;

retain cid1-cid3 age1-age3 sex1-sex3 i; /*retain variables to be output */
/*and retain pointer variable i */

array c(1:3) cid1-cid3;
array a(1:3) age1-age3;
array s(1:3) sex1-sex3;

if first.mid then do; /*initialize retained variables at start of MID BY group*/
do j=1 to 3;
c(j)=.;
a(j)=.;
s(j)=' ';
end;
i=1;
end;

/*Assign values to output variables from variables in current input observation:*/

c(i)=cid;
a(i)=age;
s(i)=sex;

i=i+1; /*increment i to point to next set of child variables*/

if last.mid then output; /*output observation if this is last child of mother*/
run;


The output data set MOTHER (1 obs=mother) looks like this:

            MID    CID1    AGE1    SEX1    CID2    AGE2    SEX2    CID3    AGE3    SEX3

             1      1       2        f       2       5       m       .       .         
             2      1      10        m       2      12       m       3      13       f  
             3      1       7        f       .       .               .       .         
                .
                . 
                .


Second solution

Here is a second solution to the above restructuring problem. To produce the mother file, the child file is merged with three copies of itself and with appropriate use of the RENAME= and WHERE= data set options.

       

data mother(keep=mid cid1-cid3 age1-age3 sex1-sex3);
merge
child(rename=(cid=cid1 age=age1 sex=sex1) where=(cid1=1))
child(rename=(cid=cid2 age=age2 sex=sex2) where=(cid2=2))
child(rename=(cid=cid3 age=age3 sex=sex3) where=(cid3=3))
;
by mid;
run;



Another topic?
Questions or comments?  If you are affiliated with the Carolina Population Center, send them to Phil Bardsley.