Skip Navigation

UNC Carolina Population Center

 

Examples from a current project

Original N's:

  • records in original raw data file: 4,398,590 persons
  • number of id values occurring 2+ times: 81,532
  • total # of records with duplicate id problem: 170,029
  • # of observations in SAS data set of persons after removing all those with duplicate id: 4,228,561

Files built for analysis, including intermediate files:

  • N's: 1,408,233---252,172,992 observations
  • File sizes: 75 mb---6.8 gb
  • Job times: Real from 26 1/2 minutes to 3 1/2 hours, CPU from 21 1/2 minutes to 2 3/4 hours