You are here: Home / FAQ / About Data

Questions about Data

 

GENERAL

How does the Public-Use dataset compare to the Restricted-Use dataset?

The Add Health Public-Use data can now be downloaded from the Inter-University Consortium for Political and Social Research (ICPSR).  The Public-Use dataset contains all the data from the In-home Interview, just a smaller sampling.  Public-Use data doesn’t contain ID numbers of friends or siblings, so the data cannot be linked.

 

How much memory do I need to accommodate all Add Health datasets that are available?

Although the Add Health data require less than 2.5 G of space, you may also need to have space available for software and temp files created by the software, depending on your computing configuration.  Most users purchase a device with 60G memory.

 

Are geocodes included in the restricted-use data?

Due to deductive disclosure concerns, geocodes are not available with the restricted-use data.  However, Add Health has established a set of requirements for investigators seeking to add supplemental contextual data to Add Health. A brief introduction to the Ancillary Study proposal process and costs is available here.  Please email Add Health at addhealth@unc.edu with questions and to request the requisite proposal application forms and policies.

 

Can the Add Health data be linked to Census information (neighborhood)?

Add Health does not provide geocodes with the restricted-use data which would allow you to add your own Census data.  Many Census variables have been linked to the Add Health data.  A description of Add Health contextual data and the codebooks are available here.  Add Health has established a set of requirements for investigators seeking to add supplemental contextual data to Add Health. A brief introduction to the Ancillary Study proposal process and costs is available here.  Please email Add Health at addhealth@unc.edu with questions and to request the requisite proposal application forms and policies.

 

Can supplemental contextual or biological data be added to Add Health? 

Add Health has established a set of requirements for investigators seeking to add supplemental contextual or biological data to Add Health, under the auspices of an Add Health ancillary study. An ancillary study is any study that derives support from independent funds outside the Add Health Program Project, and does one or more of the following:

  1. Collects new, original questionnaire data on Add Health respondents
  2. Merges secondary data sources onto Add Health respondent or school records and requires personal identifiers (e.g., geocodes) to perform these linkages
  3. Collects new biospecimens from Add Health respondents
  4. Uses archived biospecimens collected by the Add Health study

 A brief introduction to the Ancillary Study proposal process and costs is available here.  Please email Add Health at addhealth@unc.edu  with questions and to request the requisite proposal application forms and policies.

 

How do I read a SAS export file?

The following SAS commands will allow you to read a SAS export file:

   libname in xport '/directory path where file is located/SAS export file name';
   data wave1;
       set in.SAS dataset name;
   run;

For example, the Add Health dataset name on the CD is ALLWAVE1.EXP, the internal SAS dataset name is ALLWAVE1, and your CD drive is D:

libname in xport 'd:allwave1.xpt';
data wave1;
   set in.allwave1;
run;

 

How do I read a SAS export file with STATA?

Using STATA 8.1 you can use the following STATA command to read a SAS export file.

fdause datasetname.xpt

If the SAS file is named datasetname.exp, rename the file to datasetname.xpt before running the STATA command.

 

How do I read a SAS export file with SPSS?

The following SPSS command allows you to read a SAS export file.

GET SAS DATA="\folder\datasetname.xpt".

If the SAS file is named datasetname.exp, rename the file to datasetname.xpt before running the SPSS command.

 

What numbers should be used for the NIH Inclusion Enrollment Report?

The Wave I inclusion enrollment numbers are available in this downloadable file.  In addition, the cumulative inclusion enrollment report can be downloaded here.

What are the codes for anti-hypertensive medications?

The codes for the anti-hypertensive medications are:

'040-047-xxx' 'BETA-BLOCKERS'

'040-049-156' 'THIAZIDE DIURETICS'

'040-042-xxx' 'ACE-INHIBITORS'

'040-043-xxx' 'ANTI-ADRENERGICS (peripherally acting)'

'040-044-xxx' 'ANTI-ADRENERGICS (centrally acting)'

'040-048-xxx' 'CALCIUM CHANNEL BLOCKERS'

'040-053-xxx' 'VASODILATORS'

'040-056-xxx' 'AT2 RECEPTOR BLOCKERS'

'040-055-xxx' 'COMBO ANTI-HYPERTENSIVES'

Anti-hypertensive medication codes used in the following article: Nguyen QC, Tabor JW, Entzel PP, Lau Y, Suchindran C, Hussey JM, Halpern CT, Harris KM, Whitsel EA. Discordance in national estimates of hypertension among young adults. Epidemiology 2011;22(4):532-541.

SAMPLING AND DESIGN EFFECTS

 

How do I correct for the design effects of the Add Health sampling process?

The paper "Strategies to Perform a Design-Based Analysis Using the Add Health Data" discusses how to correct for design effects and the unequal probability of selection to ensure that your analysis results are nationally representative with unbiased estimates.

 

What variables from the public-use data should be used to correct for design effects?

The paper "Strategies to Perform a Design-Based Analysis Using the Add Health Data" refers to variables from the Add Health Restricted-use Data.  What variables from the public-use data should be used to correct for design effects?

 

Variables for Correcting for Design Effects in the Public-Use Dataset

Design Type = With Replacement

Unit = Adolescent

 

 

Wave I

N=6504

Wave II

N = 4834*

Wave III

N = 4882*

Wave IV

N=5114*

Strata Variable

--------- #

--------- #

--------- #

--------- #

Cluster Variable

CLUSTER2+

CLUSTER2+

CLUSTER2+

CLUSTER2+

Weight Variable

GSWGT1

GSWGT2

GSWGT3_2**

GSWGT4_2**

# With Weights

6504

4834

4882

5114

# Missing Weights

0

0

0

0

Mean of Weights

3422.6630

3892.7001

4535.91

4304.66

Sum of Weights

22261000.000

18817312.465

22144327.000

22014038.00

Minimum Weight Value

256.0588

282.4469

295.5669

265.3710

Maximum Weight Value

1835.4864

21107.1003

27327.081

2309.52

 

* These numbers are based on individual datasets, not combined datasets.
# A strata variable is not available; not using a strata variable only minimally affects the standard errors.
+ The Sociometrics variable name is MEX50197.
** The Wave III and IV files have several weight variables.  See chart in codebook for which weight to use.

 

How were adolescents identified as eligible for special oversamples for the in-home interview?

An adolescent's answer to a specific question or questions on the In-School Questionnaire determined his or her eligibility for inclusion in an oversample. For example, an adolescent who marked "Chinese" as his or her Asian or Pacific Islander background was eligible for the Chinese oversample. The genetic oversamples were identified in two ways. All adolescents who indicated they were twins were sampled with certainty. When an adolescent indicated at least one other household member in grades 7 through 12 with whom he or she did not share a biological mother and/or biological father, they were added to the pool of potential half-siblings and other, non-related adolescents. Full siblings were not oversampled.

 

WAVE I

How many Wave I in-home respondents have in-school questionnaire data?

15,356 of the Wave I in-home respondents also have in-school data.

 

What was the response rate for the Wave I school administrator questionnaire?

A total of 132 schools were included in the Add Health Wave I sample.  An administrator from each school was asked to complete a questionnaire.  The response rate among administrators was 98.5%.

 

What was the response rate for the Wave I parent questionnaire?

The parent questionnaire response rate was 85.4% for the child-specific data.

 

What is the best way to compute age in the Add Health Wave I in-home data?

To compute a Wave I age variable with the Wave I data, use the following variables and formula:

IMONTH - Month interview completed
IDAY - Day interview completed
IYEAR - Year interview completed
H1GI1M - What is your birth date? month [and year]
H1GI1Y - What is your birth date? [month and] year

The respondent's age is constructed using the interview completion date and date of birth variables. Because only the month and year of birth are available, 15 is used as the day of birth when calculating age. Consult the Introduction to the Adolescent In-Home Codebook to be sure to take into account the respondents whose birth date and/or interview date is incorrect.  Additionally, a few birth dates were corrected during the four waves of data collection so the Wave I date of birth should be compared to the last wave of data for the respondent.  The last wave of participation is considered the most correct.

SAS programming code that can be used to construct a Wave I AGE variable using Wave I variables is provided below.

idate=mdy(imonth,iday,iyear);
bdate=mdy(h1gi1m,15,h1gi1y); 
age=int((idate-bdate) / 365.25);

The code to construct Wave I age in Stata is below.

recode h1gilm (96=.), gen (w1bmonth)
recode h1gi1y (96=.), gen (w1byear)
gen w1bdate = mdy(w1bmonth, 15,1900+w1byear)
format w1bdate %d
gen w1idate=mdy(imonth, iday,1900+iyear)
format w1idate %d
gen w1age=int((w1idate-w1bdate)/365.25)

This information is provided by the Add Health team as a service to the Add Health research community. It is provided "as is" with no guarantees as to suitability for a particular purpose.

 

What does the Wave I variable COMMID represent?

The COMMID variable groups together the respondents who attend the high school and feeder school that make up the 7 - 12 grade span for the strata.

 

Why are there 1,821 respondents without a Grand Sample Weight at Wave I?

The following Wave I cases could not be weighted:
1. cases added in the field
2. cases selected as a pair (twins, half-sibs) where both were not interviewed
3. cases without a sample flag
4. respondents from schools outside of the 80 strata

 

WAVE II

What was the response rate for the Wave II school administrator questionnaire?

A total of 132 schools were included in the Add Health Wave II sample.  An administrator from each school was asked to complete a questionnaire.  The response rate among administrators was 87.0%.

 

How do I code gender changes between Wave I and Wave II?

When there is a discrepancy between the Wave I and Wave II gender of a respondent, use the Wave II gender. The restricted-use data include 23 cases in which the Wave I variable BIO_SEX and the Wave II variable BIO_SEX2do not match. The Wave II data have been confirmed as correct. Wave II includes 7 cases in which the variable SEXFLG2 equals 1. This indicates that the incorrect gender was used to control the questionnaire skips during the interview. The variable BIO_SEX2 was corrected, but answers to questions based on gender will be incorrect.

 

WAVE III

Where can I find the monograph about biomarkers collected in Wave III?

"Biomarkers in Wave III of the Add Health Study" is available in pdf format. The monograph outlines relevant procedures, design, and sampling schemes used in the collection of biomarker data, and serves as a user guide for analysis and interpretation.

 

How can I obtain a copy of the first release of the Education Data?

The restricted-use Education Data, collected by the Adolescent Health and Academic Achievement Study, are available through the Add Health contract. For users who already have a contract, contactto request an order form for the Education Data. A copy of the public-use version of the file can be downloaded from the ICPSR website.

When there are gender discrepancies between Wave I and Wave III, how do I know which one is correct?

There are 20 cases in which Wave III gender (BIO_SEX3) does not match the Wave I gender (BIO_SEX). At Wave III, the preloaded gender variable came from the last wave of available data. Eighteen of these inconsistent cases match the Wave II gender (BIO_SEX2) and were confirmed at Wave III as being correct. Of the remaining two inconsistent cases:

In one case the Wave III gender, female, was confirmed by the Add Health security manager as being correct.

In the last case, both the Wave I and Wave II gender are listed as male, which is correct. For this case only, the Wave III gender is incorrect.

 

When I calculate a Wave III respondent age using the birth date (month, 15, year) and date of interview, I do not get the same age for some respondents as the one found in variable CALCAGE3. Why does this happen?

The age calculated during the interview uses the actual day of birth, which is not released with the Add Health data. During the Wave III interview, the age of the respondent was calculated by the computer interviewing program and then verified by the respondent. The discrepancy in the ages occurs when a respondent is interviewed during his or her birth month.

 

How many of the interviewed Wave III public-use sample were originally selected for the core and high education black samples?

Wave III public-use sample = 4,882

Core sample only = 4,490

High education black sample only = 325

Both samples = 67

 

WAVE IV

How many of the interviewed Wave IV public-use sample were originally selected for the core and high education black samples?

Wave IV public-use sample = 5,114

Core sample only = 4,699

High education black sample only = 345

Both samples = 70