Questions about Data
- How does the Public-Use dataset compare to the Restricted-Use dataset?
- How much memory do I need to accommodate all Add Health datasets that are available?
- Are geocodes included in the restricted-use data?
- Can the Add Health data be linked to census information (neighborhood)?
- How do I read a SAS export file?
- How do I read a SAS export file with STATA?
- How do I read a SAS export file with SPSS?
- What numbers should be used for the NIH Inclusion Enrollment Report?
- SAMPLING AND DESIGN EFFECTS
- How do I correct for the design effects of the Add Health sampling process?
-
What variables from the public-use data should be used to correct for design effects?
- How were adolescents identified as eligible for special oversamples for the in-home interview?
- WAVE I
- How many Wave I in-home respondents have in-school questionnaire data?
- What was the response rate for the Wave I school administrator questionnaire?
- What was the response rate for the Wave I parent questionnaire?
- What is the best way to compute age in the Add Health Wave I in-home data?
- What does the Wave I variable COMMID represent?
- Why are there 1,821 respondents without a Grand Sample Weight at Wave I?
- WAVE II
- What was the response rate for the Wave II school administrator questionnaire?
- How do I code gender changes between Wave I and Wave II?
- WAVE III
- Where can I find the monograph about biomarkers collected in Wave III?
- How can I obtain a copy of the first release of the Education Data?
- When there are gender discrepancies between Wave I and Wave III, how do I know which one is correct?
- When I calculate a Wave III respondent age using the birth date (month, 15, year) and date of interview, I do not get the same age for some...
- How many of the interviewed Wave III public-use sample were originally selected for the core and high education black samples?
- WAVE IV
GENERAL
How does the Public-Use dataset compare to the Restricted-Use dataset?
The Add Health Public-Use data can now be downloaded from the Inter-University Consortium for Political and Social Research (ICPSR). The Public-Use dataset contains all the data from the In-home Interview, just a smaller sampling. Public-Use data doesn’t contain ID numbers of friends or siblings, so the data cannot be linked.
How much memory do I need to accommodate all Add Health datasets that are available?
Although the Add Health data require less than 2.5 G of space, you may also need to have space available for software and temp files created by the software, depending on your computing configuration. Most users purchase a device with 60G memory.
Are geocodes included in the restricted-use data?
Due to deductive disclosure concerns, geocodes are not available with the restricted-use data. However, Add Health has procedures in place to allow researchers to add contextual data to our files by using the geocodes in our secure data facility or by contracting with us to do it. Please email Add Health for an Ancillary Study application and additional information.
Can the Add Health data be linked to census information (neighborhood)?
Add Health does not provide geocodes with the restricted-use data which would allow you to add your own Census data. Below are the data descriptions for the datafiles that are available within Add Health that are descriptive of communities where respondents reside:
Waves I, II and III Contextual—Community contextual variables based on state, county, tract, and block group levels derived from the Waves I, II and III addresses.
Waves I, II and III Grouping—Pseudo state, county, tract, and block group variables that allow respondents to be aggregated geographically based on Waves I, II and III addresses.
Spatial—X, Y coordinates that can be used to calculate distances between friends in a school community.
Climate—This file contains climate data for each Wave I respondent based on the nearest climate station. Information is available on precipitation, total snowfall, sky cover, temperature, and total hours of sunshine.
Population Density—The Wave I population density file contains the proportion of 1990 U.S. Census block group population and area (in square meters) within 1, 3, 5, and 8.04672 km (5 mi) of each Wave I respondent.
Weather—This file contains weather data for each Add Health Wave I respondent based on the nearest weather station reporting data for the corresponding survey month and year.
How do I read a SAS export file?
The following SAS commands will allow you to read a SAS export file:
libname in xport '/directory path where file is located/SAS export file name';
data wave1;
set in.SAS dataset name;
run;
For example, the Add Health dataset name on the CD is ALLWAVE1.EXP, the internal SAS dataset name is ALLWAVE1, and your CD drive is D:
libname in xport 'd:allwave1.xpt'; data wave1; set in.allwave1; run;
How do I read a SAS export file with STATA?
Using STATA 8.1 you can use the following STATA command to read a SAS export file.
fdause datasetname.xpt
If the SAS file is named datasetname.exp, rename the file to datasetname.xpt before running the STATA command.
How do I read a SAS export file with SPSS?
The following SPSS command allows you to read a SAS export file.
GET SAS DATA="\folder\datasetname.xpt".
If the SAS file is named datasetname.exp, rename the file to datasetname.xpt before running the SPSS command.
What numbers should be used for the NIH Inclusion Enrollment Report?
The Wave I enrollment numbers are available in this downloadable file.
SAMPLING AND DESIGN EFFECTS
How do I correct for the design effects of the Add Health sampling process?
The paper "Strategies to Perform a Design-Based Analysis Using the Add Health Data" discusses how to correct for design effects and the unequal probability of selection to ensure that your analysis results are nationally representative with unbiased estimates.
What variables from the public-use data should be used to correct for design effects?
The paper "Strategies to Perform a Design-Based Analysis Using the Add Health Data" refers to variables from the Add Health Restricted-use Data. What variables from the public-use data should be used to correct for design effects?
Variables for Correcting for Design Effects in the Public-Use Dataset
Design Type = With Replacement
Unit = Adolescent
|
|
Wave I N=6504 |
Wave II N = 4834* |
Wave III N = 4882* |
Wave IV N=5114* |
|
Strata Variable |
--------- # |
--------- # |
--------- # |
--------- # |
|
Cluster Variable |
CLUSTER2+ |
CLUSTER2+ |
CLUSTER2+ |
CLUSTER2+ |
|
Weight Variable |
GSWGT1 |
GSWGT2 |
GSWGT3_2** |
GSWGT4_2** |
|
# With Weights |
6504 |
4834 |
4882 |
5114 |
|
# Missing Weights |
0 |
0 |
0 |
0 |
|
Mean of Weights |
3422.6630 |
3892.7001 |
4535.91 |
4304.66 |
|
Sum of Weights |
22261000.000 |
18817312.465 |
22144327.000 |
22014038.00 |
|
Minimum Weight Value |
256.0588 |
282.4469 |
295.5669 |
265.3710 |
|
Maximum Weight Value |
1835.4864 |
21107.1003 |
27327.081 |
2309.52 |
* These numbers are based on individual datasets, not combined datasets.
# A strata variable is not available; not using a strata variable only minimally affects the standard errors.
+ The Sociometrics variable name is MEX50197.
** The Wave III and IV files have several weight variables. See chart in codebook for which weight to use.
How were adolescents identified as eligible for special oversamples for the in-home interview?
An adolescent's answer to a specific question or questions on the In-School Questionnaire determined his or her eligibility for inclusion in an oversample. For example, an adolescent who marked "Chinese" as his or her Asian or Pacific Islander background was eligible for the Chinese oversample. The genetic oversamples were identified in two ways. All adolescents who indicated they were twins were sampled with certainty. When an adolescent indicated at least one other household member in grades 7 through 12 with whom he or she did not share a biological mother and/or biological father, they were added to the pool of potential half-siblings and other, non-related adolescents. Full siblings were not oversampled.
WAVE I
How many Wave I in-home respondents have in-school questionnaire data?
15,356 of the Wave I in-home respondents also have in-school data.
What was the response rate for the Wave I school administrator questionnaire?
A total of 132 schools were included in the Add Health Wave I sample. An administrator from each school was asked to complete a questionnaire. The response rate among administrators was 98.5%.
What was the response rate for the Wave I parent questionnaire?
The parent questionnaire response rate was 85.4% for the child-specific data.
What is the best way to compute age in the Add Health Wave I in-home data?
To compute a Wave I age variable with the Wave I data, use the following variables and formula:
IMONTH - Month interview completed
IDAY - Day interview completed
IYEAR - Year interview completed
H1GI1M - What is your birth date? month [and year]
H1GI1Y - What is your birth date? [month and] year
The respondent's age is constructed using the interview completion date and date of birth variables. Because only the month and year of birth are available, 15 is used as the day of birth when calculating age. Consult the Introduction to the Adolescent In-Home Codebook to be sure to take into account the respondents whose birth date and/or interview date is incorrect. Additionally, a few birth dates were corrected during the four waves of data collection so the Wave I date of birth should be compared to the last wave of data for the respondent. The last wave of participation is considered the most correct.
SAS programming code that can be used to construct a Wave I AGE variable using Wave I variables is provided below.
idate=mdy(imonth,iday,iyear); bdate=mdy(h1gi1m,15,h1gi1y); age=int((idate-bdate) / 365.25);
The code to construct Wave I age in Stata is below.
recode h1gilm (96=.), gen (w1bmonth) recode h1gi1y (96=.), gen (w1byear) gen w1bdate = mdy(w1bmonth, 15,1900+w1byear) format w1bdate %d gen w1idate=mdy(imonth, iday,1900+iyear) format w1idate %d gen w1age=int((w1idate-w1bdate)/365.25)
This information is provided by the Add Health team as a service to the Add Health research community. It is provided "as is" with no guarantees as to suitability for a particular purpose.
What does the Wave I variable COMMID represent?
The COMMID variable groups together the respondents who attend the high school and feeder school that make up the 7 - 12 grade span for the strata.
Why are there 1,821 respondents without a Grand Sample Weight at Wave I?
The following Wave I cases could not be weighted:
1. cases added in the field
2. cases selected as a pair (twins, half-sibs) where both were not interviewed
3. cases without a sample flag
4. respondents from schools outside of the 80 strata
WAVE II
What was the response rate for the Wave II school administrator questionnaire?
A total of 132 schools were included in the Add Health Wave II sample. An administrator from each school was asked to complete a questionnaire. The response rate among administrators was 87.0%.
How do I code gender changes between Wave I and Wave II?
When there is a discrepancy between the Wave I and Wave II gender of a respondent, use the Wave II gender. The restricted-use data include 23 cases in which the Wave I variable BIO_SEX and the Wave II variable BIO_SEX2do not match. The Wave II data have been confirmed as correct. Wave II includes 7 cases in which the variable SEXFLG2 equals 1. This indicates that the incorrect gender was used to control the questionnaire skips during the interview. The variable BIO_SEX2 was corrected, but answers to questions based on gender will be incorrect.
WAVE III
Where can I find the monograph about biomarkers collected in Wave III?
"Biomarkers in Wave III of the Add Health Study" is available in pdf format. The monograph outlines relevant procedures, design, and sampling schemes used in the collection of biomarker data, and serves as a user guide for analysis and interpretation.
How can I obtain a copy of the first release of the Education Data?
The restricted-use Education Data, collected by the Adolescent Health and Academic Achievement Study, are available through the Add Health contract. For users who already have a contract, email Tannaz Sabet at ICPSR to request an order form for the Education Data. A copy of the public-use version of the file can be downloaded from the ICPSR website.
When there are gender discrepancies between Wave I and Wave III, how do I know which one is correct?
There are 20 cases in which Wave III gender (BIO_SEX3) does not match the Wave I gender (BIO_SEX). At Wave III, the preloaded gender variable came from the last wave of available data. Eighteen of these inconsistent cases match the Wave II gender (BIO_SEX2) and were confirmed at Wave III as being correct. Of the remaining two inconsistent cases:
In one case the Wave III gender, female, was confirmed by the Add Health security manager as being correct.
In the last case, both the Wave I and Wave II gender are listed as male, which is correct. For this case only, the Wave III gender is incorrect.
When I calculate a Wave III respondent age using the birth date (month, 15, year) and date of interview, I do not get the same age for some respondents as the one found in variable CALCAGE3. Why does this happen?
The age calculated during the interview uses the actual day of birth, which is not released with the Add Health data. During the Wave III interview, the age of the respondent was calculated by the computer interviewing program and then verified by the respondent. The discrepancy in the ages occurs when a respondent is interviewed during his or her birth month.
How many of the interviewed Wave III public-use sample were originally selected for the core and high education black samples?
Wave III public-use sample = 4,882
Core sample only = 4,490
High education black sample only = 325
Both samples = 67
WAVE IV
How many of the interviewed Wave IV public-use sample were originally selected for the core and high education black samples?
Wave IV public-use sample = 5,114
Core sample only = 4,699
High education black sample only = 345
Both samples = 70
