Common Errors and How to Avoid Them
Here are some common errors that you can avoid.
Ignoring clustering and unequal probability of selection of participants in your analyses. This results in biased estimates and false-positive hypothesis test results. Avoid this error by using the svy commands for your analysis. If your analysis technique is not available with the svy commands, then use a command that allows pweight with the robust cluster() option.
Using the Wrong Weight Command
Using the wrong weight specification in Stata. For data from a sample survey, you should use the pweight option to define the sampling weight. Using any of the other weight options (aweight, fweight, or iweight) can result in incorrect variance, standard errors, confidence intervals, and p-values.
Subsetting the Sample
Subsetting the sample when using the svy commands in Stata. These commands use the Taylor Series approximation for the variance estimation and must be able to correctly count the number of primary sampling units (PSUs) that were originally sampled. Subsetting the data may cause an incorrect number of PSU's to be used in the variance computation formula. Do not subset the data from a sample survey and always use the subpop option when using the svy commands to do sub-population analysis.
Stratum with only one PSU detected
You may get an error message when you try to run an svy command: "stratum with only one PSU detected". This happens when observations have values missing for variables in your model, resulting in their being dropped. An entire PSU may disappear as a result of missing values. Use the svydes command to identify the problem strata. A common fix is to combine a small stratum with an adjoining stratum. See the manual entry on svydes, or in Stata type findit svydes, or see http://www.stata.com/support/faqs/stat/stratum.html for details.
What set of observations is Stata analyzing?"
Using a subpop variable does not do the same thing as an -if-. In fact that's why the subpop option was invented. The -svy- commands use the whole dataset to help determine the standard error even if you are only looking at a subset of it (with a subpop var). During the time Stata is analyzing your data, Stata subsets to only those observations where ALL the following variables are non-missing:
- strata (if using one)
- psu (if using one)
- sample weight (if using one)
- subpop (if using one)
- analysis variable(s)**
If any one of them is missing then Stata drops the obs where any of those variables are missing. ** svymean with more than one variable will not subset to obs where all analysis variables are non-missing unless the "complete" option is specified.