Skip to content. | Skip to navigation

Personal tools

Dummy variables

Creating Dummy (Indicator) Variables

 

Three shortcuts to save you time.

Sometimes we need to create a number of dummy, or indicator, variables from a single categorical variable. These dummies usually take the value of 1 if the observation has the attribute and 0 if the observation does not. Suppose each observation has a value for their current age in years, ranging from 15 to 49. You want to create 35 indicator variables with values 0/1 that describe whether the respondent is 15 years old or not, 16 or not, 17 or not, etc.

The most obvious way to do this is the generate command. This example uses the familiar Tanzania DHS women's example data file, where v012 is the variable containing the woman's age:

clear
use "t:\statatut\exampw1.dta"
gen age15=0
replace age15= 1 if v012 == 15
gen age16=0
replace age16= 1 if v012 == 16
etc.

Repeating these 2 commands 35 times can be tedious and leave ample room for errors. How can we do this more easily?


The first shortcut is the foreach and forvalues commands. See Repeating commands: shortcuts in this tutorial to learn the basics of these commands.

 

clear
use "t:\statatut\exampw1.dta"
foreach v of newlist age15-age49 {
   gen byte `v' = 0
}
forvalues n=15/49 {
   replace age`n' = 1 if v012==`n'
}

This use of the foreach and forvalues commands simply generates the two commands in the first example 35 times for us, eliminating all that typing and opportunity for error. Note that we've added the data storage type byte to the generate command. Since the indicators only contain the values 0 and 1, they easily fit in a single byte of storage, so this option saves megabytes of storage. See Describing the data in this tutorial for an explanation of Stata's storage types.


The second shortcut is the tabulate command.

clear
use "t:\statatut\exampw1.dta"
tab v012, gen(age)
de

The gen option on tabulate creates a new dummy variable for each value of v012. It names each dummy using the prefix you assign in parentheses, in this case "age". Note that the dummies are named age1 through age35, which does not correspond to their value. Instead, the values are recorded in the variable labels.


The third shortcut is the xi command. This command is really intended to feed indicator variables into another Stata command, such as a regression, but it will create dummy variables.

  
clear
use "t:\statatut\exampw1.dta"
rename v012 age
xi, prefix() i.age
gen age_15=0
replace age_15=1 if age==15

First we rename v012 to age so that the dummy variables start with the letters "age" instead of "v012". In the xi command, the prefix() option gets rid of Stata's default prefix, "_I", which it adds to each dummy variable name. Finally, we create the dummy for age 15 using generate and replace. This is necessary because xi does not create a dummy for the lowest value (remember, it's designed to feed these dummies into a multivariate procedure, so one category must be dropped).


Review again?

 

Another topic?

Questions or comments? If you are affiliated with the Carolina Population Center, send them to Phil Bardsley.