Describing the data
Describing the variables: means, univariates, frequencies, and data types.
Now we'll use some real data. The data are from a health facility survey conducted in Tanzania in 1999. Type each of these commands and observe the results. Note that the letters in bold on each command are acceptable abbreviations.
clear
use "t:\statatut\exampfac.dta"
describe
summarize
su pill-natural
su fphour*
codebook urbrur facname
tabulate urbrur
tab urbrur, nolabel missing plot
tab factype urbrur
tab1 factype urbrur
tab2 factype urbrur
tab2 factype urbrur, row col cell
Questions:
1. The describe command lists each variable in Stata's memory. What do the terms "double," "str42," "byte," etc. in the second column refer to? Answer.
2. How do I specify which data type I want to use?
Answer.
3. The summarize command lists the
number of observations, mean, standard deviation, min, and max for a
variable. Why is the number of observations different for some
variables, and is even 0 for facname?
Answer.
4. How can I summarize a specific set of variables?
Answer.
5. When is the codebook command useful?
Answer.
6. The tabulate command gives
frequencies (counts), and is most useful with categorical variables.
What are the two ways to specify a one-way frequency?
Answer.
7. What do the nolabel missing plot options do on the tabulate command?
Answer.
8. How can I get two-way frequencies (cross-tabulations)?
Answer.
Answers:
1. That is the data type for each variable. Each data type handles a different kind of data. The following table describes the data types used by Stata:
Type Min Max Precision Bytes Type ---------------------------------------------------------------- byte -2 digits 2 digits 2 digits 1 integer int -4 digits 4 digits 4 digits 2 integer long -9 digits 9 digits 9 digits 4 integer float -10**38 10**36 10**-8 4 real double -10**307 10**307 10**-16 8 real str1 1 1 1 character str80 1 80 80 character str244 1 244 244 character ----------------------------------------------------------------Strings are limited to 244 characters. You can see this and the limits on just about everything else in Stata by typing the command help limits
. At CPC, we're using Stata/SE, so the right-hand column applies.
Back to question.
2. You can specify a data type on the generate command:
gen byte a=0If you don't specify a data type, by default Stata uses type float (4 bytes). Using an efficient data type reduces the file size. This is important for very large files or for computers with little memory (RAM). The compress command selects the most efficient data type after variables have been generated. See compress
in Miscellaneous Tips and Tricks for details on compress.
Back to question.
3. The "Obs" column displays the number of non-missing observations for numeric variables. For string variables, like facname, it is always 0.
Back to question.
4. There are two ways to specify a variable list, both shown in the example:
- pill-natural (first variable - last variable)
- fphour* (root variable name plus *)
These two methods work with all Stata commands. To use the first method, you need to know the position of each variable in the Stata data file. Use the describe command to see those positions, or look for them in the Variables window.
Back to question.
5. The codebook command gives univariate statistics about numeric variables, and it is a handy way to get information about string variables.
Back to question.
6. The two ways to get one-way frequencies are:
- tab factype (for a single variable)
- tab1 pill-natural (necessary for lists of variables)
7. These three options give extra information about the variable urbrur:
- nolabel displays the numeric values instead of the value labels
- missing shows how many observations have missing values
- plot gives a graphical comparison of the frequencies
in the Miscellaneous Tips and Tricks section of this tutorial.
Back to question.
8. The two ways to get two-way frequencies are:
- tab factype urbrur
- tab2 factype urbrur
These two commands are equivalent.
Back to question.
Review again?
Another topic?


