Describing the data
Describing the variables: means, univariates, frequencies, and data types.
Now we'll use some real data. The data are from a health facility survey conducted in Tanzania in 1999. Copy each of these commands into the Command window, press Enter, and observe the results. Note that the letters in bold on each command are acceptable abbreviations. If you see "-more-" at the bottom of your Results screen, press the Space Bar to see a new page of data, or press "q" to quit the command.
clear use "q:\utilities\statatut\exampfac.dta" describe summarize su pill-natural su fphour* codebook urbrur facname tabulate urbrur tab urbrur, nolabel missing plot tab factype urbrur tab1 factype urbrur tab2 factype urbrur tab2 factype urbrur, row col cell
1. The describe command lists each variable in Stata's memory. What do the terms "double," "str42," "byte," etc. in the second column refer to? Answer.
2. How do I specify which data type I want to use? Answer.
3. The summarize command lists the number of observations, mean, standard deviation, min, and max for a variable. Why is the number of observations different for some variables, and is even 0 for facname? Answer.
4. How can I summarize a specific set of variables? Answer.
5. When is the codebook command useful? Answer.
6. The tabulate command gives frequencies (counts), and is most useful with categorical variables. What are the two ways to specify a one-way frequency? Answer.
7. What do the nolabel missing plot options do on the tabulate command? Answer.
8. How can I get two-way frequencies (cross-tabulations)? Answer.
Type Min Max Precision Bytes Type ---------------------------------------------------------------- byte -2 digits 2 digits 2 digits 1 integer int -4 digits 4 digits 4 digits 2 integer long -9 digits 9 digits 9 digits 4 integer float -10**38 10**36 10**-8 4 real double -10**307 10**307 10**-16 8 real str1 1 1 1 string str2 2 2 2 string
... . . . ...
str2045 1 2045 2045 string
strL 2000000000 2000000000 2000000000 long string
Prior to Stata version 13, strings were limited to 2045 characters. Starting with Stata 13 a new data type strL can hold strings up to 2 billion characters. You can see this and the limits on just about everything else in Stata by typing the command help limits, and there's a detailed explanation of data types in the PDF documentation and under help data types.
gen byte a=0
If you don't specify a data type, by default Stata uses type float (4 bytes). Using an efficient data type reduces the file size. This is important for very large files or for computers with little memory (RAM). The compress command selects the most efficient data type after variables have been generated. See compress in Miscellaneous Tips and Tricks for details on compress.
3. The "Obs" column displays the number of non-missing observations for numeric variables. For string variables, like facname, it is always 0.
- pill-natural (first variable - last variable)
- fphour* (root variable name plus *)
These two methods work with all Stata commands. To use the first method, you need to know the position of each variable in the Stata data file. Use the describe command to see those positions, or look for them in the Variables window.
5. The codebook command gives univariate statistics about numeric variables, and it is a handy way to get information about string variables.
- tab factype (for a single variable)
- tab1 pill-natural (necessary for lists of variables)
Another handy command is fre written by Ben Jan at the University of Bern. It is not built into Stata, but we have installed it on all terminal servers at CPC. LIke all user-contributed Stata commands, it is available for free from the SSC archives at Boston College. You can install it on your desktop or laptop by typing:
ssc install fre
- nolabel displays the numeric values instead of the value labels
- missing shows how many observations have missing values
- plot gives a graphical comparison of the frequencies
For more information on how Stata handles missing values, see missing values in the Miscellaneous Tips and Tricks section of this tutorial.
- tab factype urbrur
- tab2 factype urbrur
These two commands are equivalent.