Describing the dataDescribing the variables: means, univariates, frequencies, and data types.Now we'll use some real data. The data are from a health facility survey conducted in Tanzania in 1999. Type each of these commands and observe the results. Note that the letters in bold on each command are acceptable abbreviations.
clear
use "t:\statatut\exampfac.dta"
describe
summarize
su pill-natural
su fphour*
codebook urbrur facname
tabulate urbrur
ta urbrur,nolabel missing plot
ta factype urbrur
tab1 factype urbrur
tab2 factype urbrur
tab2 factype urbrur, row col cell
Questions:1. The describe command lists each variable in Stata's memory. What do the terms "double," "str42," "byte," etc. in the second column refer to? Answer.
Answers:1. That is the data type for each variable. Each data type handles a different kind of data. The following table describes the data types used by Stata: Type Min Max Precision Bytes Type ---------------------------------------------------------------- byte -2 digits 2 digits 2 digits 1 integer int -4 digits 4 digits 4 digits 2 integer long -9 digits 9 digits 9 digits 4 integer float -10**38 10**36 10**-8 4 real double -10**307 10**307 10**-16 8 real str1 1 1 1 character str80 1 80 80 character str244 1 244 244 character ----------------------------------------------------------------Strings are limited to 244 characters. You can see this and the limits on just about everything else in Stata by typing the command help limits. At CPC, we're using Stata/SE, so the right-hand column applies. Back to question. 2. You can specify a data type on the generate command:
gen byte a=0
If you don't specify a data type, by default Stata uses type float (4 bytes).
Using an efficient data type reduces the file size. This is important for very large files or for computers with little memory (RAM).
The compress command selects the most efficient data type after variables have been generated.
See compress in Miscellaneous Tips and Tricks for details on compress.
Back to question.
3. The "Obs" column displays the number of non-missing observations for numeric variables. For string variables, like facname, it is always 0. Back to question. 4. There are two ways to specify a variable list, both shown in the example:
5. The codebook command gives univariate statistics about numeric variables, and it is a handy way to get information about string variables. Back to question. 6. The two ways to get one-way frequencies are:
7. These three options give extra information about the variable urbrur:
8. The two ways to get two-way frequencies are:
Review again? Another topic? Questions or comments? If you are affiliated with the Carolina Population Center, send them to Phil Bardsley
|

