# Describing the data

### Describing the variables: means, univariates, frequencies, and data types.

Now we'll use some real data. The data are from a health facility survey conducted in Tanzania in 1999. Copy each of these commands into the Command window, press Enter, and observe the results. Note that the letters in bold on each command are acceptable abbreviations. If you see "-more-" at the bottom of your Results screen, press the Space Bar to see a new page of data, or press "q" to quit the command.

clear use "q:\utilities\statatut\exampfac.dta"describesummarize su pill-natural su fphour*codebookurbrur facnametabulate urbrur tab urbrur, nolabel missing plot tab factype urbrurtab1factype urbrurtab2factype urbrur tab2 factype urbrur, row col cell

### Questions:

1. The **describe** command lists each variable in Stata's memory. What do the terms "double," "str42," "byte," etc. in the second column refer to? Answer.

2. How do I specify which data type I want to use? Answer.

3. The **summarize** command lists the number of observations, mean, standard deviation, min, and max for a variable. Why is the number of observations different for some variables, and is even 0 for facname? Answer.

4. How can I summarize a specific set of variables? Answer.

5. When is the **codebook** command useful? Answer.

6. The **tabulate** command gives frequencies (counts), and is most useful with categorical variables. What are the two ways to specify a one-way frequency? Answer.

7. What do the **nolabel missing plot** options do on the **tabulate** command? Answer.

8. How can I get two-way frequencies (cross-tabulations)? Answer.

### Answers:

1. That is the data type for each variable. Each data type handles a different kind of data. The following table describes the data types used by Stata:

Type Min Max Precision Bytes Type ---------------------------------------------------------------- byte -2 digits 2 digits 2 digits 1 integer int -4 digits 4 digits 4 digits 2 integer long -9 digits 9 digits 9 digits 4 integer float -10**38 10**36 10**-8 4 real double -10**307 10**307 10**-16 8 real str1 1 1 1 string str2 2 2 2 string

... . . . ...

str2045 1 2045 2045 string

strL 2000000000 2000000000 2000000000 long string

----------------------------------------------------------------

Prior to Stata version 13, strings were limited to 2045 characters. Starting with Stata 13 a new data type **strL** can hold strings up to *2 billion characters.* You can see this and the limits on just about everything else in Stata by typing the command **help limits, **and there's a detailed explanation of data types in the PDF documentation and under **help data types.**

2. You can specify a data type on the **generate** command:

gen byte a=0

If you don't specify a data type, by default Stata uses type float (4 bytes). Using an efficient data type reduces the file size. This is important for very large files or for computers with little memory (RAM). The **compress** command selects the most efficient data type after variables have been generated. See compress in Miscellaneous Tips and Tricks for details on compress.

3. The "Obs" column displays the number of non-missing observations for numeric variables. For string variables, like facname, it is always 0.

4. There are two ways to specify a variable list, both shown in the example:

- pill-natural (first variable - last variable)
- fphour* (root variable name plus *)

These two methods work with all Stata commands. To use the first method, you need to know the position of each variable in the Stata data file. Use the describe command to see those positions, or look for them in the Variables window.

5. The codebook command gives univariate statistics about numeric variables, and it is a handy way to get information about string variables.

6. The two ways to get one-way frequencies are:

- tab factype (for a single variable)
- tab1 pill-natural (necessary for lists of variables)

Another handy command is **fre** written by Ben Jan at the University of Bern. It is not built into Stata, but we have installed it on all terminal servers at CPC. LIke all user-contributed Stata commands, it is available for free from the SSC archives at Boston College. You can install it on your desktop or laptop by typing:

**ssc install fre**

7. These three options give extra information about the variable urbrur:

**nolabel**displays the numeric values instead of the value labels**missing**shows how many observations have missing values**plot**gives a graphical comparison of the frequencies

For more information on how Stata handles missing values, see missing values in the Miscellaneous Tips and Tricks section of this tutorial.

8. The two ways to get two-way frequencies are:

- tab factype urbrur
- tab2 factype urbrur

These two commands are equivalent.