You are here: Home / Groups and subsets of data

# Groups and subsets of data

### Groups and subsets of data, comparison, missing values.

We'll continue using the 1999 Tanzania Facility Survey data. Copy and paste the following commands into the Command window, press Enter, and see what happens. No need to copy the comments (surrounded by /* */). Note the double equal signs (==) in the tab and su commands.

clear
use "q:\utilities\statatut\exampfac.dta"

/* types of government facilities */

tab factype if authorit==1

/* mean age of hospitals */

su age if factype<=4

/* availability of condoms at religious facilities */

tab malecond if authorit==4 | (authorit>=7 & authorit<=9), missing

/* mean age of facilities by urban-rural location */

sort urbrur
by urbrur: su age

/* mean age of facilities that offer family planning by urban-rural location */

by urbrur: su age if pill<.

/* display the first 10 observations in memory */

list factype authorit urbrur in 1/10

For details on subgroup processing, see by command in the Miscellaneous Tips and Tricks section of this tutorial.

### Questions:

1. The if option on the tab command restricts the command to observations that meet the following qualifications. In this case, the qualification is that authority is 1 (government). What is the difference between

authorit=1

and

authorit==1

2. The == sign is called a relational operator. What relational, logical, and arithmetic operators are available in Stata? Answer.

3. The sort command puts the observations in memory in ascending order of the values of one or more variables. In this case, the variable is authority. What does the command sort factype urbrur do? Answer.

4. The by option executes the following command once for each value of the by variable. When would you use by instead of tab2 to see data separately by groups. Answer.

5. What does the phrase pill<. mean? Answer.

6. How are missing values stored in Stata data? Answer.

7. The in option allows you to specify which specific observations you want. What do the numbers "1" and "10" refer to in this example? Answer.

8. What is another way, using _n, to write "in 1/10"? Answer.

1. A single equal sign means give the value on the right to the variable on the left, in other words it means "assignment." A double equal sign means check whether the variable on the left has the value on the right, in other words "comparison."

Back to question.

2. Here is the full list of relational, logical, and arithmetic operators:

==   equal to
>    greater than
>=   greater than or equal to
<    less than
<=   less than or equal to
~    not
!    not
&    and
|    or
-    subtraction
*    multiplication
/    division
^    power

Back to question.

3. It puts the data in ascending order of factype, and within each value of factype it orders the data in ascending order of urbrur. The gsort command allows you to sort in descending as well as ascending order.

Back to question.

4. The tab2 command works best with categorical data, while the by option works best with continuous variables. For example, if a and b are categorical variables:

by a: tab b

is the same as:

tab2 a b

The tab2 command gives more concise output and offers chi-square and other statistics.

For n-way analyses of continuous data, consider the tabsum, tabstat, or table commands instead, such as:

tabulate urbrur, summarize(age)
tabstat age, by(urbrur) stats(mean n)
table urbrur, contents(mean age n age)

The table command is particularly powerful and handles multiple levels of conditioning variables. See help for details.

Back to question.

5. It means "pill is less than missing."

Back to question.

6. Missing values are stored as a number larger than the largest allowable value for the data type. So, you need to be careful when using the "greater than" operator: This command includes missing values of age:

tab factype if age>=25

If you don't want missing, you need to specifically exclude it:

tab factype if age>=25 & age<.

You can specify up to 27 different types of missing values. They are: ".", ".a", ".b", ... ,".z". (During data entry, you can use these to differentiate among Refused, Not Applicable, Don't Know, and other possible reasons for missing values.) These are the largest values allowed by the data type, so you can use "<." to exclude all 27 missing values for a variable. Back to question.

7. The numbers refer to the temporary variable "_n" that Stata creates for each observation in memory. This number is not saved if you save a permanent data file. Furthermore, this number changes if you change the sort order of the data.

Back to question.

8. You can write:

list factype authorit urbrur  if _n <= 10

Back to question.

Review again?

Another topic?

Wink Plone Theme by Quintagroup © 2013.

##### Personal tools
This is themeComment for Wink theme