Groups and subsets of dataGroups and subsets of data, comparison, missing values.We'll continue using the 1999 Tanzania Facility Survey data. Type the following commands and see what happens. No need to type in the comments (surrounded by /* */). Note the double equal signs (==) in the ta and su commands.
clear
use "t:\statatut\exampfac.dta"
/* types of government facilities */
ta factype if authorit==1
/* mean age of hospitals */
su age if factype<=4
/* availability of condoms at religious facilities */
ta malecond if authorit==4 | (authorit>=7 & authorit<=9),missing
/* mean age of facilities by urban-rural location */
sort urbrur
by urbrur: su age
/* mean age of facilities that offer family planning by urban-rural location */
by urbrur: su age if pill<.
/* display the first 10 observations in memory */
list factype authorit urbrur in 1/10
For details on subgroup processing, see by command in the Miscellaneous Tips and Tricks section of this tutorial.
Questions:1. The if option on the ta command restricts the command to observations that meet the following qualifications. In this case, the qualification is that authority is 1 (government). What is the difference between authorit=1 and authorit==1 in the ta command? Answer.
Answers:1. A single equal sign means give the value on the right to the variable on the left, in other words it means "assignment." A double equal sign means check whether the variable on the left has the value on the right, in other words "comparison." Back to question. 2. Here is the full list of relational, logical, and arithmetic operators:
== equal to
> greater than
>= greater than or equal to
< less than
<= less than or equal to
~ not
! not
& and
| or
+ addition
- subtraction
* multiplication
/ division
^ power
Back to question.
3. It puts the data in ascending order of factype, and within each value of factype it orders the data in ascending order of urbrur. The gsort command allows you to sort in descending as well as ascending order. Back to question. 4. The tab2 command works best with categorical data, while the by option works best with continuous variables. For example, if a and b are categorical variables: by a: ta bis the same as: tab2 a bThe tab2 command gives more concise output and offers chi-square and other statistics. For n-way analyses of continuous data, consider the tabsum, tabstat, or table commands instead, such as: tabulate urbrur, summarize(age) tabstat age, by(urbrur) stats(mean n) table urbrur, contents(mean age n age)The table command is particularly powerful and handles multiple levels of conditioning variables. See help for details. Back to question. 5. It means "pill is less than missing." Back to question. 6. Missing values are stored as a number larger than the largest allowable value for the data type. So, you need to be careful when using the "greater than" operator: This command includes missing values of age: ta factype if age>=25If you don't want missing, you need to specifically exclude it: ta factype if age>=25 & age<.Starting with Version 8 of Stata, you can specify up to 27 different types of missing values. They are: ".", ".a", ".b", ... ,".z". (During data entry, you can use these to differentiate among Refused, Not Applicable, Don't Know, and other possible reasons for missing values.) These are the largest values allowed by the data type, so you can use "<." to exclude all 27 missing values for a variable. Back to question. 7. The numbers refer to the temporary variable "_n" that Stata creates for each observation in memory. This number is not saved if you save a permanent data file. Furthermore, this number changes if you change the sort order of the data. Back to question. 8. You can write:
list factype authorit urbrur if _n<=10
Back to question.
Review again? Another topic? Questions or comments? If you are affiliated with the Carolina Population Center, send them to Phil Bardsley
|

