# The by command in detail

### More examples of the by command.

This page is intended to show how to use the **by** command to create variables and access specific observations in sub-groups. Sub-group processing breaks down the dataset into multiple mini-datasets. The way to think about it is as if each sub-group is the entire dataset.

The way to access data from different observations is with internal Stata variables **_n** and **_N**. The variable **_n** is equal to the numeric position of an observation and **_N** is equal to the total number of observations. For example, in a dataset of 10 observations, in the first observations **_n** is equal to 1 and **_N** is equal to 10. In the second observation **_n** is equal to 2 and **_N** is equal to 10, etc.

v001 _n _N ---- -- -- 1002 1 10 1002 2 10 1002 3 10 1002 4 10 1002 5 10 1003 6 10 1003 7 10 1003 8 10 1004 9 10 1004 10 10

When using the **by** command, **_n** is equal to 1 for the first obs of the by-group and **_N** is equal to the total number of observations for the by-group. So if a person has 5 observations in the data set and **v001** is the id variable, "**by v001:** " creates by-groups where the first observation in the by-group **_n=1** and **_N=5**. The second observation **_n=2** and **_N=5** , etc. The last observation for that person **_n** will equal **_N**.

**Note**: **if** conditions in the command being run by the **by** command do not subset the by group so **_n** and **_N** are not affected by the **if**condition.

by v001:

makes the data look like this:

v001 _n _N ---- -- -- 1002 1 5 1002 2 5 1002 3 5 1002 4 5 1002 5 5 1003 1 3 1003 2 3 1003 3 3 1004 1 2 1004 2 2

If you want to keep the first obervation per person do:

by v001: keep if _n == 1

second observation is:

by v001: keep if _n == 2

second to the last observation is:

by v001: keep if _n == _N - 1

last observation is:

by v001: keep if _n == _N

The following is Stata code that plays with this concept.

NOTE: the following is a **do-file**. Highlight and copy all text between the horizontal bars and paste into the Stata interactive do-file editor window so that you can submit this do-file.

use "q:\utilities\statatut\examfac2.dta",clear keep facid factype q102_1-q103_20 ** q103_1-q103_20 are # hours/week usually worked ** ** facid is the id var ** ** q102_1-q102_20 are titles of facility workers ** label list title type /** The following code figures out how many total hours per week * different types of workers at a health facility work. **/ sort facid /* Get rid of duplicate ids. */ drop if facid == facid[_n-1] /* This would also keep the first obs in a duplicate id situation: * by facid: keep if facid[_n] == 1 */ /* Make the data set multiple obs per facility. */ reshape long q102_ q103_, i(facid) j(position) /** Save the data set as a temporary data set in case you want to try another idea. **/ preserve gen ttl_h1= 0 replace ttl_h1=q103_ if q103_ != 99 & q102_ == 1 by facid: gen ttl_th1=ttl_h1[1]+ttl_h1[2]+ttl_h1[3]+ttl_h1[4]+ttl_h1[5]+ /// ttl_h1[6]+ttl_h1[7]+ttl_h1[8]+ttl_h1[9]+ttl_h1[10]+ /// ttl_h1[11]+ttl_h1[12]+ttl_h1[13]+ttl_h1[14]+ttl_h1[15]+ /// ttl_h1[16]+ttl_h1[17]+ttl_h1[18]+ttl_h1[19]+ttl_h1[20] list facid factype q102_ q103_ ttl_h1 ttl_th1 in 1/500 if factype==3 /* or you could use the -forvalues- command */ replace ttl_th1= 0 forvalues num= 1/20 { by facid: replace ttl_th1= ttl_h1[`num'] + ttl_th1 } /* If you wanted totals for all the other titles you'd have * to repeat the above 19 more times. */ /* You could use the -egen- command to sum up work hours per title for each facilty * and for commands to reduce the amount of typing. */ clear restore preserve foreach var of newlist ttl_h1-ttl_h10 { gen `var'= . } forvalues num = 1/10 { replace ttl_h`num'= q103_ if q103_ != 99 & q102_ == `num' egen ttl_th`num'= sum(ttl_h`num'), by(facid) } // examples of when factype == 3 in the first 500 obs list facid factype q102_ q103_ ttl_h1 ttl_th1 in 1/500 if factype == 3 /** Keep only the variables that represent all obs of the facility. **/ keep facid factype ttl_th1-ttl_th10 /** Keep only the last obs per facid, though any of it's obs would be just as good. **/ by facid: keep if _n == _N list in 1/500 if factype == 3