You are here: Home / The by command in detail

# The by command in detail

### More examples of the by command.

This page is intended to show how to use the by command to create variables and access specific observations in sub-groups.  Sub-group processing breaks down the dataset into multiple mini-datasets.  The way to think about it is as if each sub-group is the entire dataset.

The way to access data from different observations is with internal Stata variables _n and _N.  The variable _n is equal to the numeric position of an observation and _N is equal to the total number of observations.  For example, in a dataset of 10 observations, in the first observations _n is equal to 1 and _N is equal to 10.  In the second observation _n is equal to 2 and _N is equal to 10, etc.

```
v001 _n _N
---- -- --
1002  1 10
1002  2 10
1002  3 10
1002  4 10
1002  5 10
1003  6 10
1003  7 10
1003  8 10
1004  9 10
1004 10 10```

When using the by command, _n is equal to 1 for the first obs of the by-group and _N is equal to the total number of observations for the by-group.  So if a person has 5 observations in the data set and v001 is the id variable, "by v001: " creates by-groups where the first observation in the by-group _n=1 and _N=5.  The second observation _n=2 and _N=5 , etc.  The last observation for that person _n will equal _N.

Note: if conditions in the command being run by the by command do not subset the by group so _n and _N are not affected by the ifcondition.

`by v001:`

makes the data look like this:

```
v001 _n _N
---- -- --
1002  1  5
1002  2  5
1002  3  5
1002  4  5
1002  5  5
1003  1  3
1003  2  3
1003  3  3
1004  1  2
1004  2  2```

If you want to keep the first obervation per person do:

` by v001: keep  if _n == 1`

second observation is:

` by v001: keep  if _n == 2`

second to the last observation is:

` by v001: keep  if _n == _N - 1`

last observation is:

` by v001: keep  if _n == _N`

The following is Stata code that plays with this concept.

NOTE:  the following is a do-file.  Highlight and copy all text between the horizontal bars and paste into the Stata interactive do-file editor window so that you can submit this do-file.

```use "q:\utilities\statatut\examfac2.dta",clear

keep facid factype q102_1-q103_20
** q103_1-q103_20 are # hours/week usually worked **

** facid is the id var **

** q102_1-q102_20 are titles of facility workers **
label list title type

/** The following code figures out how many total hours per week
* different types of workers at a health facility work. **/
sort facid

/* Get rid of duplicate ids. */
drop  if facid == facid[_n-1]

/* This would also keep the first obs in a duplicate id situation:
* by facid: keep  if facid[_n] == 1 */

/* Make the data set multiple obs per facility. */
reshape long q102_ q103_, i(facid) j(position)

/** Save the data set as a temporary data set in case you want to try another idea. **/
preserve
gen ttl_h1= 0
replace ttl_h1=q103_  if q103_ != 99 & q102_ == 1

by facid: gen ttl_th1=ttl_h1[1]+ttl_h1[2]+ttl_h1[3]+ttl_h1[4]+ttl_h1[5]+ ///
ttl_h1[6]+ttl_h1[7]+ttl_h1[8]+ttl_h1[9]+ttl_h1[10]+                   ///
ttl_h1[11]+ttl_h1[12]+ttl_h1[13]+ttl_h1[14]+ttl_h1[15]+               ///
ttl_h1[16]+ttl_h1[17]+ttl_h1[18]+ttl_h1[19]+ttl_h1[20]

list facid factype q102_ q103_ ttl_h1 ttl_th1 in 1/500 if factype==3

/* or you could use the -forvalues- command */
replace ttl_th1= 0
forvalues num= 1/20 {
by facid: replace ttl_th1= ttl_h1[`num'] + ttl_th1
}

/* If you wanted totals for all the other titles you'd have
* to repeat the above 19 more times. */

/* You could use the -egen- command to sum up work hours per title for each facilty
* and for commands to reduce the amount of typing. */

clear
restore
preserve

foreach var of newlist ttl_h1-ttl_h10 {
gen `var'= .
}

forvalues num = 1/10 {
replace ttl_h`num'= q103_  if q103_ != 99 & q102_ == `num'
egen ttl_th`num'= sum(ttl_h`num'), by(facid)
}

// examples of when factype == 3 in the first 500 obs
list facid factype q102_ q103_ ttl_h1 ttl_th1  in 1/500  if factype == 3

/** Keep only the variables that represent all obs of the facility. **/
keep facid factype ttl_th1-ttl_th10

/** Keep only the last obs per facid, though any of it's obs would be just as good. **/
by facid: keep  if _n == _N

list  in 1/500  if factype == 3```

This page was contributed by Dan Blanchette, however please direct questions to Phil Bardsley as noted below.

Review again?

Another topic?

Wink Plone Theme by Quintagroup © 2013.

##### Personal tools
This is themeComment for Wink theme