Match merging

 

Many observations match, but others don't

In this example, we have data from the 1996 Tanzania Facility Survey that we want to merge with data from the 1999 survey. This will allow us to examine trends. The 1999 survey was not designed as a followup of the 1996 survey, so not all the same facilities were visited in the two surveys. To simplify the programming, we've only included facilities in the 1999 file that were also surveyed in 1996.

For this example, we'll look at trends in the availability of the 4 major contraceptive methods, so the two files contain only those 4 variables in addition to the single identifier: facid.

Prior to Stata 11, the data had to be sorted on the identifying variables before merging. Now you have a choice of syntax for merging. We've shown you both methods below, the old which requires sorting, and the new which does not (Stata will sort the data for you when you use the new syntax). Both the old and new syntax work in Stata 11 and after, but the new syntax is better.

The syntax is the same as the "One-to-one merging" example, and in fact this is one-to-one merging. The difference is that in the earlier example we expected a perfect match, but here we don't.

You can type the following commands into Stata, or copy them into a do-file and run them.


 

Merge using New Syntax (Stata 11 and later)

clear
use "q:\utilities\statatut\merge96.dta"
merge 1:1 facid using "q:\temp\statatut\merge99.dta"
drop _merge
/* Look at the Results Window - the merge is a mess! */

 

Merge using Old Syntax (Stata 10 and earlier)

/* Sort the 1999 file and save it temporarily */

clear
use "q:\utilities\statatut\merge99.dta"
sort facid
save "d:\statatemp\temp99.dta"

/* Sort the 1996 file */

clear
use "q:\utilities\statatut\merge96.dta"
sort facid

/* Merge the 1999 data onto the 1996 data */

merge facid using "d:\statatemp\temp99.dta"

/* Check how well the merge went - we don't expect all "3's" */

tab _merge
drop _merge
su

/* Clean up */

erase "d:\statatemp\temp99.dta"

 

Questions:

1. This merge is a mess! Less than half of the facilities in 1996 have a match in 1999. How can we get rid of the non-matching observations? Answer.


2. The 1996 and 1999 files each have 5 variables. How many variables do we expect the merged file to have? Answer.


3. This is a simple example with only 5 variables in each input file. So, it's easy to check whether I've used different variable names in each file. What would happen if I accidentally had a variable in each file with the same name? Answer.


4. How would I know if I had a variable with the same name on each file? Answer.


5. How many data files can I merge with one command? Answer.

 


Answers:

 

1. Only 207 of the facilities matched (_merge==3). The remainder were only in the 1996 "master" file (_merge == 1) except for one facility in the 1999 "using" file (_merge == 2). To get rid of the non-matches, we need to use _merge: keep if _merge == 3

You can type this command now and see the result. Stata drops 274 observations from the 1996 data in memory and one observation from the 1999 data. The remaining 207 observations are the matched 1996-1999 facilities. Now we can drop _merge, since we no longer need it.

Back to question 

 


 

2. The resulting file should have facid (common to both files), 4 more variables from each input file, and _merge, for a total of 10 variables

.

Back to question 

 


 

3. Stata by default keeps the values of the "master" file when a merge involves variables with the same name in both files. We planned for this default by renaming the 4 method variables to include the survey year in both files. If we had not done that, the values in the 1999 "using" file would have been lost for matching facilities.

 The merge command includes an update option to override this default. However, you probably won't need that. It's best to always rename your variables so that the master and using files have unique variable names.

Back to question 

 


 

4. The merged file would have one fewer variables than you expected.  That is the only way you would know; Stata does not issue a warning.

Back to question 

 


 

5. In prior versions (8-10) of Stata, you could merge 3 or more files in a single merge command. Frankly, we thought that merging more than 2 files at a time was error prone. There are many ways to miss problems in a merge, and these are compounded when more files are merged at the same time.

So, we're happy to see that in Stata 11 and after you can only merge 2 files together at a time (using the new syntax).

Back to question


 

Review again?

 

Another topic?


Wink Plone Theme by Quintagroup © 2013.

Personal tools
This is themeComment for Wink theme