Introduction to merging in SAS
What is a match-merge?A match-merge
Prerequisites for a match-merge
From now on when we use the term "merge" it will mean "match-merge."
Examples of mergingWhen we want to combine two data sets by merging and we know beforehand that each observation in each data set has a match in the other data set we can do a very straight-forward merge. Here are a couple of examples. Example 1. Often different data on the same cases are stored in two or more different data sets. For example, you may have two person level data sets on exactly the same individuals but containing different information on those individuals. To combine the data on those individuals into one data set requires a merge. Here is the way the merge would work using the simplest example:
Data set one Data set two This data step does a merge of data set one and two by ID: data three; The output data set three looks like this: ID A B C The data sets being merged in the example above contain different data on the same cases and the variable ID is a unique identifier (no duplicates).
Often we want to combine data from two data sets where each observation has a match on the BY-variable(s) but
there are duplicate values of the BY-variable(s) in one of the data sets. For example, we may want to add
household level variables (in one data set) to persons from those same households (in a second data set).
Or, we may want to add community level variables (in one data set) to the households of those same communities (in a second data set).
We want to add the household level variables (in hhcps99) to the persons (in percps99). The two input data sets to the merge are permanent SAS data sets in the same directory. The merged data set will be a temporary SAS data set named indiv. libname in '/afs/isis/depts/cpc/computer/stone/data/class01/';
More on MERGE: when matching is not perfectWhat happens when there is no match on the BY variable(s) in a match-merge? Here is a simple example: Data set one Data set twoThe second observation (ID=20) in data set one does not have a match on ID in data set two. A match-merge of one and two by ID like this, data three;results in an output data set three which looks like this: ID A B C
The IN=variablesWhat if you want to keep in the output data set of a merge only the matches (only those observations to which both input data sets contribute)? SAS will set up for you special temporary variables, called the "IN=" variables, so that you can do this and more.Here's what you have to do:
So to keep only the matches in the match-merge above, ask for the IN= variables and use them: data three; Data set three will now consist of only the matches on ID: ID A B COnly the matches are kept in the output data set above because of the way the IN= variables X and Y take on values in the PDV:
data x1y1 /* x1y1, x1y0, x0y1 are your choices of data set names */
How do you know when you merged correctly?Before doing a merge, ask yourself these questions:
Test your program: print and examine observations BEFORE and AFTER the merge. Finally, take time to examine the log carefully.
More on merging in the Tips and Tricks Data Management page:
Another topic? Questions or comments? If you are affiliated with the Carolina Population Center, send them to Phil Bardsley.
|

