Combining data files


Introduction to merging

Merging is the process of adding variables from a permanent file on disk to the data in memory. The observations in the two files may be on the same level, for example, they may both be from surveys of the same people who were interviewed at different times. Or, the observations may be from surveys on different levels, such as mothers and their children. In either case, the files have one or more identifying variables in common.

Many people confuse merge with append. Append combines two files with completely different observations but the same variables, while merge combines files with the same or related observations but different variables. The append command is explained fully in a later example.

The merge command is simple to use, but there are a wide variety of situations, and corresponding pitfalls, associated with merging. The following three examples cover the more common situations.

One-to-one merging

Match merging

One-to-many merging

Caution (if using a version of Stata prior to 12)!  Merging and appending both add data to the data already in Stata's memory. It is easy to ask Stata to put more data in memory than you have allowed room for. Add together the sizes of all the files you want to merge or append before you combine them, clear and set memory if necessary, then combine the files. If not, you may get the message "No room to add more variables/observations."

