Appending data files


Adding observations with the same variables

The need to append doesn't arise often in surveys. Usually it is needed during data entry when new questionnaires arrive in the survey office in batches, are entered, and the resulting files must be combined with files from previous batches of surveys.

For this example we'll simulate the situation in which a new batch of questionnaires has been entered. Facility data from 19 of 20 regions have been entered and combined in one file, named fac19.dta. We've received the remaining facilities' data and need to append it to the original 19.

Copy these commands into a do-file editor and run them.

/* Use the original facility file with 19 regions */

use "q:\utilities\statatut\fac19.dta"
tabulate region

/* Append the file with one remaining region (number 7) */

append using "q:\utilities\statatut\newfac.dta"
tabulate region



1. Why don't we need to sort before appending? Answer.

2. Several notes appear in the Stata log about labels already being defined. Is this a problem? Answer.

3. When using append, the two files should have identical variable names. How would I know if, by mistake, a variable in one file had a different name from a variable in the other file being appended? Answer.




1. We don't sort before appending because we do not need to match on any identifiers. We use append when we want to add new observations, so no matching is involved.

2. No, these notes do not indicate a problem with the append. They mean that the file in memory has label definitions with the same names as those in the "using" file on disk. We expect that, since both files have identical variables and, therefore, identical labels defined for their values.

3. The only way to catch this situation is to describe both files before the append. They should have the same number of variables. After the append command, the resulting file should also have that number of variables. If the files have the same number before but one more variable after the append, you know that a variable in one of the input files had a different name.


