Skip Navigation

UNC Carolina Population Center

 

Appending data files



Adding observations with the same variables

The need to append doesn't arise often in surveys. Usually it is needed during data entry when new questionnaires arrive in the survey office in batches, are entered, and the resulting files must be combined with files from previous batches of surveys.

For this example we'll simulate the situation in which a new batch of questionnaires has been entered. Facility data from 19 of 20 regions have been entered and combined in one file, named fac19.dta. We've received the remaining facilities' data and need to append it to the original 19.

You can type the following commands into Stata, or copy them into a do-file and run them.


/* Use the original facility file with 19 regions */

clear
use "t:\statatut\fac19.dta"
ta region

/* Append the file with one remaining region (number 7) */

append using "t:\statatut\newfac"
ta region
 

Questions:

1. Why don't we need to sort before appending? Answer.


2. Several notes appear in the Stata log about labels already being defined. Is this a problem? Answer.


3. When using append, the two files should have identical variable names. How would I know if, by mistake, a variable in one file had a different name from a variable in the other file being appended? Answer.






Answers:


1. We don't sort before appending because we don't need to match on any identifiers. We use append when we want to add new observations, so no matching is involved. Back to question.



2. No, these notes don't indicate a problem with the append. They mean that the file in memory has label definitions with the same names as those in the "using" file on disk. We expect that, since both files have identical variables and, therefore, identical labels defined for their values. Back to question.



3. The only way to catch this situation is to describe both files before the append. They should have the same number of variables. After the append command, the resulting file should also have that number of variables. If the files have the same number before but one more variable after the append, you know that a variable in one of the input files had a different name. Back to question.



Review again?
Another topic?
Questions or comments?  If you are affiliated with the Carolina Population Center, send them to Phil Bardsley