A simple program

Writing a simple program

The term "program" has a special meaning in Stata. It is a set of commands that starts with program define and ends with end. In between you can put any of the commands you're used to using in Stata, and you can use many other commands that are specific to programs. These are discussed in detail in the PDF document "Stata Programming Reference Manual".

The purpose of this page is to give you a simple example of a program and to show you the power of a program to save you time and effort.

Suppose that you want to add household income and assets, variables in the household-level data, to each individual in the household. Using the methods discussed in One-to-many merging you would merge the household data onto the individual data using houseid as the merge key. Now let's suppose you want to do this for 50 countries to look at time and regional differences. It would be cumbersome to copy and paste the code 50 times. But more importantly, if you decided to add some code, you'd have to add it 50 times.

Instead, you can write a program and run it 50 times. Each time you run it you only need to change the name of the country and the year of the survey. If you want to add a command, you only add it once, and it is automatically run on all 50 countries when you run the program. Here's an example:

capture program drop mergedata
program define mergedata
   use "c:/data/`1'/`2'/individual.dta",clear
   merge m:1 houseid using "c:/data/`1'/`2'/household.dta", ///
      keepusing(houseid hhincome assets)
   drop if _merge == 2 // households without individuals in sample
   drop _merge
   save "c:/data/`1'/`2'/merged.dta", replace
end

mergedata Peru 2000
mergedata Peru 2005
mergedata "Costa Rica" 2001
mergedata "Costa Rica" 2004
mergedata Brazil 1998
mergedata Brazil 2003 
mergedata Brazil 2008
 (etc.)

Questions:

1. Which part is the "program"? Answer.

 

2. What does "capture program drop mergedata" do? Answer.

 

3. There are numbers "1" and "2" scattered around the program. What do they do? Answer.

 

4. So how do I send values to these local macros `1' and `2'? Answer.

 

5. Why does "Costa Rica" have quotes around it? Answer.

 

6. The slashes ("/") in the paths are backward from the way I usually type them in Windows. Is that necessary? Answer.

 

7. How to I run a program? Answer.

 

8. What if I have an error in my program? How do I see what values went into `1' and `2'? Answer.

 


Answers:

 

1. Each command between and including program define mergedata and end is part of the program. Here we've chosen to name the program "mergedata", but you can name it almost anything. It's best to avoid names that are already taken, like "merge". You can check whether help turns up a command when selecting a name for your program, and then you'll know you need to try a different name. 

Back to question 

 


 

2. The capture command allows you to give a command that might otherwise fail and continue anyway. In this case, the command is program drop mergedata. If there is no program in memory already called "mergedata", the command to drop the program from memory will fail. By capturing the error message from that situation, you can continue working.

Back to question 

 


 

3. The numbers `1' and `2' are called "local macros". They are temporary variables that hold values you send to them. The first value you send goes into `1' and the second value you send goes into `2'. You can have many more local macros numbered `3', `4', etc. if you need them.

Notice that these numbers have a backward apostrophe to their left side. This character is on the key in the upper left corner of the English keyboard along with the tilda (~) character. The character to the right of the number is an apostrophe (also called "single quote"), which shares a key with the quote (").

This is one way to send values to a program, but not the only way. Others are discussed in the programming reference.

Back to question


 

4. The lines starting with mergedata Peru 2000 that come after end are the commands to run the program. The first word is the new command mergedata which we defined above it. Following the command are the two values we want to send to the local macros `1' and `2'.

Back to question 

 


 

5. "Costa Rica" has quotes because it has a space in the middle. We want mergedata to understand that the value going into local macro `1' is two words. Without the quotes, "Costa" will go into `1', "Rica" will go into `2', and 2001 will have nowhere to go because we didn't put a `3' in our program. Back to question.

 

 


 

6. Stata doesn't normally care which way you type the slashes. In this case, though, where we have a local macro in the file path, we must type the slashes as shown in the example.

Back to question 

 


 

7. To run your program you need to first define it to Stata, that is highlight and execute the commands from capture program drop through end. You will see the commands echoed in the Results Window, but you won't otherwise get any indication from Stata that it has stored your program in memory. Next, you can highlight and execute one line at a time that calls the program (in this case the commands like mergedata Peru 2000) so you can check the results one merge at a time. Or, if you're feeling very self-confident, you can execute all 50 of them at once!

Back to question 

 


 

8. Debugging a program can be tricky. The best tool available is set trace on in combination with set tracedepth

     set tracedepth 1
     set trace on

set trace on shows each of your commands followed by the values that were substituted in your local macros. It takes a minute to figure out how to read this, but it's your best tool and well worth that minute of staring at it.

set tracedepth 1 (its smallest value) allows you to see only the substitutions of values in your program. If you forget to set tracedepth, you'll see deep down into all the commands you're calling as the substitutions scroll by in the Results Window. It's slow and not particularly useful.

Put the set tracedepth command anywhere before the set trace command. You can put the set trace command in front of the program, or in front of your first line that calls the program, to have it apply to all lines of the program. If you have a long program and know approximately where your problem is, you can turn trace on for just the one or two lines of code that you think have an error, then turn it off again like this:

set trace on
   merge m:1 houseid using ...
set trace off

That way you have fewer lines of code to sift through in your Results Window.

Once you figure out how to fix your program, remember to set trace off so you don't have to look at the extra stuff scrolling by in the Results Window.

Back to question

 


Review again?

 

Another topic?


Wink Plone Theme by Quintagroup © 2013.

Personal tools
This is themeComment for Wink theme