SAS
|
Stata
|
In SAS operators can be symbols or mnemonic equivalents such as:
&
or
and
For many situations in SAS order doesn't matter:
<=
can be:
=<
and
>=
can be:
=>
|
Most operators are the same in
Stata as in SAS, but
in Stata operators do not have mnemonic equivalents. For example, you have
to use the ampersand (&) and not the word "and":
This works:
var_a > = 1 & var_b <= 10
where this does not:
var_a > = 1 and var_b <= 10
These are the operators that are different in Stata:
Symbol Definition & and | or >= greater than or equal to <= less than or equal to == equality (for equality testing) ! = does not equal ~ not ^ power
NOTE: Symbols have to be in the order shown: " >= " not " => " .
|
Range of values:
if 1 <= var_a <= 10
or:
if var_a in(1,2,3,4,5,6,7,8,9,10)
|
if var_a >= 1 & var_a <= 10
or:
if inrange(var_a,1,10)
or:
if inlist(var_a,1,2,3,4,5,6,7,8,9,10)
|
Referencing multiple variables at a time:
Say the following variables are in a
data file in the order shown:
var1 var2 var3 age var4 var5
Then you could code them as:
var1--var5
To SAS, this means "all variables that are positionally between
var1 and var5," which would include the variable age. |
Referencing multiple variables at a time:
var1-var5
To Stata, this means "all variables that are positionally between
var1 and var5." Notice that there is only one hyphen ( - ).
|
Referencing multiple variables at a time:
var1-var5
is the same as:
var1 var2 var3 var4 var5
no matter the positions of the variables are in the observation.
Using a colon selects variables containing the same prefix:
var:
could represent:
var1 var2 var10 variable varying var_1
|
Referencing multiple variables at a time:
var?
The question mark ( ? ) is a wild card that represents one
character
in the variable name. It could be a number, a letter, or an underscore
( _ ).
var*
The asterisk ( * ) is a wild card that represents many characters
in the variable name. They could be numbers, letters, or underscores. Thus
var*
could represent:
var1 var2 var10 variable varying var_1
|
| To save the contents of the Log
window and/or Output window, go
to that window and click on the menu bar's "File", "Save". In SAS batch mode these files are automatically generated for
you. |
To save the contents of the results window, start logging to a log file
BEFORE
you submit commands that you want logged. Open a log file by
clicking
on the icon in the tool bar that looks like a scroll and a traffic
light. A " *.log "
file is a simple ASCII text file; a " *.smcl " file is formatted
with html-like tags.
You can also use the log command:
log using "d:\mydata\mydofile.log", replace
NOTE: The "replace" option simply tells Stata to overwrite the log file
if it already exists. This is helpful when you have to run a
do-file over and over again.
More on
this in the Stata tutorial.
|
libname in8 v8 "d:\mydata\";
data new; set in8.mySASfile; run;
or, starting in SAS 8:
data new; set "d:\mydata\mysasfile.sas7bdat"; run;
|
use "d:\mydata\myStataFile.dta"
You can also click on the "open file" icon and select your dataset.
More on this in the Stata tutorial.
|
Save the dataset newer to d:\mydata\ :
data in8.newer; set new; run;
|
save "d:\mydata\newer.dta"
To overwrite the dataset newer if it already exists:
save "d:\mydata\newer.dta" , replace
You can also click on the "save" icon to save your dataset.
More
on this in the Stata tutorial.
|
proc contents;
On selected variables:
proc contents data = in8.newer (keep = id age height); run;
|
describe
On selected variables:
describe id age height
More on this in the Stata tutorial.
|
proc means;
On selected variables:
proc means; var age height; run;
or
proc univariate; var age height; run;
|
summarize
On selected variables:
summarize age height
If you want variable labels and a proc univariate style output try:
summarize age height, detail
or:
codebook age height
More on this in the Stata tutorial.
|
proc surveymeans; cluster sampunit; strata stratum; var age height; weight sampwt; run;
|
Stata version 8:
svyset sampunit [pweight = sampwt], strata(stratum)
svymean age height
More on
this in the Stata tutorial.
|
Analyze a subpopulation by implementing the domain option:
proc surveymeans; cluster sampunit; strata stratum; domain female; var age height; weight sampwt; run;
|
Stata version 8:
Analyze a subpopulation by implementing the subpop option:
svymean age height, subpop(female)
More on
this in the Stata tutorial.
|
proc freq;
|
tabulate
or, for just checking out your dataset, try:
codebook
More on this in the Stata tutorial.
|
A series of 1-way tables:
proc freq; tables var1 var2; run;
|
A series of 1-way tables:
tab1 var1 var2
More on this in the Stata tutorial.
|
A 2-way table:
proc freq; tables var1*var2; run;
|
A 2-way table:
tab2 var1 var2
More on this in the Stata tutorial.
|
Starting in SAS 9:
proc surveyfreq; cluster sampunit; strata stratum; tables females*var1*var2; weight sampwt; run;
When using proc surveyfreq the domain/subpop variable needs to be included in the tables statement.
|
Stata version 8:
svyset sampunit [pweight = sampwt], strata(stratum)
svytab var1 var2, subpop(females)
More on
this in the Stata tutorial.
|
proc surveyreg; cluster sampunit; strata stratum; model depvar = indvar1 indvar2 indvar3; weight sampwt; run;
Proc surveyreg does not have a way of dealing with subpopulations.
Using "by" or "where"
will not suffice as they will compute incorrect standard errors.
|
Stata version 8:
svyset sampunit [pweight = sampwt], strata(stratum) svyregress depvar indvar1 indvar2 indvar3, subpop(females)
|
Starting in SAS 9:
proc surveylogistic; cluster sampunit; strata stratum; model depvar = indvar1 indvar2 indvar3; weight sampwt; run;
Proc surveylogistic does not have a way of dealing with subpopulations.
Using "by" or "where"
will not suffice as they will compute incorrect standard
errors.
|
Stata version 8:
svyset sampunit [pweight = sampwt], strata(stratum)
svylogit depvar indvar1 indvar2 indvar3, subpop(females)
|
proc print;
On selected variables:
proc print; var id age height; run;
On selected variables and a limited range of observations:
proc print data = new (firstobs = 1 obs = 20); var id age height; run;
|
list
On selected variables:
list id age height
On selected variables and a limited range of observations:
list id age height in 1/20
More on this in the Stata tutorial.
|
/* comment */ * comment ;
|
Stata version 8:
/* comment */ * comment // comment
To continue a line:
///
For example:
list hhid personid gender age weight height /// race income date
More on this in the Stata tutorial.
|
Create a numeric variable with a default length of 8 bytes:
var1 = 1234;
Create a numeric variable with the minimum allowable length (3 bytes):
length var1 3; var1 = 1234;
|
generate var1 = 1234
NOTE: the default numeric type is "float." The statement above
is relying on that default.
It could have been written explicitly as:
generate float var1 = 1234
"float" stands for "floating point decimal."
You could more wisely save storage space by specifying:
gen int var1 = 1234
"int" stands for "integer."
More on this in the Stata tutorial.
|
Create a character variable with a length of 3 bytes:
name = "Bob";
|
Generate a string variable with a length of 3 bytes:
gen str3 name = "Bob"
|
Increase the variable length to allow for 5 characters:
data new; length name $5; set new;
Change the values of numeric and character variables.
var1 = 123456; name = "Bobby"; run;
|
replace var1 = 123456
Stata automatically increases the storage type if necessary. To
change
the storage of a variable manually, use the recast command.
replace name = "Bobby"
Stata automatically increases length to 5
More on this in the Stata tutorial.
|
Example of an if-then statement:
if var1=123456 then var2=1;
|
The condition follows the
command:
replace var2 = 1 if var1 == 123456
Notice that Stata requires two equals signs when testing equality.
|
Example of an if-then do loop:
if age <= 10 then do; child = 1; parent = 0; end;
|
replace child = 1 if age <= 10 replace parent = 0 if age <= 10
Since each command is executed on all observations before the next
command is executed, the "if-then do loop" is not an option.
Stata does have excellent looping tools: foreach, forvalues, and while.
More on this in the Stata tutorial
|
Example of an if-then-else:
if 0 <= age <= 2 then agegp = 1; else if 2 < age <= 10 then agegp = 2; else if 10 < age <= 20 then agegp = 3; else if 20 < age <= 40 then agegp = 4; else agegp = . ;
|
For the same reason "if-then-do
loops" (above) are not possble in Stata, the same goes for
"if-then-else". But here is a way of doing the same
thing. In this example " agegp == . " is used to
simply highlight the fact that it has not been assigned a value, just
like the "else" does in "if-then-else":
gen agegp = . replace agegp = 1 if agegp == . & age >= 0 & age <= 2 replace agegp = 2 if agegp == . & age > 2 & age <= 10 replace agegp = 3 if agegp == . & age > 10 & age <= 20 replace agegp = 4 if agegp == . & age > 20 & age <= 40
Better done with the recode command which can also create value labels:
recode age ( 0/2.9999 = 1 "0 to 2 year olds") /// ( 3/10.9999 = 2 "3 to 10 year olds") /// (11/20.9999 = 3 "11 to 20 year olds") /// (21/40.9999 = 4 "21 to 40 year olds") /// ( else = . ) , gen(agegp) test
The test option checks to see if the ranges overlap.
Since
recode's ranges are >= and <= , adding .9999 to the upper range
ensures that fractional values are handled correctly.
|
Drop variables var1, var2, and var3:
data new(drop = var1 var2 var3); set new; run;
|
Drop variables var1, var2, and var3:
drop var1 var2 var3
More on this in the Stata tutorial.
|
Keep variables var1, var2, and var3:
data new(keep = var1 var2 var3); set new; run;
|
Keep variables var1, var2, and var3:
keep var1 var2 var3
|
Keep observations / subsetting if statement:
data new; set new; if var1 = 1 then output; run;
|
Keep observations
keep if var1 == 1
|
Delete observations:
data new; set new; if var1 = 1 then delete ; run;
|
Drop observations:
drop if var1 == 1
More on this in the Stata tutorial.
|
Loop over a variable list (varlist):
data new(drop = i); set new; array raymond {4} var1 var2 var3 var4; do i = 1 to 4; if raymond{i} = 99 then raymond{i} = .; end; run;
|
foreach i in var1 var2 var3 var4 { replace `i' = . if `i' == 99 }
NOTE: Notice that the quote to the left of the letter " i " is a left
quote ( ` ).
The left quote is located at the top of your keyboard next to the "!
1"
key. In this example i is a local macro variable that exists only for
the
duration of the foreach command so it does not need to be dropped like
the
variable i in the SAS code.
More on this in the Stata tutorial.
|
Create variable labels:
label age = "age in years" height = "height in inches";
|
label var age "age in years" label var height "height in inches"
More on this in the Stata tutorial.
|
Define a format:
proc format; value yesno 1 = "yes" 2 = "no"; run;
Assign the format to a variable:
data newer; set newer; format smokes yesno.; run;
|
Define a format. These are called "value labels":
label define yesno 1 "yes" /* */ 2 "no"
Assign the value label to a variable:
label value smokes yesno
More on this in the Stata tutorial.
|
Assign formats defined by SAS to a variable:
format interview_date mmddyy8.;
|
Assign formats defined by Stata to a variable:
format interview_date %n/d/y
NOTE: The letter "n" in "%n/d/y" stands for "number of the month". "%m/d/y" would use the name of the month.
|
title "Nutritional Intakes for 12-18 year olds";
|
Since the Results window/log file is a mix of both the log and the Output window Stata doesn't
need a title statement. Titling can be accomplished with a comment.
/* Nutritional Intakes for 12-18 year olds */
|
proc sort data = new out = newer; by id; run;
|
sort id
More on this in the Stata tutorial.
|
proc transpose data = new (keep = age edu rel sex id lineno) out = tr_new; by id; run;
|
reshape long age edu rel sex, i(id) j(lineno)
More on this in the Stata tutorial.
|
data newer; set newer; by id; if first.id = 1 then f_num = 1; if first.id = 1 and last.id = 1 then s_num = 1; if last.id = 1 then l_num = 1; run;
|
by id: gen f_num = 1 if _n == 1 by id: gen s_num = 1 if _n == 1 & _N == 1 by id: gen l_num = 1 if _n == _N
Stata's "_n" is equivalent to SAS's "_n_" in that it is equal
to the observation number; but when inside a by command "_n"
is equal to 1 for the first observation of the by-group,
2 for the second observation of the by-group, etc.
Stata's "_N" is equal to the number of observations
in the dataset except in a by-command when it is
equal to the total number of observations in the by-group.
More on this in the Stata tutorial.
|
Count the number of boys within an id by-group:
data new; set newer; by id; retain count 0; if first.id then count = 0; if gender = 1 and age<= 18 then count = count+1; run;
|
Count the number of boys by id:
by id: gen count = sum(gender == 1 & age<= 18)
The sum function creates a running sum of the expression inside it.
|
data both; merge new(in = a) in8.newer(in = b); by id; if a = 1 and b = 1; run;
|
merge id using "d:\mydata\newer.dta" keep if _merge == 3
Stata automatically creates the variable "_merge" after a merge.
Stata will
not merge on another dataset if _merge already exists on one of the
datasets.
The dataset in memory is the "master" dataset. The dataset that
is being merged on is the "using" dataset.
Unlike SAS, variables shared by the master dataset and the using
dataset will not
be updated (values overwritten) by the using dataset. Like SAS,
the formats, labels,
and informats of variables shared by the master dataset and the using
dataset will be
defined by the master dataset. Remember that the master always
wins. Use the -update- option to overwrite data in master file.
More on this in the Stata tutorial.
|
Concatenate two datasets:
data both; set new in8.newer; run;
|
append using "d:\mydata\newer.dta"
More on this in the Stata tutorial.
|
Sort datasets in order to prepare them for a merge:
Sort permanently stored datasets and create new, sorted copies in the work library:
proc sort data = in8.individual out = indiv; by id; run;
proc sort data = in8.household out = house; by id; run;
data temp2; merge house(in = a) indiv(in = b); by id; run;
|
Sort datasets in order to prepare them for a merge:
Create a local macro variable to represent a filename
for Stata to use in temporarily storing a data file on the computer's hard drive
if requested to do so later:
tempfile indiv
use "d:\mydata\individual.dta"
sort id
Save the dataset that's currently in memory to a temporary filename in
Stata's temp directory.
This file will be deleted when Stata is exited just like
a dataset in SAS's work
library:
save "`indiv'"
use "d:\mydata\household.dta" sort id merge id using "`indiv'"
More on this in the Stata tutorial.
|
Create a local macro variable "ver":
%let ver = 7; version = &ver.;
|
local ver = 7 gen version = `ver'
Notice that to evaluate the local macro variable "ver" a left quote " `
" is used
and then a right quote " ' ". The left quote is located on your
keyboard next to the
"! 1" key. |