Skip to content. | Skip to navigation

Personal tools

T-test and difference in proportions

T-test program in SAS

The following is an example program written in SAS code that tests the differences between two means from two data sets that are a sample of two independent samples. In this example the means are taken from results from two waves of USDA data collections. These results were generated in one of two ways:

1) Using SAS with SUDAAN also installed: from a SUDAAN descript procedure

2) Using Stata: from saving the results after svy: mean or svy: tab using the parmest command

(These are results data sets in which each obs is results from a different variable.) NOTE: Do not use this formula on results obtained from one wave of data / the same sample of people. This formula only works for between-wave comparisons.

SAS code

** SAS code **;
** Test the difference between 1977-78 data and 1994-96 data. **;
data temp;
 merge means77(rename= (nsum=nsum77 mean=mean77 semean=semean77))
       means96(rename= (nsum=nsum96 mean=mean96 semean=semean96));
  by variable;

/** "nsum" is the number of observations with non-missing values that were used to create the mean.
    "semean" is the standard error of the mean.  ***/

*** create standard deviation from standard error ***;
  std77= semean77*sqrt(nsum77);
  std96= semean96*sqrt(nsum96);

*** create t-statistic for difference between 2 means ***;
  t7796= (mean77-mean96)/sqrt((std77**2/nsum77)+(std96**2/nsum96));

*** create 2-tailed probability for t-statistic ***;
  p7796= 2*(1-probnorm(abs(t7796)));

/** You may want to round the t-statistic and the p-value to the 3rd decimal place **/
  t7796= round(t7796, 0.001);
  p7796= round(p7796, 0.001);
run;

** functions used in this program:
sqrt(argument) — takes the square root of whatever is inside the parentheses
abs(argument) — takes the absolute value of whatever is inside the parentheses

Here is an example of a T-test of results from one wave of data collection (within sample t-test) using SUDAAN: SUDAAN T-test

 

T-test Stata code

** T-test Stata code **;
** Test the difference between 1977-78 data and 1994-96 data. **;
use means77
rename nsum nsum77 
rename mean mean77 
rename semean semean77
merge variable using means96
rename nsum nsum96
rename mean mean96 
rename semean semean96

/** "nsum" is the number of observations with non-missing values that were used to create the mean.
    "semean" is the standard error of the mean.  ***/

/*** create standard deviation from standard error ***/
gen std77= semean77*sqrt(nsum77)
gen std96= semean96*sqrt(nsum96)

/*** create t-statistic for difference between 2 means ***/
gen t7796= (mean77-mean96)/sqrt((std77^2/nsum77)+(std96^2/nsum96))

/*** create 2-tailed probability for t-statistic ***/
gen p7796= 2*(1-norm(abs(t7796)))

/** You may want to round the t-statistic and the p-value to the 3rd decimal place **/
replace t7796= round(t7796, 0.001)
replace p7796= round(p7796, 0.001)

Difference in proportion test program

The following is an example program written in SAS code and then Stata code that tests the differences between two proportions from two data sets that represent two independent samples. In this example the proportions are taken from two data sets of two waves of USDA data collections. These results were generated from a SUDAAN descript procedure or Stata svy: mean command. (These are results data sets in which each obs contains results from a different variable.) Do not use this formula on results obtained from one wave of data / the same sample of people. This formula only works for between-wave comparisons.

SAS code

** SAS code **;
** Test the difference between 1977-78 data and 1989-91 data. **;

/************************************************************************
 "p77" is proportion of whites (1) to non-whites (0) in the 1977 data
 "p89" is proportion of whites (1) to non-whites (0) in the 1989 data

 NOTE: a proportion of a 0/1 categorical variable can be obtained by taking
       the mean of the variable.  This gives you the proportion of 1's.

 "se77" is standard error of the proportion of whites in the 1977 data
 "se89" is standard error of the proportion of whites in the 1989 data

 "var7789" is the variance 
 "z7789" is the z-score 
 "p7789" is the p-value 
************************************************************************/

data temp;
 merge means77(rename= (nsum=nsum77 mean=p77 semean=semean77))
       means89(rename= (nsum=nsum89 mean=p89 semean=semean89));
  by variable;

/** create the variance by squaring the standard errors and adding them together **/
  var7789= (semean77**2 + semean89**2);

/** create the z score by taking the difference of the 2 proportions and dividing
      by the square root of the variance **/
  z7789= (p77-p89) / sqrt(var7789);

** create the p-value **;
  p7789= 2 * probnorm(-abs(z7789));

/** You may want to round the z-score and the p-value to the 3rd decimal place **/
  z7789= round(z7789, 0.001);
  p7789= round(p7789, 0.001);
run;

Stata code

** Difference in proportion Stata code **;
** Test the difference between 1977-78 data and 1989-91 data. **;

use means77
rename nsum nsum77 
rename mean p77 
rename semean semean77
merge variable using means89
rename nsum nsum89
rename mean p89 
rename semean semean89

/** create the variance by squaring the standard errors and adding them together **/
gen var7789= (semean77^2 + semean89^2)

/** create the z score by taking the difference of the 2 proportions and dividing
      by the square root of the variance **/
gen z7789= (p77-p89) / sqrt(var7789)

/** create the p-value **/
gen p7789= 2 * norm(-abs(z7789))

/** You may want to round the z-score and the p-value to the 3rd decimal place **/
replace z7789= round(z7789, 0.001)
replace p7789= round(p7789, 0.001)

/* You may want to automatically tag significant results */
gen str1 abc7789= "A"  if p7789 <= 0.01  
gen str12 cst77= string(est77, "%10.0g") +"%" + " " + abc7789
/* cst77 is a character variable. string(est77,"%10.0g") reads the
   numeric values of est77 and makes them string (character) values.
   The plus (+) symbols concatenate the strings together into one
   long variable. e.g. est= 12.03, abc7789= "A" then cst77= "12.03% A" */

Questions or comments? If you are affiliated with the Carolina Population Center, send them to Phil Bardsley.