SAS Programming Tips for Working With Very Large Data Sets
What is a very large data set?
Some symptoms:
Examples from a current project
How can you save time and space?
Some rules:
length personid mothrid fathrid spid74-spid98 6; length emigdth chnum migrat97 3; length child1-child17 $ 4;
data tmp; set parntliv(where=(mother='1' & malive=' ')); . .
data tmp; set parntliv; where mother='1' & malive=' '; . .
data tmp; set parntliv; if mother='1' & malive=' '; . . proc datasets library=work; delete tmp tmp2; run;
proc datasets library=work;
-----Directory-----
Libref: WORK
Engine: V8
Physical Name: /tmpsas2/SAS_work247300006C54_gromit
File Name: /tmpsas2/SAS_work247300006C54_gromit
Inode Number: 16384
Access Permission: rwxr-xr-x
Owner Name: cpcstone
File Size (bytes): 512
# Name Memtype File Size Last Modified
--------------------------------------------------
1 TMP DATA 4118159360 30JUL2001:18:01:09
2 TMP2 DATA 39460864 30JUL2001:18:06:46
3 TMP3 DATA 4118159360 30JUL2001:18:25:55
delete tmp tmp2; run;
NOTE: Deleting WORK.TMP (memtype=DATA).
NOTE: Deleting WORK.TMP2 (memtype=DATA).
. . . data tmp2; set tmp; . . run; proc .... data=tmp2; run; data tmp; /* <---- WORK.TMP is reused to save space in WORK library */ set tmp2; . . .
/*first make a copy of in.indiv and then merge with inw.womfixed*/
data tmp;
set in.indiv(keep=personid dob muborn wherborn county);
run;
data women;
merge inw.womfixed(in=a)
tmp;
by personid;
if a=1;
run;
/*merge the permanent data sets directly*/
data women;
merge inw.womfixed(in=a)
in.indiv(keep=personid dob muborn wherborn county);
by personid;
if a=1;
run;
data out.smpl5pct; set in.indiv; if ranuni(2106)<.05 then output; /* 5% random sample of in.indiv*/ run; Erika Stone last modified: Feb. 21, 2002 Questions or comments? If you are affiliated to the Carolina Population Center, send them to Phil Bardsley
|

