# Introduction to Stata

This tutorial is function-oriented, focusing on the data-management tasks most needed by data analysts working with sample survey data. It works up from basic tasks, such as how to drop variables, to the tasks needed for complex file organization, such as how to reshape and merge data files.

There is also a section on Analyzing Data from Sample Surveys. It explains which sampling weight command to use and whether to use svy or robust cluster to adjust for survey design effects.

These web pages assume that you are using Stata Version 13 or above for Windows.

If you would like to run the example commands, you need to copy the example Stata data files to your local PC. Click download sample data for instructions. If you're using a computer at the Carolina Population Center, the data are available to you on Q:\temp\statatut\.

See Stata Windows environment below for an orientation to the Windows interface. It also gives you sources of help beyond this tutorial.

Other resources available to help you learn Stata include the UCLA's IDRE Stat website, several introductory guides in the CPC library and others available from Stata Press, and Stata Corporation's Resources for learning Stata.

**SAS Users:** the SAS User's Guide to Stata may help you make the transition from SAS to Stata.

### A simple example

**input:**putting data into Stata**generate:**creating a new variable**list (or browse):**viewing the contents of memory**save:**saving memory in a permanent Stata-format file**log:**capturing the results of Stata commands for printing- Stata's default actions
- how data are stored in RAM

### Using permanent Stata data files

**clear:**clearing Stata's memory**set memory:**allowing enough space for the data**use:**copying the file into memory**save,replace:**saving changes

### Describing the data

**describe:**names of variables**summarize:**the mean, min, and max of variables**codebook:**more univariate statistics**tabulate:**frequencies and cross-tabulations- data types and data storage

### Groups and subsets of data

**if:**do command for a subset of observations**sort:**order observations by the values of a variable**by:**do command for groups of observations (requires sort)**in:**do command for a range of observations- relational, logical, and arithmetic operators
- missing values

### Changing the data

**replace:**change the values of a variable**recode:**change the values of a variable**rename:**change a variable name**label:**labeling variables, values, and data files**drop:**drop one or more variables**drop if:**drop observations conditional on one or more variables**edit:**editing the data file directly

### Data cleaning

**do:**storing and executing commands in do-files**#delimit:**writing long commands in do-files**/* */:**documenting your do-files- finding and fixing outliers
**duplicates:**finding duplicate ids

### Adding summary statistics to a data file

**egen:**add summary statistics to each observation**collapse:**create file of summary statistics by groups

### Combining data files

**one-to-one:**same observations in each file**match merge:**many observations in each file match, but some don't**one-to-many:**hierarchical data, analysis at the**lower**level**merging summary statistics:**hierarchical data, analysis at the**higher**level**appending:**adding observations with the same variables

### Reshaping a data file

**reshape long:**change variables to observations**reshape wide:**change observations to variables

### Documenting Your Work

### Graphics

**histogram**with normal curve fitted to it**graph box**plot displayed for two groups**scatter**plot**twoway**scatter plot with regression line- other resources for learning graphics in Stata

### Analyzing Data from Sample Surveys

**Data characteristics:**stratification, clustering, sampling weights**Choosing the correct weight syntax:**pweight, aweight, fweight, or iweight?**Commands to analyze survey data:**svy, robust cluster, subpop**Logistic Regression Example:**adjust, svylc, svytest**Common errors**and how to avoid them

### Labor-Saving Techniques

### Miscellaneous Tips and Tricks

**getting help for Stata****updating Stata****importing and exporting data files****working with large files****shrinking large data files****error messages**and what they mean**missing values**and how to work with them**the by command**in detail**exporting results**to MS Office**the parmest command:**saving Stata results**temporary files****looping:**foreach in detail**looping with while****precision**and data storage

Authors: Phil Bardsley, Kim Chantala, and Dan Blanchette