### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# ECONOMETRICS ECON 4570

RPI

GPA 3.89

### View Full Document

## 23

## 0

## Popular in Course

## Popular in Economcs

This 36 page Class Notes was uploaded by Ashlynn Little I on Monday October 19, 2015. The Class Notes belongs to ECON 4570 at Rensselaer Polytechnic Institute taught by Staff in Fall. Since its upload, it has received 23 views. For similar materials see /class/224894/econ-4570-rensselaer-polytechnic-institute in Economcs at Rensselaer Polytechnic Institute.

## Popular in Economcs

## Reviews for ECONOMETRICS

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/19/15

Kenneth L Simons 25Apr09 Useful Stata Commands for Stata version 10 Kenneth L Simons 7 This document is updated continually For the latest version open it from the course disk space 7 This document brie y summarizes Stata commands useful in ECON 4570 Econometrics and ECON 6570 Advanced Econometrics This presumes a basic working knowledge of how to open Stata use the menus use the data editor and use the dofile editor We will cover these topics in early Stata sessions in class If you miss the sessions I suggest you ask fellow students to show you through basic usage of Stata and get the recommended text about Stata for the course and use it to practice with Stata More replete information is available in three very useful books Lawrence C Hamilton s Statistics with Stata updated for version 9 recommended in past for ECON 4570 Econometrics Christopher F Baum s An Introduction to Modern Econometrics Using Stata recommended for ECON 6570 Advanced Econometrics and A Colin Cameron and Pravin K Trivedi s Microeconometrics using Stata just released and is great and the most thorough of the three Throughout estimation commands specify robust standard errors EickerHuberWhite heteroskedasticconsistent standard errors This does not imply that robust rather than conventional estimates of Varle should always be used nor that they are sufficient Other estimators shown here include Davidson and MacKinnon s improved smallsample robust estimators for OLS clusterrobust estimators useful when errors may be arbitrarily correlated within groups one application is across time for an individual and the NeweyWest estimator to allow for time series correlation of errors Selected GLS estimators are listed as well Hopefully the constant presence of vcerobust in estimation commands will make readers sensitive to the need to account for heteroskedasticity and other properties of errors typical in real data and models Contents Preliminaries for RPI Dot CIO Labs A Loading Data B Variable Lists IfStatements and Options C Lowercase and Uppercase Letters D Review Window and Abbreviating Command Names E Viewing and Summarizing Data El Just Looking E2 Mean Variance Number of Nonmissing Observations Minimum Maximum Etc E3 Tabulations Histograms Density Function Estimates E4 Scatter Plots and Other Plots E5 Correlations and Covariance F Generating and Changing Variables Fl Generating Variables VIUIUIUIAAA UI ONONONONUI Kenneth L Simons 25Apr09 F2 TrueFalse Variables 7 F3 Random Numbers 7 F4 Replacing Values of Variables 7 F5 Getting Rid of Variables 7 F6 Ifthenelse Formulas 8 F7 Quick C 39 39 quot 8 F8 More 8 G Means Hypothesis Tests and Con dence Intervals 8 G1 Con dence Intervals 8 G2 Hypothesis Tests 8 H OLS Regression and WLS and GLS 9 H1 Improved Robust Standard Errors in Finite Samples 9 H2 Weighted Least Sqnares 9 H3 Feasible Generalized Least Squares 10 I PostEstimation C 39 10 Il Fitted Values Residuals and Related Plots 10 I2 Con dence Intervals and Hypothesis Tests 10 I3 Nonlinear Hypothesis Tests 11 I4 Computing Estimated Expected Values for the Dependent Variable 11 IS Displaying Adjusted R2 and Other Estimation Results 12 I6 Plotting Any Mathematical Function 12 I7 In uence Statistics 12 I8 Functional Form Test 13 I9 IY 39 J in itv Tests 13 110 Serial Correlation Tests 13 Ill Variance In ation Factors 14 J Tables of 39 Results 14 J 1 Tables of Regression Results Using Stata s BuiltIn C J 14 J2 Tables of Regression Results Using AddOn C J 15 J2a Installing or Accessing the AddOn C 39 15 J2b Storing Results and Making Tables 16 J2c Understanding the Table Command s Options 16 J2d Wide Tables 16 J26 Storing A HI I lResnlts 17 J2f Saving Tables as Files 17 J2g Clearing Stored Results 17 J2h More Options and Related C J 17 K Data Types When 33 t 33 and Missing Values 17 L Results Returned after C J 18 M DoFiles and Programs 18 N MonteCarlo 39 39 quot 19 0 Doing Things Once for Each Group 20 P Generating Variables for TimeSeries and Panel Data 20 Pl Creating Time Variables that Start from a First Time and Increase by l at Each Observation 20 P2 Creating Time Variables Using a String Date 21 P3 Telling Stata You Have Time Series or Panel Data 21 Kenneth L Simons 25Apr09 P4 Lags Forward Leads and Differences 22 P5 Generating Means and Other Statistics by Individual Year or Group 22 Q Panel Data Statistical Methods 22 Q1 Fixed Effects 7 Simplest Method 22 Q2 Other Panel Data Estimators 23 Q3 TimeSeries Plots for Multiple I J39 39J 39 23 R Probit and Logit Models 23 S Other Models for Limited Dependent Variables 24 S1 Censored and Truncated Regressions with Normally Distributed Errors 24 S2 Count Data Models 24 S3 Survival Models aka Hazard Models Duration Models Failure Time Models 25 T Instrumental Variables 39 25 T1 GMM Instrumental Variables 39 26 U Time Series Models 27 U1 Autocorrelatinn 27 U2 Autoregressions AR and Autoregressive Distributed Lag ADL Models 27 U3 Information Criteria for Lag Length Selection 27 U4 Augmented Dickey Fuller Tests for Unit Roots 28 U5 Fnreca tino 28 U6 NeweyWest HeteroskedasticandAutocorrelationConsistent Standard Errors 28 U7 Dynamic Multipliers and Cumulative Dynamic Multipliers 29 V System Estimation C 39 29 V1 ThreeStage Least Srplares 29 V2 Seemingly Unrelated Regression 30 V3 Multivariate 39 30 W Other Estimation Methods 30 W1 Nonlinear Least Squares 30 X Data 39r 39 quot Tricks 31 X1 Combining Datasets Adding Rows 31 X2 Combining Datasets Adding Columns 31 X2a IT 39 J Merge 32 X2b Matched OnetoOne Merge 32 X2c Matched ManytoOne Meroe 33 X2d Matched OnetoMany Merge 34 X2e Matched ManytoMany Merge 34 X3 Reshaping Data 34 X4 Converting Between Strings and Numbers 35 X5 Labels 35 X6 Notes 36 X7 More Useful C J 36 Kenneth L Simons 25Apr09 Useful Stata Version 10 Commands Preliminaries for RPI Dot CIO Labs RPI computer labs with Stata include as of Spring 2008 Sage 4510 and the VCC Lobby all Windows PCs Stata might also be available in Sage 3101 and Troy 2015 try in the rooms to nd out To access the Stata program look under My Computer and open the disk drive X which in our classroom is named Sage4510 then doubleclick on the program icon that you see You must start Stata this way 7 it does not work to doubleclick on a saved Stata le because Windows in the labs has not been set up to know where to nd Stata or even which saved les are Stata les To access the course disk space go to hassl lwinrpieduclassesecon 6570 If you are logged into the WIN domain you will go right to it If you are logged in locally on your machine or into anther domain you will be prompted for credentials Use usemame winquotrcsidquot password quotrcspasswordquot substitiuting your RCS usemame for quotrcsidquot and your RCS password for quotrcspasswordquot Once entered correctly the folder should open up To access your personal RCS disk space from DotCIO computers nd the icon on the desktop labeled Connect to RCS doubleclick on it and enter your usemame and password Your personal disk space will be attached as drive H Public RCS materials will be attached as drive P You will want to save Stata do les to drive H For handy use when logging in you may also want to put the web address to attach the course disk space in a le on drive H that way at the start of a session you can attach the RCS disk space and then open the le with your saved command and run it A Loading Data set memory 100m Sets memory available for data to 100 megabytes Clear before using edit Opens the data editor to type in or paste data You must close the data editor before you can run any further commands use quot lenamedtaquot Reads in a Stataformat data le insheet using quot lenametxtquot Reads in text data save quot lenamedtaquot Saves the data B Variable Lists IfStatements and Options Most commands in Stata allow 1 a list of variables 2 an ifstatement and 3 options 1 A list of variables consists of the names of the variables separated with spaces It goes immediately after the command If you leave the list blank Stata assumes where possible that you mean all variables You can use an asterisk as a wildcard see Stata s help for varlist Examples edit varl var2 var3 Opens the data editor just with variables varl var2 and var3 edit Opens the data editor with all variables In later examples varlist means a list of variables and varname or yvar etc means one variable 2 An ifstatement restricts the command to certain observations You can also use an instatement If and instatements come after the list of variables Examples edit varl if var2 gt 3 Opens the data editor just with variable varl only for observations in which var2 is greater than 3 edit if var2 var3 Opens the data editor with all variables only for observations in which var2 equals var3 edit varl in 10 Opens the data editor just with varl just in the 10th observation edit varl in 101200 Opens the data editor just with varl in observations 101200 4 Kenneth L Simons 25Apr09 edit varl if var2 gt 3 in 101200 Opens the data editor just with varl in the subset of observations 101200 that meet the requirement var2 gt 3 3 Options alter what the command does There are many options depending on the command 7 get help on the command to see a list of options Options go after any variable list and ifstatements and must be preceded by a comma Do not use an additional comma for additional options the comma works like a toggle switch so a second comma turns off the use of options Examples use quotfilenamedtaquot clear Reads in a Stataformat data le clearing all data previously in memory Without the clear option Stata refuses to let you load new data if you haven t saved the old data Here the old data are forgotten and will be gone forever unless you saved some version of them save quotfilenamedtaquot replace Saves the data replacing a previouslyexisting file if any You will see more examples of options below C Lowercase and Uppercase Letters Case matters if you use an uppercase letter where a lowercase letter belongs or vice versa an error message will display D Review Window and Abbreviating Command Names The Review window lists commands you typed previously Click in the Review window to put a previous command in the Command window then you can edit it as desired Doubleclick to run a command Also many of the commands below can have their names abbreviated For example instead gt7 4 gt7 4 of typing summarize su will do and instead of regress reg will do E Viewing and Summarizing Data Here remember two points from above 1 leave a varlist blank to mean all variables and 2 you can use ifstatements to restrict the observations used by each command El Just Looking If you want to look at the data but not change them it is bad practice to use Stata s data editor as you could accidentally change the data Instead use the browser via the button at the top or by using the following command Or list the data in the main window as noted below browse varlist Opens the data viewer to look at data without changing them Close the viewer before using other commands list varlist Lists data If there s more than 1 screenful press space for the next screen or q to quit listing E2 Mean Variance Number of Nonmissing Observations Minimum Maximum Etc summarize varlist Give summary information for the variables listed summarize varlz39st detail Give detailed summary information for the variables listed E3 Tabulation lquot Densitv Function Estimates tabulate varname Creates a table listing the number of observations having each different value of the variable varname tabulate var var2 Creates a twoway table listing the number of observations in each row and column Kenneth L Simons 25Apr09 tabulate var var2 exact Creates the same twoway table and carries out a statistical test of the null hypothesis that var and var2 are independent The test is exact in that it does not rely on convergence to a distribution tabulate var var2 chi2 Same as above except the statistical test relies on asymptotic convergence to a normal distribution If you have lots of observations exact tests can take a long time and can run out of available computer memory if so use this test instead histogram varname Plots a histogram of the specified variable histogram varname bin normal The bin option specifies the number of bars The normal option overlays a normal probability distribution with the same mean and variance kdensity varname normal Creates a kernel density plot which is an estimate of the pdf that generated the data The normal option lets you overlay a normal probability distribution with the same mean and variance E4 Scatter Plots and Other Plots scatter yvar xvar Plots data with yvar on the vertical axis and xvar on the horizontal axis scatter yvar yvar2 xvar Plots multiple variables on the vertical axis and xvar on the horizontal axis Stata has lots of other possibilities for graphs with an inchandahalfthick manual For a quick webbased introduction to some of Stata s graphics commands try the Graphics section of this web page httpwww ats 11012 J J To tour Stata s graphics capabilities from inside Stata go Stata s Help menu choose Stata Command type graphiintro and press return Scroll down past the table of contents and read the section labeled A quick tour E5 Correlations and Covariances The following commands compute the correlations and covariances between any list of variables Note that if any of the variables listed have missing values in some rows those rows are ignored in all the calculations correlate var var2 Computes the sample correlations between variables correlate var var2 covariance Computes the sample covariances between variables Sometimes you have missing values in some rows but want to use all available data wherever possible 7 ie for some correlations but not others For example if you have data on health nutrition and income and income data are missing for 90 of your observations then you could compute the correlation of health with nutrition using all of the observations while computing the correlations of health with income and of nutrition with income for just the 10 of observations that have income data These are called pairwise correlations and can be obtained as follows pwcorr var var2 Computes pairwise sample correlations between variables F Generating and Changing Variables A variable in Stata is a whole column of data You can generate a new column of data using a formula and you can replace existing values with new ones Each time you do this the calculation is done separately for every observation in the sample using the same formula each time Fl Generating Variables generate newvar Generate a new variable using the formula you enter in place of Examples follow Kenneth L Simons 25Apr09 gen f m a Remember Stata allows abbreviations gen means generate gen xsquared XAZ gen logincome logincome Use log or 1n for a logbasee or log10 for logbase10 gen q expz 1 7 expz gen a abscosX This uses functions for absolute value abs and cosine cos Many more functions are available 7 get help for functions for a list F2 TrueFalse Variables gen young age lt 18 If age is less than 18 then young is true represented in Stata as 1 If age is 18 or over then young is false represented in Stata as 0 gen old age gt 18 If age is 18 or higher this yields 1 otherwise this yields 0 gen age18 age 18 Use asir1gleequa 39 et a variable equal to something Use a whether the left hand side equals the right han s1 e n this case age18 is created and equals 1 ifthe observation has age 18 and 0 if it does not gen youngWoman age lt 18 amp female Here the ampersand amp means a logical and The variable youngWoman is created and equals 1 if and only if age is less than 18 ar1clalso female equals one otherwise it equals 0 gen youngOrWoman agelt18 1 female Here the vertical bar 1 means a logical or The variable youngOrWoman is created and equals 1 if age is less than 18 or if female equals one otherwise it equals 0 gen ageNot18 age 18 The symbol means not equal to gen notOld old The symbol is pronounced not and switches true to false or false to true The result is the same as the variable young above When creating truefalse values note that missing values in Stata work like infinity So if age is missing and you gen old age gt 18 then old gets set to 1 when really you don t know whether or not someone is old Instead you should gen old age gt 18 if agelt This is discussed more in section K below When using truefalse values note that 0 is false and anything else 7 including missing values 7 counts as true So 0 1 1 0 3 0 and 0 Again use an ifstatement to ensure you generate nonmissing values only where appropriate F3 Random Numbers gen r uniform Random numbers uniformly distributed between 0 and 1 gen n 5 2 invnormuniform Normallydistributed random numbers with mean 5 and standard deviation 2 For other distributions use Stata s menu to get help for functions F4 Replacing Values of Variables replace young age lt 16 Changes the value of the variable young to equal 1 if and only if age is less than 16 replace young 0 if agegt16 amp agelt18 Changes the value of the variable young to 0 but only if age is at least 16 and less than 18 F5 Getting Rid of Variables drop varlist Gets rid of all variables in the list Kenneth L Simons 25Apr09 clear Gets rid of all variables as well as other things not discussed yet like global variables programs etc F6 Ifthenelse Formulas gen realwage condyear1992 wage1889 1403 wage Creates a variable that uses one formula for observations in which the year is 1992 or a different formula if the year is not 1992 Stata s condz39f then else works much like Excel s IFz39f then else In this case suppose you have data from 1992 and 2004 only and that the consumer price index was 1403 in 1992 and 1889 in 2004 then the example given here would compute the real wage by rescaling 1992 wages while leaving 2004 wages the same F7 Quick Calculations display Calculate the formula you type in and display the result Examples follow display 523100127 display normal196 Compute the probability to the left of 196 in the cumulative normal distribution display F109000232 Compute the probability that an Fdistributed number with 10 and 9000 degrees offreedom is less than or equal to 232 Also there is a function Ftailn1n2f 1 7 Fn1n2f Similarly you can use ttailnt for the probability that Tgtt for a tdistributed random variable T with 71 degrees of freedom F8 More For functions available in equations in Stata use Stata s Help menu choose Stata Command and enter functions To generate variables separately for different groups of observations see the commands in sections 0 and P5 For timeseries and panel data see section P especially the notations for lags leads and differences in section P4 If you need to refer to a specific observation number use a reference like x3 meaning the valuable of the variable x in the 3rd observation In Stata in means the current observation when using generate or replace so that for example anl means the value of x in the preceding observation and 7N means the number of observations so that xLN means the value of x in the last observation G Means Hypothesis Tests and Confidence Intervals G1 Confidence Intervals ci varname Confidence interval for the mean of varname using asymptotic normal distribution ci varname level Confidence interval at For example use 99 for a 99 confidence interval G2 Hypothesis Tests ttest varname Test the hypothesis that the mean of a variable is equal to some number which you type instead of the number sign Kenneth L Simons 25Apr09 ttest varname varname2 Test the hypothesis that the mean of one variable equals the mean of another variable ttest varname bygroupvar Test the hypothesis that the mean of a single variable is the same for all groups The groupvar must be a variable with a distinct value for each group For example groupvar might be year to see if the mean of a variable is the same in every year of data H OLS 39 and WLS and GLS regress yvar xvarlist Regress the dependent variable yvar on the independent variables xvarlist For example regress y X or regress y X1 X2 X3 regress yvar xvarlist vcerobust Regress but this time compute robust EickerHuberWhite standard errors We are alwastingthexeettszhystgntisznjn cs because we want consistent ie asymptotically unbiased results but we do not want to have to assume homoskedasticity and normality of the random error terms So if vou are in ECON 4570 E icx 39 alwavs to specifv the vce robust option after e timation J The vce stands for variancecovariance estimates of the estimated model parameters regress yvar xvarlist vcerobust level Regress with robust standard errors and this time change the con dence interval to e g use 99 for a 99 con dence interval H1 Improved Robust Standard Errors in Finite Samples For robust stande errors an apparent improvement is possible Davidson and MacKinnon report two variancecovariance estimation methods that seem at least in their Monte Carlo simulations to converge more quickly as sample size n increases to the correct variance covariance estimates Thus their methods seem to be better although they require more computational time Stata by default makes Davidson and MacKinnon s recommended simple degrees of freedom correction by multiplying the estimated variance matriX by nnK However students in ECON 6570 Advanced Econometrics learn about an alternative in which the squared residuals are rescaled To use this formula specify vcehc2 instead of vcerobust to use the approach discussed in Greene s teXt on p 164 or in Hayashi p 125 formula 255 using d1 An alternative is vcehc3 instead of vcerobust Greene p 164 footnote 15 or Hayashi page 125 formula 255 using d2 H2 Weighted Least Squares Students in ECON 6570 Advanced Econometrics learn about varianceweighted least squares Greene pp 167169 If you know to within a constant multiple the variances of the error terms for all observations this yields more efficient estimates OLS with robust standard errors works properly using asymptotic methods but is not the most efficient estimator Suppose you have stored in a variable sdvar a reasonable estimate of the standard deviation of the error term for each observation Then weighted least squares can be performed as follows vwls yvar xvarlist sdsdvar R Davidson and J MacKinnon Estimation and Inference in Econometrics OXford OXford University Press 1993 section 163 Kenneth L Simons 25Apr09 H3 Feasible Generalized Least Squares Students in ECON 6570 Advanced Econometrics learn about feasible generalized least squares Greene pp 156158 and 169175 The groupwise heteroscedasticity model can be estimated by computing the estimated standard deviation for each group using Greene s equation 836 p 173 do the OLS regression get the residuals and use by groupvars egen estvar meanresidual 2 with appropriate variable names in place of the the italicized words then gen estsd sqrtestvar then use this estimated standard deviation to carry out weighted least squares as shown above Or if your independent variables are just the group variables categorical variables that indicate which observation is in each group you can use the command vwls yvar xvarlz39st The multiplicative heteroscedasticity model is available via a free thirdparty addon command for Stata See section J2a of this document for how to use addon commands If you have your own copy of Stata just use the help menu to search for sg77 and click the appropriate link to install A discussion of these commands was published in the Stata Technical Bulletin volume 42 available online at httpwww stata J quot iuul uals stb42pdf The A then can be estimated like this see the help le and Stata Technical Bulletin for more information reghv yvar xvarlz39st varzvarlz39st robust twostage I PostEstimation Commands Commands described here work after OLS regression They sometimes work after other estimation commands depending on the command Il Fitted Values Residuals and Related Plots predict yhatvar After a regression create a new variable having the name you enter here that contains for each observation its f1tted value y predict rvar residuals After a regression create a new variable having the name you enter here that contains for each observation its residual 12 scatter y yhat X m variables named y and yhat versus X scatter resids X It is wise to plot your residuals versus each of your Xvariables Such residual plots may reveal a systematic relationship that your analysis has ignored It is also wise to plot your residuals versus the fitted values of y again to check for a possible nonlinearity that your analysis has ignored rvfplot Plot the residuals versus the fitted values of y rvpplot Plot the residuals versus a predictor Xvariable For more such commands see the nice regression postestimation section of the Stata manuals 12 Confidence Intervals and Hypothesis Tests For a single coefficient in your statistical model the confidence interval is already reported in the table of regression results along with a 2sided ttest for whether the true coefficient is zero However you may need to carry out Ftests as well as compute confidence intervals and ttests for linear combinations of coefficients in the model Here are eXample commands Note that when a variable name is used in this subsection it really refers to the coefficient the Bk in front of that variable in the model equation lincom logpllogpklogpf Compute the estimated sum of three model coefficients which are the coefficients in front of the variables named logpl logpk and logpf Kenneth L Simons 25Apr09 Along with this estimated sum carry out a ttest with the null hypothesis being that the linear combination equals zero and compute a con dence interval lincom 2logplllogpkllogpf Like the above but now the formula is a different linear combination of regression coefficients lincom 2logplllogpkllogpf level As above but this time change the confidence interval to eg use 99 for a 99 confidence interval test logpllogpklogpf Test the null hypothesis that the sum of the coefficients of variables logpl logpk and logpf totals to 1 This only makes sense after a regression involving variables with these names This is an Ftest test logq2logql logq3logql logq4logql logq5logql Test the null hypothesis that four equations are all true simultaneously the coefficient of logq2 equals the coefficient of logql the coefficient of logq3 equals the coefficient of logql the coefficient of logq4 equals the coefficient of logql and the coefficient of loqu equals the coefficient of logql ie they are all equal to each other This is an Ftest test x3 x4 x5 Test the null hypothesis that the coefficient of x3 equals 0 and the coefficient of x4 equals 0 and the coefficient of x5 equals 0 This is an Ftest I3 Nonlinear Hypothesis Tests Students in ECON 6570 Advanced Econometrics learn about nonlinear hypothesis tests After estimating a model you could do something like the following testnl 7bpopdensity7blandarea 3000 Test a nonlinear hypothesis Note that coefficients must be specified using 7b whereas the linear test command lets you omit the 7b testnl 7b mpg l7bweight 7b trunk l7blength For multiequation tests you can put parentheses around each equation or use multiple equality signs in the same equation see the Stata 10 manual Reference QZ p 469 for examples 14 Computing Estimated Expected Values for the Dependent Variable di 7bxvarname Display the value of an estimated coefficient after a 39 Use the variable name icons for the estimated constant term Of course there s no need just to display these numbers but the good thing is that you can use them in formulae See the next example di ichons 7bage25 7bfemalel After a regression of y on age and female but no other independent variables compute the estimated value of y for a 25yearold female See also the predict command mentioned above Also Stata s adjust command provides a powerful tool to display predicted values when the xvariables taken on various values but for your own understanding do the calculation by hand a few times before you try using adjust Kenneth L Simons 25Apr09 IS Iquot 39 39 Adiusted R2 and Other P timation Results display er27a After a regression the adjusted Rsquared I can be looked up as er27a Or get I as in section J below Stata does not report the adjusted R2 when you do regression with robust stande errors because robust standard errors are used when the variance conditional on your righthandside variables is thought to differ between observations and this would alter the standard interpretation of the adjusted R2 statistic Nonetheless people often report the adjusted R2 in this situation anyway It may still be a useful indicator and often the conditional variance is still reasonably close to constant across observations so that it can be thought of as an approximation to the adjusted R2 statistic that would occur if the conditional variance were constant ereturn list Display all results saved from the most recent model you estimated including the adjusted R2 and other items Items that are matrices are not displayed you can see them with the command matrix list rmatrixname I6 Plotting Any Mathematical Function twoway function yexpx6sinx range0 1257 Plot a function graphically for any function of a single variable x that you specify A command like this may be useful when you want to examine how a polynomial in one regressor which here must be called x affects the dependent variable in a regression without specifying values for other variables I7 In uence Statistics In uence statistics give you a sense of how much your estimates are sensitive to particular observations in the data This may be particularly important if there might be errors in the data After running a regression you can compute how much different the estimated coefficient of any given variable would be if any particular observation were dropped from the data To do so for one variable for all observations use this command predict newvarname dfbetavarname Computes the in uence statistic DFBETA for varname how much the estimated coefficient of varname would change if each observation were excluded from the data The change divided by the standard error of varname for each observation i is stored in the ith observation of the newly created variable newvarname Then you might use summarize newvarname detail to find out the largest values by which the estimates would change relative to the standard error of the estimate If these are large say close to l or more then you might be alarmed that one or more observations may completely change your results so you had better make sure those results are valid or else use a more robust estimation technique such as robust regression which is not related to robust standard errors or quantile regression both available in Stata Kenneth L Simons 25Apr09 If you want to compute in uence statistics for many or all regressors Stata s dfbeta command lets you do so in one step 18 Functional Form Test It is sometimes important to ensure that you have the right functional form for variables in your regression equation Sometimes you don t want to be perfect you just want to summarize roughly how some independent variables affect the dependent variable But sometimes e g if you want to control fully for the effects of an independent variable it can be important to get the functional form right eg by adding polynomials and interactions to the model To check whether the functional form is reasonable and consider alternative forms it helps to plot the residuals versus the fitted values and versus the predictors as shown in section 11 above Another approach is to formally test the null hypothesis that the patterns in the residuals cannot be explained by powers of the fitted values One such formal test is the Ramsey RESET test estat ovtest Ramsey s 1969 regression equation specification error test 19 Heteroskedasticity Tests Students in ECON 6570 Advanced Econometrics learn about heteroskedasticity tests After running a regression you can carry out White s test for 39 39 J in itv using the estat imtest white Heteroskedasticity tests including White test You can also carry out the test by doing the auxiliary regression described in the textbook indeed this is a better way to understand how the test works Note however that there are many other heteroskedasticity tests that may be more appropriate Stata s imtest command also carries out other tests and the commands hettest and szroeter carry out different tests for heteroskedasticity The BreuschPagan Lagrange multiplier test which assumes normally distributed errors can be carried out after running a regression by using the command 1 estat hettest normal Heteroskedasticity test BreuschPagan Lagrange mulitplier Other tests that do not require normally distributed errors include estat hettest iid Heteroskedasticity test 7 Koenker s l981 s score test assumes iid errors estat hettest fstat Heteroskedasticity test 7 Wooldridge s 2006 Ftest assumes iid errors estat szroeter rhs mtestbonf Heteroskedasticity test 7 Szroeter 1978 rank test for null hypothesis that variance of error term is unrelated to each variable estat imtest Heteroskedasticity test 7 Cameron and Trivedi 1990 also includes tests for higherorder moments of residuals skewness and kurtosis For further information see the Stata manuals See also the ivhettest command described in section T1 of this document This makes available the PaganHall test which has advantages over the results from estat imtest 110 Serial Correlation Tests Students in ECON 6570 Advanced Econometrics learn about tests for serial correlation To carry out these tests in Stata you must first tsset your data as described in section P of this document see also section U For a BreuschGodfrey test where say p 3 do your regression and then use Stata s estat bgodfrey command estat bgodfrey lagsl 2 3 Heteroskedasticity tests including White test Other tests for serial correlation are available For example the DurbinWatson dstatistic is available using Stata s estat dwatson command However as Hayashi p 45 points out the 13 Kenneth L Simons 25Apr09 DurbinWatson statistic assumes there is no endogeneity even under the alternative hypothesis an assumption which is typically violated if there is serial correlation so you really should use the BreuschGodfrey test instead or use Durbin s alternative test estat durbinalt For the Box Pierce Q in Hayashi s 2104 or the modi ed BoxPierce Q in Hayashi s 21020 you would need to compute them using matrices The LjungBox test is available in Stata by using the command wntestq varname lags LjungBox portmanteau Q test for white noise Ill Variance In ation Factors Students in ECON 6570 Advanced Econometrics learn about variance in ation factors VIFs which show the multiple by which the estimated variance of each coefficient estimate is larger because of nonorthogonality with other variables in the model To compute the VIFs use estat vif After a regression display variance in ation factors J Tables of Regression Results This section will make your work much easier You can store results of regressions and use previously stored results to display a table This makes it much easier to create tables of regression results in Word By copying and pasting most of the work of creating the table is done trivially without the chance of typing wrong numbers Stata has builtin commands for making tables and you should try them first to see how they work as described in section J 1 In practice it will be easiest to use additional commands which have to be installed discussed in section J2 To put results into Excel or Word select the table you want to copy or part of it but do not select anything additional Then choose Copy Table from the Edit menu Stata will copy information with tabs in the right places to paste easily into a spreadsheet or word processing program For this to work the part of the table you select must be in a consistent format ie it must have the same columns everywhere and you must not select any extra blank lines After pasting such tabdelimited text into Word use Word s Convert Text to Table command to turn it into a table In Word 2007 from the Insert tab in the Tables group click Table and select Convert Text to Table see httpwwwuweceduhelpWord07tbtxttotablehtm You can then adjust the font borderlines etc appropriately J 1 Tables of 39 Results Using Stata s BuiltIn C J Here is an example of how to store results of regressions and then use previously stored results to display a table regress y xl robust estimates store modell regress y x1 x2 x3 x4 x5 x6 x7 robust estimates store model2 regress y x1 x2 x3 x4 x6 x8 x9 robust estimates store model3 estimates table modell model2 model3 The last line above creates a table of the coefficient estimates from three regressions You can improve on the table in various ways Here are some suggestions estimates table modell model2 model3 se Includes standard errors estimates table modell model2 model3 star Adds asterisks for significance levels Unfortunately estimates table does not allow the star and se Kenneth L Simons 25Apr09 options to be combined however see section J2 for an alternative that lets you combine the two estimates table modell model2 model3 star statsN r2 r27a rmse Also adds information on number of observations used R2 132 and root mean squared error The latter is the estimated standard deviation of the error term estimates table modell model2 model3 b72f se72f stfmt7 4g statsN r2 r27a rmse Similar to the above examples but formats numbers to be closer to the appropriate format for papers or publications The coefficients and standard errors in this case are displayed using the 7 2f format and the statistics below the table are displayed using the 7 4g format The 7 2f tells Stata to use a fixed width of at least 7 characters to display the number with 2 digits after the decimal point The 74g tells Stata to use a general format where it tries to choose the best way to display a number trying to fit everything within at most 7 characters with at most 4 characters after the decimal point Stata has many options for how to specify number formats for more information get help on the Stata command forma You can store estimates after any statistical command not just regress The estimates commands have lots more options get help on estimates table or estimates for information Also for items you can include in the stats option type ereturn list after running a statistical command 7 you can use any of the scalar results but not macros matrices or functions J2 Tables of 39 Results Using AddOn 39 In practice you will find it much easier to go a step further A free set of thirdparty addon commands gives much needed exibility and convenience when storing results and creating tables What is an addon command Stata allows people to write commands called ado files which can easily be distributed to other users If you ever need to find available addon commands use Stata s help menu and Search choosing to search resources on the intemet and also try using Stata s ssc command J2a Installing or Accessing the AddOn J On your own computer the addon commands used here can be permanently installed as follows ssc install estout replace Installs the estout suite of commands In RF 3 Dot C10 labs use a different method because in the installation folder for addon files you don t have file write permission I have put the addon commands in the course disk space in a folder named stata extensions You merely need to tell Stata where to look you could copy the relevant files anywhere and just tell Stata where Type the command listed below in Stata You only need to run this command once after you start or restart Stata adopath folderToLookIn Here replace folderToLookIn with the name of the folder It might be something like Hstata extensionsquot If in doubt under Stata s File menu you can choose File Name to look up the relevant folder name Kenneth L Simons 25Apr09 J2b Storing Results and Making Tables Once this is done you can store results more simply store additional results not saved by Stata s builtin commands and create tables that report information not allowed using Stata s builtin commands Below are some examples eststo reg y X1 X2 robust Regress y on X1 and X2 with robust standard errors and store the results Estimation results will be stored with names like estl est2 etc eststo quietly reg y X1 X2 X3 robust Similar to above but quietly tells Stata not to display any output esttab est1 est2 se star 010 005 001 0001 r2 ar2 scalarsF nogaps Make a near publicationquality table You will still want to make the variable names more meaningful change the column headings and set up the borders appropriately J2c IT J J39 the Table ommand s Options The esttab command above had a lot in it so it may help to look at simpler versions of the command to understand how it works esttab Displays a table with all stored estimation results with tstatistics not standard errors Numbers of observations used in estimation are at the bottom of each column esttab se Displays a table with standard errors instead of tstatistics esttab se ar2 Display a table with standard errors and adjusted Rsquared values esttab se ar2 scalarsF Like the previous table but also display the Fstatistic of each model versus the null hypothesis that all coefficients eXcept the constant term are zero esttab ba3 sea3 ar22 Like esttab se ar2 but this controls the display format for numbers The a3 ensures at least 3 significant digits for each estimated regression coefficient and for each standard error The 2 gives 2 decimal places for the adjusted Rsquared values You can also specify standard Stata number formats in the parentheses eg 90g or 82f could go in the parentheses use Stata s Help menu choose Command and get help on format esttab star 010 005 001 0001 Set the pvalues at which different asterisks are used esttab nogaps Gets rid of blank spaces between rows This aids copying of tables to paste into eg Word J2d Wide Tables If you try to display estimates from many models at once they may not all fit on the screen The solution is to drag the Results window to the right to allow longer lines then use the set linesize command as in the eXample below to actually use longer lines set linesize 140 Tell Stata to allow 140 characters in each line of Results window output Then you can make very wide tables with lots of columns In Microsoft Word wide tables may best fit on landscape pages create a Section Break beginning on a new page then format the new section of the document to turn the page sideways in landscape mode You can create a new section break beginning on a new page to go back to vertical pages on later pages Kenneth L Simons 25Apr09 J2e Storing Additional Results After estimating a statistical model you can add additional results to the stored information For example you might want to do an Ftest on a group of variables or analyze a linear combination of coefficient estimates Here is an example of how to compute a linear combination and add information from it to the stored results You can display the added information at the bottom of tables of results by using the scalars option eststo reg y X1 X2 robust Regress lincom Xl X2 Get estimated difference between the coefficients of X1 and X2 estadd scalar Xdiff restimate Store the estimated difference along with the regression result Here it is stored as a scalar named Xdiff estadd scalar Xdi SE rse Store the standard error for the estimated difference too Here it is stored as a scalar named XdiffSE esttab scalarsXdiff XdiffSE Include Xdiff and Xdi SE in a table of regression results J2f Saving Tables as Files It can be helpful to save tables in files which you can open later in Word EXcel and other programs Although they are not used here you can use all the options discussed above esttab estl est2 using resultstXt tab Save the table with columns for the stored estimates named estl and est2 into a tabdelimited teXt file named resultstXt esttab estl est2 using results rtf Saves a richteXt format file good for opening in Word esttab estl est2 using results csv Save a commaseparated values teXt file named resultscsv with the table This is good for opening in EXcel However numbers will appear in EXcel as teXt If you want to be able to use the numbers in calculations use the neXt command esttab estl est2 using results csv plain Saves a file good for use in EXcel The plain option lets you use the numbers in calculations J2g Clearing Stored Results Results stored using eststo stay around until you quit Stata To remove previously stored results do the following eststo clear Clear out all previously stored results to avoid confusion or to free some RAM memory J2h More Options and Related Commands For more eXamples of how to use this suite of commands use Stata s online help after installing the commands or better yet use this website httn39 fmwww bc J 39 J On the website look under EXamples at the left K Data Tvpes When 33 t 33 and Missing Values This section is somewhat technical and may be skipped on a first reading Computers can store numbers in more or less compact form with more or fewer digits If you need eXtra precision you can use double precision variables instead of the default float variables which are singleprecision oating point numbers If you need compact storage of integers to save memory or to store precise values of big integers Stata provides other data types called byte in and long Also a string data type str is available gen type varname Generate a variable of the specified datatype using the specified formula EXamples follow 17 Kenneth L Simons 25Apr09 gen double bankHoldings 123456789 Doubleprecision numbers have 16 digits of accuracy instead of about 7 digits for regular oat numbers gen byte young ageltl6 Here since the result is a 0 or 1 using the byte number format accurately records the number in a small amount of memory gen str name firstname quot quot lastname Generates a variable involving strings The following commands help deal with data types describe varlist Lists technical information about variables including data types compress varlist Changes data to most compact form possible without losing information If you compare a oatingpoint number accurate to about 7 digits to a doubleprecision number accurate to 16 digits don t expect them to be equal The actual calculations Stata carries out are in doubleprecision even though variables are ordinarily oat singleprecision to save space Suppose you generate a oattype variable named rating equal to 33 in the first observation Stata stores the number as 33 accurate to about 7 digits Then typing list if rating33 will fail to list the first observation Why Stata looks up the value of rating which in the first observation is 33 accurate to about 7 digits and compares it to the number 33 which is immediately put into doubleprecision for the calculation and hence is accurate to 16 digits and hence is different from the rating Hence the first observation will not be listed Instead you could do this list if rating oat33 The oat 33 converts to a number accurate to only about 7 digits the same as the rating variable Missing values in Stata are written as a period They occur if you enter missing values to begin with or if they arise in a calculation that has for example 00 or a missing number plus another number For comparison purposes missing values are treated like infinity and when you re not used to this you can get some weird results For example replace 2 0 if ygt3 causes 2 to be replaced with 0 not only ify has a known value greater than 3 but also if the value of y is missing Instead use something like this replace 2 0 if ygt3 amp ylt The same caution applies when generating variables anytime you use an ifstatement etc L Results Returned after Commands Commands often return results that can be used by programs you might write To see a list of the results from the most recent command that returned results type return list Shows returned results from a general command like summarize eretum list Shows returned results from an estimation command like regress M DoFiles and Programs You should become well used to the dofile editor which is the sensible way to keep track of your commands Using the dofile editor you can save previously used lists of commands and reopen them whenever needed If you are analyzing data for class work for a thesis or for other reasons keeping your work in dofiles both provides a record of what you did and lets you make corrections easily This document mainly assumes you are used to the dofile editor but below are two notes on using and writing dofiles plus an example of how to write a program At the top of the dofile editor are icons for various purposes From the left they are new dofile open dofile save print find in this dofile cut copy paste undo redo preview in viewer run and do The preview in viewer icon you won t need it s useful when writing documents such as help files for Stata s viewer The run and do icons though are really important The do icon the last icon is the one you will usually use Click on it to do all of the commands in the dofile editor the commands will be sent to Stata in the order listed However if you have selected some text in the do file editor then only the lines of text you selected will be done instead of all of the text If you select 18 Kenneth L Simons 25Apr09 part of a line the whole line will still be done The run icon has the same effect except that no output is printed in Stata s results window Since you will want to see what is happening you should use the do icon not the run icon You will want to include comments in the dofile editor so you remember what your dofiles were for There are three ways to include comments 1 put an asterisk at the beginning of a line it is okay to have white space ie spaces and tabs before the asterisk to make the line a comment 2 put a double slash anywhere in a line to make the rest of the line a comment 3 put a at the beginning of a comment and end it with to make anything in between a comment even if it spans multiple lines For example your dofile might look like this My analysis of employee earnings data Since the data are used in several weeks of the course the dofile saves work for later use clear This gets rid of any preexisting data set memory 100m Allocate 100 mb for data use quotLmyfoldermyfiledtaquot I commented out the following three lines since I m not using them now regress income age robust predict incomeHat scatter incomeHat income age Now do my polynomial age analyses gen age2 age02 gen age3 age03 regress income age age2 age3 bachelor robust You can write programs in the dofile editor and sometimes these are useful for repetitive tasks Here is a program to create some random data and compute the mean capture program drop randomMean Drops the program if it exists already program define randomMean rclass Begins the program which is rclass drop fall Drops all variables quietly set obs 30 Use 30 observations and don t say so gen r uniform Generate random numbers summarize r Compute mean return scalar average rmean Return it in raverage end Note above that rclass means the program can return a result After doing this code in the dofile you can use the program in Stata Be careful as it will drop all of your data It will then generate 30 uniformlydistributed random numbers summarize them and return the average By the way you can make the program work faster by using the meanonly option after the summarize command above although then the program will not display any output N MonteCarlo Simulations It would be nice to know how well our statistical methods work in practice Often the only way to know is to simulate what happens when we get some random data and apply our statistical methods We do this many times and see how close our estimator is to being unbiased normally distributed etc Our OLS estimators will do better with larger sample sizes when the xvariables are independent and have large variance and when the random error terms are closer to normally distributed Here is a Stata command to call the above at the end of section N program 100000 times and record the result from each time simulate quotrandomMeanquot avgraverage reps100000 l9 Kenneth L Simons 25Apr09 The result will be a dataset containing one variable named avg with 100000 observations Then you can check the mean and distribution of the randomly generated sample averages to see whether they seem to be nearly unbiased and nearly normally distributed summarize avg kdensity avg normal Unbiased means right on average Since the sample mean of say 30 independent draws of a random variable has been proven to give an unbiased estimate of the variable s true population mean you had better nd that the average across all 100000 experiments result computed here is very close to the true population mean And the central limit theorem tells you that as a sample size gets larger in this case reaching the notsoenormous size of 30 observations the means you compute should have a probability distribution that is getting close to normally distributed By plotting the results from the 100000 experiments you can see how close to normallydistributed the sample mean is Of course we would get slightly different results if we did another set of 100000 random trials and it is best to use as many trials as possible 7 to get exactly the right answer we would need to do an in nite number of such experiments Try similar simulations to check results of OLS regressions You will need to change the program in section M and alter the simulate command above to use the regression coef cient estimates instead of the mean you might say b0rb0 b1rb1 b2rb2 in place of avgraverage if your program returns results named b0 b1 and b2 0 Doing Things Once for Each Group Stata s by command lets you do something once for each of a number of groups Data must be sorted rst by the groups For example sort year Sort the data by year by year regress income age robust Regress separately for each year of data sort year state Sort the data by year and within that by state by year state regress income age robust Regress separately for each state and year combination Sometimes when there are a lot of groups you don t want Stata to display the output The quietly command has Stata take action without showing the output quietly by year generate xInFirstObservationOfY ear x1 The xl means look at the rst observation of x within each particular bygroup qby year generate xInFirstObservationOfY ear x1 qby is shorthand for quietly by qbys year generate xInFirstObservationOfY ear x1 qbys sorts and then does quietly by See also section P5 for more ways to generate results e g means or standard deviations separately for each bygroup P Generating Variables for TimeSeries and Panel Data With panel and time series data you may need to a generate values with reference to past or future times and b generate values separately for each individual in the sample Here are some commands to help you Your time variable should be an integer and should not always have gaps between numbers eg years might be 1970 1971 2006 7 ifthey are every other year 1970 1972 1974 then you should create a new variable like time year19702 Stata has lots of options and commands to help with setting up quarterly data etc P1 Creating Time Variables that Start from a First Time and Increase bv 1 at Each Observation If you have not yet created a time variable and your data are in order and do not have gaps you might create a year quarter or day variable as follows 20 Kenneth L Simons 25Apr09 generate year 1900 in 1 Create a new variable that speci es the year beginning with 1900 in the rst observation and increasing by 1 thereafter Be sure your data are sorted in the right order rst generate quarter q1970q1 in 1 Create a new variable that speci es the time beginning with 1970 quarter 1 in the rst observation and increasing by 1 quarter in each observation Be sure your data are sorted in the right order rst The result is an integer number increasing by 1 for each quarter 1960 quarter 2 is speci ed as 1 1960 quarter 3 is speci ed as 2 etc format quarter tq Tell Stata to display values of quarter as quarters generate day d01jan1960 in 1 Create a new variable that speci es the time beginning with 1 Jan 1960 in the rst observation and increasing by 1 day in each observation Be sure your data are sorted in the right order rst The result is an integer number increasing by 1 for each day 01jan1960 is speci ed as 0 02jan1960 is speci ed as 2 etc format day td Tell Stata to display values of day as dates Like the d and q functions used above you may also use w for week m for month h for halfyear or y for year Inside the parentheses you type a year followed except for y by a separator a comma colon dash or word followed by a second number The second number speci es the day week month quarter or halfyear get help on tfcn for more information P2 Creating Time Variables Using a String Date If you have a string variable that describes the date for each observation and you want to convert it to a numeric date you can probably use Stata s very exible date conversion functions You will also want to format the new variable appropriately Here are some examples gen t dailydstr quotmdyquot Generate a variable t starting from a variable dstr that contains dates like Dec12003 1212003 1212003 January 1 2003 jan12003 etc Note the quotmdyquot which tells Stata the ordering of the month day and year in the variable If the order were year month day you would use quotymdquot format t td This tells Stata the variable is a date number that speci es a day Like the daily function used above The similar functions monthlystrvar quotymquot or monthlystrvar quotmyquot and quarterlystrvar quotyqquot or quarterlystrvar quotqyquot allow monthly or quarterly date formats Use tm or tq respectively with the format command These date functions require a way to separate the parts Dates like 20050421 are not allowed If d1 is a string variable with such dates you could create dates with separators in a new variable d2 suitable for daily like this gen str10 d2 substrd 1 4 quotquot substrd 5 2 quotquot substrd 7 2 This uses the substr function which returns a substring 7 the part of a string beginning at the rst number s character for a length given by the second number P3 Telling Stata You Have Time Series or Panel Data You must declare your data as time series or panel data in order to use timerelated commands tsset timevar Tell Stata you have time series data with the time listed in variable timevar 21 Kenneth L Simons 25Apr09 tsset idvar timevar Tell Stata you have panel data with the idvar being a unique ID for each individual in the sample and timevar being the measure of time P4 Lags Forward Leads and Differences After using the tsset command see above it is easy to refer to past and future data The value of var one unit of time ago is Lvar the value two units of time ago is L2var etc the Ls stand for lag Future values although you are unlikely to need them are Fvar F2var etc Below are some examples using them Data must be sorted first in order by time for timeseries data or in order by individual and within that by time for panel data sort timevar Sort timeseries data sort idvar timevar Sort panel data gen changeInX x Lx The variable changeInX created here equals x minus its value one year ago gen changeInX Dx The same changeInX can be created via Stata s difference operator Dvar gen income2YearsAgo L2income You can use these L and F notations in the list of variables for regression too regress gdp L gdp L2 gdp Lunemployment L2unemployment robust P5 quot Means and Other Statistics bv quot 39 39 39 Year or Group The egen extensions to generate command can generate means sums counts standard deviations medians and much more for each individual year or group qbys state year egen meanIncome meanincome Mean of income in each state and year qbys state year egen meanIncome sumchildren Total number of children of people in the sample separately in each state and year qbys state year egen nPeople countpersonID Number of nonmissing values of personID separately in each state and year qbys state year egen meanIncome sdincome Standard deviation in each state and year qbys year egen medianIncomeByYear meanincome Median of income in each year For many more uses of Stata s egen command get help on egen Q Panel Data Statistical Methods 321 Fixed Effects 7 Simplest Method Stata s areg command provides a simple way to include fixed effects in OLS regressions More extensive commands are mentioned below but the following will do for student coursework in ECON 4570 Econometrics areg yvar xvarlz39st absorbbyvar vcerobust Regress the dependent variable yvar on the independent variables xvarlz39st and on the dummy variables needed to distinguish each separate bygroup indicated by the byvar variable absorb option For example the byvar might be the state to include fixed effects for states Coefficient estimates will not be reported for these fixed effect dummy variables See also the newey command in section U6 to account for serial correlation in error terms 22 Kenneth L Simons 25Apr09 322 Other Panel Data Estimators Students in ECON 6570 Advanced Econometrics will need to use other panel data estimators You will need to have declared your panel data first as in section P3 Then xtreg yvar xvarlist fe Fixed effects regression The fe requests fixed effects estimates This uses conventional nonrobust standard errors xtreg yvar xvarlist fe vcecluster clustervar Fixed effects regression again but now with cluster robust standard errors clustered by the specified variable Typically the clustervar is the same as the panelvar used when tsseting your data see section P3 in order to allow for arbitrary serial correlation of the error terms within each observation estimates store fixed Store estimates after running fixed effects model xtreg yvar xvarlist re vcerobust Random effects regression The re requests random effects estimates Here the robust option for variancecovariance estimation requests EickerHuberWhite robust standard errors but again you could and likely should request clusterrobust standard errors instead estimates store random Store estimates after running fixed effects model hausman fixed random Hausman test for whether random effects model is appropriate instead of fixed effects model If the test is rejected this suggests that the coefficient estimates are inconsistent when fixed effects are not used xtreg yvar xvarlist mle vcerobust Random effects again but now using the maximum likelihood randomeffects model A betweeneffects model be is also available to estimate differences between the averages overtime for each 39 quot 39 39 39 and a r r 39 quot an rag d pa model is also available Stata has many other estimation commands for panel data including dynamic panel data models such as ArellanoBond estimation Christopher Baum s bookAn Introduction to Modern Econometrics Using Stata shows some of these commands There are also panel equivalents of many other models for example fixed and random effects versions of the logit model 03 TimeSeries Plots for Multiple Individuals When making plots in Stata the byvarlist option lets you make a separate plot for each individual in the sample For example you could do sort companyid year scatter employment year bycompanyid connectl This would make plots of each company s employment in each year with a separate plot for each company arranged in a grid However you might prefer to overlay these plots in a single graph You could do this as follows tsset xtline employment overlay The xtline command with the overlay option puts all companies plots in a single graph instead of having a separate plot for each company See also section E4 which talks brie y about graphing R Probit and Logit Models probit yvar xvarlist robust Probit regression logit yvar xvarlist robust Logit regression 23 predict probOfOutcome pr me Kenneth L Simons 25Apr09 Compute the predicted probability that the dependent variable is l separately for each observation Display marginal effects ie dyde for each independent variable j when the variables are at their means If you have any dummy variables as independent variables me instead computes for each dummy variable AyAXj for the change from Xj0 to Xjl again while other variables are at their means S Other Models for Limited Dependent Variables In Stata s help you can easily nd commands for models such as Tobit and other censored regression models truncated regression models count data models such as Poisson and negative binomial ordered response models such as ordered probit and ordered logit multinomial response models such as mulinomial probit and multinomial logit survival analysis models and many other statistical models Listed below are commands for a few of the most commonly used models S l Censored and Truncated Regressions with Normally Distributed Errors If the error terms are normally distributed then the censored regression model Tobit model and truncated regression model can be estimated as follows Actually you can tell Stata that tobit yvar xvarlist vcerobust ll Estimate a censored regression Tobit model in which there is a lower limit to the values of the variables and it is specified by You can instead or in addition specify an upper limit using ul If the censoring limits are different for different observations then use the cnreg command instead or more generally if you also have data that are known only to fall in certain ranges then use the intreg command instead truncreg yvar xvarlist vcerobust ll Estimate a truncated regression model in which there is a lower limit to the values of the variables and it is specified by You can instead or in addition specify an upper limit using ul Be careful that you really do think the error terms are close to normally distributed as the results can be sensitive to the assumed distribution of the errors There are also common models for truncated or censored data fitting particular distributions such as zerotruncated count data for which no data are observed when the count is zero or rightcensored survival times you can find many such models in Stata SZ Count Data Models The Poisson and negative binomial models are two of the most common count data models poisson yvar xvarlz39st vcerobust Estimate a model in which a count dependent variable yvar results from a Poisson arrival process in which during a period of time the Poisson rate of arrivals that each add 1 to the count in the yvariable is proportional to eXpxi39B where x includes the independent variables in xvarlist nbreg yvar xvarlz39st vcerobust Estimate a negative binomial count data model This allows the variance of y to exceed the mean whereas the Poisson model assumes the two are equal As always see the Stata documentation and online help for lots more count data models and options to commands and look for a book on the subject if you need to work with count data seriously 24 Kenneth L Simons 25Apr09 S3 Survival Models aka Hazard Models Duration Models Failure Time Models To t survival models or make plots or tables of survival or of the hazard of failure you must rst tell Stata about your data There are a lot of options and variants to this so look for a book on the subject if you really need to do this A simple case is stset survivalTime failuredummyEqualToOneb FailedElseZero Tell Stata that you have survival data with each individual having one observation The variable survivalTime tells the elapsed time at which each individual either failed or ceased to be studied It is the norm in survival data that some individuals are still surviving at the end of the study and hence that the survival times are censored from above ie right censored The variable dummyEqualToOneb FailedElseZero provides the relevant information on whether each option failed during the study 1 or was rightcensored 0 sts graph survival yscalelog Plot a graph showing the fraction of individuals surviving as a function of elapsed time The optional use of yscalelog causes the vertical axis to be logarithmic in which cases a line of constant negative slope on the graph corresponds to a hazard rate that remains constant over time Another option is bygroupvar in which case separate survival curves are drawn for the different groups each of which has a different value of groupvar A hazard curve can be tted by specifying hazard instead of survival streg xvarlist distributionexponential nohr vcerobust After using stset estimate an exponential hazard model in which the hazard Poisson arrival rate of the rst failure is proportional to expxi395 where x includes the independent variables in xvarlist Other common models make the hazard dependent on the elapsed time such models can be speci ed instead by setting the distribution option to weibull gamma gompertz lognormal loglogistic or one of several other choices and a stratagroupvar option can be used to assume that the function of elapsed time differs between different groups stcox xvarlz39st nohr vcerobust After using stset estimate a Cox hazard model in which the hazard Poisson arrival rate of the rst failure is proportional to felapsed time x expxi39 where x includes the independent variables in xvarlist The function of elapsed time is implicitly estimated in a way that best ts the data and a stratagroupvar option can be used to assume that the function of elapsed time differs between different groups As always see the Stata documentation and online help for lots more about survival analysis T Instrumental Variables Regression Note for Econometrics students using Stock and Watson s textbook the term instruments in Stata output and in the econometrics profession generally means both excluded instruments and exogenous regressors Thus when Stata lists the instruments in ZSLS regression output it will include both the Z s and the W s as listed in Stock and Watson s textbook Here is how to estimates two stage least squares ZSLS regression models Read the notes carefully for the rst command below 25 Kenneth L Simons 25Apr09 ivregress 2sls yvar exogXVarlist endogXVarlist otherInstruments vcerobust Twostage least squares regression of the dependent variable yvar on the independent variables exogXTarlist and endogXTarlist The variables in endogXVarlist are assumed to be endogenous The exogenous RHS variables are exogXTarlist and the other exogenous instruments not included in the RHS of the regression equation are the variables listed in otherInstruments For Econometrics students using Stock and Watson s textbook exogXTarlist consists of the W s in the regression equation endogXVarlist consists of the X s in the regression equation and otherInstruments consists of the Z s For Advanced Econometrics students using Hayashi s textbook exogXTarlist consists of the exogenous variables in z ie variables in z that are also in xi endogXVarlist consists of the endogenous variables in z ie variables in z that are not in xi and otherInstruments consists of the excluded instruments ie variables in x but not in zi ivregress 2sls yvar exogXVarlist endogXVarlist otherInstruments vcerobust first Same but also report the firststage regression results ivregress 2sls yvar exogXVarlist endogXVarlist otherInstruments vcerobust first level99 Same but use 99 confidence intervals predict yhatvar After an ivreg create a new variable having the name you enter here that contains for each observation its value of y predict rvar residuals After an ivreg create a new variable having the name you enter here that contains for each observation its residual 12 You can use this for residual plots as in OLS regression Tl GMM Instrumental Variables Regression Students in ECON 6570 Advanced Econometrics learn about GMM instrumental variables regression For singleequation GMM instrumental variables regression add the gmm option at the end of the above regression commands ivregress gmm yvar exogXTarlist endogXTarlist otherInstruments vcerobust first GMM instrumental variables regression showing firststage results For singleequation LIML instrumental variables regression Hayashi s section 86 add the liml option at the end of the above regression commands ivregress liml yvar exogXVarlist endogXVarlist otherInstruments vcerobust first LIML instrumental variables regression showing firststage results For more options to these commands use the thirdparty ivreg2 command described in section 87 of the recommended supplementary text Baum s An Introduction to Modern Econometrics using Stata use ssc install ivreg2 replace or adopath as in section J2a of this document Multiequation GMM instrumental variables regression is not supported in Stata After estimating a regression with instrumental variables a Jtest of overidentifying restrictions can be carried out as follows for an example see section 86 of Baum s text This requires installing the thirdparty overid command use ssc install ivreg2 replace or adopath as in section J2a of this document overid Carry out an overidentifying restrictions test after ivregress or ivreg2 Also a Jtest is automatically carried out when using ivreg2 26 Kenneth L Simons 25Apr09 To test a subset of the overidentifying restrictions via a Ctest Hayashi p 220 use the ivreg2 command with the list of variables to be tested in the orthog option ivreg2 yvar exogXTarlz39st endogXTarlist otherlnstruments vcerobust gmm orthogvars After this GMM instrumental variables regression an orthogonality Ctest is carried out only for the variables vars if vars involves multiple variables then separate their names with spaces For a heteroskedasticity test after ivregress or ivreg2 or also after regress use the thirdparty ivhettest command use ssc install ivreg2 replace or adopath as in section J2a of this document The PaganHall statistic reported is most robust to assumptions see section 89 of Baum s teXt ivhettest Carry out a heteroskedasticity test Get help on ivhettest for options if you want to restrict the set of variables used in the auxiliary regression VA in Hayashi s section 27 U Time Series Models First tsset your data as in section P above and note how to use the lag and lead operators as described in section P Ul Autocorrelations corrgram varname Create a table showing autocorrelations among other statistics for lagged values of the variable varname corrgram varname lags noplot You can specify the number of lags and suppress the plot correlate X LX L2X L3X L4X L5X L6X L7X L8X Another way to compute autocorrelations for X with its first eight lags correlate L08X This more compact notation also uses the 0th through 8th lags of X and computes the correlation correlate L08X covariance This gives autocovariances instead of autocorrelations U2 39 AR and 39 Distributed Lag ADL Models regress y Ly robust Regress y on its lperiod lag with robust stande errors regress y Ll4y robust Regress y on its first 4 lags with robust standard errors regress y Ll4y Ll3X Lw robust Regress y on its first 4 lags plus the first 3 lags of X and the first lag of w with robust standard errors regress y Ly LXl LX2 robust Regress y on the lperiod lags of y X1 and X2 with robust standard errors test L2X L3X L4X Hypothesis tests work as usual regress y Ly iftinl962ql l999q4 robust The if tin used here restricts the sample to times in the speci ed range of dates in this case from 1962 first quarter through 1999 fourth quarter U3 Information Criteria for Lag Length Selection To get BIC and AIC values after doing a regression use the estat ic command estat ic Display the information criteria AIC and BIC after a regression To include BIC and AIC values in tables of regression results you could use the estimates table command regress y Ly robust estimates store yl 27 Kenneth L Simons 25Apr09 regress y Ll2y robust estimates store y2 estimates table yl y2 statsbic aic After storing regression results you can make a table of regression results reporting the BIC and AIC To speed up the process of comparing alternative numbers of lags you could use a forvalues loop in your do le editor For example forvalues lags 16 regress y Ll lags y robust estimates store y lags estimates table yl y2 y3 y4 y5 y6 statsbic aic U4 Augmented Dickey Fuller Tests for Unit Roots dfuller y Carry out a DickeyFuller test for nonstationarity checking the null hypothesis in a onesided test that y has a unit root dfuller y regress Show the associated regression when doing the DickeyFuller test dfuller y lag2 regress Carry out an augmented DickeyFuller test for nonstationarity using two lags of y checking the null hypothesis that y has a unit root and show the associated regression dfuller y lag2 trend regress As above but now include a time trend term in the associated regression U5 Forecasting regress y Ly LX After a regression tsappend addl Add an observation for one more time after the end of the sample Use add to add observations Use browse after this to check what happened predict yhat Xb Then compute the predicted or forecasted value for each observation predict rmsfe stdp And compute the standard error of the outofsample prediction or forecast If you want to compute multiple pseudooutofsample forecasts you could do something like this gen yhat gen rmsfe forvalues p 3050 regress y Ly iftlt p robust predict yhatTemp Xb predic rmsfeTemp stdp replace yhat yhatTemp if t p l replace rmsfe rmsfeTemp if t p l drop yhatTemp rmsfeTemp scatter yhatTemp t U6 NewevWest l 39 J in andAutocorrelation nnciqtem Standard Errors newey y X1 X2 lag Regress y on X1 and X2 using heteroskedasticandautocorrelation consistent NeweyWest standard errors assuming that the error 28 Kenneth L Simons 25Apr09 term times each righthandside variable is autocorrelated for up to periods of time If is 0 this is the same as regression with robust standard errors A rule ofthumb is to choose 075 TAl3 rounded to an integer where T is the number of observations used in the regression see the text by Stock and Watson page 607 If there is strong serial correlation might be made more than this rule of thumb suggests while if there is little serial correlation might be made less than this rule of thumb suggests U7 Dvnamic Mnltinlier and Cumulative Dvnamic quot39 quot x If you estimate the effect of multiple lags of X on Y then the estimated effects on Y are effects that occur after different amounts of time For example newey ipGrowth Ll18oilshock lag7 Here the growth rate of industrial production ipGrowth is related to the percentage oil price increase or 0 if there was no oil price increase oilshock in 18 previous months This provides estimates of the effects of oil price shocks after 1 month after 2 months etc The cumulative effect after 6 months then could be found by lincom Lloilshock L2oilshock L3oilshock L4 oilshock L5oilshock L6oilshock Confidence intervals and p values are reported along with these results You could draw by hand a graph of the estimated effects versus the time lag along with 95 confidence intervals You could also draw by hand a graph of the estimated cumulative effects versus the time lag along with 95 confidence intervals Making the same graphs in an automated fashion in Stata is a little more painstaking but see my Stata dofile for Stock and Watson s exercise El5l for an example V System Estimation Commands Advanced Econometrics students work with estimators for systems of equations Here is a brief introduction to some pertinent system estimation commands Note that these commands all assume conditionally homoskedastic errors To refer to a coefficient in an equation after estimation for the lincom test and testnl commands see the example test command in section V1 below Vl ThreeStage Least Squares Read the Stata manual s entry for the reg3 command to get a good sense of how it works Here are some examples drawn from the Stata manual reg3 consump wagepriv wagegovt wagepriv consump govt capitall Estimate a twoequation 3SLS model in which the two dependent variables consump and wagepriv are assumed to be endogenous Dependent variables are assumed to be endogenous unless you list them in the exog option The instruments consist of all other variables wagegovt govt capitall and the constant term Note that the consumption equation estimates will be the same as in ZSLS since that equation is just identified test consumpwagegovt wageprivcapitall The test lincom and testnl commands work fine after multiequation estimations but you have to specify each coefficient you are talking about by naming an equation as well as a variable in an equation Thus consumpwagegovt refers to the coefficietnt of the variable wagegovt in the equation named 29 Kenneth L Simons 25Apr09 consump Stata by default names equations after their dependent variables For nonlinear hypothesis tests refer to coefficients for example using 7bconsumpwagegovt reg3 qDemand quantity price pcompete income qSupply quantity price praw endogprice Estimate a twoequation model naming the equations qDemand and qSupply since they have the same dependent variable and treat price as endogenous Treat the other three regressors and the constant as exogenous V2 Seemingly Unrelated Regression Read the Stata manual s entry for the sureg command to get a good sense of how it works Here is an example drawn from the Stata manual sureg price foreign mpg displ weight foreign length corr Estimate a twoequation SUR model The corr option causes the crossequation correlation matrix of the residuals to be displayed along with a test of the null hypothesis that the error terms have zero covariance between equations V3 Multivariate Regression mvreg headroom trunk turn price mpg displ geariratio length weight corr Estimate three regression equations the first with headroom as the dependent variable the second with trunk space as the dependent variable the third with turning circle as the dependent variable In each case the six variables listed on the righthand side of the equals sign are used as regressors The corr option causes the crossequation correlation matrix of the residuals to be displayed along with a test of the null hypothesis that the error terms have zero covariance between equations The same estimates could be obtained by running three separate regressions but this also analyzes correlations of the error terms and makes it possible to carry out crossequation tests afterward W Other Estimation Methods Advanced Econometrics students work various other estimation methods discussed here Wl Nonlinear Least Squares Read the Stata manual s entry for the nl command to get a good sense of how it works There are several ways in which to use nonlinear regression commands Here is an example showing how to estimate a nonlinear least squares model for the equation yi 51 Bze zx 1i nl y bl b2l expb3x Estimate this simple nonlinear regression Look at the above line to understand its parts The n1 is the name of the nonlinear regression command After that is an equation in parentheses The left side of the equation is the dependent variable The right side is the conditional expectation function in this case 51 Bze zx The terms in curly brackets are the parameters to be estimated which we have called bl b2 and b3 Stata will try to minimize the least squares by searching through the space of all possible values for the parameters However if we started by estimating 52 as zero we might not be able to search well 30 Kenneth L Simons 25Apr09 7 at that point the estimate of 53 would have no effect on the sum of squared errors Instead we start by estimating 52 as one using the b2l The 1 part tells Stata to start at l for this parameter Often you may have a linear combination of variables as in the formula yi a1 aze lx1 zx2 zx3 64 linear combination nl y ala2lexpxb X1 X2 X3 X4 Estimate this nonlinear regression After a nonlinear regression you might want to use the nlcom command to estimate a nonlinear combination of the parameters 8i Stata has a shorthand notation using Xb varlist to enter the X Data Manipulation Tricks In real statistical work often the vast majority of time is spent preparing the data for analysis Many of the commands given above are very useful for data preparation 7 see particularly sections F M O and P above This section describes several more Stata commands that are extremely useful for getting your data ready to use Make sure you organize all your work in a dofile or multiple dofiles starting with clear and set memory if needed then reading in the data then doing anything else like generating variables merging datasets reshaping the data using tsset et cetera and then possibly saving the prepared data in a separate file If running this dofile does not take too long you can just run it each time you want to do statistical analyses If it takes a long time save a prepared data file at the end so that you can just read in the data file when needed Xl Combining Datasets Adding Rows Suppose you have two datasets typically with at least some of the same variables and you want to combine them into a single dataset To do so use the append command append using lename Appends another dataset to the end of the data now in memory You must have the other dataset saved as a Stata file Variables with the same name will be placed in the same column for example if you have variables named cusip and year and the other dataset has variables with the same names then all the cusip values in the appended data will be in the new rows of the cusip variable while all of the year values in the appended data will be in the new rows of the year variable X2 Combining Datasets Adding Columns Suppose you have two datasets The master dataset is the one now in use in memory and you want to add variables from a using dataset in another file You might carry out i a oneto one merge eg with two crosssectional or panel datasets on the same 1000 people one with variables on age and education the other with variables on employment and earnings ii a many toone merge eg with a master crosssectional or panel dataset on 1000 people along with a year or grocerystoreshoppedin variable for each observation plus a using dataset with the US GDP in each year or with information about the location and size of each grocery store or iii a oneto many merge e g with a dataset listing 50 US states along with characteristics of each state plus a dataset listing many government agencies in each state To add the columns in the latter dataset to the former dataset use Stata s merge command as in the examples below 31 Kenneth L Simons 25Apr09 When you merge two datasets you will want to ensure that the rows of data match up properly You should do so using a socalled matched merge In i you would need a person id variable that is a unique value for each person in the dataset with the same values used in the two different datasets If the datasets involve panel data you would also need a time variable such as the year that also can be matched when comparing the two datasets In ii you would need a year or company identifier that can be matched when comparing the two datasets In iii you would need the state identifier variable in each dataset the state identifier might be a twoletter state abbreviation or an appropriately defined number for each state If you have two datasets for which the ith row in one corresponds to the ith row in the other you can do an unmatched merge in which corresponding rows are assumed to match up Unmatched merges are very dangerous because too often you accidentally end up with your rows in a different order from what was intended in one or both of the files and hence end up matching information into the incorrect rows Here is an example unmatched merge followed by the three example matched merges listed above X2a Unmatched Merge As mentioned above unmatched merges are very dangerous and should be avoided unless absolutely necessary Nonetheless this is a simple starting point to understand the merge command merge using lename This merges a saved Stata dataset with the master dataset by adding extra variables at the right of the dataset If the Stata file named lename has any variables with the same names as variables in the master dataset these variables get ignored although there are options to get information from these variables instead of ignoring them A variable named imerge gets created that equals 1 if the observation was in the original dataset only 2 if it was in the using dataset only and 3 if it was in both datasets If the same number of observations existed in both datasets before the unmatched merge then imerge will always equal 3 but otherwise some rows will either be from only the original dataset imergel or only the using dataset imerge3 You could avoid extra observations being added from the using dataset ie observations that would have imerge2 by using the nokeep option tabulate imerge Always check the values of imerge after merging two datasets to avoid errors X2b Matched OnetoOne Merge In a matched onetoone merge such as example i above each one line in the master dataset corresponds to one line in the using dataset The only exceptions are that some lines in either dataset may have zero matching lines in the other dataset As emphasized above you need to have the same identifier variables in each dataset in order to know which observation in the master dataset should be matched with which observation in the using dataset Consider a match using data in which a variable named personid is a different value for each person and each personid has one or more observations corresponding to different years or sort personid Matched merges require the data to be sorted rst in order by the matching variable Even if the data are already sorted Stata has to know that they are sorted so you still have to use the sort command 32 Kenneth L Simons 25Apr09 merge personid using lename unique This merges a saved Stata dataset with the master dataset by adding extra variables at the right of the master dataset The variables listed immediately after the merge command 7 in this case personid 7 must match in order for an obervation in one dataset to be declared the same as an observation in the other dataset The using dataset must have been sorted by personid before saving it or if it were not then you could add the sort option to this command this would sort both the master dataset and the using dataset The unique option tells Stata that in each dataset you have only one observation for each unique value of personid If non unique values are found ie if two or more of the same person appear in either of the datasets Stata will stop with an error message so that you can fix your apparent mistake either something is wrong with your data or you do not really want a onetoone merge As with an unmatched merge if the Stata le named lename has any variables with the same names as variables in the master dataset these variables get ignored although there are options to get information from these variables instead of ignoring them A variable named imerge gets created that equals 1 if the observation was in the original dataset only 2 if it was in the using dataset only and 3 if it was in both datasets You could avoid extra observations being added from the using dataset ie observations that would have imerge2 by using the nokeep option tabulate imerge Always check the values of imerge after merging two datasets to avoid errors If you are merging two panel datasets each unique observation in the dataset will by identified by unique combinations of two or more variables such as a personidyear pair This works the same as the above merge except that personid year is use instead of just personid sort personid year merge personid year using lename unique tabulate imerge X2c Matched ManytoOne Merge In a matched manytoone merge such as example ii above multiple lines in the master dataset can correspond to one line in the using dataset As emphasized above you need to have the same identifier variables in each dataset in order to match observations in the two datasets Consider a match using crosssectional data in which a variable named personid is a unique value for each person and each person can be observed multiple times visiting grocery stores with a variable groceryid used to identify the groceries a person could have many observations in the same grocery store Alternatively you might have panel data with people observed in different years in which case year would take the place of groceryid Before reading this you should understand the commands in section X2b Here there are two differences First the match variable is just groceryid Second because this is not a onetoone merge the unique option to the merge command is not appropriate in the master data each unique value of groceryid may occur many times Instead the uniqusing option tells Stata that each different groceryid appears only once at most in the using dataset sort groceryid 33 Kenneth L Simons 25Apr09 merge groceryid using lename uniqusing tabulate imerge X2d Matched OnetoMany Merge In a matched onetomany merge such as example iii above one line in the master dataset can correspond to multiple lines in the using dataset The master dataset s match variable values are unique in the sense that at most one observation has any given value or combination of values of the match variables Consider a match starting with master data having 50 observations corresponding to the 50 US states with a variable named state being a unique value for each state In the using data each state is listed many times with each observation describing a government agency in one of the states Before reading this you should understand the commands in section X2b Here there are two differences First the match variable is just state Second because this is not a onetoone merge the unique option to the merge command is not appropriate in the using data each unique value of state may occur many times Instead the uniqmaster option tells Stata that each different state appears only once at most in the master dataset sort state merge state using lename uniqmaster tabulate imerge X2e Matched ManytoMany Merge Manytomany merges are also possible For more information get help on Stata s joinby command X3 Reshaping Data Often particularly with panel data it is necessary to convert between wide and long forms of a dataset Here is a trivially simple example Wide Form I personid I income2005 I income2006 I income2007 I birthyear I 1 32437 33822 41079 1967 2 50061 23974 28553 1952 32437 2 2007 28553 1952 This is a trivially simple example because usually you would have many variables not just income that transpose between wide and long form plus you would have many variables not just birthyear that are specific to the personid and don t vary with the year 34 Kenneth L Simons 25Apr09 Trivial or complex all such cases can be converted from wide to long form or vice versa using Stata s reshape command reshape long income ipersonid jyear Starting from wide form convert to long form reshape wide income ipersonid jyear Starting from long form convert to wide form If you have more variables that like income need to transpose between wide and long form and regardless of how many variables there are that don t vary with the year just name the relevant variables after reshape long or reshape wide eg reshape long income married yrseduc ipersonid jyear Starting from wide form convert to long form reshape wide income married yrseduc ipersonid jyear Starting from long form convert to wide form X4 Converting Between Strings and Numbers Use the describe command to see which variables are strings versus numbers describe If you have string variables that contain numbers an easy way to convert them to numbers is to use the destring command The tostring command works in the reverse direction For example if you have string variables named year month and day and the strings really contain numbers you could convert them to numbers as follows destring year month day replace Convert string variables named year month and day to numeric variables assuming the strings really do contain numbers You could convert back again using tostring tostring year month day replace Convert numeric variables named year month and day to string variables When you convert from a string variable to a numeric variable you are likely to get an error message because not all of the strings are numbers For example if a string is 2345678 then Stata will not recognize it to be a number because of the commas Similar values like see note or gt1000 cannot be converted to numbers If this occurs Stata will by default refuse to convert a string value into a number This is good because it points out that you need to look more closely to decide how to treat the data If you want such nonnumeric strings to be converted to missing values instead of Stata stopping with an error message then use the force option to the destring command destring year month day replace force Convert string variables named year month and day to numeric variables If any string values do not seem to be numbers convert them to missing values Like most Stata commands these commands have a lot of options Get help on the Stata command destring or consult the Stata manuals for more information X5 Labels What if you have string variables that contain something other than numbers like male versus female or people s names It is sometimes useful to convert these values to categorical variables with values l23 instead of strings At the same time you would like to record which numbers correspond to which strings The association between numbers and strings is achieved using what are called value labels Stata s encode command creates a labeled numeric variable from a string variable Stata s decode command does the reverse For example encode personName generatepersonNameN decode personName generatepersonNameS 35 Kenneth L Simons 25Apr09 This example started with a string variable named personName generated a new numeric variable named personNameN with corresponding labels and then generated a new string variable personNameS that was once again a string variable just like the original If you browse the data personNameN will seem to be just like the string variable personName because Stata will automatically show the labels that correspond to each name However the numeric version may take up a lot less memory If you want to create your own value labels for a variable that s easy to do For example suppose a variable named female equals 1 for females or 0 for males Then you might label it as follows label define femaleLab 0 quotmalequot 1 quotfemalequot This defines a label named femaleLab label values female femaleLab This tells Stata that the values of the variable named female should be labeled using the label named femaleLab Once you have created a labeled numeric variable it would be incorrect to compare the contents of a variable to a string summarize if countryquotCanadaquot This causes an error if country is numeric However Stata lets you look up the value corresponding to the label summarize if countryquotCanadaquotcountryLabel You can look up the values from a label this way In this case countryLabel is the name of a label and quotCanadaquotcountryLabel is the number for which the label is Canada according to the label definition named countryLabel If you do not know the name of the label for a variable use the describe command and it will tell you the name of each variable s label if it has a label You can list all the values of a label with the command label list labelname This lists all values and their labels for the label named labelname Stata also lets you label a whole dataset so that when you get information about the data the label appears It also lets you label a variable so that when you would display the name of the the variable instead the label appears For example label data quotphysical characteristics of butter y speciesquot This labels the data label variable income quotreal income in 1996 Australian dollarsquot This labels a variable X6 Notes You may find it useful to add notes to your data You record a note like this note This dataset is proprietary theft will be prosecuted to the full extent of the law However notes are not by seen by users of the data unless the users make a point to read them To see what notes there are type notes Notes are a way to keep track of information about the dataset or work you still need to do You can also add notes about specific variables note income In ationadjusted using Australian census data X7 More Useful Commands For more useful commands go to Stata s Help menu choose Contents and click on Data management 36

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.