COMPUTING IN STATISTICS
COMPUTING IN STATISTICS STAT 517
Popular in Course
Mr. Cleve MacGyver
verified elite notetaker
Popular in Statistics
This 21 page Class Notes was uploaded by Shane Marks on Monday October 26, 2015. The Class Notes belongs to STAT 517 at University of South Carolina - Columbia taught by D. Hitchcock in Fall. Since its upload, it has received 39 views. For similar materials see /class/229653/stat-517-university-of-south-carolina-columbia in Statistics at University of South Carolina - Columbia.
Reviews for COMPUTING IN STATISTICS
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/26/15
STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Chapter 5 Enhancing Your Output with ODS o ODS Output Delivery System can simply be a way to save highquality output to different files RTF files PS Postscript files PDF files HTML files be sure to look at the text s example 0 ODS uses style templates to enhance default formats for those who want to do EVERYTHING in SAS o It is also an effective means to control output printed to the usual OUTPUT window L I ST ING files or written to SAS data sets University of South Carolina Page 1 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Why ODS c We have already learned a couple menudriven applications of ODS Selecting Create HTML to save output in HTML format in Exercise 6 Rightclicking on on a graphics window and saving it as a PDF file 0 The second example is analogous to saving graphics in R University of South Carolina Page 2 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Why ODS 0 While menudriven options are convenient they do have shortcomings when Generating many graphics in a large program Generating graphics in a DO loop remember the Monitoring Well data in R Generating graphics in a MACRO c We need inline commands to save output files as well University of South Carolina Page 3 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Creating an output file 0 The basic structure of an ODS command can be quite simple ODS PDF FILE zstat 517filenamepdf SAS graphics commands ODS PDF CLOSE o This basic syntax is commonly used for postscript html and rtf files as well 0 Note the similarity to the use of the pdf and postscript functions in R o In addition to saving filename pdf an attractive PDF window will open in SAS University of South Carolina Page 4 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Creating an output file 0 We can include multiple PROC s inside the ODS commands and generate multiple graphs 0 An additional precaution turn off the default output device and resume it when nShedy ODS LISTING CLOSE ODS PDF FILE zstat 517filenamepdf SAS graphics commands ODS PDF CLOSE ODS LISTING Example Broad River stage data University of South Carolina Page 5 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Selecting Output 0 SAS PROC s can produce alot of separate pieces of output some more useful than others 0 We can use ODS SELECT to choose pieces to output 0 ODS TRACE identifies output names for ODS SELECT o The trace is printed in the LOG window and it is quite cryptic 0 Once you get used to the naming conventions though it isn t so bad ODS TRACE ON SAS PROC statements RUN ODS TRACE OFF University of South Carolina Page 6 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Saving Output 0 ODS OUTPUT can save multiple output files 0 Remember that these output data sets can look odd 0 but judicious use of IF can take care of that o ODS OUTPUT can save name label and path output pieces SAS PROC statements ODS nameyouroutname pathyouroutpath RUN University of South Carolina Page 7 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Chapter 6 Modifying and Combining Data Sets 0 The SET statement is a powerful statement in the DATA step 0 Its main use is to read in a previously created SAS data set either in WORK or another library which can be modified and saved as a new data set DATA newdatasetname SET Olddatasetname run University of South Carolina Page 1 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock 0 We could stack not concatenate multiple data sets by listing several data sets in the SET statement 0 If one data set contains variables not included in the other data sets the observa tions from the other sets will have missing values for those variables in the combined data set 0 If input data sets are sorted by a specific variable stacking them may not preserve the sorting c To preserve the sorting we can interleave the data sets with a BY statement but we must sort all data sets first University of South Carolina Page 2 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Merging Data Sets 0 When observations in two or more data sets are connected by having at least one common variable it is possible to merge the data sets together Example DATA combineddataname MERGE datasetl datasetk BY commonvariable 0 Note If the data sets have an identically named variable other than the BY variable then the merged data will contain only the values from the lastdata set 0 All data sets need to be sorted by the BY variable before they can be merged 0 We can also merge each observation in a smaller data set with several observations from a larger data set onetomany matchmerge o MERGE without a BY statement saves the input data sets sidebyside compare to SET command University of South Carolina Page 3 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Merging Summary Statistics and Data 0 Often we want to merge summary statistics either statistics for entire data set or often for groups within the data set with the observations themselves 0 First calculate summary statistics using PROC MEANS after sorting if necessary 0 Output the summary statistics to another data set with an OUTPUT statement 0 Give the statistics meaningful names in this output data set 0 Use a MERGE statement to combine the original data with the OUTPUT data from PROC MEANS University of South Carolina Page 4 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Merging Summary Statistics and Data 0 Once summary stats are merged with original data we can calculate 1 centered data observations 2 standardized data observations 3 data expressed as a percentage of group sums o This is done by transforming data through functions involving the summary statistics 0 This can be tricky in R too I have to use match twice to make this work University of South Carolina Page 5 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Merging the Grand Total with the Original Data 0 When PROC MEANS is used without a BY statement you can get the grand total the grand mean etc rather than groupwise statistics 0 Merging is more difficult because the original data and summary data do not have a common variable We need to trick SAS with the SET statement DATA newdataset TE 3Ll THEN SET summarydataset SET Olddataset University of South Carolina Page 6 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock 0 Variables read from the summary data set with the first SET statement are retained with all observations 0 This is a general trick for merging one or a few observations with many where no common variable exists 0 The UPDATE statement is similar to MERGE but is typically used when a data set changes over time new variables are added values of variables are changed for old observations etc See pg 184185 University of South Carolina Page 7 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Data Set Options We have already seen many of these options incorporated into our earlier examples 0 System options specified in Options statement affect SAS operation often for matting Statement options affect the running of a step c NOPRINT is often used when you use a PROC to create an output data set Example NOPRINT option in PROC MEANS NOWINDOWS option in PROC REPORT DATA option in any procedure University of South Carolina Page 8 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock 0 Data set options affect readingwriting of data set 0 We can use data set options in DATA steps with statements like DATA SET MERGE UPDATE or in PROC steps with DATA option KEEP specifies variables to keep in data set DROP specifies variables to drop in data set RENAME oldname newname renames certain variables E TRSTOBS tells SAS where to start reading data OBS tells SAS where to stop reading data 0 The TN option is typically used to track which data set an observation in a combined data set came from 0 variables in the IN option only exist during that data step but can be used to create other variables University of South Carolina Page 9 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Creating several data sets with the OUTPUT statement 0 A single DATA step can create several SAS data sets this is a trick I don t use nearly enough 0 The DATA line must give multiple data set names DATA setl set2 set3 o The OUTPUT statement is often used with TF THEN statements or within a DO loop Example IF THEN OUTPUT setl ELSE OUTPUT set2 University of South Carolina Page 10 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock o The OUTPUT statement can also be used to create several observations from one o It transforms wide data sets into long data sets o It is often used with repeatedmeasures data several values observed for each in dividual o OUTPUT is also useful for generating function values 0 Used in a DO loop OUTPUT will tell SAS to create an observation at each iteration of the DO loop University of South Carolina Page 11 STAT 517 DelwicheSlaughter Chapter 6 University of South Carolina Hitchcock Using PROC TRANSPOSE to Flip Observations and Variables PROC TRANSPOSE converts variables data set columns into observations data set rows or observations into variables It takes a little work to get everything rela belled correctly after transposing PROC TRANSPOSE DATA lt names the new trans posed data set OUT BY lt identifies variables you don twant transposed ID lt values of this variable will become variable names VAR lt the values of these variables will be transposed placed as rows for each level of the BY variable We must first sort by the BY variable Note If the ID statement is missing the newly created variables will have default names coll c012 etc PROC TRANSPOSE is handy for converting wide data files into long data files or vice versa especially with longitudinal data Page 12 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock Automatic Variables in SAS 0 During the DATA step SAS creates temporary automatic variables These are not typically saved as part of the data set but they can be used in the DATA step c N keeps track of the number of times SAS has looped through the DATA step ie the number of observations that have been read It may be different from obs if data has been subsetted The automatic variable ERROR is binary 1 if observation has an error 0 if no error FIRST groupvariable gt 1 for the first observation with a new value for groupvariable 0 otherwise LAST groupvariable gt 1 for the last observation with a new value for groupvariable 0 otherwise They can be useful for picking out the highest or lowest values for each level of groupvariable sort by the groupvariable then use a subsetting IF along with a BY statement to save only the first or last occurences for each level of groupvari University of South Carolina Page 13 STAT 517 DelwicheSlaughter Chapter 6 Hitchcock able University of South Carolina Page 14
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'