COMPUTING IN STATISTICS
COMPUTING IN STATISTICS STAT 517
Popular in Course
Mr. Cleve MacGyver
verified elite notetaker
Popular in Statistics
This 10 page Class Notes was uploaded by Shane Marks on Monday October 26, 2015. The Class Notes belongs to STAT 517 at University of South Carolina - Columbia taught by D. Hitchcock in Fall. Since its upload, it has received 47 views. For similar materials see /class/229653/stat-517-university-of-south-carolina-columbia in Statistics at University of South Carolina - Columbia.
Reviews for COMPUTING IN STATISTICS
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/26/15
STAT 517 DelwicheSlaughter Chapter 2 HitchcockGrego Chapter 2 Getting Data Into SAS 0 Data stored in many different formsformats 0 Four categories of methods to read in data 1 Entering data directly through keyboard small data sets 2 Creating SAS data sets from raw data files 3 Converting other software s data files eg Excel into SAS data sets my favorite 4 Reading other software s data files directly often need additional SASACCESS products University of South Carolina Page 1 STAT 517 DelwicheSlaughter Chapter 2 HitchcockGrego Import Window 0 Allows you to import various types of data files Microsoft Excel formats 0 Default is for first row to be variable names Change this using Options button 0 Options button also selects worksheet from workbook 0 Work library data set deleted after exiting SAS 0 Other libraries data set saved after exiting SAS but not necessarily library location 0 Can save PROC IMPORT statements used to import the data University of South Carolina Page 2 STAT 517 DelwicheSlaughter Chapter 2 HitchcockGrego Reading in Raw Data If you type data directly into a SAS program this is indicated with a statement like cards datalines lines o If your raw data is in an external file use an INFILE statement to tell SAS where it is o Specify full path name 0 If your lines are longer than 256 characters use LRECL University of South Carolina Page 3 STAT 517 DelwicheSlaughter Chapter 2 HitchcockGrego Data Separated by Spaces 0 This style is called free format since the number of spaces in between variables is flexible 0 Use INPUT statement to name variables 0 Include a after names of character variables University of South Carolina Page 4 STAT 517 DelwicheSlaughter Chapter 2 HitchcockGrego Data Arranged in Columns 0 Knowledge of this approach is less important nowadays 0 Important applications still exist 0 Each value of a variable is found at the same spot on the data line 0 Advantages 1 Don t need space between values 2 Missing values don t need special symbol can be blank 3 Character data can have blanks 4 Can skip variables you don t need to read into SAS Example INPUT varl 1 10 var2 11 15 var3 16 30 University of South Carolina Page 5 STAT 517 DelwicheSlaughter Chapter 2 HitchcockGrego Data Not in Standard Format 0 Types of nonstandard data 1 Numbers with commas or dollar signs 2 Dates and times of day 0 We can read nonstandard data using codes known as informats 0 Most informats end in so SAS won t confuse them with a variable 0 Import from Excel often assigns informats automatically 0 p 4445 lists many SAS informats Note that date informats are converted to a numerical value Julian date University of South Carolina Page 6 STAT 517 DelwicheSlaughter Chapter 2 HitchcockGrego Other Inputting Issues 0 You can mix input styles read in some variables liststyle others columnstyle others using informats even the order can be shuffled o Eg you can explicitly move SAS to a specific column number Example 5 0 moves SAS to the 50th column Messy Data 0 colon modifier Tells SAS exactly how many columns long a variable s field is but stops when it reaches a space 0 Example Deptname 15 tells SAS to read Deptname for 15 characters or until it reaches a space 0 This method is not appropriate for character data with embedded spaces University of South Carolina Page 7 STAT 517 DelwicheSlaughter Chapter 2 HitchcockGrego Multiple Lines of Data per Observation 0 Sometimes each observation will be on several lines in the raw data file census data standardized test scores etc 0 Use to tell SAS when to go to the next line 0 Or use 2 for example to tell SAS to go to the 2nd line of the observation Multiple Observations per Line of Raw Data 0 Sometimes several observations will be on one line of data 0 This is common for textbook exercises 0 Use to tell SAS to stay on the raw data line and wait for the next observation Reading Part of a Data File 0 Sometimes we want to modify data input based on values of one variable 0 We can read just the first variables using the sign University of South Carolina Page 8 STAT 517 DelwicheSlaughter Chapter 2 HitchcockGrego Reading Delimited Files 0 These instructions have been completely subsumed by Excel imports o DLM allows you to have something other than spaces separated data values 0 Comma delimiters DLM 0 Tab delimiters DLM O 9 X o delimiters DLM o This assumes two delimiters in a row is the same as a single delimiter o What if two commas in a row indicate a missing value c What if some data values contain commas 0 Can use DSD option 0 Note Data values with commas in them must be in quotes 0 Default with DSD is comma delimiters but can specify other delimiters with DLM option University of South Carolina Page 9 STAT 517 DelwicheSlaughter Chapter 2 HitchcockGrego SAS data sets Temporary and Permanent 0 Data sets stored in Work library are temporary removed upon exiting SAS 0 Data sets stored in other libraries are permanent will be saved upon exiting SAS 0 You can specify the library when creating a data set in the DATA step Example Suppose you have a library called sportlib this is a libref DATA sportlibbaseball creates a data set baseball to be stored in the sport lib library permanent DATA workbaseball would store baseball in the work library temporary DATA baseball by default stores ba s ebal l in the we rk library temporary University of South Carolina Page 10