STATISTICAL METHODS IN PSYCHOLOGY I
STATISTICAL METHODS IN PSYCHOLOGY I PSYC 830
Popular in Course
Cecelia Erdman IV
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
Popular in Psychlogy
This 11 page Class Notes was uploaded by Cecelia Erdman IV on Sunday October 25, 2015. The Class Notes belongs to PSYC 830 at University of North Carolina - Chapel Hill taught by Staff in Fall. Since its upload, it has received 16 views. For similar materials see /class/228722/psyc-830-university-of-north-carolina-chapel-hill in Psychlogy at University of North Carolina - Chapel Hill.
Reviews for STATISTICAL METHODS IN PSYCHOLOGY I
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/25/15
An Introduction to R Notes on R A Programming Environment for Data Analysis and Graphics Version 271 20080623 W N Venables D M Smith and the R Development Core Team Copyright 1990 W N Venables Copyright 1992 W N Venables amp D M Smith Copyright 1997 R Gentleman amp R lhaka Copyright 19977 1998 M Maechler Copyright 199972006 R Development Core Team Permission is granted to make and distribute verbatim copies of this manual provided the copy right notice and this permission notice are preserved on all copies Permission is granted to copy and distribute modi ed versions of this manual under the condi tions for verbatim copying7 provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one Permission is granted to copy and distribute translations of this manual into another language7 under the above conditions for modi ed versions7 except that this permission notice may be stated in a translation approved by the R Development Core Team ISBN 3 900051127 Chapter 1 Introduction and preliminaries 3 There is an important difference in philosophy between S and hence R and the other main statistical systems In S a statistical analysis is normally done as a series of steps with intermediate results being stored in objects Thus whereas SAS and SPSS will give copious output from a regression or discriminant analysis R will give minimal output and store the results in a t object for subsequent interrogation by further R functions 14 R and the window system The most convenient way to use R is at a graphics workstation running a windowing system This guide is aimed at users who have this facility In particular we will occasionally refer to the use of R on an X window system although the vast bulk of what is said applies generally to any implementation of the R environment Most users will nd it necessary to interact directly with the operating system on their computer from time to time In this guide we mainly discuss interaction with the operating system on UNIX machines If you are running R under Windows or MacOS you will need to make some small adjustments Setting up a workstation to take full advantage of the customizable features of R is a straight forward if somewhat tedious procedure and will not be considered further here Users in dif culty should seek local expert help 15 Using R interactively When you use the R program it issues a prompt when it expects input commands The default prompt is gt which on UNIX might be the same as the shell prompt and so it may appear that nothing is happening However as we shall see it is easy to change to a different R prompt if you wish We will assume that the UNIX shell prompt is In using R under UNIX the suggested procedure for the rst occasion is as follows 1 Create a separate sub directory say work to hold data les on which you will use R for this problem This will be the working directory whenever you use R for this particular problem mkdir work Cd work Start the R program with the command R At this point R commands may be issued see later to F To quit the R program the command is gt 10 At this point you will be asked whether you want to save the data from your R session On some systems this will bring up a dialog box and on others you will receive a text prompt to which you can respond yes no or cancel a single letter abbreviation will do to save the data before quitting quit without saving or return to the R session Data which is saved will be available in future R sessions Further R sessions are simple 1 Make work the working directory and start the program as before Cd work R 2 Use the R program terminating with the q command at the end of the session To use R under Windows the procedure to follow is basically the same Create a folder as the working directory and set that in the Start In7 eld in your R shortcut Then launch R by double clicking on the icon Chapter 1 Introduction and preliminaries 6 gt rmx y z ink junk temp foo bar All objects created during an R sessions can be stored permanently in a le for use in future R sessions At the end of each R session you are given the opportunity to save all the currently available objects If you indicate that you want to do this7 the objects are written to a le called RData75 in the current directory7 and the command lines used in the session are saved to a le called Rhistory When R is started at later time from the same directory it reloads the workspace from this le At the same time the associated commands history is reloaded It is recommended that you should use separate working directories for analyses conducted with R It is quite common for objects with names x and y to be created during an analysis Names like this are often meaningful in the context of a single analysis7 but it can be quite hard to decide what they might be when the several analyses have been conducted in the same directory 5 The leading dot in this le name makes it invisible in normal le listings in UNIX Chapter 2 Simple manipulations numbers and vectors 8 and so on all have their usual meaning max and min select the largest and smallest elements of a vector respectively range is a function whose value is a vector of length two namely 6 minx maxx lengthx is the number of elements in x sumx gives the total of the elements in x and prodx their product Two statistical functions are meanx which calculates the sample mean which is the same as sumxlengthx and varx which gives sum xmeanx quot2 lengthx 1 or sample variance If the argument to varO is an n by p matrix the value is a p by p sample covariance matrix got by regarding the rows as independent p variate sample vectors sort x returns a vector of the same size as x with the elements arranged in increasing order however there are other more exible sorting facilities available see order or sortlist which produce a permutation to do the sorting Note that max and min select the largest and smallest values in their arguments even if they are given several vectors The parallel maximum and minimum functions pmax and pmin return a vector of length equal to their longest argument that contains in each element the largest smallest element in that position in any of the input vectors For most purposes the user will not be concerned if the numbers in a numeric vector are integers reals or even complex lnternally calculations are done as double precision real numbers or double precision complex numbers if the input data are complex To work with complex numbers supply an explicit complex part Thus sqrt 17 will give NaN and a warning but sqrt 170i will do the computations as complex numbers 23 Generating regular sequences R has a number of facilities for generating commonly used sequences of numbers For example 130 is the vector Cl 2 29 30 The colon operator has high priority within an ex pression so for example 21 15 is the vector C2 4 28 30 Put n lt 10 and compare the sequences 1 n 1 and 1 n 1 The construction 30 1 may be used to generate a sequence backwards The function seq is a more general facility for generating sequences It has ve arguments only some of which may be speci ed in any one call The rst two arguments if given specify the beginning and end of the sequence and if these are the only two arguments given the result is the same as the colon operator That is seq2 10 is the same vector as 2 10 Parameters to ser and to many other R functions can also be given in named form in which case the order in which they appear is irrelevant The rst two parameters may be named fromva1ue and toVa1ue thus seq130 seqfrom1 to30 and seqto30 from1 are all the same as 130 The next two parameters to seq may be named byva1ue and lengthva1ue which specify a step size and a length for the sequence respectively If neither of these is given the default by1 is assumed For example gt seq5 5 by2 gt SS generates in 3 the vector c 50 48 46 46 48 50 Similarly gt 54 lt seqlength51 from5 by2 generates the same vector in s4 Chapter 2 Simple manipulations numbers and vectors 9 The fth parameter may be named alongvector which if used must be the only parameter and creates a sequence 1 2 lengthvector or the empty sequence if the vector is empty as it can be A related function is repO which can be used for replicating an object in various complicated ways The simplest form is gt 55 lt repx times5 which will put ve copies of x end to end in 55 Another useful version is gt 56 lt repx each5 which repeats each element of x ve times before moving on to the next 24 Logical vectors As well as numerical vectors R allows manipulation of logical quantities The elements of a logical vector can have the values TRUE FALSE and NA for not available see below The rst two are often abbreviated as T and F respectively Note however that T and F are just variables which are set to TRUE and FALSE by default but are not reserved words and hence can be overwritten by the user Hence you should always use TRUE and FALSE Logical vectors are generated by conditions For example gt temp lt x gt 13 sets temp as a vector of the same length as x with values FALSE corresponding to elements of x where the condition is not met and TRUE where it is The logical operators are lt lt gt gt for exact equality and for inequality In addition if c1 and c2 are logical expressions then c1 amp c2 is their intersection and cl c2 is their union or and c1 is the negation of c1 Logical vectors may be used in ordinary arithmetic in which case they are coerced into numeric vectors FALSE becoming O and TRUE becoming 1 However there are situations where logical vectors and their coerced numeric counterparts are not equivalent for example see the next subsection 25 Missing values In some cases the components of a vector may not be completely known When an element or value is not available or a missing value in the statistical sense a place within a vector may be reserved for it by assigning it the special value NA In general any operation on an NA becomes an NA The motivation for this rule is simply that if the speci cation of an operation is incomplete the result cannot be known and hence is not available The function isnax gives a logical vector of the same size as x with value TRUE if and only if the corresponding element in x is NA gt z lt c13NA ind lt isnaz Notice that the logical expression x NA is quite different from isnax since NA is not really a value but a marker for a quantity that is not available Thus x NA is a vector of the same length as x all of whose values are NA as the logical expression itself is incomplete and hence undecidable Note that there is a second kind of missing values which are produced by numerical com putation the so called Not a Number NaN values Examples are gt 00 Chapter 2 Simple manipulations numbers and vectors 10 gt Inf Inf which both give NaN since the result cannot be de ned sensibly In summary isnaxx is TRUE both for NA and NaN values To differentiate these is nanxx is only TRUE for NaNs Missing values are sometimes printed as ltNAgt when character vectors are printed without quotes 26 Character vectors Character quantities and character vectors are used frequently in R for example as plot labels Where needed they are denoted by a sequence of characters delimited by the double quote character eg quotxvaluesquot quotNew iteration resultsquot Character strings are entered using either matching double quot or single quotes but are printed using double quotes or sometimes without quotes They use C style escape sequences using as the escape character so is entered and printed as and inside double quotes quot is entered as quot Other useful escape sequences are n newline t tab and b backspaceisee Quotes for a full list Character vectors may be concatenated into a vector by the c function examples of their use will emerge frequently The paste function takes an arbitrary number of arguments and concatenates them one by one into character strings Any numbers given among the arguments are coerced into character strings in the evident way that is in the same way they would be if they were printed The arguments are by default separated in the result by a single blank character but this can be changed by the named parameter sepstring which changes it to string possibly empty For example gt labs lt pastecquotXquot quotYquot 110 sepquotquot makes labs into the character vector CquotXlquot quotY2quot quotX3quot quotY4quot quotX5quot quotY6quot quotX7quot quotY8quot quotX9quot quotY10quot Note particularly that recycling of short lists takes place here too thus CquotXquot quotYquot is repeated 5 times to match the sequence 1103 27 Index vectors selecting and modifying subsets of a data set Subsets of the elements of a vector may be selected by appending to the name of the vector an index vector in square brackets More generally any expression that evaluates to a vector may have subsets of its elements similarly selected by appending an index vector in square brackets immediately after the expression Such index vectors can be any of four distinct types 1 A logical vector In this case the index vector must be of the same length as the vector from which elements are to be selected Values corresponding to TRUE in the index vector are selected and those corresponding to FALSE are omitted For example gt y lt xisnax creates or recreates an object y which will contain the non missing values of x in the same order Note that if x has missing values y will be shorter than x Also gt x1 isnax amp xgt0 gt 2 creates an object 2 and places in it the values of the vector x1 for which the corresponding value in x was both non missing and positive 1 collapsess joins the arguments into a single Character string putting ss in between There are pas s more tools for character manipulation see the help for sub and substrlng Chapter 5 Arrays and matrices 19 53 Index matrices As well as an index vector in any subscript position a matrix may be used with a single index matrix in order either to assign a vector of quantities to an irregular collection of elements in the array or to extract an irregular collection as a vector A matrix example makes the process clear In the case of a doubly indexed array an index matrix may be given consisting of two columns and as many rows as desired The entries in the index matrix are the row and column indices for the doubly indexed array Suppose for example we have a 4 by 5 array X and we wish to do the following 0 Extract elements X 1 3 X 2 2 and X 3 1 as a vector structure and 0 Replace these entries in the array X by zeroes In this case we need a 3 by 2 subscript array as in the following example gt x lt array120 dimc 45 Generate a 4 by 5 array gt x 1 2 3 4 5 1 1 5 9 13 17 2 2 6 10 14 18 3 3 7 11 15 19 4 4 8 12 16 20 gt i lt arrayc1331 dimc32 gt i i is a 3 by 2 index array 1 2 1 1 3 2 2 2 3 3 1 gt xi Extract those elements 1 9 6 3 gt xi lt O Replace those elements by zeros gt x 1 2 3 4 5 1 1 5 o 13 17 2 2 o 10 14 18 3 o 7 11 15 19 4 4 8 12 16 20 gt Negative indices are not allowed in index matrices NA and zero values are allowed rows in the index matrix containing a zero are ignored and rows containing an NA produce an NA in the result As a less trivial example suppose we wish to generate an unreduced design matrix for a block design de ned by factors blocks b levels and varieties v levels Further suppose there are n plots in the experiment We could proceed as follows gt Xb lt matrix0 n b Xv lt matrix0 n v ib lt cbind1n blocks iv lt cbind1n varieties Xb ib lt 1 Xv iv lt 1 gt X lt cbindXb Xv VVVV V To construct the incidence matrix N say we could use gt N lt crossprodXb Xv Chapter 5 Arrays and matrices 22 57 Matrix facilities As noted above a matrix is just an array with two subscripts However it is such an important special case it needs a separate discussion R contains many operators and functions that are available only for matrices For example tX is the matrix transpose function as noted above The functions nrowA and ncolA give the number of rows and columns in the matrix A respectively 571 Matrix multiplication The operator o o is used for matrix multiplication An 71 by 1 or 1 by 71 matrix may of course be used as an n vector if in the context such is appropriate Conversely vectors which occur in matrix multiplication expressions are automatically promoted either to row or column vectors whichever is multiplicatively coherent if possible although this is not always unambiguously possible as we see later If for example A and B are square matrices of the same size then gt A B is the matrix of element by element products and gt A 796 B is the matrix product If x is a vector then gt x A x is a quadratic form1 The function crossprodO forms crossproducts meaning that crossprodX y is the same as tX o o y but the operation is more e icient If the second argument to crossprodO is omitted it is taken to be the same as the rst The meaning of diagO depends on its argument diagv where v is a vector gives a diagonal matrix with elements of the vector as the diagonal entries On the other hand di agM where M is a matrix gives the vector of main diagonal entries of M This is the same convention as that used for diagO in MATLAB Also somewhat confusingly if k is a single numeric value then diag k is the k by k identity matrix 572 Linear equations and inversion Solving linear equations is the inverse of matrix multiplication When after gt b lt A x only A and b are given the vector x is the solution of that linear equation system In R gt solveAb solves the system returning x up to some accuracy loss Note that in linear algebra formally x A lb where A 1 denotes the inverse of A which can be computed by solve A but rarely is needed Numerically it is both ine icient and potentially unstable to compute x lt solve A o o b instead of solveAb The quadratic form x A lx which is used in multivariate computations should be computed by something like x o o solve Ax rather than computing the inverse of A 1 Note that x 3939 x is ambiguous as it could mean either xx 0r xx where x is the column form In such cases the smaller matrix seems implicitly to be the interpretation adopted so the scalar xx is in this case the result The matrix xx may be calculated either by cbindx 3939 x or x 3939 rbindx since the result of rbindo 0r cbindo is always a matrix However the best way to compute xx 0r xx is crossprodx or x 39o39 K respectively 0 EVen better would be to form a matrix square root B with A BB and nd the squared length of the solution of By x perhaps using the Cholesky 0r eigendecomposition of A Chapter 5 Arrays and matrices 25 510 Frequency tables from factors Recall that a factor de nes a partition into groups Similarly a pair of factors de nes a two way cross classi cation and so on The function table allows frequency tables to be calcu lated from equal length factors If there are k factor arguments the result is a k Way array of frequencies Suppose for example that statef is a factor giving the state code for each entry in a data vector The assignment gt statefr lt tablestatef gives in statefr a table of frequencies of each state in the sample The frequencies are ordered and labelled by the levels attribute of the factor This simple case is equivalent to but more convenient than gt statefr lt tapplystatef statef length Further suppose that incomef is a factor giving a suitably de ned income class77 for each entry in the data vector for example with the cut function gt factorcutincomes breaks 351007 gt incomef Then to calculate a two way table of frequencies gt tableincomef statef statef incomef act nsw nt qld sa tas ViC wa 35 45 1 1 0 1 0 0 1 0 45 55 1 1 1 1 2 0 1 3 55 65 0 3 1 3 2 2 2 1 65 75 0 1 0 0 0 0 1 0 Extension to higher way frequency tables is immediate