# Applied Multivariate Methods STAT 579

UT

GPA 3.94

This 15 page Class Notes was uploaded by Kamren McLaughlin on Monday October 26, 2015. The Class Notes belongs to STAT 579 at University of Tennessee - Knoxville taught by Staff in Fall.

Date Created: 10/26/15

Univariate and Multivariate Summary Statistics A Review of Univariate Summary Statistics Random Sample Let X1 X2quot XN be a random sample from some distribution Random Sample expressed as a vector Sample Mean X1 N p x X T p x Sample Variance Sample Variance in matrix notation The deviation of each observation from the sample mean is x mx 1ix 11391 1139x Sample Variance in matrix notation Alternatively the deviation of each observation from the sample mean can be expressed as x lx 1 IN JNX Sample Variance in matrix notation The sum of the squared deviations of each observation from the sample mean is IN 11391 11N 11391 1 x391N 1139111392 x x391N 11391 1139x Sample Variance in matrix notation Let Multivariate Observations p number of response variables N number of experimental units x0 value of thefh response variable on the rth experimental unit Data Matrix x11 x12 xlp X x21 x22 xzp pr xv xN1 xNZ pr I 1 1 I I XN A Multivariate Observation xrp Multivariate Summary Statistics Random Sample Let x1 x2 XN be a random sample from a multivariate distribution Sample Mean Vector A Sample VarianceCovariance Matrix N S 2xr nxr 4039 r1 X39IN 11391 11X ix 1 1 X 1N WJN Sample Correlation Matrix Clustering Variables The VARCLUS Procedure Variable Reduction The VARCLUS procedure attempts to divide a set of variables into non overlapping clusters such that each cluster can be interpreted as essentially onedimensional This means that a large set of variables can be replaced by a single member of each cluster to act as a representative often with very little loss of information Variable Selection You can use outside knowledge to guide the selection or you can use the 1 R9 ratio to determine which variables are the best candidates 17 R2 ratio M 1 R2 next closest cluster Small values of this ratio indicate that the variable has a strong correlation with its own cluster and a weak correlation with the other clusters Variable Reduction Variable clustering PROC VARCLUS reduces the umber of variables not just the number of dimensions Rs uared 00h Next Cluster Variable Elnsest Cluster I Redheat uhILeNeaL Eggs lt choose one n w Cluster 2 Cereal Nuts choose one Cluster 3 FISh choose one Starch Cluster 4 FruILVeg 10000 The Algorithm used in VARCLUS 1 A cluster is chosen for splitting The selected cluster has either the smallest percentage of variation explained by its cluster component using the PERCENT option or the largest eigenvalue associated with the second principal component using the MAXEIGEN option The Algorithm used in VARCLUS 2 The chosen cluster is split into two clusters by finding the first two principal components rotating the components and assigning each variable to the rotated component with which it has the higher squared correlation ln PROC VARCLUS the principal components are rotated using an orthoblique rotation that is raw quartimax rotation on the eigenvectors Using SAS Macros What is a SAS Macro A SAS macro is a program written in a special SAV macro programming language It is written by a SAS user and is not supplied by SAS It can produce results that are not available directly in SAS procedures It can also select results produced by SAS procedures so that the user does not have to run the SAS procedure You should NEVER modify the actual macro programming yourself How to use SAS Macros In orderto use a SAS macro You have to tell SAS where it is located You have to call it Telling SAS where Macros are located A SAS macro will have a particular name and will be stored in a file of the same name For example the SAS macro Clforr is stored in the file Clforrsas This macro produces various confidence intervals for the population correlation coefficient Telling SAS where Macros are located If you place all the files containing macros in this location CDocuments and SettingsXYZMy DocumentsSAS Macros and place the following statements in the autoexecsas file filename statmacs 39CDocuments and SettingsXYZMy DocumentsSAS Macros39 options mautosource sasautosstatmacs sasautos then you can call the macros when desired Calling SAS Macros As an example the SAS macro Cforr produces various confidence intervals for the population correlation coefficient To use or call this macro simply run this single line of code CIfor r DataA Var X Y Alpha05 Here X and Y are the names ofvariables stored in the SAS data set named In actuality you would need to specify the name of the data set containing your data along with the correct variable names SAS Autocall Macro Potit The PLOTIT Macro General form of the PLOTIT macro plotit datadata set plotvars var1 var2 labelvar varname symvar groupvar typevar groupvar symsize option symlen option

