### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Applied Multivariate Statistical Methods STAT 5610

WMU

GPA 3.54

### View Full Document

## 28

## 0

## Popular in Course

## Popular in Statistics

This 11 page Class Notes was uploaded by Alia Gerhold on Wednesday September 30, 2015. The Class Notes belongs to STAT 5610 at Western Michigan University taught by Jung Wang in Fall. Since its upload, it has received 28 views. For similar materials see /class/216953/stat-5610-western-michigan-university in Statistics at Western Michigan University.

## Reviews for Applied Multivariate Statistical Methods

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/30/15

Covariance Contents 1 M 0 Motivation The Orthogonal Factor Model Methods of Estimation Factor Rotation Comparison Between Factor Analysis and Principal Components Analysis Factor Scores Factor Analysis IIIB 1 IIIB 1 IIIB 3 IIIB 5 IIIB 7 IIIB 7 B Factor Analysis 1 Motivation 1 Idea Explain the covariance structure of many variables in terms of a few underlying but unobservable random factors That is reduce the number of variables in the covariance 2 Example SpearmanIntelligence X1 classic test score X2 math X174 French Xp music X1 M1 511F1 512F2 quot39ZlmFm81 F17 Fm are common X2 2 21F1 522F2 39 39 39 ZZmFm 82 factors unobservable and random 7 Xp Mp gplFl 172112 mem 8p 81 em are speci c factors where u is the mean of variable 239 and 7 is the factor loading of 2th variable on jth factor The above equations can be expressed as the matrix form below X u LFe 1 X1 1 11 1m F1 81 5 5 5 5 5 5 X17 MP 171 pm Fm 8 Equation 1 is like regression model but Fi s are unobservable random factors 2 The Orthogonal Factor Model 1 De nitions and Assumptions a X u LF e as above b 0 covF Im That is 0 VarF l covF7 Fj 0 V2 and Vj 31 239 c Ee 0 ie 0 W 11 cove diagz1 wp ie Var6 71 cov68j 07 239 31 j IIIB l d Common and speci c factors are uncorrelated ie covF7 8739 0 V2 2 Covariance Structure a 2 LL II ie 2 2 SS of factor loadings 2 h 11 communality speci c variance b covXF L ie 7 covX7 F7 3 Example 19 30 2 l2 4 l 2 0 0 0 30 57 5 23 7 7 2 4 7 71 l 0 4 0 0 2 5 38 47 7 71 6 l 2 6 8 0 0 l 0 12 23 47 68 l 8 0 0 0 3 V L v 2 L 11 So 011 VarX1 42122 172 19 03922 VarX2 72224 534 57 4 Comments a The factor model reduces the number of unknown parameters in 2 from 171 1 to pm 1 Eg i p 6 m 2 2 has 21 parameters while factor model has 18 parameters ii p 12 m 2 2 has 78 parameters while factor model has 36 parameters b Most covariance matrix can t be factored if m lt p can always do if m p see Example 92 c If a factorization exists it is not unique for m gt 1 since orthogonal transformation of the factors gives a new and acceptable factor model That is suppose X u LF e is an orthogonal factor model and that T is orthogonal Set L LT and F TF V V W loading new factor Then X u LTF e is another orthogonal factor model That is rotating the factors keeps an orthogonal factor model IIIB 2 3 Methods of Estimation 1 Principal components method a Idea Spectral decomposition gives 2 Alelel Amemem Am1emlem1 Apepep variability due to common factorklarge part variability due to speci c factorsismall part whereAl2A222Am2Am122Ap Then ei 2ltelEemgt 39 iAmem where 71 0 7 221 6 That is the equation above is exact for the variances Note With data the above quantities are estimated by the corresponding sample quan tities b Example 93 page 491 Market Data Rate new product on a sevenpoint scale the sample correlation matrix is X1 X2 X3 X5 4 taste good buy avor suitable for snack lots of energy taste 100 002 096 042 001 good buy 002 100 013 071 085 avor 096 013 100 050 011 suitable for snack 042 071 050 100 079 lots of energy 001 085 011 079 100 Table 1 Correlation Matrix Market Data There appears to be two groups XlXg and X2X5 Also X4 is closer to X2X5 group correlation of 079 than to XlXg group correlation of 071 So the apparent linear relationship might be explained by 2 or 3 factors A twofactor orthogonal factor model using principal components method yielded the following F1 F2 communalities h speci c variances 71 taste 056 082 098 002 good buy 078 752 088 012 avor 065 075 098 002 suitable for snack 094 710 089 011 lots of energy 080 7 54 093 007 eigenvalues 28531 18063 IIIB 3 c Determining the number of factors i Proportion of sample variance due to the jth factor A ELI 5 Aj p in corre lation matrix Note that M 21 6 Eg Market Data Table l The cumulative proportions of total variance explained by these two factors are L555 0571 and w 0932 respectively Increase the number of common factors until a suitable proportion of the total vari ance has been explained Note As new factors are added to the factor loadings remain xed ii Look at the residual matrix S 7 LL II or R 7 LL Il If all the entries in the residual matrix are small then we have a good factorization Fact Sum of squares of entries ofthe residual matrix 3 A3 A127 So if the deleted eigenvalues are small then the residual matrix is small Eg Market Data Table 1 The residual matrix is taste good buy avor suitable for snack lots of energy taste 000 001 701 702 001 good buy 001 000 002 707 706 avor 701 002 000 703 000 suitable for snack 702 707 703 000 702 lots of energy 001 706 000 702 000 Two factors seems adequate 2 Maximum likelihood method a Additional assumptions i F and e are normally distributed This forces X as seen in Equation 1 to be normal ii Since L is not unique due to multiplication by orthogonal matrix to the solution of the factors we require LIl 1L if A to be a diagonal matrix This makes L unique b MLE s are determined by the software and are dilTerent than the PC solution c Stock Data cont d IIIB 4 ML solution PC solution company F1 F2 11 F1 F2 11 Allied Chemical 068 019 050 078 722 034 du Pont 069 052 025 077 742 017 Union Carbide 068 025 047 079 723 031 Exxon 062 707 061 071 041 027 Texaco 079 744 018 071 052 022 proportion 048 060 057 073 ML solution factor 1 may be called general market and factor 2 may be called Texaco contrasted with alu Pont PC solution factor 1 may be called general market also and factor 2 oil contrasted with chemical Which is preferred ML or PC Answer the one with smaller residual matrix Large sample test to determine m the number of common factors That is test H0 2 LL II Likelihood ratio test statistic Q V LL 11 B cln 2p4m75 81 6 39 7 where c n 71 2 i 10 my 10 i m Reject H0 1fB gt dea7 and df i Eg Stock Data above B 62 m 2 df 1 9amp05 384 So do not reject H0 4 Factor Rotation 1 De nition If T is an orthogonal matrix ie change of basis and 2 LL 11 then L 7 LT is the factor loading corresponding to the factor rotation F TF 2 LL Il 2 Properties a The residual matrix communalities and speci c variances are all unchanged under or thogonal factor rotations Also the total variance of the m factors remain unchanged after rotation although the variance of individual factor changes after rotation b The factors are rotated to make the factor loadings more interpretable Ideally we would like each variable to load highly on a single factor but to have small loadings on the other factors IIIB 5 E 4 c For m 2 the orthogonal rotation matrix has the form cos 6 sin6 7 sin 6 cos 6 where 6 is the rotation angle Which one is preferred Answer varimax rotation which seeks to nd a rotation so as to make the entries in each factor loading column as close to either 0 or i1 Example Market Data Table l cont d original rotated variable F1 F2 F1 F2 taste 056 082 002 099 good buy 078 752 094 701 avor 065 075 013 098 suitable for snack 094 710 084 043 lots of energy 080 754 097 702 proportion 571 934 507 934 Note that the rotation matrix is T 7 08366 05479 7 705479 08366 39 IIIB 6 5 Comparison Between Factor Analysis and Principal Compo nents Analysis 1 Both methods try to achieve dimension reduction Factor analysis attempts to achieve a reduction from p to m dimensions by postulating a model relating X17 7Xp to m hypo thetical variables whereas principal components analysis has no such explicit model 2 Both methods can be thought of as trying to represent some aspect of the covariance matrix 2 or correlation matrix p as well as possible but PCA concentrates on the diagonal elements whereas in FA the interest is in the offdiagonal elements PCA max 23 VarZk in 221 VarZk tr2 FA II is diagonal LF accounts completely for the offdiagonal elements 3 For a given data set the number of factors required for an adequate factor model will be no larger and may be strictly smaller than the number of PC s required to account for most of the variation in the data If PC s are used as initial factors then the ideal choice of m will often be less than that determined by most of the rules 4 Changing m the dimensionality pf the model can have much more drastic effects on FA than it does on PCA 5 PCA versus FA PC s can be calculated exactly from X whereas the common factors typi cally cannot EX linear function of F but not conversely 6 Factor Scores Factor scores are the observations expressed in the factor space 1 Usefulness Factor scores are often used for diagnostics purpose and frequently used as inputs to a subsequent analysis 2 Methods of estimates a Bartlett d method Weighted Least Squares Method For factor loadings estimated by the maximum likelihood method the factor scores are obtained by 71 A 71 w x 7 f if covariance matrix is factored A L 11 h j 7 f3 y W ere y D 12wj 7 f 1f correlatlon matrix 1s factored For factor loadings estimated by the principal components method it is customary to obtain factor scores by unweighted ie ordinary least square method which is essentially the principal components scores IIIB7 b Thompson s method Regression Method fj A L S 1wj 7 f if covariance matrix is factored LR lD 12wj 7 E if correlation matrix is factored 3 Example ChickenBone Data Note A google search with keyword f owl f em sas will result in a SAS le The data were corrected by Les Marcus The correlation matrix is Skull Skull Femur Tibia Humerus Ulna Length Breadth Length Length Length Length X1 X2 X3 X4 X5 X6 X1 1000 0583 0569 0602 0621 0603 X2 0583 1000 0515 0548 0584 0526 X3 0569 0515 1000 0926 0877 0878 X4 0602 0548 0926 X5 0621 0584 0877 1000 0937 X6 0603 0526 0878 0894 0937 1000 0 000 10 b0 0 oo 1 1 O 00 O 1 Note that it s dilTerent than the correlation matrix in the textbook in variable Skull Breadth a ThreeFactor Principal Components Solution Unrotated Factor Loadings Rotated Factor Loadings F1 F2 F3 Ff F F 1amp1 Skull Length 0744 70446 0497 0339 70276 0899 0000 Skull Breadth 0694 70597 70402 0289 70922 0256 0000 Femur Length 0928 0245 70037 0912 70209 0218 0077 Tibia Length 0941 0196 70026 0895 70241 0256 0075 Humerus Length 0949 0141 70037 0873 70288 0277 0079 Ulna Length 0941 0213 0003 0902 70212 0270 0069 Cumulative proportion of total standardized sample variance explained 0761 0881 0950 0568 0760 0950 Three factors seem to be appropriate The screeplot is given below IIIB8 Principal Components Method ChickenBone Data Variances 0 Factor 1 Ff appears to be body size factor while factors 2 and 3 F and F5 collec tively appear to be head size factor b Maximum Likelihood Solution There exhibit clear disagreement between the ML solution and the PC solution Factor Score Plot Three Factor Model Ch39 t Icken Bone Da a The disagreement is also apparent when m 2 IIIB 9 Factor Score Plot Two Factor Model Chicken Bone Data 0 O 6 30 D D LO 9 o I l l l l 4 2 o 2 ML1 ML2 It is observed that the data obviously deviate from multivariate normality as evidenced by the chisquare plot xzplot standardized Chicken Bone Data sample quantiles 10 theoretical quantiles Since the ML solution requires normality assumption we use only PC solution IIIB10

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.