APPLIED MULTIVARI STATS
APPLIED MULTIVARI STATS STAT 530
Popular in Course
Popular in Statistics
This 6 page Class Notes was uploaded by Shane Marks on Monday October 26, 2015. The Class Notes belongs to STAT 530 at University of South Carolina - Columbia taught by B. Habing in Fall. Since its upload, it has received 11 views. For similar materials see /class/229663/stat-530-university-of-south-carolina-columbia in Statistics at University of South Carolina - Columbia.
Reviews for APPLIED MULTIVARI STATS
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/26/15
Exploratory Factor Analysis Brian Habing University of South Carolina 7 Last Updated October 4 2005 FA is not worth the time necessary to understand it and carry it out Hills 1977 Factor analysis should not be used in most practical situations Chat eld and Collins 1980 pg 89 At the present time factor analysis still maintains the avor of an art and no single strategy should yet be quotchiseled into stone quot J0hns0n and Wichem 2002 pg 517 38 The number of articles ISI Web of Science shows with the topic Factor Analysis for the week ending October 1 2005 1213 The number of FA articles in 2005 through October 151 References Chat eld C amp Collins A J 1980 Introduction to Multivariate Analysis London Chapman and Hall Hair JF Jr Anderson RE Tatham RL amp Black WC 1998 Multivariate Data Analysis 5 11 Edition Upper Saddle River NJ Prentice Hall Hatcher L 1994 A Step by Step Approach to Using the SAS System for FactorAnalysis and Structural Equation Modeling Cary NC SAS Institute Inc Hills M 1977 Book Review Applied Statistics 26 339340 Johnson DE 1998 AppliedMultivariate Methods for Data Analysis Paci c Grove CA BrooksCole Publishing Johnson RA amp Wichem DW 2002 Applied Multivariate Statistical Analyses 5 1 Edition Upper Saddle River NJ Prentice Hall Mardia KV Kent JT amp Bibby JM 1979 Multivariate Analysis New York Academic Press Sharma S 1996 Applied Multivariate Techniques United States John Wiley amp Sons Stevens J 2002 Applied Multivariate Statistics for the Social Sciences 4 11 Edition Mahwah NJ Lawrence Erlbaum Associates Thompson B 2000 QTechnique Factor Analysis One Variation on the TwoMode Factor Analysis of Variables In Grimm LG amp Yamold P Reading and Understanding More Multivariate Statistics Washington DC American Psychological Association Gnal he Mndel ofthe 1 subjects have been standardlzed Xt F Zkal X2 7hrFr st 2 X dFl kau with z r the qufactor loadlngs Th e wlll assume th5 orthogonahty below but rt ls not true for obllque rotatrons The n are also rnolepenolent and the F and n are mutually rnolepenolent of eaeh other In rnatnx form th5 ean be wntten as Km Attacde u1xl whlch ls eqnvalentto AA eovu a x u should be aqw dlagonal rnatnx Thls lmplles that VarX i 2 Varn 11 he 1 T ofX the vananee that ls speclfll to vanable x Path Dug 4 represented by a errele and eaeh mmlfest vanable ls represented by a square An an39owlndlcates eausal ty whlch ean getto be aprerty eornplreateol subject tn sorne rnoolels The dlagram at the le ls for a eractor model of t AU AZ x 3 vanables H q3 where the rst vanable loaols only on factor 1 1120 the seeonol varlable loaols on both factors and the thtrol varlable loaols only on factor 2 1510 Types of Factor Analysis Some authors refer to several different types of factor analysis such as R Factor Analysis or QFactor Analysis These simply refer to what is serving as the variables the columns of the data set and what is serving as the observations the rows Rows Pg What we have been doing so far has been R factor analysis we have been looking for latent factors that lie behind the variables This would for example let us group different test questions that seem to be measuring the same underlying construct In Q factor analysis we would be looking for factors that seem to be underlying the examinees and be asking how many types of people are there The math is the same but the terminology and goals can be different Everything we have below refers to R factor analysis Is the data appropriate Correlation It doesn39t make sense to use factor analysis if the different variables are unrelated why model common factors if they have nothing in common Rule of thumb A substantial number of correlations are gt 03 Some other methods also include Bartlett39s test of sphericity the measuring of sampling adequacy MSA the antiimage correlation matrix and the KaiserMeyerOlkin measure KMO I haven t seen that much is gained by using these others Multivariate Normality 7 In order to use maximum likelihood estimation or to perform any of the tests of hypotheses it is necessary for the data to be multivariate normal This is not required for fitting the model using principal components or principal factor methods Sample Size A general rule of thumb is that you should have at least 50 observations and at least 5 times as many observations as variables Stevens 2002 pg 395 summarizes some results that are a bit more specific and backed by simulations The number of observations required for factors to be reliable depend on the data In particular on how well the variables load on the different factors A factor is reliable if it has 3 or more variables with loadings of 08 and any n 4 or more variables with loadings of 06 and any n 10 or more variables with loadings of 04 and 112150 Factors with only a few loadings require r2300 Obviously this doesn t cover every case but it does give some guidance How many factors There are several rules for determining how many factors are appropriate for your data Mardia Kent and Bibby 1979 pg 258 point out that there is a limit to how many factors you can have and actually end up with a model that is simpler than what your raw data The quantity 3 is the difference between the number of unique values in your data s qxq correlation matrix and the number of parameters in the k factor model Fag kr gw It only makes sense to perform a factor analysis if sgt0 and some programs will not let you estimate the factor analytic model if it is not true Even though you could always exactly fit a 5 factor model to 5 variables using principal components and no specificity sgt0 only for one or two factors The minimum number of variables required for di erent numbers of factors are IFactors I2I3I4I5I6I I Variables Required I 5 I 7 I 8 I 9 I 11 I In general this is not something we worry about too much since we usually want to have a much smaller number of factors than variables Also recall that we want several variables loading on each factor before we can actually trust that factor to be meaningful anyway Kaiser 39s Criterion Eigen Value gt 1 7 Take as many factors as there are eigenvalues gt 1 for the correlation matrix Hair etal 1998 pg 103 reports that this rule is good ifthere are 20 to 50 variables but it tends to take too few if there are lt20 variables and to many if there are gt50 Stevens 2002 pg 389 reports that it tends to take too many if there are gt 40 variables and their communalities are around 04 It tends to be accurate with 1030 variables and their communalities are around 07 Scree Plot 7 Take the number of factors corresponding to the last eigenvalue before they start to level off Hair etal 1998 pg 104 reports that it tends to keep one or more factors more than Kaiser s criterion Stevens 2002 pg 390 reports that both Kaiser and Scree are accurate if ngt250 and communalities 2 06 Fixed of Variance Explained 7 Keep as many factors as are required to explain 60 70 8085 or 95 There is no general consensus and one should check what is common in your field It seems reasonable that any decent model should have at least 50 of the variance in the variables explained by the common factors A priori 7 If you have a hypothesis about the number of factors that should underlie the data then that is probably a good at least minimum number to use In practice there is no single best rule to use and a combination of them is often used so if you have no a priori hypothesis check all three and use the closest thing to a majority decision There are also a variety of other methods out there that are very popular with some authors Minimum average partial correlation parallel analysis and modified parallel analysis are three of them Three that require multivariate normality are likelihood ratio test AIC Akaike s information criterion and SEC Schwartz s Bayesian criterion Methods for Fitting the Factor Analeis Model Principal Components Factor Analysis Just take the first m loadings from the principal components solution and multiply by the square root of the corresponding eigen value This is not really appropriate since it attempts to explain all of the variance in the variables and not just the common variance It therefore will often have highly correlated errors However Hair etal 1998 pg 103 reports that it often gives similar results to other methods if there are 230 variables or if most variables have communalities gt Principal Factor Factor Analysis 7 aka Principal Axis Factoring and sometimes even Principal Components Factoring Come up with initial estimates of the communality for each variable and replace the diagonals in the correlation matrix with those Then do principal components and take the first m loadings Because you have taken out the specificity the error matrix should be much closer to a diagonal matrix There are various initial estimates used for the initial communalities the absolute value of the maximum correlation of that variable with any of the others the squared multiple correlation coefficient for predicting that variable from the others in multiple regression and the corresponding diagonal element from the inverse of the correlation matrix There seems to be no agreement on which is best but the first is a slight bit easier to program Iterated Principal Factor Factor Analysis This begins the same way as the principal factor method but you use the fitted model to get better estimates of the communalities and then you keep repeating the process This method will often fail to fit because of the Heywood case estimated communalities gt 1 negative error variances though Maximum Likelihood Factor Analysis 7 Requires the assumption of multivariate normality and is difficult to program It does allow for various tests of hypotheses however Other Methods 7 Alpha Factoring Image Factoring Harris Factoring Rao39s Canonical Factoring Unweighted Least Squares I would recommend using the iterated principal factor solution if it does not have impossible communalities or error variances If the iterated method fails then I would use the noniterated principal factor method if the error covariances do not seem unreasonably large If that too fails you either need more factors or the factor analytic model is possibly inappropriate for your data set Rotation It is often considered best to use an orthogonal rotation because then all of the mathematical representations we39ve used so far will work Two of the most popular are Varimax 7 Maximizes the sum of the variances of the squared factor loadings within the columns This tends to force each variable to load highly on as few factors as possible Ideally it will cause each variable to load on only one factor and thus point out good summed scores that could be made If there is an overall underlying general factor that is working on all of the variables this rotation will not find it e g The math test has geometry trigonometry and algebra but does it also have an overall general math ability too Qnartimax 7 Whereas varimax focuses on the columns quartimax focuses on the rows Sharma 1996 pg 120 reports that it tends to have all of the variables load highly on one factor and then each variable will tend to load highly on one other factor and near zero on the rest unlike varimax this method will tend to keep the general factor It sometimes doesn t seem to differ much from the initial solution There are also oblique rotations that don t keep the independence between the factors Popular ones include Promax Oblimin and Orthoblique Interpret these with caution Interpretation Once you have your factor loadings matrix it is necessary to try and interpret the factors It is common to indicate which of the loadings are actually signi cant by underlining or circling them and possibly erasing the nonsignificant ones Signi cant is measured in two ways Practical Signi cance Are the factor loadings large enough so that the factors actually have a meaningful effect on the variables e g Roughly speaking if two things only have a correlation of 02 it means that the one only explains 4 of the variation in the other Hair etal 1998 pg 111 recommends the following guidelines for practical signifrcance i03 Minimal i04 More Important i05 Practically Signifrcant Statistical Signi cance We also want the loading to be statistically significantly different from zero eg Ifthe loading is 03 but the confidence interval for it is 702 to 08 do we care Stevens 2003 pg 294 reports the following rules of thumb based on sample size In 50 100 200 300 600 1000 lloading 0722 0512 0384 0298 0210 0162 F nrther Guidance 7 Johnson 1996 pg 156 recommends not including factors with only one significant loading Hatcher 1994 pg 73 reports that we want at least 3 variables loading on each factor and preferably more This is also related to a result discussed earlier in the section on sample size Finally it seems reasonable if you are using varimax to remove variables which load heavily on more than one factor Factor Scores Exploratory factor analysis helps us to gain an understanding of the processes underlying our variables We might also want to come up with estimates for how each of the observations rates on these unobservable latent factors Surrogate Variable 7 Choose a single variable that loads highly on that factor and use its value It is very simple to calculate and simple to explain too simple Estimate Factor Scores 7 Use a statistical method to actually estimate the values of the for each observation Mardia etal 1979 pg 274 reports that if multivariate normality holds then Bartlett39s weighted least squares method is unbiased and Thompson s regression method will be biased but have a smaller overall error Factor scores however are difficult to interpret what does it mean to take a complicated weighted average of the observed values and the exact values estimated depend on the sample Snmmed Scores 7 If each variable loads on only a single factor then make subscores for each factor by summing up all of the variables that load on that factor This is the compromise solution and is often the one used in practice when designing and analyzing questionnaires Some Final Steps Johnson and Wichem 2002 pg 517 suggest the following to see if your solution seems reasonable plot the factor scores against each other to look for suspicious observations and for large data sets split them in half and perform factor analysis on each half to see if the solution is stable
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'