Statistics and Research Design
Statistics and Research Design PSYC 6430
Popular in Course
Popular in Psychlogy
verified elite notetaker
This 13 page Class Notes was uploaded by Lane Schuster on Sunday October 11, 2015. The Class Notes belongs to PSYC 6430 at East Carolina University taught by Karl Wuensch in Fall. Since its upload, it has received 24 views. For similar materials see /class/221341/psyc-6430-east-carolina-university in Psychlogy at East Carolina University.
Reviews for Statistics and Research Design
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/11/15
Bivariate Linear Correlation One way to describe the association between two variables is to assume that the value of the one variable is a linear function of the value ofthe other variable lfthis relationship is perfect then it can be described by the slopeintercept equation for a straight line Y a bX Even if the relationship is not perfect one may be able to describe it as nonperfect linear Distinction Between Correlation and Regression Correlation and regression are very closely related topics Technically if the X variable often called the independent variable even in nonexperimental research is fixed that is if it includes all of the values ofX to which the researcher wants to generalize the results and the probability distribution ofthe values ofX matches that in the population of interest then the analysis is a regression analysis If both the X and the Y variable often called the dependent variable even in nonexperimental research are random free to vary were the research repeated different values and sample probability distributions ofX and Y would be obtained then the analysis is a correlation analysis For example suppose I decide to study the correlation between dose of alcohol X and reaction time If I arbitrarily decide to use as values ofX doses ofO 12 and 3 ounces of 190 proofgrain alcohol and restrict X to those values and have the equal numbers of subjects at each level of X then I ve fixed X and do a regression analysis lfl allow X to vary randomly for example I recruit subjects from a local bar measure their blood alcohol X and then test their reaction time then a correlation analysis is appropriate In actual practice when one is using linear models to develop a way to predict Y given X the typical behavioral researcher is likely to say she is doing regression analysis If she is using linear models to measure the degree of association between X and Y she says she is doing correlation analysis Scatter Plots One way to describe a bivariate association is to prepare a scatter plot a plot of all the known paired XY values dots in Cartesian space X is traditionally plotted on the horizontal dimension the abscissa and Y on the vertical the ordinate If all the dots fall on a straight line with a positive slope the relationship is perfect positive linear Every time X goes up one unit Y goes up b units If all dots fall on a negatively sloped line the relationship is perfect negative linear Copyright 2009 Karl L Wuensch All rights reserved Corr6430doc Perfect Positive Linear Perfect Negative Linear 12 10 10 8 g Y 539 Y 5 4 4 2 2 0 I I I o N w 3 II a o N u 3 0 a A linear relationship is monotonic ofone direction that is the slope ofthe line relating Y to X is either always positive or always negative A monotonic relationship can however be nonlinear if the slope ofthe line changes magnitude but not direction as in the plots below Perfect Positive Monotonic Perfect Negative Monotonic 12 1o 1o s s Y 6 Y 6 4 4 2 2 o o o 1 2 3 4 5 e o 1 2 3 4 5 e X X Nonmonotonic Relationship A nonlinear relationship may however not be 6 monotonic as shown to the right where we have a quadratic relationship between level of test anxiety and performance on a complex cognitive task Performance UI 0 I I I I I O 1 2 3 4 5 Test Anxiety Of course with real data the dots are not likely all to fall on any one simple line but may be approximately described by a simple line We shall learn how to compute correlation coefficients that describe how well a straight line ts the data lfyour plot shows that the line that relates X and Y is linear you should use the Pearson correlation coef cient discussed below lfthe plot shows that the relationship is monotonic not a straight line but a line whose slope is always positive or always negative you can use the Spearman correlation coefficient discussed below If your plot shows that the relationship is curvilinear but not monotonic you need advanced techniques such as polynomial regression not covered in this class Let us imagine that variable X is the number of hamburgers consumed at a cook out and variable Y is the number of beers consumed We wish to measure the relationship between these two variables and develop a regression equation that will enable us to predict how many beers a person will consume given that we know how many burgers that person will consume A Scatter Plot of Our Data Subject Burgers X Beers Y XY 12 5 8 4O 1039 0 2 4 10 40 e 839 39 g 6 o 3 3 4 12 m 4 4 2 6 12 2 5 1 2 2 0 39 o 1 2 3 4 Sum 15 30 106 Burgers Mean 3 6 St Dev 1581 3162 5 431622 40 Covariance One way to measure the linear association between two variables is covariance an extension of the unidimensional concept of variance into two dimensions The Sum of S uares Cross Products 2 2 X Y SSCP ZX XY YZXY 106 16 lf most of the dots in the scatter plot are in the lower left and upper right quadrants most ofthe crossproducts will be positive so SSCP will be positive as X goes up so does Y lfmost are in the upper left and lower right SSCP will be negative as X goes up Y goes down Just as variance is an average sum of squares 8 N or to estimate population variance from sample data 88 N1 covariance is an average SSCP SSCP N We shall compute covariance as an estimate ofthat in the oulation from which our data were randomly sampled That is COV 4 A major problem with COV is that it is affected not only by degree of linear relationship between X and Y but also by the standard deviations in X and in Y In fact the maximum absolute value of COVXY is the product oXoy Imagine that you and I each measured the height and weight of individuals in our class and then computed the covariance between height and weight You use inches and pounds but I use miles and tons Your numbers would be much larger than mine so your covariance would be largerthan mine but the strength of the relationship between height and weight should be the same for both ofour data sets We need to standardize the unit of measure of our variables Pearson r We can get a standardized index ofthe degree of linear association by dividing COV by the two standard deviations removing the effect ofthe two univariate standard deviations This index is called the Pearson roduct moment correlation coefficient rfor short and is de ned as r 0 Pearson r may also sxsy 1 5813162 2ny be defined as a mean r Z N where the Z scores are computed using population standard deviations E n Pearson r may also be computed as 16 16 r 2 yssxss z 415812431622 W Pearson rwill vary from 1 to O to 1 lfr 1 the relationship is perfect positive and every pair of XY scores has ZX Zy If r 0 there is no linear relationship lfr 1 the relationship is perfect negative and every pair of XY scores has ZX Zy Factors Which Can Affect the Size of r Range restrictions Ifthe range ofX is restricted rwill usually fall it can rise if X and Y are related in a curvilinear fashion and a linear correlation coefficient has inappropriately been used This is very important when interpreting criterionrelated validity studies such as one correlating entrance exam scores with grades after entrance Extraneous variance Anything causing variance in Y but not in X will tend to reduce the correlation between X and Y For example with a homogeneous set of subjects all run under highly controlled conditions the r between alcohol intake and reaction time might be 095 but if subjects were very heterogeneous and testing conditions variable r might be only 050 Alcohol might still have just as strong an effect on reaction time but the effects of many other extraneous variables such as sex age health time of day day of week etc upon reaction time would dilute the apparent effect of alcohol as measured by r Interactions It is also possible that the extraneous variables might interact with X in determining Y That is X might have one effect on Y ifZ 1 and a different effect ifZ 2 For example among experienced drinkers Z 1 alcohol might affect reaction time less than among novice drinkers Z 2 If such an interaction is not taken into account by the statistical analysis a topic beyond the scope of this course the rwill likely be smaller than it otherwise would be Assumptions of Correlation Analysis There are no assumptions ifyou are simply using the correlation coef cient to describe the strength of linear association between X and Y in your sample If however you wish to use tor F to test hypothesis about p or place a con dence interval about your estimate of p there are assumptions Bivariate Normality It is assumed that the joint distribution ofXY is bivariate normal To see what such a distribution look like try the Java Applet at httpucskuleuvenbe avaversion20Applet030html Use the controls to change various parameters and rotate the plot in threedimensional space In a bivariate normal distribution the following will be true The marginal distribution on ignoring X will be normal The marginal distribution ofX ignoring Y will be normal Every conditional distribution onX will be normal AQNA Every conditional distribution ofXY will be normal Homoscedasticity 1 The variance in the conditional distributions of YX is constant across values of X 2 The variance in the conditional distributions of XY is constant across values of Y Testing Hg p 0 If we have XY data sampled randomly from some bivariate population of interest we may wish to test HQ p 0 the null hypothesis that the population correlation coef cient rho is zero X and Y are independent ofone another there is no linear association between X and Y This is quite simply done with Student s t rxn 2 8 1r2 1 64 You should rememberthat we used this formula earlier to demonstrate that the independent samples ttest is just a special case of a correlation analysis if one of the variables is dichotomous and the other continuous computing the point biserial rand testing its significance is absolutely equivalent to conducting an independent samples t test Keep this in mind when someone tells you that you can make causal inferences from the results ofa ttest but not from the results of a correlation analysis the two are mathematically identical so it does not matter which analysis you did What does matter is how the data were collected lfthey were collected in an experimental manner manipulating the independent variable with adequate control of extraneous variables you can make a causal inference lfthey were gathered in a nonexperimental manner you cannot t 2309 With df N 2 Putting a Confidence Interval on R or R2 It is a good idea to place a con dence interval around the sample value of r or r2 but it is tedious to compute by hand Fortunately there is now available a free program for constructing such con dence intervals Please read my document Putting Con dence Intervals on R2 or R For our beer and burger data a 95 confidence interval for rextends from 28 to 99 APAStyle Summary Statement For our beer and burger data our APA summary statement could read like this The correlation between my friends burger consumption and their beer consumption fell short of statistical significance rn 5 8 p 10 A 95 confidence interval for rextends from 28 to 99 For some strange reason the value of the computed tis not generally given when reporting a test of the signi cance of a correlation coef cient You might want to warn your readers that a Type II error is quite likely here given the small sample size Were the result signi cant your summary statement might read something like this Among my friends burger consumption was significantly related to beer consumption Power Analysis Power analysis for r is exceptionally simple Viv 1 assuming that dfare large enough for tto be approximately normal Cohen s benchmarks for effect sizes for rare 10 is small but not necessarily trivial 30 is medium and 50 is large Cohen J A Power Primer Psychological Bulletin 1992 112 155159 For our burgerbeer data how much power would we have ifthe effect size was large in the population that is p 50 5 5JZ 100 From our powertable using the traditional 05 criterion of significance we then see that power is 17 As stated earlier a Type II error is quite likely here How many subjects would we need to have 2 95 power to detect even a small effect Lots n 1 1297 That is a lot of p burgers and beer See the document R2 Power Anaysis Correcting for Measurement Error in Bivariate Linear Correlations The following draws upon the material presented in the article Schmidt F L amp Hunter J E 1996 Measurement error in psychological research Lessons from 26 research scenarios Psychological Methods 1 199223 When one is using observed variables to estimate the correlation between the underlying constructs which these observed variables measure one should correct the correlation between the observed variables for attenuation due to measurement error Such a correction will give you an estimate of what the correlation is between the two constructs underlying variables that is what the correlation would be if we able to measure the two constructs without measurement error Measurement error results in less than perfect values forthe reliability of an instrument To correct forthe attenuation resulting from such lack of perfect reliability one can apply the following correction rXY rX Y r r XX yy rm is our estimate for the correlation between the constructs corrected for where attenuation M is the observed correlation between X and Y in our sample rXX is the reliability ofvariable X and ryy is the reliability ofvariable Y Here is an example from my own research I obtained the correlation between misanthropy and attitude towards animals for two groups idealists for whom I predicted there would be only a weak correlation and nonidealists for whom I predicted a stronger correlation The observed correlation was 02 for the idealists 36 forthe idealists The reliability Cronbach alpha was 91 for the attitude towards animals instrument which had 28 items but only 66 for the misanthropy instrument not surprising given that it had only 5 items When we correct the observed correlation forthe nonidealists we obtain rm 46 a much 3 16691 more impressive correlation When we correct the correlation for the idealists the corrected r is only 03 I should add that Cronbach39s alpha underestimates a test39s reliability so this correction is an overcorrection It is preferable to use maximized lamba4 as the estimate of reliability Using labmda4 estimates of reliability the corrected r is r 39 X Y 7893 Testing Other Hypotheses Hz p1 p2 One may also test the null hypothesis that the correlation between X and Y in one population is the same as the correlation between X and Y in another population See our textbook for the statistical procedures One interesting and controversial application ofthis test is testing the null hypothesis that the correlation between IQ and Grades in school is the same for Blacks as it is for Whites Poteat Wuensch and Gregg 1988 Journal of School Psychology 26 5968 were not able to reject that null hypothesis H21 wa Pwv If you wish to compare the correlation between one pair ofvariables with that between a second overlapping pair ofvariables for example when comparing the correlation between one IQ test and grades with the correlation between a second IQ test and grades use V lliams procedure explained in our textbook or use Hotelling s more traditional solution available from Wuensch and elsewhere It is assumed that the correlations for both pairs of variables have been computed on the same set of subjects Should you get seriously interested in this sort of analysis consult this reference Meng Rosenthal amp Rubin 1992 Comparing correlated correlation coef cients Psychological Bulletin 111 172175 Hz PWX PYz If you wish to compare the correlation between one pair ofvariables with that between a second nonoverlapping pair of variables read the article by T E Raghunathan R Rosenthal and D B Rubin Comparing correlated but nonoverlapping correlations Psychological Methods 1996 1 178183 Hg p nonzero value Our textbook also shows how to test the null hypothesis that a correlation has a particular value not necessarily zero and how to place con dence limits on our estimation ofa correlation coef cient For example we might wish to test the null hypothesis that in grad school the r between IQ and Grades is 05 the value most often reported for this correlation in primary and secondary schools and then put 95 con dence limits on our estimation ofthe population p Shrunken r2 Please note that these procedures require the same assumptions made for testing the null hypothesis that the p is zero There are however no assumptions necessary to use ras a descriptive statistic to describe the strength of linear association between X and Y in the data you have For a relatively unbiased estimate of population r2 requiring no assumptions compute the shrunken r2 11 r2n 111 644I52 n 2 3 This corrects for the tendency to get overestimates ofp from small samples What is the value of r if n 2 How well can you fit any two points in Cartesian space with a straight line See my document What is R2 When N p 1 and df 0 for the answer to this question Spearman rho When one s data are ranks one may compute the Spearman correlation for ranked data also called the Spearman p which is computed and signi cancetested exactly as is Pearson rif n lt 10 nd a special table for testing the signi cance of the Spearman p The Spearman p measures the linear association between pairs of ranks lfone s data are not ranks but e converts the raw data into ranks prior to computing the correlation coefficient the Spearman measures the degree of monotonicity between the original variables If every time X goes up Y goes up the slope of the line relating X to Y is always positive there is a perfect positive monotonic 10 relationship but not necessarily a perfect linear relationship for which the slope would have to be constant Consider the following data X 10 19 20 29 30 31 40 41 5 Y 10 99 100 999 1000 1001 10000 10001 100000 You should run the program Spearmansas on my SAS Programs web page It takes these data and transforms them into ranks and then prints out the new data The rst page of output shows the original data the ranked data and also the Y variable after a base 10 log transformation A plot ofthe raw data shows a monotonic but distinctly nonlinear relationship A plot ofX by the log on shows a nearly perfect linear relationship A plot of the ranks show a perfect relationship PROC CORR is then used to compute Pearson Spearman and Kendall tau correlation coef cients How Do Behavioral Scientists Use Correlation Analyses 1 to measure the linear association between two variables without establishing any causeeffect relationship 2 as a necessary and suggestive but not sufficient condition to establish causality lfchanging X causes Y to change then X and Y must be correlated but the correlation is not necessarily linear X and Y may however be correlated without X causing Y It may be that Y causes X Maybe increasing Z causes increases in both X and Y producing a correlation between X and Y with no causeeffect relationship between X and Y For example smoking cigarettes is well known to be correlated with health problems in humans but we cannot do experimental research on the effect of smoking upon humans health Experimental research with rats has shown a causal relationship but we are not rats One alternative explanation of the correlation between smoking and health problems in humans is that there is a third variable or constellation of variables genetic disposition or personality that is causally related to both smoking and development of health problems That is if you have this disposition it causes you to smoke and it causes you to have health problems creating a spurious correlation between smoking and health problems but the disposition that caused the smoking would have caused the health problems whether or not you smoked No I do not believe this model but the data on humans cannot rule it out As another example of a third variable problem consider the strike by PATCO the union ofair traf c controllers back during the Reagan years The union cited statistics that air traf c controllers had much higher than normal incidence of stress related illnesses hypertension heart attacks drug abuse suicide divorce etc They said that this was caused by the stress ofthe job and demanded better benefits to deal with the stress no mandatory overtime rotation between high stress and low stress job positions etc The government crushed the strike red all controllers invoking a third variable explanation ofthe observed correlation between working in air traf c control and these illnesses They said that the air traf c controller profession attracted persons 11 of a certain disposition Type A individuals who are perfectionists who seem always to be under time pressure and these individuals would get those illnesses whetherthey worked in air traffic or not Accordingly the government said the problem was the fault of the individuals not the job Maybe the government would preferthat we hire only Type B controllers folks who take it easy and don t get so upset when they see two blips converging on the radar screen 3 to establish an instrument s reliability a reliable instrument is one which will produce about the same measurements when the same objects are measured repeatedly in which case the scores at one time should be well correlated with the scores at another time and have equivalent means and variances as well 4 to establish an instruments criterionrelated validity a valid instrument is one which measures what it says it measures One way to establish such validity is to show that there is a strong positive correlation between scores on the instrument and an independent measure ofthe attribute being measured For example the Scholastic Aptitude Test was designed to measure individuals ability to do well in college Showing that scores on this test are well correlated with grades in college establishes the tests validity 5 to do independent groups ttests if the X variable groups is coded 01 or any other two numbers and we obtain the rbetween X and Y a signi cancetest of the hypothesis that p 0 will yield exactly the same tand p as the traditional pooled variances independent groups t test In other words the independent groups t test is just a special case of correlation analysis where the X variable is dichotomous and the Y variable is normally distributed The r is called a pointbiserial r It can also be shown that the 2 X 2 Pearson Chisquare test is a special case of r When both X and Y are dichotomous the ris called phi g j 6 One can measure the correlation between Y and an optimally weighted set of two or more X s Such a correlation is called a multiple correlation A model with multiple predictors might well predict a criterion variable better than would a model with just a single predictor variable Considerthe research reported by McCammon Golden and Wuensch in the Journal of Research in Science Education 1988 25 501 51 0 Subjects were students in freshman and sophomore level Physics courses only those courses that were designed for science majors no general education ltfootball physicsgt courses The mission was to develop a model to predict performance in the course The predictor variables were CT the WatsonGlaser Critical Thinking Appraisal PMA Thurstone s Primary Mental Abilities Test ARI the College Entrance Exam Board s Arithmetic Skills Test ALG the College Entrance Exam Board s Elementary Algebra Skills Test and ANX the Mathematics Anxiety Rating Scale The criterion variable was subjects scores on course examinations Our results indicated that we could predict performance in the physics classes much better with a combination of these predictors than with just any one ofthem At Susan McCammon s insistence I also separately analyzed the data from female and male students Much to my surprise I found a remarkable sex difference Among female students every one of the predictors was signi cantly related to the criterion among male students none ofthe predictors was A posteriori searching of the literature revealed that Anastasi Psychological Testing 1982 had noted a relatively consistent nding of sex differences in the predictability of academic grades possibly due to women being more conforming and more accepting of academic standards better students so that women put maximal effort into their studies whether or not they like the course and according they work up to their potential Men on the other hand may be more ckle putting forth maximum effort only if they like the course thus making it dif cult to predict their performance solely from measures ofability ANOVA which we shall cover later can be shown to be a special case of multiple correlationregression analysis 7 One can measure the correlation between an optimally weighted set of Y s and an optimally weighted set ofX s Such an analysis is called canonical correlation and almost all inferential statistics in common use can be shown to be special cases of canonical correlation analysis As an example ofa canonical correlation consider the research reported by Patel Long McCammon amp Wuensch Journal of Interpersonal Violence 1995 10 354366 1994 We had two sets of data on a group of male college students The one set was personality variables from the MMPI One of these was the PD psychopathically deviant scale Scale 4 on which high scores are associated with general social maladjustment and hostility The second was the MF masculinityfemininity scale Scale 5 on which low scores are associated with stereotypical masculinity The third was the MA hypomania scale Scale 9 on which high scores are associated with overactivity flight of ideas low frustration tolerance narcissism irritability restlessness hostility and difficulty with controlling impulses The fourth MMPI variable was Scale K which is a validity scale on which high scores indicate that the subject is clinically defensive attempting to present himself in a favorable light and low scores indicate that the subject is unusually frank The second set of variables was a pair of homonegativity variables One was the IAH Index of Attitudes Towards Homosexuals designed to measure affective components of homophobia The second was the SBS SelfReport of Behavior Scale designed to measure past aggressive behavior towards homosexuals an instrument specifically developed for this study Our results indicated that high scores on the SBS and the IAH were associated with stereotypical masculinity low Scale 5 frankness low Scale K impulsivity high Scale 9 and general social maladjustment and hostility high Scale 4 A second relationship found showed that having a low IAH but high SBS not being homophobic but nevertheless aggressing against gays was associated with being high on Scales 5 not being stereotypically masculine and 9 impulsivity This relationship seems to re ect a general not directed towards homosexuals aggressiveness in the words of one of my graduate students being an equal opportunity bully Links all recommended reading in other words know it for the test Biserial and Polychoric Correlation Coefficients Comparing Correlation Coef cients Slopes and Intercepts Con dence Intervals on R2 or R Contingency Tables with Ordinal Variables Correlation and Causation Cronbach s Alpha and Maximized Lambda4 InterRater Agreement m Residuals Plots how to make them and interpret them Tetrachoric Correlation what it is and how to compute it Copyright 2009 Karl L Wuensch All rights reserved
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'