### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Educational Statistics EDUR 8131

GSU

GPA 3.86

### View Full Document

## 45

## 0

## Popular in Course

## Popular in Educational Research And Measurement

This 29 page Class Notes was uploaded by Carli Abbott on Monday October 12, 2015. The Class Notes belongs to EDUR 8131 at Georgia Southern University taught by Bryan Griffin in Fall. Since its upload, it has received 45 views. For similar materials see /class/222020/edur-8131-georgia-southern-university in Educational Research And Measurement at Georgia Southern University.

## Similar to EDUR 8131 at GSU

## Reviews for Educational Statistics

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/12/15

Notes 7 Chi Square tests x2 tests The Ztest ttest and Pearson39s r all assume that at least one of the variables usually the dependent variable is measured on the interval scale When variables of interest are nominal or categorical these statistical tests oftentimes are inappropriate but the chisquare may prove useful Two types of chi square tests exist goodnessoffit and the contingency table chisquare or test of association 1 GoodnessofFit one nominal or categorical variable present With the goodnessof t test one is usually interested in determining whether a given distribution of data follows an expected pattern For example suppose one wishes to know whether the distribution of births throughout the year is random with equal probabilities or frequencies during the year 1a Hypotheses The null hypothesis states that the distribution of births throughout the year is random and has equal frequencies ie H0 frequency of births equally distributed throughout year or H0 f1f2f3f6 or H0 distributionpop distributionmeory and the alternative hypothesis is H1 not H0 births not equally distributed throughout year or H1 the frequencies f1 f2 f6 are not all equal or H1 distribution M y distributionmory 1b Example data Assume the researcher obtained information about frequencies of birth from the local hospital The following frequencies were observed JanFeb March May July Sept Nov April June August October Dec observed 71 78 83 94 112 112 Total births 552 Does this distribution seem likely if indeed births are equally likely throughout the year that is does the null hypothesis of equality of births throughout the year seem tenable The chisquare goodnessof t test can be used to assess this possibility JanFeb March May July Sept Nov April June August October Dec observed 71 78 83 94 112 112 expected 92 92 92 92 92 92 Total births 552 expected 5526 92 1c Calculating x2 chisquare The chisquare goodnessof t statistic to test HO can be calculated using the following formula 0 E2 17 2 E J rLZ J J J xii The formula on the far right is presented by some authors and is based upon proportions The formula on the left is presented in most other textbooks and is based upon frequencies The formula on the left is the one to be used for the remainder of this section The chisquare goodnessof t formula can be explained as follows 1j the unique cells or categories in the table of frequencies 2 O the observed frequency in cellj 3 E the expected frequency in cell j 4 Z a summation signiadd up all squared terms once division has occurred The expected frequencies E are determined by theory In the example above it was expected that the frequencies of births would be equally distributed across the year Since 552 births were observed during the year this means that 5526 92 births are expected every two months If for example one wanted to nd the expected frequency for every month the expected frequency would be 55212 46 The value of X2 is obtained as follows Version 2142005 2 2 2 2 2 2 471792 78792 83792 94792 112792 112792 92 92 92 92 92 92 441 196 81 4 400 400 77 77 92 92 92 92 92 92 4793 2130 880 043 4348 4348 16542 The X2 distributions are a positively skewed b have a minimum of zero and c have just one parameter which is their degree of freedom df 1d Degrees of freedom The df for goodnessof t chisquares is de ned as df or v J l where J is the number of categories present Since there were six categories in the example data there are df 6 1 5 ve df e Testing H0 To statistically test the tenability of the null hypothesis one must determine whether the calculated value of X2 exceeds what would be expected by chance given that H0 is true ie does the calculated X2 exceed the critical value of X2 The critical X2 or cmxz can be found in Table 1 below Version 2142005 Table 1 Upper Critical Values for Chisquare en02 on on df 010 005 001 df 010 005 001 1 2706 3841 6635 51 64295 68669 77386 2 4605 5991 9210 52 65422 69832 78616 3 6251 7815 11345 53 66548 70993 79843 4 7779 9488 13277 54 67673 72153 81069 5 9236 11070 15086 55 68796 73311 82292 6 10645 12592 16812 56 69919 74468 83513 7 12017 14067 18475 57 71040 75624 84733 8 13362 15507 20090 58 72160 76778 85950 9 14684 16919 21666 59 73279 77931 87166 10 15987 18307 23209 60 74397 79082 88379 11 17275 19675 24725 61 75514 80232 89591 12 18549 21026 26217 62 76630 81381 90802 13 19812 22362 27688 63 77745 82529 92010 14 21064 23685 29141 64 78860 83675 93217 15 22307 24996 30578 65 79973 84821 94422 16 23542 26296 32000 66 81085 85965 95626 17 24769 27587 33409 67 82197 87108 96828 18 25989 28869 34805 68 83308 88250 98028 19 27204 30144 36191 69 84418 89391 99228 20 28412 31410 37566 70 85527 90531 100425 21 29615 32671 38932 71 86635 91670 101621 22 30813 33924 40289 72 87743 92808 102816 23 32007 35172 41638 73 88850 93945 104010 24 33196 36415 42980 74 89956 95081 105202 25 34382 37652 44314 75 91061 96217 106393 26 35563 38885 45642 76 92166 97351 107583 27 36741 40113 46963 77 93270 98484 108771 28 37916 41337 48278 78 94374 99617 109958 29 39087 42557 49588 79 95476 100749 111144 30 40256 43773 50892 80 96578 101879 112329 31 41422 44985 52191 81 97680 103010 113512 32 42585 46194 53486 82 98780 104139 114695 33 43745 47400 54776 83 99880 105267 115876 34 44903 48602 56061 84 100980 106395 117057 35 46059 49802 57342 85 102079 107522 118236 36 47212 50998 58619 86 103177 108648 119414 37 48363 52192 59893 87 104275 109773 120591 38 49513 53384 61162 88 105372 110898 121767 39 50660 54572 62428 89 106469 112022 122942 40 51805 55758 63691 90 107565 113145 124116 41 52949 56942 64950 91 108661 114268 125289 42 54090 58124 66206 92 109756 115390 126462 43 55230 59304 67459 93 110850 116511 127633 44 56369 60481 68710 94 111944 117632 128803 45 57505 61656 69957 95 113038 118752 129973 46 58641 62830 71201 96 114131 119871 131141 47 59774 64001 72443 97 115223 120990 132309 48 60907 65171 73683 98 116315 122108 133476 49 62038 66339 74919 99 117407 123225 134642 50 63167 67505 76154 100 118498 124342 135807 Version 2142005 If on 05 the critical value for the example data is cmxz 1107 To test HO simply compare the obtained X2 against the critical and if the obtained is larger then reject HO Decision Rule If x2 2 CmXZ then reject H0 otherwise FTR H0 With the current example the decision rule is If 16542 2 1107 then reject H0 otherwise FTR H0 So reject the null at alpha equal to 05 and conclude that the distribution of births does not appear to be randomly distributed ie the distribution of births seems to be higher during the fall and earlier winter months than during other times of the year If one were writing this result for a paper it would be written as follows The statistical results X2 df 5 16542 p lt 05 indicate that the distribution of births are not equally distributed throughout the year Based upon the observed frequencies it appears that the birth rate is highest for the months of September to December and lowest for the spring and summer months Exercises 1 Horseracing fans often maintain that in a race around a circular track signi cant advantages accrue to the horses in certain post positions Any horse39s post position is his assigned post in the starting lineup Position 1 is closest to the rail on the inside of the track and position 8 is on the outside farthest from the rail in an 8horse race Test whether post position is related to race results This example taken from S Siegel 1956 V ic statistics for the 39 39 39 39 sciences McGrawHill Listed below are the observed frequencies of 1st place nishers for each post position during a regular month at the tracks Post Position 1 2 3 4 5 6 7 8 No of 29 19 18 25 17 10 15 11 wins Total wins 144 a What are the null and alternative hypotheses b What are the expected frequencies if post position is unrelated to winning c What is the obtained and critical chisquare statistics and df if alpha is set at the 05 level d What is the decision rule Version 2142005 e What are the results of this test 2 The director of athletics at the local high school wonders if the sports program is getting a proportional amount of support from each of the four classes represented in the high school If there are roughly equal numbers of students in each of the classes what does the following breakdown of attendance figures from a random sample of students in attendance at a recent basketball game suggest This example taken from J F Healey 1993 Statistics Atool for social research 3rd ed Wadsworth Class Frequency Freshmen 200 Sophomores 150 Juniors 120 Seniors 1 10 Total attendance 580 a What are the null and alternative hypotheses b What are the expected frequencies for attendance by class c What is the obtained and critical chisquare statistics and df if alpha is set at the 01 level d What is the decision rule e What are the results of this test 3 Suppose someone has a hypothesis that the quotTransylvania effectquot of the full moon is related to incidence of drug overdose A search of medical les at a hospital yielded 1182 drug overdose cases which included the date The fullmoon phase was based on the actual dates of the full moon plus or minus 2 days yielding 75 fullmoon days and 381 nonfullmoon days total number of days was 456 for the period investigated The observed frequency of drug overdose was 196 during the fullmoon days and 986 during the nonfullmoon days a What are the null and alternative hypotheses b What are the expected frequencies how does one calculate their values for this example c What is the obtained and critical chisquare statistics and df if alpha is set at the 05 level d What is the decision rule e What are the results of this test Version 2142005 2 ChiSquare Test of Association or ChiSquare Test of Contingency Tables two nominal or categorical variable present With the goodnessof t test one is interested in determining whether a given distribution of data follows an expected pattern For the test of association one is interested in learning whether two or more categorical variables are related It is most typical to find two categorical variables depicted in a contingency table a crosstabulation of the frequencies for various combinations of the variables Note that contingency tables are referred to as 2by3 3by3 etc where the numerals are determined by the number of rows R and columns C in the table If for example there is a table with two rows and two columns the table is a R x C or 2 x 2 2by2 table At issue in the following research question is whether the policy of allowing college faculty to takeon outside consultation for a fee is supported uniformly between tenured and untenured faculty The data are as follows example taken from D E Hinkle et al 1979 Applied statistics for he 39 39 39 sciences Rand McNally Support Policy Do not Support Policy Tenured 88 17 84 1 1 Total 200 1a Hypotheses The null hypothesis states that there is no relationship between the two variables ie that support for the consulting policy is independent of the tenure status of the faculty or that there is no difference between tenured and nontenured faculty regarding their support of the consulting policy H0 distributiontmmd distributionnommmd or the distributions are equal or H0 variable A tenure status is independent ofvariable B policy support and the alternative hypothesis is H1 some difference in the distributions 01 H1 variables A and B are associated not independent Version 2142005 1b Determining Expected Values Expected values are determined by the column and row marginal frequencies Marginal frequencies are pointed out below Support Policy Do not Support Policy Marginal Row Tenured 88 17 88 17 105 V 39 84 11 8411 95 Marginal Column 88 84 172 17 11 28 Grand Total 172 28 T 39 200 Total 200 The following formula can be used to calculate expected frequencies for a given row r and column c eg r l and c l which corresponds to cell quotTenuredquot and quotSupport Policyquot r0wt0talcolumnct0tal Em N where EE is the expected value for row r and column c row total is the marginal frequency for row r columnc total is the marginal frequency for column c and N is the total sample size For the current example the expected values are 1 r 1 c 1 tenured and support policy 105172 18060 E 200 200 903 2 r 1 c 2 tenured and do not support policy 10528 2940 147 200 200 3 r 2 c 1 nontenured and support policy E21 95172 16340 817 200 200 4 r 2 c 2 nontenured and do not support policy 9528 2660 133 200 200 Version 2142005 Support Policy Do not Support Policy Marginal Row Tenured 88 903 17 147 105 V 39 84 817 11 133 95 Marginal Column 172 28 200 Note Expected values in parentheses 1c Calculating X2 chisquare The chisquare test of association statistic used to test H0 can be calculated using the following formula 2 0711 X 2 E The chisquare test of association formula can be explained as follows 1 re the unique cells or categories in the table of frequencies 2 O the observed frequency in cell re 3 E the expected frequency in cell re 4 Z a summation signiadd up all squared terms once division has occurred The expected frequencies E are determined in the manner demonstrated above in part b The value of X2 is obtained as follows 2 8842032 8441172 1771472 1171332 X 903 817 147 133 006 006 036 040 088 The X2 distributions are a positively skewed b have a minimum of zero and e have just one parameter which is their degree of freedom df 1d Degrees of freedom The df for association chisquares is de ned as dfor v R 1C 1 where R is the number of rows present and C is the number of columns present Since there were two rows and two columns in the example data there is Version 2142005 df2121 1 e Testing HO To statistically test the tenability of the null hypothesis one must determine whether the calculated value of X2 exceeds what would be expected by chance given that HO is true ie does the calculated X2 exceed the critical value of X2 The critical X2 or magi can be found in Table 1 above Ifoc 05 the critical value for the example data is cmxz 384 To test HO simply compare the obtained X2 against the critical and if the obtained is larger then reject Decision Rule If x2 2 CmXZ then reject H0 otherwise FTR H0 With the current example the decision rule is If 088 2 384 then reject H0 otherwise FTR H0 So fail to reject the null at alpha equal to 05 and conclude that policy support does not depend upon tenure status If one were writing this result for a paper it would be written like follows The statistical results X2 df l 088 p gt 05 indicate that one39s decision to support the policy of consultations does not appear to be associated with one39s tenure status In short support of the consulting policy is independent of the tenure status of the faculty f Computer output for the previous example l col rowl l 2 Total 1 l 88 17 l 105 2 l 84 11 l 95 Totall 172 28 l 200 Pearson chi2l 08809 Pr 0348 Version 2142005 Exercises 1 A researcher wishes to determine whether an experimental treatment RPT enhances achievement and academic selfef cacy The researcher must use two intact classes for the experiment since random assignment is not possible Good research requires that the experimental and control groups be as equivalent as possible at the start of the experiment to ensure adequate internal validity To help establish that the two classes are equivalent the researcher plans to collect IQ and ITBS scores to determine whether a statistical difference exists between the two groups on these measures In addition the researcher will try to show that the two groups also have similar racial distributions The following data are collected for the two classes Black Hispanic White Class 1 15 7 10 Class 2 12 6 17 a What are the null and alternative hypotheses b What are the expected frequencies c What is the obtained and critical chisquare statistics and df if alpha is set at the 05 level d What is the decision rule e What are the results of this test 2 Is there a relationship between high school program of study and whether the student eventually dropped out of college Some educators argue that students who study under college preparatory programs are much better prepared for college than are students who studied under general education programs or vocational education programs Listed below are dropout gures for students enrolled in a medium sized midwestem university Determine whether dropping out is related to program of study in high school High School Program of Dropped Out of College Graduated from College Study Vocational 289 323 General 3 34 4 5 6 College Prep 230 698 a What are the null and alternative hypotheses b What are the expected frequencies c What is the obtained and critical chisquare statistics and df if alpha is set at the 01 level d What is the decision rule e What are the results of this test Version 2142005 Notes 6 Correlation 1 Correlation correlation this term usually refers to the degree of relationship or association between two quantitative variables such as IQ and GPA or GPA and SAT or HEIGHT an WEIGHT etc positive relationship TT two variables vary in the same direction ie they COVg together39 as one 39 39ncreases the other variable also increases39 eg higher GPAs correspond to higher SATs and lower GPAs correspond to lower SATs39 0 negative inverse relationship Tl as one variables increases the other decreases39 eg higher GPAs correspond with lower SATs What type ofrelationship is it ifboth variables covary like W o scatterplow scattergrams graphs that illustrate the relationship between two variables each point of the scatter represents scores on two variables for one case or individual Figure 1 below shows a positive and relatively strong correlation between SAT and IQ Three points are identi ed in Figure 1 with arrows One individual scored 674 on SAT and 84 on IQ one scored 183 on SAT and 58 on IQ and another scored 342 on SAT and 113 on IQ As these three points illustrate each dot represents the combination of two variables for one individual or case Figure 1 Correlation between SAT and IQ o o AD 000 o H a U SAT674Q84 I cf 88 Ilm E I In 5D 5 0 U7 1 0 lt mg m 2 E on E 2 ED 0 N I ISAT 183 IQ I58 60 I I I 0 100 120 140 IQ Scores M 100 SD 15 Beg1n Stata commands Ignore these marks corr2data SAT IQ n500 means500 100 corr100 6321 100 5d5100 15 cstoragelower replace SAT roundSAT replace IQ roundIQ twoway scatter SAT IQ msymbolc1rc1e yt1t1eMethemet1cs SAT Scores M 500 SD 100 xtltleIQ Scores M 100 SD 15 legendsymplacementnorth scheme52mono text342 113 quot SAT 342 IQ 113quot placemente text674 84 quotSAT 58 quot SAT e End Stata commands Ignore these marks 674 IQ 84 quot placementw text183 183 IQ 58quot placement Figure 2 Three Scatterplots I I O C C O I C II D O D C I O D gt p gt In Figure 2 are three scatterplots a shows a positive relationship gure b shows a negative relationship and gure c depicts a curvilinear or nonlinear relationship 0 linear representation can a single straight line be drawn for gure a that best represents the relationship between the two variables what about b c draw the lines on the scatterplots In many cases the nature of relationship between two variables can be represented by a line that fits among the scatter Examples are presented in Figure 3 Figure 3 Scatterplots with lines representing the general trend of the relationship 2 Pearson39s r o Pearson39s r or the Product Moment Coef cient of Correlation r is a measure of the degree of linear relationship or association between two usually quantitative variables the population correlation is denoted as D Greek rho and r refers to correlation obtained from a sample calculating r three formulas for r are provided Z formula A r L Zr nil where n l is the sample size minus 1 2X are the z scores on variable X zy are the z scores on variable Y and r is the Pearson39s correlation coefficient Version 2142005 S formula B r sxsy where sX is the standard deviation of variable X sy is the standard deviation of variable y and sKy is the covariance ofvariables X and Y sKy is computed as Zoe J7Y 7 XXI 71050 KY r171 r171 formula C r quotZXY T ZXXZY JUIZXZ ZXyanYz 722 Each of these calculations are illustrated later 3 Properties of Pearson39s r o lin r can only measure linear relationships note that it is possible for curvilinear relationships to exist eg anxiety and performance range of values r is bounded by 100 and 100 a perfect positive relationship is represented by 100 a perfect negative relationship is represented by 100 the closer r is to 000 the weaker the linear relationship between the two variables 0 r 000 zero correlation indicates the weakest possible relationship that is no linear relationship between two variables an r of 000 does not rule out all possible relationships since there is the possibility of a nonlinear or curvilinear relationship no variance if either variable has zero variance s2 000 then there is no relationship between the two variables and Pearson39s r is unde ned but r 000 if a variable has a variance of 000 then it does not vary and is therefore not a variableiit is a constant change of scale r will remain the same between two variables even if the scale of one or both of the variables is changed eg converting X to z or T does not affect the value of r factors that may alter r these factors may in ate r de ate r change the sign of r or have no effect variability or restriction of range SAT GPA SAT restricted extreme scores combined data boys and girls and math scores Examples and each of these are provided below Version 2142005 Example of Range Restriction Universities frequently make use standardized tests in an effort to screen applicants Below are descriptive statistics for GRE mathematics scores and rst year graduate GPA from 500 students Note from the statistical output that the correlation is 59 correlate GRE GPA means obs500 Variable l Mean Std Dev Min Max GRE l 500 100 1830000 8000000 GPA l 3000795 8455727 2300000 4 GRE GPA GRE 10000 GPA l 05932 10000 As Figure 4 shows there is a positive relation between GRE and GPA Note the ceiling effect of GPA with a number of students earning 400 during their rst year Figure 4 Scatterplot for GRE and GPA v m II on GPA 2 l l l l 200 400 600 800 GRE What would happen if students with GRE mathematics scores of only 450 or better are admitted How might that change the scatterplot and correlation These data used to generate descriptive statistics and scatterplot above are used again below but this time restricting observations to only students with a score of 450 or better on GRE mathematics subsection As results presented in the descriptive statistics and scatterplot show the correlation between GRE and GPA is reduced as a result of the range restriction placed on GPA from 59 to 41 Range restriction in a situation like this falsely implies that GRE scores provide less predictive power than is actually the case Version 2142005 correlate GRE GPA if GREgt450 means Variable 1 Mean Std Dev Min Max GRE 1 5498239 7244123 4500053 8189229 GPA 1 3277684 6831325 564881 4 1 GRE GPA GRE 1 10000 GPA 1 04122 10000 Figure 5 Scatterplot for GRE and GPA with only GRE scores 450 or better v mmwunmnglmmm minimn00 0 gmnmmm m 00 m 0 w n 0 m 111239 Mum s quot o 0 0 U I 1 010quotquot db 0 a i1 000 h EN quot 0 M m 0 9 o 000 am 0 a 0 0 o O 1 1 1 1 1 400 500 600 700 800 GRE Example of Extreme Score Normally one would think that there should be no relationship between a male s height and his SAT verbal score However the data below from 25 men show a moderate correlation of r 31 height measured in feet and SAT verbal scores How can this be The scatterplot Figure 6 provides the answer Note in Figure 6 the extreme score the outlier showing one individual with a height of over 9 feet This is a data entry error correlate SAT Height means Variable 1 Mean Std Dev Min Max SAT 1 500 100 3115515 6709754 Height 1 5988138 8350335 5026478 92 SAT Height SAT Height 10000 03138 10000 Version 2142005 Figure 6 Correlation between Height and SAT verbal scores note extreme score outlier to right of scatterplot o o h 0 a o 8 a 0 4 o 0 n o m u H n o lt w o a o o g m a v a n n o u o m w w w w w 5 6 7 Height Removing the data entry error 9 foot tall man provides a correction to the correlation estimate as the correlation statistic below reveals The correlation drops from an unlikely 31 to one closer to 00 at 05 correlate SAT Height if Heightlt8 means Variable 1 Mean Std Dev Min Max SAT 1 492876 9545074 3115515 6298821 Height 1 5854311 5102792 5026478 6934643 SAT Height SAT Height 10000 00508 10000 Example of Combined Data To be added 4 Alternative Correlations Pearson39s r assumes both variables are continuous variables with an interval or ratio scale of measurement Alternative measures of association exist which do not make this assumption Spearman Rank Correlation rmnks appropriate for two ordinal variables which are converted to ranks once variables are converte to ranks simply apply the Pearson39s r formula to the ranks to obtain rmks if untied ranks exist ie no ties exist then the following formula will simplify calculation of franks 622D2 rm 5 17 k nn2 71 Version 2142005 where D refers to the difference between the ranks on the two variables Phi Coefficient g0 or rw appropriate for two dichotomous variables a nominal variable with only two categories is referred to as dichotomous one may apply Pearson39s r to the two dichotomous variables to obtain rw although simplified formulas exist PointBiserial Coefficient rpb this correlation is appropriate when one variable is dichotomous and er is continuous with either an interval or ratio scale of measurement as with the other two rph is simply Pearson39s r although a simplified formula exists Figure 7 Scatterplots with Correlations V ersion 2142005 Summary of Correlation Coef cients Correlation Coefficient Variable X Variable Y Pearson39s r r interval ratio interval ratio Spearman s Rank rmks ordinal ranked ordinal ranked Phi rq dichotomous dichotomous PointBiserial rpb dichotomous interval ratio Note ranked refers to ordering the original data from highest to lowest and then assigning ordinal ranks eg l 2 3 5 Proportional Reduction in Error 1PRE Predictable Variance Variance Explained o interpretation of r the coefficient r indicates direction of linear association and to a lesser extent strength of association the closer the value r to 100 or l00 the stronger the relationship squaring r 12 this value provides a more interpretable quantityithe proportional reduction in error or the proportion of predictable variance accounted for or the variance overlap between two variables extent of covariation in percentage terms simply put 12 is a measure of the amount of variability in proportions that overlaps between two variables for pictorial description see Venn diagrams r2 is a measure of the strength of the relationship between two variables the larger 12 the stronger the relationship 12 is sometimes called the strength of association or the coefficient of determination r2 ranges from 000 to 100 the closer to 100 the stronger the relationship Viewing r2 as the Proportional Reduction in Error PRE is perhaps easiest to understand Assume that one is trying to predict freshmen college GPA Suppose one knows from previous years that freshmen GPA ranges from 200 to 400 for students in a given university with a mean of 300 Without know anything else about students one s best prediction for the likely values of GPA for a group of freshmen is the mean of 300 with a range of 200 to 400 If the mean for a given group of students is 250 then our predicted mean of 300 is in error If the mean for another group of students is 31 then our predicted mean of 300 is again in error Is there any way to reduce the amount of error one has in making predictions for GPA The answer is yes if one has access to additional information about each student Now suppose additional information about each student is available such as their high school class rank and their rank based upon SAT scores By using this information it is possible to reduce errors in prediction While correlation coefficients are not designed to provide prediction equations regression is used for that purpose squared correlation coefficients can provide information about the extent to which prediction error will be minimized The squared correlation coefficient 12 indicates the proportional reduction in error that will result for knowing additional information such as SAT rank 12 P quot 39 Reduction in Error Illustrated Below is a correlation matrix showing college GPA correlates 116 with High School Class Rank HSirank and 697 with a student s SAT score rank SATirank Since both correlations are positive they indicate that as rank either HS or SAT increases GPA also increases Version 2142005 correlate GPA HSirank SATirank means Variable Mean Std Dev Min Max GPA 1 31 3 2162229 4016737 Hsirank 1 50 2893181 0 100 SATirank 1 50 2893181 0 100 1 GPA HSirank SATirank GPA 1 10000 Hsirank 1 01166 10000 SATirank 1 06970 02829 10000 If one created a regression equation using highs school class rank the error in predicting college GPA would be reduced by 12 1162 013 or 13 If however one were to use SAT score rank the PRE proportional reduction in error would be 12 6972 486 or 486 a big improvement over using just HS rank This reduction in error is loosely illustrated in Figures 8 and 9 Figure 8 shows the relationship between SAT rank and college GPA while Figure 9 shows the scatterplot for HS class rank and college GPA Both scatterplots have a gray band behind the scatter This gray band represents a interval of predicted values Note that for students with an SAT Rank of 000 the gray band ranges from a GPA of 220 to about 330 while for students with a HS Class Rank equal to 000 the range of predicted GPA falls between about 225 to 380 Note that the band of predicted values is tighter more narrow for SAT rank than for HS class rank thus re ecting the better prediction capabilities for SAT ra Figure 9 Scatterplot with College GPA with Figure 8 Scatterplot with College GPA with HS Class Rank SAT Rank m II mquot m 3 1 Il 1quotth dIIIII H n V quotInquot m Mil um lam m quot gulp quotBl up quotIn Him quotHa Ca11ege GF A 3 Omega GF A a 25 n 2n 4 u an mu m n 6 SAT Rank Ignore marks below use quotCDocument5GSUCOURSESEDUR 8131Data and GraphsGPA Hsirank SATirankJitaquot clear twoway lfitci GPA SATirank stdf level99 scatter GPA SATirank lfit GPA SATirank clpatternsolld Clwldthth1ck ytltlecollege GPA xtltleSAT Rank legendoff ytick214 schemelts2mono xticku1u1uu twoway lfitci GPA Hsirank stdf level99 scatter GPA Hsirank ifit GPA Hsirank clpatternsolld Clwldthth1ck legend off ytick2 1 4 scheme sZmonO xtlckE1E1ElE ytltle College GPA xtltleHS class Rank Ignore marks above Version 2142005 6 Correlation and Causation A correlation between variables does not imply the existence of causation ie X a Y or Y a X A strong correlation does not imply causation e g r 98 re trucks and damage in urban areas neither does a weak correlation imply the lack of causation eg r 04 Causation can only be established Via experimental research and replications Correlational research ie nonexperimental research cannot be used to establish the existence of causal relationships 7 Testing a Pearson39s 39 quot coefficient r a Calculating a tratio for r When testing a correlation coefficient one is typically interested in whether the correlation is statistically different from a value of 000 that is one is interested in learning whether the calculated correlation is statistically different from no linear relationship r 000 The formula for obtaining a calculated tValue for the correlation is Nn72 xlir2 and the df or 1 are t df n 2 where n is the number of pairs of scores or the number of subjects the sample size Hypotheses for r Hypotheses for r include Nondirectional H1 p i 000 HO p 000 Directional onetail tests 1 Lowertail 1r is negative H1 p lt 000 HO p 000 Uppertail 1r is positive H1 p gt 000 HO p 000 Version 2142005 Statistical significance level The statistical signi cance level alpha is usually set at the conventional 10 05 or 01 level Critical values one uses for testing the correlation are the same tvalues used above for the one sample ttest Decision rules The decision rules for the test of the correlation follow Twotailed tests Ift g tmt or t 2 tent then reject H0 otherwise fail to reject H0 Onetailed test uppertailed a 39 quot 39 39 positive r Ift 2 tent then reject H0 otherwise fail to reject H0 Onetailed test lowertailed a 39 quot 39 39 negative r Ift g tent then reject H0 otherwise fail to reject H0 Note that ted symbolizes the critical tvalue An example A researcher wishes to determine if a relationship exists between the number of hours spent studying and performance on a statistics exam The researcher not being very bright was unsure whether t e relationship would be negative or positive so he speci ed a nondirectional hypothesis e g there will be a relationship between hours spent studying and performance on the statistics exam Using the 17 students in his class the researcher found the correlation between hours studied and exam grade to be r 42 To determine if it was likely that this sample of students comes from a population with p 0 ie no linear relationship between the variables the correlation was tested for statistical signi cance at the 1 01 level t7 rxn72 74241772 7 4241772 71627 142 17422 171764 908 A twotailed test with on 01 and df 17 2 15 has a critical value ofi2947 so the null hypothesis is not rejected at the 1 level of significance 1792 Why no rejectioniwhat is the decision rule How does one interpret this result If however the statistical signi cance level was set at 05 would a different conclusion result What about on 05 with a uppertailed test to test for a positive correlation What conclusion is drawn and if the result is different why is it different Note In actual research one is not to alter either the statistical signi cance level or the hypothesis once the data are collected That is if your results are not statistically signi cant the first time then stop there The alterations above were done for illustrative purposes only b Critical r values rmt Finally note that for statistical tests of correlation coef cients there is a much easier procedure than the use of tratios Once you have Version 2142005 l Identi ed the correct HO and H1 2 set the signi cance level 0c 3 calculated the correlation 4 and calculated the df 5 use table below to determine critical values rem for r 6 apply the appropriate decision rule Decision rules using rmt Twotailed tests Ifr g rmt or r 2 rem then reject H0 otherwise fail to reject H0 Onetailed test uppertailed a 39 u 39 positive r Ifr 2 rem then reject H0 otherwise fail to reject H0 Onetailed test lowertailed a 39 u 39 negative r Ifr g rem then reject H0 otherwise fail to reject H0 Table 1 Critical Values for Pearson s r Levels of Signi cance a for Onetailed Test Levels of Significance a for Onetailed Test 050 0025 0010 0005 0050 0025 0010 0005 Levels of Significance a for Twotailed Test Levels of Signi cance a for Twotailed Test df n2 0100 0050 0020 0010 df n2 0100 0010 1 0988 0997 09995 09999 23 0337 0396 0462 0505 2 0900 0950 0980 0990 24 0330 0388 0453 0496 3 0805 0878 0934 0959 25 0323 0381 0445 0487 4 0729 0811 0882 0917 26 0317 0374 0437 0479 5 0669 0754 0833 0874 27 0311 0367 0430 0471 6 0622 0707 0789 0834 28 0306 0361 0423 0463 7 0582 0666 0750 0798 29 0301 0355 0416 0456 8 0549 0632 0716 0765 30 0296 0349 0409 0449 9 0521 0602 0685 0735 35 0275 0325 0381 0418 10 0497 0576 0658 0708 40 0257 0304 0358 0393 11 0476 0553 0634 0684 45 0243 0288 0338 0372 12 0458 0532 0612 0661 50 0231 0273 0322 0354 13 0441 0514 0592 0641 60 0211 0250 0295 0325 14 0426 0497 0574 0623 70 0195 0232 0274 0303 15 0412 0482 0558 0606 80 0183 0217 0256 0283 16 0400 0468 0542 0590 90 0173 0205 0242 0267 17 0389 0456 0528 0575 100 0164 0195 0230 0254 18 0378 0444 0516 0561 120 0150 0178 0210 0232 19 0369 0433 0503 0549 150 0134 0159 0189 0208 20 0360 0423 0492 0537 200 0116 0138 0164 0181 21 0352 0413 0482 0526 300 0095 0113 0134 0148 22 0344 0404 0472 0515 400 0082 0098 0116 0128 Version 2142005 Recall the example presented above Try testing the correlation between time spent studying and performance on a statistics examination r 42 using the same levels of signi cance presented above 0c 01 0c 05 and 0c 05 uppertailed test c Hypothesis testing and pvalues for r If using statistical software to perform hypothesis testing one simply compares the obtained pvalue for the correlation r to on to determine statistical signi cance Note that most software reports by default p values for twotailed tests The decision rule is pr g on then reject H0 otherwise fail to reject H0 Results from SPSS are reported below The row labeled Sig 2tailed provides the pvalue for a two tailed test these rows are in bold Descriptive Statistics ean Std Deviation N GPA 31000 3000 750 HSiRANK 500000 289318 750 SATiRANK 500000 289318 750 Correlations GPA HSiRANK SATiRANK GPA Pearson Correlation 1000 117 697 Sig 2tai1ed 001 000 N 750 750 750 HSiRANK Pearson Correlation 117 1000 283 Sig 2tai1ed 001 000 N 750 750 750 SATiRANK Pearson Correlation 697 283 1000 Sig 2tai1ed 000 000 N 750 750 750 Correlation is significant at the 001 level 2 tailed Focusing just on the correlation between GPA and HSiRank r 117 the decision rule in this case using on 05 results in H001 g 05 then reject H0 otherwise fail to reject H0 Since 001 is less than 05 H0 is rejected and once concludes that there is a statistically signi cant positive correlation between college GPA and High School Class Rank The higher one s class rank the higher one s college GPA d Exercises 1 Aresearcher nds the following correlation between GPA and SAT scores r 56 n 19 Using a lowertailed test test for the statistical signi cance of this correlation with on 01 Use both the critical t and critical r methods for testing the correlation Version 2142005 2 A researcher nds a correlation of r 139 between academic selfefficacy and academic performance Is this correlation statistically different from zero Note 11 30 0c 05 and use a two tailed test 3 What is the smallest sample size is needed in 2 to reject HO 4 Aresearcher nds a negative correlation ie r 65 n 100 between academic selfefficacy and test anxiety Is the population correlation between these two variables likely to be different from zero Set on 01 and base the statistical test upon the following information H1 p gt 000 5 The more one studies the more likely one will perform well in school Test this supposition using the following information r 33 n 44 0c 05 H1 p i 000 6 The following information was obtained for the two variables X and Y 12 25 n 33 Using the following test for statistical significance 0c 01 H1 p lt 000 Also test for the following H1 p i 000 8 Correlation Matrices Often researchers calculate correlation coefficients among several variables A convenient method for displaying these correlations is via a correlation matrix Below are the correlations among IQ SAT GRE and GPA Usually such tables include a footnote such as this p lt 05 The asterisk denotes that a particular correlation is statistically different from 000 at the 05 level Ifthe asterisk is not next to a particular correlation that means the null hypothesis was not rejected for that correlation The dashed lines denote perfect correlations r 100 The correlation between a variable and itself is always equal to 100 IQ SAT GRE GPA IQ SAT 75 GRE 81 82 GPA 45 36 42 p lt 05 Which correlations are statistically signi cant Version 2142005 9 Example 39 39 quot of Pearson39s r for 10 and SAT Recall the three formulas 22ny 7 nil Formula A r 20414601 e 7 2th enOYXY nil nil My quotZXY QZXXZH nZXZ ZXyanYz 722 8 Formula B r XY 8 s where sXy For formula A first calculate Z scores for both variables Student IQ IQ E 1Q ET zIQ Bill 100 0571 0326 0057 Beth 101 0429 0184 0043 Bryan 102 1429 2042 0143 Bertha 95 5571 31036 0558 Barry 87 13571 184172 1360 Betty 120 19429 377486 1947 Bret 99 1571 2468 0 157 7Q 100571 ss 597714 Sig 99619 sIQ 9981 Student SAT SAT m SAT r ZSAT Bill 1010 22 143 490312 0409 Beth 1085 52857 2793862 0976 Bryan 1080 47857 2290292 0884 Bertha 990 42143 1776032 0778 Barry 970 62143 3861752 1148 Betty 1100 67857 4604572 1253 Bret 990 42143 1776032 0778 M 1032143 ss 17592854 sg 2932142 Sm 54149 Version 2142005 Next nd the sum of the product of the Z scores Student SAT IQ zSAT ZIQ ZIQ X Zsu Bill 1010 100 0409 0057 0023 Beth 1085 101 0976 0043 0042 Bryan 1080 102 0884 0143 0126 Bertha 990 95 0778 0558 0434 Barry 970 87 1148 1360 1561 Betty 1100 120 1253 1947 2440 Bret 990 99 0778 0 157 0122 ZZIQZW 4748 791 Z Z Formula A r 2 IQ SAT 74397648 n 7 1 For formula B one must rst calculate the covariance sxy between both variables Zoe efxx 7 7 2X1 wood SXY Formula B r where sXy s s n 71 n 71 Student SAT IQ SAT XIQ Bill 1010 100 101000 Beth 1085 101 109585 Bryan 1080 102 110160 Bertha 990 95 94050 Barry 970 87 84390 Betty 1100 120 132000 Bret 990 99 98010 7Q 100571 3le 597714 s2 99619 sIQ 9981 W 1032143 ssSAT 17592854 s2 2932142 SSAT 54149 ZSATxIQ 729195 Version 2142005 519w XXX YXY 7Y7 XXIX 740W 729l9577x 1005719 1032143 Sxy SIQ SAT IQSSAT n 71 n 71 7 71 729195 7 726625576 2569424 428237 6 r 7 SW 519347 428237 428237 0792 isXsY SIQSW 9981x54149 540461 Formula C is the formula used before widespread use of computer due to its ease of calculation despite that this formula looks more complex than the other two My quotZXY QZXXZD nZXZ ZXyanYz 722 Student SAT IQ IQXSAT SAT2 1Q2 Bill 1010 100 101000 1020100 10000 Beth 1085 101 109585 1177225 10201 Bryan 1080 102 110160 1166400 10404 Bertha 990 95 94050 980100 9025 Barry 970 87 84390 940900 7569 Betty 1100 120 132000 1210000 14400 Bret 990 99 98010 980100 9801 ZSAT 7225 2SAT2 7474825 ZIQ 704 ZIQ2 71400 and ZSATIQ 729195 nZXYeQXxZY 7 nZ1QXSATeZIQZSAT JMZXZ 42021012142 421021 JMZIQZ ZIQYHHZSATZ 425ml 7 x 729195 e 7225 x 704 7 x 7474825 7 522006257 x 71400 7 495616 5104365 7 5086400 7 17965 17965 7 0791 7123150x4184 7515259600 22699330 Version 2142005 PaItitioned Variance for SAT 12 7912 626 total variance in y variance predicted variance not predicted 547 12X 3 l 1 12 3 2932142 6262932 142 1 6262932 142 2932142 1835521 1096621 Proportion explained 0626 in percent 626 Proportion not explained 0374 in percent 374 Variance explained 1835521 Variance not explained 1096621 Why was SAT picked as the dependent variable in this example Exercises Address the following for each problem a What are the correct H0 and H1 in both written and symbolic form b What is are the critical values c What is the obtained calculated correlation d Did you reject or fail to reject HO e Write your conclusion as if explaining the results to nonstatisticians 1 What is the relationship between academic selfef cacy and test anxiety answer r 773 Student Academic Selfef cacy Test Anxiety Bob Linda 5 Marlynn 9 7 Bryan 9 5 Eric 7 8 2 What is the correlation between grade point average GPA and intelligence IQ answer r 844 Student GPA IQ Bob 33 139 Linda 34 140 Marlynn 39 145 Bryan 37 135 Eric 27 125 Version 2142005

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.