Statistical Principles of Psychological Research (B.S. Majors)
Statistical Principles of Psychological Research (B.S. Majors) PSYC 215
Popular in Course
Cecelia Erdman IV
verified elite notetaker
verified elite notetaker
BIOL 103 - 54
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
Popular in Psychlogy
This 30 page Class Notes was uploaded by Cecelia Erdman IV on Sunday October 25, 2015. The Class Notes belongs to PSYC 215 at University of North Carolina - Chapel Hill taught by Viji Sathy in Fall. Since its upload, it has received 22 views. For similar materials see /class/228715/psyc-215-university-of-north-carolina-chapel-hill in Psychlogy at University of North Carolina - Chapel Hill.
Reviews for Statistical Principles of Psychological Research (B.S. Majors)
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/25/15
Regression Regression Statistical technique for determining the bestfitting straight line for a set of data The bestfitting straight line is called the regression line The regression line is a mathematical model of the relationship between the variables and it can be used to predict the value of Y DV from a known value of X IV 1 Equation of the regression line 3 bx 3 Notation note Y is called Y hat Y predicted value of Y predicted score on DV b slope of regression line the change in Y for each unit of increase in X X known value of X known score on TV a yintercept where the regression line crosses the Y axis the value of Y when X 0 2 How is the bestfitting line determined The equation for slope and yintercept are derived from a calculusbased procedure called the least squares method In this method the goal is to minimize the distance between the line and the actual data points The squares part of this term comes from squaring the distance Y 7 Y error or residual the difference between actual and predicted value of Y Z Y SSmm sum of squared error The bestfitting regression line is defined as the line that has the smallest possible SSmm 3 Equations for slope and yintercept You are NOT required to memorize the following equations I m providing these equations here so you know that the values for b and a don tjust come out of thin a39 Z X MX XY MY Notation note MX is the mean of the X values b W and My is the mean of the Y values aMY bM X 4 Using the Regression Line to Make Predictions Example 1 We can use the data set I provided at the beginning of the unit on correlation to predict depression score from optimism score I calculated the slope and yintercept and they are b 44 and a 16 We can plug those values into the regression equation to predict depression a 44X 1679 Y predicted depression score X known optimism score Page 2 of 5 a What if someone has an optimism score of 14 What do you predict for depression Y 44X 1679 4414 1679 1063 b Predict the depression score for someone with an optimism score of 18 Y 44X 1679 4418 1679 887 Practice Problem Suppose that we used a regression model to predict final exam score on a scale from 0 to 50 from average quiz score and my output showed that b 1976 and a 1541 Write the regression equation 5 Accuracy of Predictions Standard error of the estimate Average amount of error in the predictions it represents the mean distance of the actual values of Y from the best fitting line Therefore it is a measure of how accurately the regression line predicts the Y values SS error df standard error of the estimate se df n72 Examples You won t need to calculate se by hand but you need to understand What se is and What a reported se value means I For the optimism and depression example se 352 The average error in our predictions of depression will be 352 Given the range of depression scores this seems like a moderate amount of error I For the final exam score and average quiz score problem above se 984 The average error in our predictions of final exam scores is about 10 points Given that final exam scores ranged from 9 to 49 this seems like a large amount of error a Standard error of the estimate is a measure of average error It does not tell you exactly how much error there will be in any single prediction Some errors may be small predicted Y and actual Y are very close and some errors may be large predicted Y is not close to actual Y Standard error of the estimate is the mean of all the errors b Standard error of the estimate can be very useful in comparing different independent variables that you might use to predict one dependent variable For example let s say you have a choice of using average quiz score or an attendance rating to predict final exam score For predicting final exam score from average quiz score se 984 For predicting final exam score from attendance rating se 770 Page 3 of5 Sample SPSS Output 7 Mark the tenns that we review in class here VZIIZhIES EmerEdRemmdh b DEpendenWavlame uptlmlsm Mndel summary Adlusted R std Enmnv ave sways We Estlmate 56 m 3 52285 a Flemc uvs onnstanu demessmn nuovnb l Mudel l m I sduaves d1 Meansduave r and 1 REEvesslun 118851 1 1 8851 8771 HB ual 111m 9 12m Tntal 22mm 1 a Pvedlcluvs onnstaml demessmn Dependemvavlame uptlmlsm canmnnemsa Mndel Slandavmzed Cuemclems a t onnstant 16m 2121 7am nun depvesslun an 1w 713 962 116 a DEpendenWavlame uptlmlsm Unslandamlzed Cuemclems std END 6 NIulu39ple Regression F amule THhme 39 quot I a I39 i Y W bzxz 1 35 7 Natatlan note The subscripts identify different independent variables x is the rst independent variabl e X is the second independent variable on Usually all iaw sedies are standardized by transforming them to z sedies Ihen b becomes 5 Greek m vln sh Thu 39 39 39 39 39 this a er standardizing 8 Bixi me BaXa variable to the Yvan39able excluding any overlap with otherXVariables r I 39Ihltl lz v quot39 39l39 39 l Page 4 of 5 Understanding 3 Remember when we covered 2 scores that standardizing scores allow you to compare scores from different distributions The same concept applies here Standardizing the slope to make standardized regression coefficients makes it possible to compare different independent variables Independent variables that have larger 3 values are I more strongly related to the dependent variable I more important predictors of that dependent variable Consider the following equation showing standardized regression coefficients Y 03X1 11X2 55X3 This equation shows that there are three independent variables X1 and X3 are positively related to Y andXz is negatively related to Y The numerical part of 3 shows you that X3 is the most strongly related to Y and contributes the most to the prediction Understanding R and R2 for Multiple Regression 1 R multiple correlation coefficient correlation between the DV and all the TVs taken together Because of overlap among predictor variables R is less than the sum of the individual r of each predictor with the dependent variable 2 R2 proportion of variance in the DV that is explained by the set of independent variables included in the multiple regression R2 is also referred to as the proportionate reduction in error because indicates how much error in predicted scores will decrease when using the regression model compared to simply using the overall mean to predict individual scores Recognizing and Interpreting Multiple Regression Statistics in Journal Articles Table Predictions of Child s Receptive Cooperation with Mother at 15 months Predictor B SE Child s gender 41 14 Child s anger proneness 05 17 Mother s responsiveness 24 07 Attachment securitV 60 14 R2 31 Source Kochanska G Aksan N amp Carlson JJ 2005 Temperament relationships and young children s receptive cooperation with parents Developmental Prychology 41 648660 Interpretation Attachment security is the most important predictor of child s receptive cooperation followed by gender and then mother s responsiveness Given its small value child s anger proneness contributes little if anything to the prediction These four variables together account for 31 of the variance in child s receptive cooperation at 15 months Page 5 of 5 Practice Problems Interpret the multiple regressions reported in the tables below 1 Table Multiple Regression Analyses Predicting Effectiveness of Treatment of Agoraphobia T Variable r B Depression 30 30 Age 21 20 Nbr of treatment sessions 12 08 Duration of disorder 13 02 R2 13 Source Hatheg et al 2001 Short and longterm effectiveness of an empirically supported treatment of agoraphobia Journal of Consulting and Clinical Psychology 69 3753 82 2 Table Political Orientation PO Mortality Salience MS and Relationship Prime as Predictors of Support for Extrem e Force Predictor b SE B 087 024 53 MS 033 037 08 T 39 39 prime 045 037 11 Notes PO is measured on a scale from 1 extremely conservative to 7 extremely liberal Extreme force refers to an aggressive military approach to international terrorism Source Weise et al 2008 Interpersonal politics The role of terror management and attachment processes in shaping political preferences xychological Science 19 448455 Probability Probability of A pA number of outcomes classified as A total number of possible outcomes Probabilities in research articles are reported as proportions It is common to see a probability written as being less than some number such as p lt 05 or p lt 001 Two important assumptions 1 Random sampling Each individual in the population has an equal chance of being selected 2 Sampling with replacement Each individual in the sample is returned to the population before another sample is drawn This keeps probabilities constant across sampling Probability and Distributions 1 Frequency distribution of scores mwasmx N1 ONmeHa a What proportion of the sample has a score of 4 or higher b What is the probability of selecting a 4 or higher by chance alone c Matpercenlage of the sample has a score of 3 or 4 d If we randomly select one score from this distribution what is the probability that it will be a 3 or 4 2 Distribution of a statistic Imagine that you take 100 samples from a population For each sample you measure two variables and compute r Therefore you have 100 values of r Here s the frequency distribution for those r values r f 4 or higher 2 3 4 2 6 1 18 0 40 1 18 2 6 3 4 4 or lower 2 a If you select one sample at random what is the probability that it will have an r of 1 to 1 b What is the probability of selecting a sample with r of 4 or 4 or a stronger correlation 3 The Normal Distribution a Scores are normally distributed with M 40 and s 3 What is the probability of randomly selecting a score above 43 Below 34 b Body Mass Index WEightP0 quotdS x703 CDC BMI Guidelines heightinches2 Underweight below 185 rmal 1857249 Overweight 25 7 299 Obese 30 and up i BMI for US women age 20 7 39 p 267 and o 4 If one woman is selected randomly what is the probability that she will be underweight ii How likely are you to select a woman who is overweight or obese Using Distributions to Think about High Probability and Low Probability Events I Low probability corresponds to a relatively small proportion of the distribution usually 05 or less High probability corresponds to a relatively large proportion of the distribution usually 95 or more Example GREVerbal Scores mean 470 standard deviation 121 1 Make a sketch of the distribution of GREVerbal scores Indicate the high and low probability areas 2 A group of students take the GRE and the mean Verbal score for this group is 720 Is this a high probability event In other words does this look like a chance event Central Tendency A Measures of Central Tendency A measure of central tendency is a statistic that indicates the middle center or average of a distribution provides a single score to represent the entire group of scores 1 Mean Arithmetic average takes into account the value of each and every score in the distribution considered the balance point or fulcrum of the distribution 7 2X Notation note Sometimes Y is used M N as notation for sample mean In this class we will use M Examples 1 Data set 2 3 4 5 5 5 6 6 6 7 8 9 2 Data set 2 3 4 5 5 5 6 6 6 7 20 21 2 Median Score that divides the distribution in half such that half of the scores are greater and half of the scores are less than the median Mdn N1 m Median location T score Put scores in order and count from one end to median location The score at that location is the median Examples 1 Data set 2 3 4 5 5 5 6 6 6 7 8 9 2 Data set 2 3 4 5 5 5 6 6 6 7 20 21 3 Frequency distribution X f 7 4 6 9 5 24 4 1 9 3 1 0 2 7 1 6 3 Mode The most frequently occurring score in the distribution B Comparing Mean Median and Mode 1 Properties of the Mean It is sensitive to outliers extreme scores Therefore it can be a misleading measure of central a tendency if the distribution is strongly skewed It is the preferred measure of central tendency when the distribution is unimodal and fairly symmetrical b Mean requires division and often results in a fraction Therefore mean should not be used as a measure of central tendency for categorical data 0 It is the most reliable measure of central tendency The term reliable has a very specific meaning here If we took repeated random samples from the target population the sample means would show less uctuation than either the sample medians or modes The value of the mean is not exactly the same for each sample but it does not vary a lot Because of this the sample mean is considered the best estimate of the population central tendency P Because the mean is computed using the value of every score it is mathematically related to other descriptive statistics that use every score value such as the most popular measure of variability The mean is also important in many inferential statistics 2 Properties of the Median a Compared to the mean it is less sensitive to outliers Because of this it is the preferred measure of central tendency for strongly skewed distributions Examples Tncome housing prices b Less reliable than mean Its value uctuates more across random samples from the population 3 Properties of the Mode a It is the only measure of central tendency that can be used with categorical data b It is not sensitive to outliers c It is the least reliable measure of central tendency Its value uctuates a lot across random samples from the population and this makes the sample mode a poor estimate of population central tendency C Central Tendency and Skewed Distributions 1 If distribution is unim odal and perfectly symmetrical then mean median mode If distribution is unim odal and approximately symmetrical then mean u median m mode 2 If the distribution is skewed then mean median and mode will be different a Positively skewed distribution Mode lt median lt mean b Negatively skewed distribution Mean lt median lt mode Practice Problems Describe the shape of each distribution Then determine mean median and mode of each distribution X f X f X f X 9 4 4 6 90799 52 27729 8 7 3 21 807 89 145 24726 7 6 2 11 70779 48 21723 6 4 l 3 60769 38 18720 5 3 0 2 50759 26 15717 4 18 l 3 40749 18 12714 3 22 2 10 30739 10 9711 2 29 3 20 20729 8 678 l 10 4 8 10719 4 375 EX 379 EX 3 EX 26l87 EX 7440 Variability De nition Spread or dispersion of scores differences among scores Central tendency alone does not provide a complete description of a data set Central tendency glosses over differences use one score to represent all Variability points out those differences describes how different scores are from one another Measures of central tendency and variability go hand in hand they complement one another to provide a more complete description of the group of scores Measures of Variability 1 Range Distance between the highest and lowest scores highest score 7 lowest score Example 58 73 74 76 78 81 90 96 Range 96 7 58 38 2 Interquartile range QR the range of the middlemost 50 of scores range after eliminating e highest 25 of scores and the lowest 25 of scores the distance between the lst quartile and the 3quoti quartile 58 73 74 76 78 81 90 96 lnterquartile range 855 7 735 l2 3 Standard deviation average distance of the scores from the mean The range gives an answer to a relatively simple question about spread How far is it from the highest to the lowest score We could ask about spread in a different way On average how far are scores from the mean We use a different measure of variability standard deviation to answer this a Conceptual example for standard deviation Data 2 3 6 10 l3 16 20 Mean 10 To determine average distance of scores from the mean we could calculate how far each score is from the mean and then average those distances 87403610 average distance from the mean 543 7 Interpretation On average scores are 543 points from the mean b Interpreting the value of standard deviation I Mnimum value of s 0 There is no spread All scores are the same I The higher the value of the standard deviation the greater the spread of scores Comparing Range and Standard Deviation Properties of range 1 Easy to compute and understand In practice it is not common to compute range Most researchers report only the highest and lowest scores and don t do the subtraction Range uses only two scores and those are the two most extreme scores 2 I lnterquartile range is an attempt to correct for this However it is not used very often 3 Range is affected by sample size it is very likely to increase as sample size increases 4 Range is very unreliable unstable it uctuates across multiple random samples from the population TQR is more stable Properties of standard deviation Very reliable 2 Relatively unaffected by sample size 3 Can be affected by outliers extreme scores in the distribution 4 Mathematically related to mean and other statistics because it uses every individual score Variance standard deviation squared Because it is in squared units it is not useful as a descriptive statistic However it is very important in inferential statistics Given these properties the standard deviation is the preferred measure of variability You will often see mean and standard deviation reported together Unless a distribution is strongly skewed you should use mean and standard deviation to describe the center and spread respectively of a distribution of scores Standard Deviation Equation What math procedures could we use to calculate standard deviation One possibility X X 7M deviation 20 20 10 10 l6 l6 10 6 10 10 10 0 l3 l3 10 3 6 6 10 4 3 3 10 7 2 2 10 8 Zdeviations Average deviation from mean Sum of deviation Number of deviations Problem with this method The negative deviations of scores below the mean balance the positive deviations of those above and the sum of the deviations will always be zero To solve this problem we ll square the deviations Then when we get done we will take the square root to put it back in the original units Z X M 2 SS Sample standard dev1ation s 7 n 1 df Notice that the denominator is n 7 1 not simply n This small adjustment allows us to calculate sample standard deviation as an unbiased estimate of the population standard deviation Unbiased the expected value of the sample statistic is equal to the population parameter that it estimates Expected value average of the sample statistic from multiple random samples from the population SS df Sometimes the equation is abbreviated to s SS sum of squares sum of squared deviations from the mean df degrees of freedom Degrees of freedom The number of independent unrestricted scores Number of scores that are unrestricted can take on any value in calculating the statistic Degrees of freedom will be different for different statistics For standard deviation not all scores are free to vary because we must know the mean in order to determine standard deviation When a mean is known all but one score can take on any value One score is restricted not free to vary Any time degrees of freedom are used in a statistic you can know two things 1 the statistic is unbiased 2 some other statistic had to be calculated first and is included in the equation Standard deviation is an extremely important statistic You should be able to de ne standard deviation interpret a value of s write the equation for s and explain the different parts of that equation Practice Problems 1 Practice Problems Determine range and IQR X f X f 9 4 4 6 8 7 3 21 7 6 2 11 6 4 1 3 5 3 0 2 4 18 1 3 3 22 2 10 2 29 3 20 1 10 4 8 2 A set of scores has standard deviation 10 What does this mean 3 For the following frequency distribution M 16 Which of the following is the best estimate of standard deviation s 1 s 3 or s 7 Explain your answer in a way that shows that you understand the concept of standard deviation X f 27729 43 24726 47 21723 47 18720 59 15717 71 12714 60 9711 48 678 46 375 43 Correlation Describing the Relationship between Two Variables The descriptive statistics we have studied so far have been univan39ate That is we use them to describe the shape center and spread of scores we have for a single variable It is rare to study just one variable Often we study numerous variables and we want to understand how a variable is related to another variable Example What is the relationship between optimism and depression Perhaps we propose that those who are more optimistic tend to be less depressed We would need to do some research to see if this is true Let s say we have a sample of 11 people and each one completes a questionnaire measuring depression and a separate questionnaire measuring optimistic thinking Optimism Depression Subject Score Score 1 13 2 6 14 3 3 17 4 7 14 5 10 18 6 12 12 7 14 5 8 17 12 9 20 3 10 24 8 11 21 9 M1236 M1136 s 7 50 s 4 70 Techniques for Describing a Bivariate Relationship Graphing and Correlation Coef cient Graphing Scatteiplot Graph used to show the relationship between 2 variables Scatterplot for optimism and depression Depression 0 Optimism A Constructing a scatteiplot Horizontal axis is used for independent variable X Vertical axis is used for the dependent variable Y Draw axes approximately same length Plot a point for each pair of scores th Nf i B What to look for in a scatterplot Form of relationship Linear or curvilinear Curvilinear O O 70 o o 60 O O 50 40 o o 30 o e 20 39 10 o 20 40 so an IV 19 Linear Page 2 of 7 Strength or degree of relationship The closer the points are to forming a single line the stronger the relationship Perfect 70 o O 60 o O 50 O o a 40 o 30 o 20 o o 10 o U 20 40 60 80 IV Moderate 70 9 an v 50 e 5 40 39 30 9 20 o o 10 o o 0 20 40 an an IV Strong 70 0 an 39 50 o gt o n 40 so o 20 o m r r r u 20 40 an an IV Weak r O 60 o O 50 o O 40 gt o D so 20 o O m o O u 20 40 an an Page 3 of 7 3 Direction of relationship Positive or negative a Positive scores on one variable are associated with scores at a similar level on the other variable I High scores on X are associated with high scores on Y I Low scores on X are associated with low scores on Y b Negative scores on one variable are associated with scores at the opposite level on the other variable I High scores on X are associated with low scores on Y I Low scores on X are associated with high scores on Y Positive Relationship Negative Relationship 90 90 80 v 80 7 o 237 23 39 50 39 50 39 gt gt D D 3 39 3 x 20 I 20 7 39 10 7 o o 10 7 o O 0 i i i 0 i i i 0 20 40 60 80 0 20 40 60 80 IV IV Note With weak relationships it may be hard to discern the direction of the relationship from the scatterplot Correlation Coef cient Statistical measure of the relationship between two variables 1 Most common correlation coefficient is Pearson correlation coefficient r It is used when both variables are continuous either interval or ratio it measures the degree and direction of linear relationship 2 Other correlation coefficients I Spearm an correlation ordinal data andor nonlinear relationship I Point biserial correlation one variable is continuous the other is dichotomous I Phi coefficient two dichotomous variables 3 Equation for r Note You are not required to memorize the computational equation but you are expected to know the formula expressed in z scores Conceptually r degree to which X and Y vary together degree to which and Y vary separately Z Z KNOW THIS Formula expressed in terms ofz scores r Lily n X Y 2 XY L FYI Computational equation r N 212 2127102 XX ZY N N Page 4 of 7 4 Getting r r 702n 1 70210 70 Interpretation 5 Understanding the value of r Correlation coefficient values range from l00 to 100 a Numerical portion Regardless of sign the value of r indicates the strength or degree of linear relationship between the two variables The following number line indicates the labels typically assigned to various values of r within psychology and other social science research 100 0 l 3 5 100 perfect strong moderate weak none weak moderate strong perfect b Sign Direction of relationship Page 5 of 7 6 Describing correlations Putting the value of r into words a First state the direction and strength of the relationship in a sentence like the following 1 There is a 391 between 39 an strength direction name of IV name of DV b Then elaborate to give more detail about how the variables covary i Suggestions for negative relationships I As increases decreases name ofIV name of DV I People who have more tend to have less name of IV name of DV I People who are more tend to be less adjective form of IV adjective form of DV ii Suggestions for positive relationships I As increases also increases name ofIV name of DV I People who have more tend to have more name of IV name of DV I People who are more tend to be more adjective form ofIV adjective form of DV Examples 1 Optimism and depression r 70 There is a strong negative relationship between optimism and depression People who are more optimistic tend to be less depressed 2 Parental involvement and children s school achievement r 23 There is a weak positive relationship between parental involvement and children s school achievement Children with more involved parents tend to have higher school achievement Practice Problems Practice estimating correlations http isticsnetstatCorrelations Practice Correlation Practice 1 and 2 on Blackboard Page 6 of 7 Table 1 Correlations of procrastination with measures of stress health and academic performance Early semester Late semester Grade on Exam Symptoms Symptoms Term paper grade of illness Stress of illness Stress Score on Lay s General 29 64 45 31 65 68 Procrastination Scale Source Tice DM amp Baurneister RF 1997 Longitudinal study of procrastination performance stress and health The costs and benefits of dawdling Psychological Science 8 454458 Table 2 Correlations Among Peer Relationship Variables Victimization and Behavioral Vulnerability Measures l 2 3 4 5 6 7 8 Peer Measures 1 Total acceptance 2 Reciprocated friends 040 3 Friendship quality 040 049 Victimization measures 4 Overt victimization 030 038 053 5 Relational victimization 051 043 044 073 Behavioral vulnerability measures 6 Externalizing problems 033 032 042 063 059 7 Internalizing problems 0 11 019 036 035 039 007 8 Attentional problems 026 016 024 025 019 030 012 Source JensenCampbell LA amp Malcolm KT 2007 The importance of conscientiousness in adolescent interpersonal relationships Personality andSocial Psychology Bulletin 33 368383 Figure 1 The mediating in uence of attentional and externalizing problems on the conscientiousnesspeer relationship link Chuminmbusnean III5IZI39 411239 Source JensenCampbell LA amp Malcolm KT 2007 The importance of conscientiousness in adolescent interpersonal relationships Personality andSocial Psychology Bulletin 33 368383 Page 7 of 7 6 Extraneous factors that can affect r a Range of values on one or both variables Restricted can range distort r b Outliers Extreme scores on one or both variables Outliers are especially in uential if sample size is small 7 Explaining r Why are two variables correlated a Correlation is not causation If two variables are correlated then there are three possible reasons for that relationship 1 X causes Y 2 Y causes X 3 A third variable may be involved as a mediating variable Mediating variable A variable that accounts for the relationship between X and Y39 it is an essential link in the causal pathway connecting X and Y b Correlation may be squared to measure proportion of variance explained by the relationship r2 or R2 coefficient of determination proportion of variance in one variable that is accounted for by the relationship with the other variable also known as proportion of explained variance 1 7R2 coefficient of nondetermination proportion of variance not accounted for Example Parental involvement and children s school achievement r 23 so r2 05 This would mean that 5 of the variance in children s school achievement is explained by the relationship with parental involvement 95 of the variance is unexplained Additional examples i Procrastination score and late semester stress r 68 ii Attentional problems and friendship quality r 12 8 Uses of correlation In addition to measuring the strength and direction of the relationship between two variables correlation is important for other aspects of research a Validity of a test or measurement procedure Does it measure what it is supposed to measure Validity can be assessed with a correlation coefficient measuring the relationship between scores on a new measuretest and scores on an established measuretest of the same construct b Reliability of a test or measurement procedure consistency of measurement Reliability is assessed by a correlation coefficient Types of reliability include I lnterrater Correlation of two sets of ratings made by independent observers I Testretest Correlation of the test given at one time with the same test given again I Internal consistency Correlation of one part of a test with another part c Prediction If we find that two variables are related then we can use that information to create a model to predict the value of one variable given a known value of another variable Because our predictive model is based on correlation the accuracy of predictions made using that model will be directly related to the strength of the correlation z Scores Imagine a study in which data have been collected for several different variables One individual in the sample has the following scores Depression score 11 Analytical reasoning score 275 Adaptive coping score 36 What do these scores mean What could we use to make sense of these scores Standard score Score that indicates the location of a raw score in a distribution expressed as the number of standard deviations from the mean For depression scores s 4 and M 15 X M 11 15 4 Notation note RememberX is score Z 4 4 1 Mis mean and s is standard deviation S 1 the deviation of a score from the mean measured in standard deviation units I The sign indicates if the score is below or above the mean I Negative 2 score indicates that raw score is below lower than the mean I Positive 2 score indicates that the raw score is above larger than the mean I Regardless of sign the value of z tells you how many standard deviations the score is from the mean Examples 2 1 means the depression score is one standard deviation below the mean Depression M 15 s 4 12 3 4 5 6 7 8 9101112131415161718192021222324252627 Raw scores for depression 3 2 1 0 1 2 3 z scores for depression Computing 1 scores 1 Analytical reasoning Mean 250 s 50 X275 z 275 2505 50 2 Adaptive coping score M 2544 s 528 7 367 2544 7 2 e 2 528 X36 Computing raw scores from z scores X M zs Examples 1 Depression 2 125 X 15 1254 10 2 Analytical reasoning z 13 X 250 1350 315 Why express distance from the mean in standard deviation units Because it allows us to compare scores from different distributions We compare them based on their relative locations rather than their raw values Example Depression score 11 and Adaptive coping score 36 Depression 2 1 adaptive coping z 2 Below average depression and above average coping Example A college admissions counselor will use standardized test scores to choose between two applicants Applicant 1 SAT combined score 1600 From College Board stats for 2006 collegebound high school seniors mean 1518 standard deviation 112 Applicant 2 ACT composite score 26 From ACT stats for 2006 high school graduates mean 211 standard deviation 53 Standardized distribution I Definition Distribution of scores that has been transformed to create predetermined values for mean and standard deviation For any distribution any shape any mean any standard deviation we can convert the raw scores to zscores and relabel the Xaxis This does not change the shape of the distribution in any way When any distribution is changed to zscores M 0 and s 1 Standardized tests commonly transform raw scores twice First transform scores to z scores and then to other predetermined value for mean and standard deviation Example Scores on an IQ test are transformed to zscores and then transformed again after setting the mean 100 and standard deviation 15 So a raw score that is one standard deviation above the mean is 115 A raw score that is two standard deviations below the mean is 70 Normal Distribution De nltmn A famlly nf dlstnbutmns whlch axe symmemcal ummddal wrllr asympmtm talls Examples quhe mrmal mst wrllr dlffelmtvalues fmmean ll andvanance c1 02 a r r r r 1 Formula for the standard normal dlstnbutlon you won t be tested on quot5 mls formula rrr n I l m n m r The68 795 7997 rule m we Standard Narmal Distribuu39lm a l 1 d a a b d a 20 c Abuut 99 7 ufscmes fall wthln3 standard devlatmns quhe mean between ll 7 30 and ll 30 59 7 human 23 s d Example Depressmnz 71 554 namequot t2 sd Adaptlve EmegZ 2 58 3 JeMeen 1 5d I If deplessmn States are rmrmally dlstnbuted m e1 2 a mm mm N l39ga l oursldelheerea 59 7quot WW 3 stancald jewellers s a slamard dwallm pevple have laws emeralds 01m cen erme depressldn seme57 341 34 1 Ifadantlve Evpmg scales all dumb t lmwmmy pimple have luwel law 136 adaphvecvpmgscmes397 lldw many pevple have 2 1 3 72 l Maan 39 7 seme57 5d sl 5d 5d 5d Introduction to Statistics 1 Overview of Course and Requirements 2 Research Methods Overview a CorrelationalObservational method Make observations of two variables as they exist naturally to observe if there is a relationship Cannot establish causeandeffect in the relationship b Experimental Method One variable is manipulated while changes are observed in another variable 3 Terminology a Variable A characteristic or condition that changes or has different values for different individuals Independent variable The variable that is manipulated by the researcher The quotcausequot Dependent variable The variable that is observed for changes in order to assess the effect of the treatment The quoteffectquot b Data Measurements collected in research often they are scores but may be ranks categories types etc c Statistics A set of methods and rules for organizing summarizing and interpreting information i Descriptive Statistical procedures used to summarize organize and simplify data ie averages tables graphs Examples include Scores from first exam for 88 students Public attitudes towards death penalty Rates of adolescent drug and alcohol use ii lnferential Consists of techniques that allow us to study samples and then make generalizations about the populations from which they were selected Examples include Scores from first exam for 88 students to infer mean of all students in all classes A sample of public attitudes towards death penalty to infer the attitude of the population Rates of adolescent drug and alcohol use to infer substance use for the population of adolescents in the US d Population Set of all individuals of interest in a particular study e Sample A set of individuals selected from a population usually intended to represent the population in a study 4 Different ways of describing variables a Discrete or Continuous i Discrete Possible values are limited eg number of children in a household ii Continuous Possible values are theoretically infinite given sufficiently precise measurement eg height temperature IQ b Qualitative vs Quantitative i Qualitative values designate membership in distinct groups eg religions malefemale marriednever marrieddivorced ii Quantitative numerical values indicate the amount of something and arithmetic operations are sensible c Scales of Measurement i Nominal A categorical variable where the categories have no ordered values Value denotes group membership but no information about quantity eg gender 1male 2female race religion experimental condition Ordinal A categorical variable where the categories are ordered Value denotes rank ordering but distance between values not necessarily equal eg order of finish in a race 1very happy 2satisfied 3very unhappy lnterval A set of categories that are in an organized sequence with equal intervals between the values on the measurement scale no absolute zero point Temperature in Fahrenheit SAT scores iv Ratio An interval scale with an absolute zero point Most physical measurements weight height speed Number of correct answers on a test 5 Summation a Use the Greek Letter Sigma 2 b Order of operations 1 Bracketsparentheses first 2 squaring exponents x2 x 3 multiplication division x 4 additionsummation subtraction Z 6 Today s Sticking Points a There is not a clear consensus on how to define the independent variable i Some people believe that it requires a manipulation in order to technically be called an IV ii Some people just like to think of it as the quotcausequot and thus refer to correlational studies as having an IV b It is not always clear in which scale of measurement a variable may fall in i Height measured in inches or as short medium tall c There is no clear consensus on what Likerttype questions or scales eg on a scale of 17 are considered i Ordinal or interval The scores may be collected as a whole value but they are generally treated as continuous ii For our purposes we will treat them as they are conventionally interval Introduction Practice Problems 1 For the following variables indicate what scale of measurement they are and if they are measured as discrete or continuous variables Psychology course The number of people who are registered for a course A person s place on the waitlist 1 2nd etc Class time Satisfaction rating with the first class 1extremely satisfied can t wait for the next one 7Extremely unsatisfied l m dropping the class 2 Identify the IV cause and DV effect for the following studies Students do better in statistics if they study with other people than if they study alone Students who use laptops in the classroom have lower grades than those who do not Academic success depends partly on effort 3 An instructor wants to know if class attendance impacts college students performance in a math class She teaches two sections of the same class so in one class attendance is mandatory In the other section students are told that attendance is optional At the end of the semester she compares the grades ie 1100 of the two sections attendance requiredattendance optional 1 What is the independent variable 2 Is the independent variable continuous or discrete HINT be sure to think about how it is measured in this study 3 What is the dependent variable 4 Is the dependent variable continuous or discrete 5 What is the population 4 Summation exercises For the following data X 421 Compute 2x 22x 2x1 Zxl 2x2 EX2 2x12 282011 Regression Part II Odds and Ends What is Multiple Regression Linear Regression predict value of 1 variable from another 0 Multiple Regression predict values of outcome from several predictors Regression An Example Predict record sales from advertising Data 200 different album releases Outcome variable Sales CDs and Downloads in the week after release Predictor variables spent promoting the record before release of plays on radio Model with One Predictor new Sun Thounndx set N 2 u w v w u v m 3 Vi Mvenxmg Eudgn Thauunds elPounas Multiple Regression as an Equation With multiple regression the relationship is described using a variation of the equation of a straight line b0 30 intercept value of Y variable when all X5 0 point at which the regression plane crosses the Yaxis vertical 282011 Beta Values The Model with Two Predictors bl regression coefficient for X1 b2 regression coefficient for X2 bn regression coefficient for n l variable bAdverts Interpretation b1 is change in y expected per unit increase in X1 holding ltZ Rum snu Tnounnds X constant bn b is change in yexpected per unitincrease in X holding all m ofhervariables constant 2 I I quotPartial regression coefficientsquot quotv V l Iquot No oIPluysan ni fu quot39 lma 3 lllb nor WM m mu Some Methods of Regression Forced EntrySimultaneous Regression 39 Forced EntrySimultaneous most typical Enter all vars in model simultaneously 39 HierarChlcal results depend on vars entered in model Stepwise Hierarchical Regression Stepwise Regression Known predictors based on past 0 Variables entered in model based on research entered in regression mathemat39cal cr39ter39a39 model rst 0 Computer selects variables in steps New predictors entered in separate stepblock 282011 Straightforward Assumptions Variable Type Outcome must be continuous Predictors can be continuous or dichotomous NonZero Variance Predictors must not have zero variance Linearity The relationship we model is in reality linear Independence All values of the outcome should come from a different person The More Tricky Assumptions No Multicollinearity Homoscedasticity Independent Errors Normallydistributed Errors Multicollinearity Multicollinearity exists if predictors are highly correlated This assumption can be checked with collinearity diagnostics andor correlations a Cnemciems No or plays on Radial liPlWEek o Tolerance should be more than 02 Menard 1995 0 VIF should be less than 10 Myers 1990 Checking Assumptions about Errors Homoscedacitylndependence of Errors Pot ZRESID against ZPRED Normality of Errors Normal probability plot Regression Plots Slandardlzed Resltlunl Plats l2 Emduce all partial plats E Histogram E Mammal probeblllty um i cuminue Cancel Help 282011 Homoscedasticity ZRESID VS ZPRED Normality of Errors Histograms Semarplor Scallerplot Dependent Vanable Record Sales Histogra m Dependent Variable OUTCOM E Hlslog m Dependenl Variable Record Sales Dependent Variable OUTCOME Normality of Errors Normal Probability Plot Types of Predictors Important to consider variable types X WWWP le39 es39mmquot Normal P7P PlolofRegression Dichotomousbinary Squotquot quotquot iquotd ld quot Standardized Residual Depenmvaname 0mm Polytomousdiscrete multiple categories I Ordinal 75 4 Observed Cum Fvub Bad i Dichotomous Predictor Polytomous Predictor Dummy code variable Dummy code each category BUT 0 for one value OMIT one category multicollinearity 1 for the Other Caucasian or not 01 AfricanAmerican or not 01 etc Omit one category 282011 Ordinal Predictor Interaction Terms as Predictors Treat as polytomous Interactions can be obtained in regression More on this later in ANOVA section One last thing 0 Should not predict outside the range of data whom urn3m
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'