MadsSwart

Reliability and Validity; Identifying Good Measurement
Research Methods In Psychology
Emanuele Rizzi
Class Notes
This 7 page Class Notes was uploaded by MadsSwart on Sunday June 26, 2016. The Class Notes belongs to Psych 2300 at Ohio State University taught by Emanuele Rizzi in Summer 2016.


Date Created: 06/26/16
Chapter 5: Identifying Good Measurement Operationalization [part 2] Variables When Operationalized [3 forms] - Self-Report o Recording your own, or someone else’s answers - Observational o Any kind of behavioral measurement o Trace Behaviors  Something left over after that indicates the behavior you are interested in that is not measurable  i.e. want to know if someone was writing – so you look and see that the pencil is more dull than earlier.  Think detective work; anthropology etc. - Physiological o any biological component to what youre interested in  EEG, MRI, heart rate, sweat etc. EX: If Fear is your construct - could operationalize how scary your new ride is 3 ways o Asking people who just rode the ride;; or before and after::self report o Measure how long they were screaming, how long they were screaming, taking a picture or video during ride, if their eyes were shut or not, if they were holding on vs letting go::observational o Peeing their pants, white knuckles::trace behavior o Sweat, heart rate, addreniline::physiological Texting and Driving - Self report questionnaire - Direct observation or friends observation - Physiological Want lots of different studies, so maybe measuring the same thing over the different kinds and ideally they will produce the same results Measurement: scales of measurement Categorical – nominal variables o Numbers if attached to categories o Names, labels etc. Quantitative – MEANINGFUL numbers Ordinal – ranking o 1 , 2nd 3rd o Good, better, worse o There is a comparison between the categories such that one category is above the other Interval – no true 0 o Temperature – saying today is 0 degrees does not mean there is no temperature, it means it is 0 in comparison to something Ratio – 0 means the absence of something o 0 Kelvin – there is no atomic movement o Absence of variable - In psychology tend to think of non measurable things as interval ; like happiness etc. - Did you text while driving; yes or no o Self report o Categorical; two levels - How much did you text on your last drive; none, a little, moderately, a lot o Self report o Ordinal because they are rankings; 4 levels  Can put in a scale - Text per hours vs. subject number o Observational o Ratio; infinite levels  Theoretically, you could have someone who sends a million texts per hour  Zero is meaningful because you can send zero texts per hour What kind of measurement is this - Number of seconds before pressing a key after hearing tone o Observational o ratio scale - Number of classes you are taking this term o Self report or observational o Ratio - Asking how much you like college on a scale of 0-5 o Self report o interval  Your zero might not be everyones zero  Does not apply evenly to everyone - Questionnaire asking for your location of hometown (urban, suburban, rural) o Self report o nominal – 3 levels Table 5.2 in book: measurement scales for operational variables Reliability - consistency of measurement - can we get the same measurement from the same person from the same device over and over - trying to get the same consistent results 3 types Test-retest - Test someone with same measurement tool at two points in time - Some constructs fluctuate over time;; like happiness – not a consistent thing Interrater - Measurement by one observer is the same as another observer - Measuring the same thing, looking for the same thing - Measuring the same target, have similar scores - Does not have to be exact, but must be similar Internal - Similar items should give similar measurements - Important in questionnaires - If my first question rates you on anxiety, and I get a 20, and my second question rates you on anxiety, I should get something similar to a 20 - Can have internal reliability with questions measuring opposite things if they give opposite results Measuring Reliability Slope Direction & Strength - Correlation o Measured using Pearson’s R o -1 to 1  -1 perfectly aligned is descending right to left  0 is a straight line across Spread - The larger your spread the weaker your strengths Molly found a stong correlation between time people spend studying and the grades received on a test. People who studies less performed worse. The correlation in the study was most likely: a. +1.5 b. -.25 c. -.8 d. +.75 if the high are associated with high and low are associated with low; you have a positive correlation - People who study less do worse = people who study more do better A negative correlation would be: people who study more do worse Test- Interrater Internal retest Correlation Correlational coefficient; r Cronbach’s alpha; avg of all correlations al [1 w/2; 1 w/3; 1 w/4… 2 w/3; 2 w/4…] coefficient and all correlated. r If categorical; KAPPA; basically same thing [how likely is one observer to put someone in category as other observer is to put same person in same category] Reliability is necessary but not sufficient to establish validity Which reliability most important - Measuring bullying on a playground o Interrater  want to have multiple people measure what they think bullying is, if they think someone is bullying, and different kids measured etc. - Job satisfaction survey o internal  test them once, make sure all parts of survey measuring same thing - Measure weight on scale o test-retest  don’t want it to tell you at 3pm you weigh 2 pounds and then at 3:01 say 700 lbs - Clinical evaluation of schizophrenic person; they are all important o Test-retest: Can take it again and will get same answer o Interrater: Anyone can use it to measure the same person and will get same answer o Internal: All the questions measure schizophrenia More On Interrater reliability - [how well do your measurements come back the same; reliability] o Assessing different observers to see which is better o Group 1- r=.99 [group 1 has 3 observations] o Group 2- r=.78 [Group 2 has 9 observations] - 1 point would have a perfect correlation o You cannot assess reliability with just a few values; you want as many as possible – gives you a better idea of how the two things you are measuring compare to each other o Much more likely to be wrong about correlation when you have few points - Observers ratings are similar; negative correlation (-.81) and want to know if this is a good measurement o This means they are measuring opposite constructs o Make sure you defined things well o Coding system; instructions clear o Want observers to relate the same when having interrater o Even though consistently opposite; you don’t know what the right answer is, because you don’t know which one is doing the correct measuring. - You may want this, however in a situation of internal reliability; variables that are opposite: o How satisfied are you o How dissatisfied are you  You want opposite answers for opposite questions  Positive and positive have positive correlation  Negative and negative have positive correlation  Positive and negative have NEGATIVE correlation - Interpretation differences o Coding schemes o Instructions o Between team members - Validity and reliability ca - Face and content subjective - Criterion o Correlate with expected outcome o Use known groups paradigm - Convergent and discriminant o Compare to other measures to see if same or different Figure 5.5; table 2 in book. - Dramatically different after 4 years because variables changing Validity - How well does your operationalization compute the construct you are measuring - Accuracy of measure Statistical validity External validity Internal validity Construct validity - Two subjective ways to assess validity o Face validity- it looks like what you want to measure o Content validity-the measure contains all the parts that your theory says it should contain - Three empirical ways to assess validity o Discriminant validity – your measure is less strongly associated with measures of dissimilar constructs o Convergent validity-your measure is more strongly associated with measures of similar constructs o Criterion validity- your measure is correlated with a relevant outcome Two subjective forms [cannot guarantee are linked; what you think should be linked and can re-work  NOT A GOOD WAY] - Face validity o Plausibility measure captures desired construct o Is your measurement reasonably related to the construct you are measuring o Hat size – head size o Food consumption – obesity risk [face validity still valid but not as good] - Content validity o Measure addresses all relevant parts of the construct o Related to a bunch of different characteristics and outcomes o Should be theory guided o What components should my operationalization also capture o Obesity risk-food consumption +family history + nutrition + thyroid + activity level EMPERICAL: observed, seen, measurable - Criterion validity o Associated between measure and expected outcomes o Not because you think they are, but because you can see/measure etc. o Observable results/behaviors  Trace behaviors  Future behaviors  People you expect to do something will do it and vice versa.  Future/predicted behaviors  Measure height now  Measure number of free-throws later  How many go to NBA o Good criterion validity = positive correlation Assessing criterion validity- known groups - Known groups paradigm o Take individuals that you know are high/low on some scale and see if your measurement tool can discriminate between them o Groups are established/known to be different in construct on interest o Not depressed/depressed o None/mild/moderate/severe - Polygraph tests o Physiological operationalization of honesty o To know if polygragh works  Known groups paradigm  Measure a group told to lie and group told to be honest  If they lie, regardless of how or why they are lying; my machine can capture that – telling them to lie won’t change anything - Assess lying merely from language o Linguistic inquiry and word count [LIWC] o Takes sentences and characterizes different things about those sentences o Known groups  Write a lie and write a truth - Secret to finding out when people were lying o Differences in sentence structure o True has more detail and words o Fewer verbs; less emotional context o More first person pronouns o Actually talking about yourself - People who were found innocent after charged guilty o Talked about selves o Tons of detail - Guilty of purgury o He/she – shift blame away from themselves o Not as much detail Assessing validity; comparing measurements - Your measurement against another measurement o Preferably established - Two types of validity o Convergent  Should correlate with other measures of same construct  High scorers in other tool; high scorers in my tool and vice versa o Discriminant/divergent  Should not correlate with measures of a different contruct  Correlation of 0  Depression survey does not return the same answers as an addiction survey o Criterion validity  Should highly correlate with expected outcomes


