Week 4 and 5 In-Class Notes
Chapter 5: Measurement Reliability:
• Today’s Learning Goals:
o Recognize the difference between a conceptual variable and its
o Classify measurement scales as categorical and quantitative variables as ratio, interval or ordinal.
o Identify three types of reliability (test-retest, inter-rater, and internal) and explain when each type is relevant.
• This Is Your Brain on Politics
o Iacoboni et al. Op-Ed, NYT, Nov. 11, 2007.
o fMRI on swing voters to see response on presidential candidates
o Voters who rated Hillary Clinton negatively shoed high anterior cingulate cortex activation when viewing pictures of her If you want to learn more check out what systems are used to create the digital designs and then manufacture the products?
o “They were battling unacknowledged impulses to like Mrs. Clinton.”
o “This phenomenon, not found for any other candidate, suggests that Mrs. Clinton may be able to gather support from some swing voters who oppose her if she manages to soften their negative responses to her.”
• I-Clicker: In response to Iacoboni et al.’s Op-Ed, researchers argued that “it is not possible to definitively determine whether a person is anxious or feeling connected simply by looking at activity in a particular brain region.” Which type of validity are the researchers questioning? Construct Validity Don't forget about the age old question of fsu economics
o Other types of validity?
▪ Statistical Validity: Iacoboni and colleagues do not provide any statistical evidence to support their claims.
• No peer review, flawed reasoning, unfounded conclusions!
Three Claims, Four Validities:
Three Types of Claims:
Four Types of Validities:
Constructs and Operationalizations:
• There are many ways to operationalize a conceptual variable
o Ex: Want to measure “well-being”
▪ Use 5-item scale
▪ Count the number of smiles
▪ Take one’s blood pressure
o “Well-being” is a conceptual variable
• Three common types of measures:
▪ Verbal questions in questionnaire or interview
• “Stress”: How stressed are you on a scale from 1-10?
▪ Observable behaviors
• “Stress”: Observe people’s behavior and reactions (after being
given epinephrine and/or B12) being asked annoying questions
▪ Biological Data If you want to learn more check out the knowledge experience goals values and attitudes
• Brain activity, hormone levels, etc.
Categorical vs. Quantitative Variables
• Categorical Variables:
o Levels are categories
o Ex: majors – psychology, biology, business, English, …etc.
o Numbers don’t mean anything (e.g. psych =1, bio = 2, …etc.)
• Quantitative Variables:
o Values can be recorded as meaningful numbers
o Ex: weight (140 lbs. > 115 lbs.) We also discuss several other topics like tam card study guide
• Practice: Categorical or Quantitative?
o % correct on a memory test
o Number of beeps detected
o Hair color
o Income (e.g. monthly salary)
o Blood type
o Political Party
o Educational Experience (e.g. years)
Scales of Measurement: NOIR
• Nominal: Categories with different names
o Ex: Pinot Noir, Pinot Gris, Merlot
o Only grouping available
• Ordinal: Levels can be put in a rank order sequence
o Ex: small, medium, or large glass
o In addition to grouping, ranking or ordering is possible
• Interval: Equal intervals between levels, but no true zero point
o Ex: temperature (0°F is not the absence of temp.)
o In addition to grouping and ordering, intervals are standard
• Ratio: Equal intervals and a “true” zero
o Ex: money
o In addition to grouping, ordering and equal intervals, rations of numbers reflect ratios of magnitude.
o Eye color: Nominal
o Rating of well-being on 5-point scale: Interval
o Reaction time on computer task: Ratio
o Order of finishers in a 5K race: Ordinal
o Parents’ marital status: Nominal
o Blood alcohol content: Ratio
o Degree of pain felt, as rated on a 10-point scale: Interval
o Seating order in an auditorium: Ordinal
Measurement Decisions: If you want to learn more check out the cognitive ability to organize and prioritize the many thoughts that arise from the various parts of the brain, which allows a person to anticipate, strategize, and plan behavior, is referred to as:
• If you have an interesting research question and hypothesis, how do you decide what type of data to collect?
• You need to make some decisions about what you will measure:
o Self-report, observational or physiological measure?
o Categorical or quantitative data?
o If quantitative – ordinal, interval, or ratio scale?
• Think about what you are trying to predict (i.e. the outcome) and sketch your prediction o This will help you figure out what you need to measure and why We also discuss several other topics like according to freud, unconscious urges:
• What makes a good measure?
o Appropriate for the research question and hypothesis
• A reliable measure is consistent
o Test-retest reliability is consistent across different times
▪ Use correlation coefficient r to evaluate reliability
o Interrater reliability is consistent across different observers
▪ Use scatterplot to evaluate reliability
o Internal reliability is consistent across different items within one measure ▪ Don’t confuse internal reliability with internal validity!
▪ Internal reliability refers to the extent to which multiple measures, or items, are all answered the same by the same set of people
• Measured with Cronbach’s alpha (coefficient alpha): average of
all of the possible inter-item correlations
▪ Use coefficient alpha to evaluate reliability
o It depends on the situation to decide which kind of reliability is the most important?
▪ Test-Retest Reliability:
• Very important for claims about stability over time
• Example: individual differences
▪ Inter-rater Reliability:
• Very important for claims based on a measure that requires a
human judgement for observation
• Example: presence/absence of an action in a video
▪ Internal Reliability:
• Very important for claims based on self-report measures
• Example: survey that asks about the same construct in different
o I-clicker: If a scale or measure has good inter-rater reliability, it will have a positive slope on an inter-rater scatterplot.
• Today’s Learning Goals:
o Identify types of measurement validity: face, content, criterion, convergent and discriminant validity
o Describe the difference between he validity and reliability of a measure. • Narcissism Example:
o The claim: “college students are getting more narcissistic”
▪ Frequency claim
▪ Narcissism might be measured through self-reports (not ideal),
observation (in multiple ways)
▪ Narcissism needs to be well measured
o Narcissistic Personality Inventory: NPI-16
▪ Internal reliability and test-retest reliability
• What makes a good measure?
o Appropriate for your research questions and hypothesis
o Reliable – consistent
o Valid – measures what it’s supposed to measure
• Validity of Measurement
o Does your dependent measure actually measure what you want it to measure?
o Construct Validity:
▪ Face validity
• Does it look like a good measure of the variable in question?
o Often assessed by experts
o head circumference as a measure of hat size
o Number of problems solved as a measure of intelligence
▪ Content Validity
• Does it include all the important components of the construct?
o Ability to reason, plan, think abstractly, comprehend
complex ideas, learn quickly as a measure of general
o Empirical Assessments of Validity:
▪ Criterion Validity
• Does your measure predict actual behavior or outcomes that it
should theoretically predict?
• Two ways to assess criterion validity
o Correlation Method: does your measure correlate with
the behavior or outcome of interest?
o Known-groups Method: Do groups who are known to
differ on the variable of interest score differently on your
▪ Convergent Validity
• To what extent is your measure associated with other measures
of the same construct?
▪ Discriminant Validity
• Aka Divergent Validity
• To what extent is your measure not associated with measures of other constructs?
• Measurement Reliability vs. Validity
o Reliability ≠ validity
o Reliability refers to the consistency of a measure
o Validity refers to whether a measure is measuring what it’s supposed to be measuring.
• Relationship between Reliability and Validity
o Can a measure be reliable but not valid?
▪ Head Circumference as an intelligence test
▪ Number of children you have as a measure of interest in children o Can a measure be valid but not reliable?
▪ If a measure does not correlate with itself, how could it be strongly associated with another measure or variable?
Chapter 6: Surveys and Observations
• Today’s Learning Goals
o Describe ways to improve the construct validity of survey questions. o Explain how to improve the construct validity of behavioral observations.
• Surveys as Self-Report Measures
o Self-report measures can be an important route to discovery
▪ Useful when claim is about the nature of people’s beliefs and opinions o Surveys are a common self-report measure
▪ Often used to make frequency claims
o How do we evaluate the construct validity of surveys and survey questions?
• Construct Validity of Surveys
o Choosing question formats
▪ Open-ended Questions:
• Ex: What are your comments about this professor’s teaching?
• Subjective to individual’s beliefs
▪ Forced-choice Questions:
• Ex: I really like to be the center of attention; It makes me
uncomfortable to be the center of attention (Rating using Likert
• Forces people to make a choice using ratings, etc.
▪ Likert Scale
• Ex: I am able to do things as well as most other people (1. Strongly
Disagree -> 5. Strongly Agree)
▪ Semantic Differential
• Two different poles of what is trying to be measured
• Ex: Fats are: 1. Unhealthy -> 5. Healthy
o Writing well-worded questions
▪ “Do you agree that it is a terrible idea to legalize? Or Do you agree that it is a terrific idea to legalize marijuana?”
▪ “How fast do you think the car was going when it hit the other car? Or How fast do you think the car was going when it smashed into the other car?”
▪ Survey questions should be written as neutrally as possible and avoid leading questions
▪ “I look for main ideas as I read the textbook (1. Disagree -> 5. Agree)” ▪ “I look for main ideas as I read the …Canvas
▪ “Should the instructor not schedule an exam the same week a paper is due?”
▪ “People who do not drive with a suspended should not be punished. Yes or no?”
▪ Negatively worded questions can confuse people and should be avoided. ▪ Wording of Ballot Measures
• Oregon Measure 105: Repeals law limiting use of state/local law enforcement resources to enforce federal immigration laws
• Problems with the wording?
o Negatively wording
▪ Order of Survey Questions
• “Do you generally oppose affirmative action programs for women and/or racial minorities?”
o Being asked about women first increased reported
affirmative action support for racial minorities.
o Being asked about racial minorities first decreased
reported affirmative action support for women.
• “Do you generally oppose affirmative action programs for racial minorities and/or women?”
• The order of questions can matter!
• Good survey includes creating multiple versions of the survey with different orders of the questions.
o There are many ways to write a survey question
▪ Format can be open-ended, forced-choice, Likert scale, or semantic differential
o For good constrict validity, the questions should:
▪ Be clear
▪ Be neutral (i.e. no leading questions)
▪ Ask a single question (i.e. no double-barreled questions)
▪ Avoid negative wording
▪ Be presented in different orders to different participants
o I-clicker: In developing a measure of narcissism, a researcher asks participants to rate their agreement with the following statement: “I like to be the center of attention and I think I am a special person.” What is the problem with this question? It is a double-barreled question.
o Encouraging accurate responses
▪ You may write the clearest and beautifully worded survey, but what if people don’t respond accurately?
▪ Reasons why responses may inaccurate include:
o Response sets can be a problem on long surveys with lots
of related questions
▪ Acquiescence (yea-saying): say “yes” or “strongly
agree” to every without thinking carefully
• Solution: reverse-word some items
▪ Fence-sitting: People often play it safe by choosing
the midpoint of the scale.
• Solution: Take away the midpoint (i.e. the
o Solution: …Canvas
• Trying to look good
o “Do you generally favor affirmative action programs for
▪ Many will say yes because it is socially desirable
▪ Solution: include specific survey items that check
for socially desirable responding
o People who agree to these kinds of statements usually
care about social desirability and might be “faking good”
on your survey.
o People sometimes give the socially desirable response
even if this is not what they really think
• Inability to accurately report reasons for behavior and memories
• Behavioral Observations:
o By observing people, you avoid the challenges to construct validity that self report measures face.
o Observations can be better than self-reports, BUT trying to accurately observe people presents its own set of challenges.
o Inter-Rater Reliability
▪ Observational measures should include multiple observers who are well trained with a very clear codebook
▪ Researchers should test inter-rater reliability
• If inter-rater reliability is low, you may need to develop a clearer
codebook or provide better training for the observers.
▪ Example Research Question: What proportion of scenes that children see have a hand in them?
o Observer Bias: when observers see what they expect to see
▪ Solution: blind studies – observers don’t know hypotheses fo study ▪ Therapists watched a video fo a young man talking to a professor about feelings and experiences
• Some were told that the young man is a patient
• Others were told is that the young man is a job applicant
▪ Therapists asked for their observations
• “Patient” therapists described the young man as a “light,
defensive person,” “frightened of his own aggressive impulses”
• “Job applicant” therapists described the man as “attractive,”
o Observer Effects: when the participants confirm observer expectations ▪ Ex: Clever Hans
o Solution: use a blind design (or masked design)
o Ideally, the observers should be blind to the study’s hypothesis and blind to the experimental conditions to which participants have been assigned o Reactivity: when participants react to being watched
▪ Ex: an observer walks into a preschool classroom and the children stop playing and look at the observer
▪ A research assistant stands at an intersection with a clipboard and all drivers stop at the stop sign.
• Wait it out
• Unobtrusive observations – blend in
• Unobtrusive data – measure the behavior’s results
o Observing People Ethically
▪ Is it ethical for researchers to observe the behaviors of others? • It depends.
Chapter 7: Sampling
• Explain why a random sample is more likely to be a representative sample and why representative samples have external validity to a particular population
• Identity when having a representative sample is especially important and when it is not. • Differentiate between:
o Random sampling and random assignment
o How a sample was collected and sample size.
• Prof. X has a frowny face on ratemyprofessors.com
• Five-star boots on Zappos.com
• Less than half of Americans are happy with US education
• Who is contributing the data?
• To assess the external validity of a claim we should ask:
o What is the population that is relevant for the claim?
o Is my sample representative of the relevant population?
• Note: Generalizability can refer to both the sample and the setting.
o Focusing on the generalizability of the sample.
• Biased Samples:
o Convenience Sampling: sample chosen based on who is easy to contact and readily available
• Exit poll workers approach friendly looking people
• Memory researches collect data from psychology students
• Personality researchers use Amazon’s Mechanical Turk (MTurk)
o Self-selection: sample consists of people who volunteer to participate ▪ Examples:
• Product ratings completed by people who felt strongly about the
• Internet polls completed by people who go online and do these
o Potential Solutions:
▪ Random Sampling (or Probability Sampling)
• Simple Random Sampling
• Cluster and Multistage Sampling
• Stratified Random Sampling
• Systematic Sampling
• Random Assignment vs. Random Sampling
o Random Assignment: used only in experimental designs to assign participants to groups at random
▪ Each participant has equal chance of being assigned to each group ▪ Increases internal validity
o Random Sampling: creating a sample from a population of interest via random selection
▪ Each member of the population has an equal chance of being included in the sample
▪ Increases external validity
o Nonrandom Sampling:
▪ If external validity is not a top priority, sometimes we settle for
nonrandom, or biased, sampling:
• Convenience sampling
• Purposive sampling
• Snowball sampling
• Quota sampling
• Interrogating External Validity
o Random sampling produces representative samples and helps us achieve external validity
▪ Non-random sampling produces biased samples
o When is a representative sample important?
▪ The important of a representative sample is going to depend on the type of claim
▪ External validity is especially important for frequency claims.
• Can non-random sample ever be an adequate source of
information for a frequency claim?
▪ External validity is often not the top priority for association or causal claims.
• Larger samples are not necessarily more representative
o For external validity, sampling technique is more important than sample size!
Chapter 8 Notes: Bivariate Correlations
• Learning Goals
o Explain what makes a study correlational.
o Estimate results from a correlational study by looking at a scatterplot and a bar graph.
o Interrogate …Canvas
• What Makes a Study Correlational?
o Measured variables!
▪ It’s the design, not any particular statistic that makes a study
o Correlational studies can include both continuous and categorical variables (but they are graphed differently).
• Bivariate Correlations
o Bivariate correlations are associations between two measured variables. ▪ Measured, not manipulated
▪ Could be measured or manipulated
o Plotting and Analyzing Bivariate Correlations
▪ When both variables are continuous:
• Plot the data using a scatter plot
• Compute the correlation coefficient
▪ When one variable is categorical:
• Plot the data with a bar graph
• Compute the t-test (or an ANOVA, if >2 groups) to compare the
▪ Again, the type of graph or statistic doesn’t matter – what makes a study correlational is that the variables are measured.
• Interrogation Association Claims: Four Big Validities:
o Construct Validity: How well was each variable measured?
▪ You need to assess the construct validity for each variable
o Statistical Validity: How well do the data support the conclusion?
▪ What is the effect size?
• Effect size describes the strength of an association
• Large effect sizes
o Mean strong correlations
o Permit more accurate predictions
o Are typically more important
o Are more likely to be statistically significant
▪ Is the correlation statistically significant?
• What dies a statistically significant result mean in the context of
an association claim?
o It means that the probability that the association in the
sample came from a population with an association of zero
is very small (usually less than 5%; p<.05)
▪ If the relationship is actually zero, we want to avoid
mistakenly concluding otherwise.
▪ P < .05: an acceptable risk of this mistake
▪ Could outliers be affecting the association?
▪ Is there restriction of range?
• Restriction of range makes a sample’s correlation appear smaller
than it really is in the population
▪ Is there a curvilinear association?
• You need to make sure that zero associations aren’t actually
o Internal Validity: Can we make a causal inference from an association? o External Validity: To whom can the association be generalized?
▪ How important is external validity for an association claim?
• If a correlational study doesn’t use a random sample, we
shouldn’t reject the association.
• Save the question of generalizability for another study that test
the same variable in a different populations.
• Three Causal Criteria
▪ Bivariate correlations show covariance.
o Temporal Precedence
▪ Bivariate correlations do not usually show temporal precedence (not sure which variable comes first).
o Internal Validity
▪ No control for third variables with bivariate correlations.
o Can we make a causal inference from an association?
o Be aware that it is often tempting to assume causality.
PSY 301 Exam 2 Study Guide
Chapter 5: Measurement Reliability and Validity
• The validities at question in the Iacoboni et al. Op-ed article are Construct Validity and Statistical Validity.
• Types of Measures:
o Self-report measure:
▪ Verbal questions for participants to answer either by interview or
o Observational measure:
▪ Observable behaviors
o Physiological measure:
▪ Biological data
• Scales of Measurements:
o Categorical Variables: levels are categories; numbers are not meaningful ▪ Nominal Scale: categories with different names; only grouping is
o Quantitative Variables: Values recorded as meaningful numbers
▪ Ordinal Scale: Levels can be put into rank order sequence; grouping,
ranking and ordering is possible.
▪ Interval Scale: Equal intervals between levels, but no true zero; grouping and ordering; intervals are standard
▪ Ratio Scale: Equal intervals and a true zero; grouping, ordering, equal intervals and rations of numbers reflect ratios of magnitude.
• Reliability: establishes construct validity
o Correlation coefficient r: evaluates reliability by indicating the direction and strength of the relationship.
▪ Positive: sloping up
▪ Negative: sloping down
▪ Zero: no slope
▪ The strength of the relationship is determined by how close together the dots are on a scatterplot
o Test-retest Reliability:
▪ Research results are consistent across different times
o Interrater Reliability:
▪ Research results are consistent across different observers
o Internal Reliability:
▪ Consistent across items within one measure.
▪ Cronbach’s alpha (coefficient alpha): the average of all possible inter-item correlations
• Measurement Validity: establishes construct validity
o Face Validity: Does it look like a good measure of the variable in question? o Content Validity: Does it include all important components of the construct?
o Criterion Validity: Does the measure predict actual behavior or outcomes it predicts?
▪ Known-groups Method: Do the groups who are known to differ on the variable of interest score differently on your measure?
▪ Correlation Method: Does your measure correlate with the behavior or outcome of interest?
o Convergent Validity: To what extent is your measure associated with measures of other constructs?
o Discriminant Validity: To what extent is your measure not associated with measures of other constructs?
Chapter 6: Surveys and Observations
• Surveys as self-report measures: useful when claim is about nature of people’s beliefs and opinions
o Survey Question Format:
▪ Open-ended Questions: Subjective to individual’s beliefs
▪ Forced-choice Questions: Forces people to make a choice using ratings, etc.
▪ Likert Scale: 1. Strongly Disagree -> 5. Strongly Agree
▪ Semantic Differential: Two different poles of what is trying to be
o Survey Question Wording:
▪ Leading Questions: questions that lead to a particular response
▪ Double-barreled Questions: two questions in one.
▪ Negatively worded Questions: negatively phrased questions.
▪ Order of Survey Questions: Very important!
▪ Response sets: answering with a consistent pattern
• Acquiescence: saying “yes” or “strongly agree” to every question
without thinking carefully
o Solution: reverse-word some items
• Fence-sitting: people often play it safe by choosing the midpoint
of a scale
o Solution: take away the midpoint
▪ Faking-good: socially desirable responses
▪ Self-reporting more than you can remember: not as accurate as they believe they are
o Observational Data:
▪ Inter-rater Reliability:
• Observational measures should include multiple observers who
are well-trained with a codebook’
▪ Observer Bias:
• When observers see what they expect to see
▪ Observer effects:
• When participants confirm observer’s expectations
• Solution: Blind (Masked) Design
▪ Reactivity: When participants react to being watched
• Solutions: Wait it out, Unobtrusive observations, and unobtrusive
• Population: entire set of people/products you are interested in
• Sample: smaller set, taken from the population
• Census: set of observations that contains all members of the population • Representative Samples: all members of the population have an equal chance of being included in the sample
• Biased Samples: some members of the population have a much higher probability of being included in the sample compared to others.
• Random Sampling: creating a sample of the population via random selection o Simple Random Sampling
o Cluster Sampling
o Multistage Sampling
o Stratified Random Sampling
o Systematic Sampling
• Random Assignment: in experimental designs, assignment of participants to groups at random
• Non-Random Sampling: creating a sample of the population via nonrandom selection o Convenience Sampling
o Purposive Sampling
o Snowball Sampling
o Quota Sampling
• External Validity for Frequency Claims
o Random Sampling produces the best representative samples
o Especially Important to choose whether or not the sample is an adequate source of information.
• Sample size: larger samples not always more representative
• Sampling technique is more important
Chapter 8: Bivariate Correlations
• Bivariate Correlations are associations between two measured variables. o When both variables are continuous:
▪ Plot using a scatter plot
▪ Compute r (correlation coefficient)
o When one variable is continuous and one is categorical:
▪ Plot data with a bar graph
▪ Compute t-test to compare the group means
• Construct Validity of Association Claims:
o How well was each variable measured?
• Statistical Validity:
o How well does the data support the conclusion?
o Effect size: describes the strength of the association
▪ Large: strong correlations and more accurate predictions
o Statistical Significance:
▪ Small probability that association came from population with an
association of zero.
o Outliers: extreme score than stands out from the pack.
o Restriction of Range: makes sample’s correlation appear smaller than it really is in the population
o Curvilinear Associations: in which relationship between two variables in not a straight line, might change from positive to negative, or vice versa.
• Three Causal Claim Criteria
▪ Bivariate correlations show covariance
o Temporal Precedence
▪ Bivariate correlations do not usually show temporal precedence
o Internal Validity
▪ No control for third variables with bivariate correlations
▪ Spurious Association: bivariate correlation is there only because of a third variable.
• External Validity of Association Claims:
o Ask who were the participants and how were they selected
o Moderator: a variable that, depending on its level, changes the relationship between two other variables.
Chapter 9: Multivariate Correlations
• Multivariate Design: A study designed to test an association involving more than two measured variables.
• Longitudinal Design: provides evidence for temporal precedence by measuring the same variables in the same people at several points in time.
o Cross-sectional Correlation: test to see whether two variables, measured at the same point in time, are correlated.
o Autocorrelation: determine if correlation of one variable with itself, measured on two different times.
o Cross-lag correlation: shows whether earlier measure of one variable is associated with the later measure of the other variable.
• Multiple Regression: can help rule out some third variables, addressing internal validity concerns.
o Controlling for another variable: making sure that there is no third variables in the correlation
o Criterion Variable: variable a researcher is most interested in understanding or predicting.
o Predictor Variable: the rest of the variables measured in a regression analysis. o Beta: one for each predictor variable; to help make sure there are no third variables.
• Pattern and Parsimony:
o Parsimony: degree to which a scientific theory provides the simplest explanation for a phenomenon.
o We have to specify a mechanism for a causal path to control for alternative explanations; exemplifies the theory-data cycle
▪ If all diverse predictions tied to one central principle, they are
• Mediator: variable that helps explain relationship between two other variables