Chapter 1 Chapter 2 The Scientific Study of Politics and The Art of Theory Building 24 August 2015 1/23 Chapter 1 Chapter 2 Models of Politics Key to building scientific knowledge is developing models of politics • Model: concepts and variables linked together by theoryIf you want to learn more check out the standard price of materials is $4.10 per pound
Don't forget about the age old question of davidson corporation's master budget shows
Don't forget about the age old question of abba inc is considering dropping a product line
We also discuss several other topics like Why is customer relations important?
Don't forget about the age old question of Explain sensory development before birth.
If you want to learn more check out What is the wealthiest European country and has the most powerful military and navy?
. • Paradigm: Fundamental models, set of shared assumptions and commonly accepted theories. All models are simplifications but what matters is whether the model is useful. 3/23 Chapter 1 Chapter 2 Variables Definition “A variable is a characteristic that can vary in value among subjects in a sample population.” • Two main types of variables: 1. Discrete: “A variable is discrete if its possible to form a set of separate numbers, such as 0, 1, 2, 3.” • Most of the discrete variables we use in political science are categorical (gender, religion) for which we assign numbers to categories. 2. Continuous: “A variable is continuous if it can take an infinite continuum of possible real number values.” • Economic indicators, percent turnout are continuous variables. 5/23 Chapter 1 Chapter 2 The Scientific Study of Political Phenomena To answer our research questions, we follow a common set of steps: • Develop a theory • Tentative conjecture about what causes political phenomenon of interest. • Derive testable hypotheses • Null hypothesis: What we expect if theory is wrong. • Alternative hypothesis: What we expect if theory is correct. • Empirical tests: Evaluate hypotheses and theory. • Collect evidence to make judgement about “truth” of hypotheses and theory. ) Scientific Knowledge 2/23 Chapter 1 Chapter 2 Building Theories, Part I First step in theory building: express theory and concept in terms of variables. Some definitions: • Variable label: Description of what variable is (e.g., age). • Variable values: denomination in which variable occurs (e.g., years). • Dependent variable: variable to be explained. • Independent variable: variable explaining the variation in dependent variable. • The value of the dependent variable “depends” on the value of the independent variable. 4/23 Chapter 1 Chapter 2 Categorizing Variables 1. Categorical • Ordinal: Have ordered categories. • Nominal: No ordering 2. Quantitative • Interval (Continuous) • Binary: special discrete case 6/23Chapter 1 Chapter 2 Summary: Variable Classifications • Quantitative vs. Categorical • Continuous vs. Discrete • Nominal vs. Ordinal vs. Interval 7/23 Chapter 1 Chapter 2 An Example: The Economic Theory of Voting Theory: The incumbent president will fare better when the economy is relatively healthy. • Restated: The state of the economy a↵ects the outcome of the presidential election. • What are the independent and dependent variables? • Do we expect positive or negative relationship? • When the economy is good, reelection chances are good (positive relationship) • When the economy is good, reelection chances are bad (negative relationship) • Causal explanation: Voters hold president responsible for state of economy. Therefore, when economy is strong, more voters will vote for the incumbent. 9/23 Chapter 1 Chapter 2 The Economic Theory of Voting Continued Our theory: Stronger economic performance causes the incumbent party vote share to be higher. How can we operationalize economic performance? • Inflation rate • GDP per capita growth rate • Unemployment rate 11/23 Chapter 1 Chapter 2 Building Theories, Part II The second step in theory building: develop a causal explanation. Why do you think that this independent variable is causally related to this dependent variable? What is the underlying process that links these two variables together? 8/23 Chapter 1 Chapter 2 Evaluating a theory Operationalization: How we translate our concepts and variables so they are measurable and observable. • There are often many ways can operationalize a variable or concept. • How you operationalize will a↵ect your testable hypotheses: Whether it is positive or negative. 10/23 Chapter 1 Chapter 2 Data Once we have our hypotheses, we can collect data to test our hypothesized relationships. Definition “The observations gathered on the characteristics of interest are collectively called data.” • Di↵erent types of data used in the political science: • Experimental (lab or field) • Observational • Surveys • Economic (at individual and macro levels) • Political characteristics of states • The type of data used determined by research question(s). 12/23Chapter 1 Chapter 2 Basics of Theory Building 1. Variation • And type of variation. 2. Generalizability • Can your theory explain similar events/behavior in other contexts/countries? 3. Build on previous research/Advance knowledge • How does your theory compare/contrast with previous research? 13/23 Chapter 1 Chapter 2 Times-Series vs. Cross-Sectional Types of variation in the dependent variable: 1. Time dimension: Dependent variable can vary across time (monthly, quarterly, yearly). 2. Spatial dimension: Dependent variable can vary across units: individuals, US states, countries. We can thus measure are dependent variable in three ways: 1. Times-series: measure dependent variable at di↵erent points in time. 2. Cross-sectional: measure di↵erent units at same point in time. 3. Time-series cross-sectional. 15/23 Chapter 1 Chapter 2 Military Spending Figure 2.2. Military spending as percentage of GDP in 2005 for 24 countries. Cross-sectional or time-series? Cross-sectional: varies across spatial units (nations) but not time. What could be a research question related to this data? 17/23 Chapter 1 Chapter 2 Variation • Think back to some of the questions political scientists are interested in: • Who votes? • Why do some countries have multiparty systems; others, two-party? • When and why do protests erupt and spread? • Goal of much of social research is to explain variation in a dependent variable. • This variation can be across time (quarters, years, months) or across space (spatial units, such as countries, cities, people) 14/23 Chapter 1 Chapter 2 Presidential Approval Figure 2.1 Average monthly level of U.S. presidential approval from 1995 and 2005. Why is this time-series? Because same spatial unit measured over di↵erent points in time. What is the variation we want to explain? • Presidential approval (what causes it to go up and down?) 16/23 Chapter 1 Chapter 2 Generalizing We want to make general statements about political behavior and phenomena. Can use specific/local knowledge to generalize. Example • Spike in approval after September 11 2001 attacks. • Many linked (causally) terrorist attacks and presidential approval. • But what can we say more generally about causes of spikes in presidential approval. ) John Mueller: International conflicts increase presidential popularity in short term (rally around the flag e↵ect). 18/23Chapter 1 Chapter 2 Previous Research • What did previous research miss? • Missing important variables? • Does the causal explanation seem reasonable? • Can theory be applied elsewhere? • Theory of Mueller apply beyond US. • Systematic patterns. • Further implications? • Consequence of Mueller’s theory. • Theory apply to di↵erent levels of aggregation. • What individual-level behavior can result in aggregate trend? 19/23 Chapter 1 Chapter 2 Exercise 3, Chapter 2 Look at Figure 2.4, pg. 49 1. What is the variation we want to explain? • What is a potential research question we could ask with this variable? 2. Is the variable a time-series measure or cross-sectional measure? • What is the spatial unit? 3. Can you think of a theory that causes this variable to be higher or lower? 4. Is this percentage of women MPs the dependent or independent variable? 21/23 Chapter 3 Evaluating Causal Relationships 29 August 2016 1/21 Chapter 1 Chapter 2 What makes a good theory? • Is your theory causal? • Our ultimate goal is to understand causality. • Can you test your theory on data not yet observed? • Do not “data mine” • How general is your theory? • How parsimonious is your theory? • Avoid “garbage can” models. • How new/non-obvious is your theory? 20/23 Chapter 1 Chapter 2 Rules of the Road 1. Make your theories causal: Correlation (covariation) does not equal causation! 2. Don’t let data alone drive your theories. • It is easy to use raw numbers and statistics to lie 3. Consider only empirical evidence: Observe real world/test your theory. 4. Avoid normative statements: Not trying to explain how world should be. 5. Pursue generality and parsimony (simplicity). 22/23 Recap • Purpose of social research: exploratory, descriptive, explanatory, forecasting. • Steps to building scientific knowledge about political phenomena/behavior: • Develop theory and causal explanation; Derive testable hypotheses, evaluate. • Types of variables: categorical (nominal and ordinal) and continuous. • Two ways of analyzing data: Descriptive and inferential. • Important goals: make theories causal, empirical, nonnormative, general, and parsimonious. 2/21Outline of Lecture 1. Types of causal theories: bivariate, multivariate, probabilistic, deterministic. 2. How to evaluate causal theories: the four causal hurdles. 3. What it means to “control” for other relevant causes (variables). 4. Spurious relationships and other types of causal relationships. 3/21 Deterministic vs. Probabilistic Relationships 1. Deterministic: X causes Y with certainty. • Associated with physical sciences • Relationship between Fahrenheit (Y ) and Celsius (X): Y = 32 + 95X. • Equation for line: Y = a + bX 2. Probabilistic: Increases X associated with increase (or decrease) in Y . • Relationships between variables almost never deterministic. • Relationships will be probabilistic. • Equation for line Y = a + bX + e 5/21 Causal Relationships • Usually our theories have one causal direction: X causes Y but Y does not cause X. • Causality assessed through four criteria: 1. Credible causal mechanism 2. Association between variables 3. An appropriate time order: Can Y cause X? 4. Elimination of alternative explanations: Is there another variable, Z, that is related to X and/or Y ? 7/21 Causal Theories Theories typically specify causal relationship: X ! Y Most theories concern the relationship between single cause and single e↵ect. • In other words, theories are bivariate: involve 2 variables (X and Y ). But in social sciences, relationships are neither deterministic nor bivariate. • Variation in dependent variable caused by more than one factor (multivariate). • As result, we have to control/account for these other factors. • Not controlling for other variables can result in incorrect causal inferences about the relationship between X and Y . 4/21 Example of Deterministic vs. Probabilistic Causal Relationship Theory: Individual’s wealth a↵ects his/her opinions on tax policy. What is implication if deterministic? • Every wealthy person prefers lower taxes. What is implication if probabilistic? • Wealthier people are more likely to favor lower taxes. Warren Bu↵ett, one of wealthiest persons in the world, favors raising taxes on wealthy individuals. • Doesn’t mean theory is incorrect: the theory is probabilistic about the variables associated with favoring lower taxes. 6/21 Association and Time Order Association • How do we examine association? • Pearson’s r; t-test for b; scatterplot; contingency tables • Association does not imply causation Time Order • Cause precedes e↵ect. • In some cases, its easy to establish time order but not in other cases: 1. Race, gender, age precede attitudes or achievements. 2. Economic development (X) and Democracy (Y ): X ! Y but also possible that Y ! X. 8/21Elimination of Alternative Explanation • Again, association (even with appropriate time order) does not imply causality. • Relationship may be spurious. • There may be another variable (explanation) that explains the observed association. • Example of ice cream sales and drownings. 9/21 Statistical Control When we evaluate causality between X and Y , we include other variables that a↵ect Y to see if relationship between X and Y remains. Causal E↵ect of Height on Math Achievement: Do tall students tend to be better at math than short students? • What are other factors that can account for this association? • Age, nutrition, . . . • How do we control for the e↵ect of age? • Look at math scores and height for students who are the same age or same grade. Control for a variable by holdings its value constant and call this variable a control variable 11/21 Selection E↵ects Also known as selection bias: Some underlying variable or characteristic that a↵ects the sample. From chapter example on school vouchers: Parents with high involvement are more likely to be aware of voucher programs and also more likely to be involved in child’s education (related to both X and Y ). British during WW II were losing a lot of planes. Statistician Abraham Wald was asked to decide where to add more armor 13/21 Association, Causality and Anecdotal Evidence • Exceptional cases/anecdotal evidence do not necessarily disprove causality. • They are useful: can, for example, help identifying additional causal variables. • We are dealing with probabilistic relationship: • Not all people who smoke two packs of day will get cancer. • If you smoke two packs of cigarettes a day you have, all else equal, a higher probability of getting cancer than people who do not smoke (i.e. psmokers > pnon). • Anecdotal evidence can disprove causality if contradicts one of three criteria other criteria. 10/21 Control Variables and Theory Knowing which variables to control for requires: • Theory • Knowledge of previous research Failing to include relevant variables can bias your results and lead to incorrect inferences about the relationship between X and Y . • Remember main goal of social science is to establish causal connections between variables. 12/21 So where should the British put the armor? • The obvious answer: where we see all the bullet holes. • Unexpectedly, however, Wald said to add armor where there was no damage. • Why? • Only looked at bombers that returned so concluded the bombers were able to withstand damage in those areas. 14/21Multivariate Relationships: Spurious Relationships Definition: A relationship between X and Y is said to be spurious if both variables are dependent on a third variable Z, and the observed association disappears when Z is controlled. Some Examples: 1. Ice cream and drownings. Intervening variable: Season-Temperature 2. Height and math scores. Intervening variable: Age 15/21 Race and Political Participation in the US Causal theory: Race a↵ects political participation rates in US Hypothesized causal relationship: Latinos and African Americans are less likely to participate. Can we accept this as a causal relationship? Does it meet our four criteria? 17/21 Why did the relationship between race and political participation disappear once the researchers controlled for SES? 19/21 Multivariate Relationships: Multiple Causes • Dependent (response) variables in social sciences often have multiple causes. • Example: Income a↵ected by education but also location, job type, gender, etc. • This is why it is so important to think about causal theory and possible control variables. • Can have independent and associated causes of Y. • Independent causes of life expectancy: gender (women live longer than men) • Associated causes of life expectancy: education and income. 16/21 Race and Political Participation: The Four Criteria 1. Is there a reasonable causal mechanism that answers “how” and “why”? • Yes reasonable: many informal and formal barriers to participation which can have lasting e↵ects on participation rates even after dismantled. 2. Can Y cause X? 3. Is there correlation between race and participation in US? • Yes we see from Verba et al (1993) data that there is relationship. 4. Are there other variables that once controlled for make the relationship between X and Y spurious? • Yes. • One important one is SES: SES is related to both participation and race. • Once the researchers controlled for SES, relationship between race and participation disappeared. 18/21 Example: Treatment Choice and Breast Cancer Survival Research question: Does treatment choice a↵ect post-treatment longevity? • What is the dependent variable? What is the independent variable? • Two treatment choices: Radical mastectomy and lumpectomy. • Expectation is that radical mastectomy might on average increase longevity given lumpectomy’s risk of leaving behind some cancer cells. • But research finds no association between breast cancer treatment choice (X) and post-treatment longevity (Y ). • Third causal hurdle has not been crossed. 20/21Example: Treatment Choice and Breast Cancer Survival The causal hurdles: 1. First causal hurdle: Credible causal mechanism? 2. Second: Possible that Y can cause X? 3. Third: Can we imagine other factor(s) (Z) that might a↵ect both treatment choice (X) and longevity (Y )? • Severity of cancer at detection acts as selection mechanism (selection e↵ect). • Patients who get radical treatment are systematically di↵erent than those that do not. 21/21 What research strategies do political scientists use to investigate causal relationships? 1. Experiments: • Lab, Survey, and Field Experiments. 2. Observational Studies • Cross-sectional and time-Series 2/1 Importance of Random Assignment • Dependent variable Y caused by many factors. • Randomization ensures some other cause Z will not a↵ect inference (the conclusions about X on Y ). • This is because it ensures that participants across groups will not be systematically di↵erent. • Ensures treatment and control groups are identical (or close to it). • Any di↵erences observed can attributed to X. Important to distinguish between random assignment (experiments: how values of independent variables are distributed) and random sample (method for choosing subjects/cases). 4/1 Chapter 4 Research Design 31 August 2016 1/1 Experimental Research Designs Two key components of experiments: 1. Control over values of the independent variable (X). • Values of independent variable not determined by participants or by nature, but by the researcher. • Helps control for selection e↵ects/bias and the influence of other variables (Z) 2. Values of independent variable are randomly assigned to participants. • Not only want to control values but want to make sure randomly assigned. • Individuals are randomly assigned to the treatment or control group. 3/1 Aspirin and Blood Pressure Research Question: Do people who take aspirin have lower blood pressure than people who do not? • Why can we NOT take a random sample of adults and compare blood pressure rates of those that take aspirin and those that do not? • Potential other variable, Z = health conscious, that makes people more likely to take aspirin. • Confounding variable a↵ects both aspirin regimen and blood pressure. • So is it health consciousness or aspirin that causes lower blood pressure? 5/1Aspirin and Blood Pressure How does an experimental research design help us determine the relationship between aspirin (X) and blood pressure (Y )? 1. All participants are in same pool (health conscious and not) 2. Then randomly assign people to treatment group (aspirin) and control (placebo). 3. Since X is determined by random assignment, it is unlikely to be correlated with anything. 4. And thus comparison between X and Y not a↵ected by other variables. Experiments do not rule out other factors, just control for them to isolate a↵ect of one specific variable X. • Multiple factors a↵ect blood pressure: exercise, stress, etc. 6/1 Experiments and Social Sciences • Many of the relationships do not easily lend themselves to experimental research designs. • Example: E↵ect of gender on political participation • We try to replicate experimental research design as closely as possible. • Use of randomization, control variables to isolate e↵ect of X on Y . There are also other types of non-laboratory experiments that are better suited to social science research questions: • Survey experiments and field experiments. 8/1 Survey Experiment Example: List Experiment One type of survey experiment is the list experiment: • Have two lists with the names of groups and individuals. • One list will have the “sensitive item” (Treatment). • Taliban, Al-Qaeda, ISAF, etc. • The other list will not (Control) • Respondents randomly assigned to treatment or control. • Because respondents are only asked how many of the groups or individuals they support on each list, this design o↵ers level of privacy (respondents not telling you directly they support Taliban). • Then you compare the number of items the average number of groups supported by respondents in control group to treatment group. 10/1 External vs. Internal Validity Internal validity: Level of confidence in conclusions about causality from study. • Why do experiments have high internal validity? External validity: Level of confidence in the generalizability of the conclusions: can results be generalized to broader population. • Why do experiments have low degrees of external validity? 7/1 Survey Experiment Example Survey experiments can be helpful in eliciting truthful responses to sensitive questions. For example, if we wanted to know civilian support for combatants during wartime. • This is important for regimes and militaries to assess the e↵ectiveness of strategies. If we just asked civilians which group(s) they supported, the answers may not reflect true attitudes. • Will civilians respond truthfully? Do they feel comfortable doing so? • One way to deal with the issue of sensitive questions is to use a survey experiment. 9/1 List Experiment Example: Support for ISAF Experiment from Blair et al. (2014) I’m going to read you a list with the names of di↵erent groups and individuals on it. After I read the entire list, I’d like you to tell me how many of these groups and individuals you broadly support, meaning that you generally agree with the goals and policies of the group or individual. Please don’t tell me which ones you generally agree with; only tell me how many groups or individuals you broadly support. Control: Karzai government, national solidarity program, local farmers. Treatment: Karzai government, national solidarity program, local farmers, foreign forces. 11/1Experimental research designs recap: Two key characteristics: control over values of independent variable and random assignment. Experiments have high internal validity, low external validity. Many research questions not easily studied using experimental research design. • We cannot manipulate (or randomly assign) a lot of variables of interest: gender, military spending of countries, etc. 12/1 Assessing Causality with Observational Studies 1. Credible causal mechanism: connecting X and Y same for both type of research designs. 2. Appropriate time order: More dicult in observational studies than in experimental research designs. 3. Association: even if there does not appear strong bivariate relationship, we can still find relationship after controlling for other factors (unlike in experiments). 4. Other factors: comparison between di↵erent groups with di↵erent values of independent variable is not sucient to draw conclusions about causal relationship between X and Y . • Later on in course, we will talk about ways to control for other factors in observational studies. 14/1 Population vs. Sample Definition “The population is the total set of subjects of interest in a study.” Definition “A sample is the subset of the population on which the study collects data.” 16/1 Observational studies Definition An observational study is a research design in which the researcher does not have control over values of the independent variable, which occur naturally. • Most common type of research design in political science. • Two main types: Cross-sectional and time-series (and time-series cross-sectional) • With observation studies, controlling for other factors increases our confidence in causal relationship. • Cannot use random assignment to control for potential confounding factors • We follow the same four hurdles outlined in Chapter 3. 13/1 Sampling Samples, and how we select samples, are really important in observational studies. Why do we sample? • Save time and money • Populations of units are large: almost always impossible to collect information on the whole population. What is a sample? • Subset of population. • Respondents in public opinion surveys are subset of population. 15/1 Sampling Designs When we select samples, what matters is quality not quantity. • We want samples that are representative. In order to ensure quality we focuses on the selection method. • Best way to get quality (“statistically dependable results”) is to choose our sample randomly 17/1Example: Quantity vs Quality • 1936 Literary Digest mailed 10+ million surveys to predict the winner of the presidential election. • Names drawn from telephone and automobile registration lists. • 25% of readers returned the survey ) some 2.5 million readers responded. • The survey predicted Alfred Landon to win over Franklin Roosevelt. • Previous polls had correctly predicted winner. • What are possible reasons for the incorrect prediction? 18/1 Sample Selection: Simple Random Sample • Selects n subjects from a population N, where N can be individuals, families, cities, hospitals, etc. Definition “A simple random sample of n subjects from a population is one in which each possible sample of that size has the same probability (chance) of being selected.” • Simple random sample has two properties: 1. Every subject or unit (individual, city, hospital) has same probability of being chosen. 2. Independence: Selection of one unit has no influence on the selection of other units. • Drawbacks: • Need to know population and have sampling frame. • Completely unbiased independent samples hard to find. • Why will randomly dialing phone numbers produce a biased sample? 20/1 Words of Warning 1. Our statistical methods depend on independence and lack of bias in our samples. 2. Without randomized design, we cannot make dependable conclusions or inferences. • Randomization guarantees a level of accuracy/level of confidence. 22/1 Representative Sample What is a representative sample? • Microcosm of population. Attributes of population reflected in approximately similar proportion. • The 1936 Literary Digest poll was not representative. Why important? • Generalizability ! A↵ects our ability to make conclusions about the population from the sample. • Random samples important for representative samples and generalizability. 19/1 Other Sampling Methods Because of our time and resource constraints, other sampling methods may be more cost-e↵ective, ecient and possible: 1. Stratified 2. Cluster • Used often in surveys 3. Systematic Although not simple random samples, these methods still retain elements of simple random sample, most notably are probabilistic samples. 21/1 Example: The importance of sampling It is important to keep in mind how sample data were drawn. A content analysis of a daily newspaper examined the percentage of newspaper space devoted to news about entertainment. The sampling frame consists of the daily editions of the newspaper for the previous year. What potential problem might there be for your conclusions if you decided to examine papers sampled every 7 days? Could we easily do a simple random sample? 23/1Non probability sampling Commonly known as samples of convenience or opportunity. How might the following samples lead to biased conclusions about the research interest? • We want to study decision-making behavior of legislators • We select to study the legislators in Idaho, Montana, Wyoming. • We want to know the issue preferences of Pennsylvania voters. • We sample 50 millworkers in Pittsburg. 24/1 Systematic Random Sampling • Selecting one from every k. First unit drawn randomly. • All cases (individuals, cities, etc) have same probability of being chosen BUT not all possible combinations have same probability. • Good representation of population • Problems arise when systematic bias in list from which we draw the sample. 26/1 Cluster Sampling Definition “Divide the population into a large number of clusters, such as city blocks. Select a simple random sample of the clusters.” • Multistage/Multilevel sampling. • Easier to sample housing units (known and fixed) than individuals. • All housing units part of geographically distinct area. 28/1 High Dose Birth Control and Breast Cancer Scientists at Fred Hutch in Seattle found increased risk of breast cancer for those taking high dose birth control. At first might be alarming, but . . . http://www.kplu.org/post/ certain-birth-control-pills-linked-breast-cancer-risk-say-seattle-scientists What is a fundamental question (or questions) we need answered before we can assess causality? https://www.fredhutch.org/en/news/center-news/2014/08/ Some-new-birth-control-raise-breast-cancer-risk.html 25/1 Systematic Random Sample in Practice http://conflict.lshtm.ac.uk/page 36.htm 27/1 Chapter 5 Measurement and Operationalization 7 September 2016 1/1Recap of Research Design • Experimental Studies • Observational Studies: cross-sectional and time-series. What is main di↵erence between the two? We also looked at sample selection and the importance of randomization. • Quality over quantity • Probability vs. Nonprobability: We want probabilistic samples: simple random, stratified, cluster. We want to be aware of possible sources of bias in a sample. • To evaluate the causal claims, we need to know sampling method. 2/1 Operationalization Our theories are about relationship between concepts. • The relationship between democracy and economic development. The problem is that we don’t observe these concepts. Evaluating our theories requires operationalization. Operationalization: “The movement of variables from the rather abstract conceptual level to the very real measurable level.” • Concept (economic development) ! measurable and observable variable (GDP per capita). 4/1 Measurement • Some concepts are easy to operationalize: Economic output or the unemployment rate. • Others are not: Poverty. • Could measure with some income cuto↵ (e.g. 3$ a day). • The problem is that many Western countries provide transfer payments. • Should we use pretransfer or a posttransfer definition? • The pretransfer gives sense of failings of private sector. • The posttransfer gives sense of how much welfare programs are falling short and sense of how people are actually living. ) The appropriate measure will depend your research interests. 6/1 Recap: Average Family Size How many children are in your family? Average family size was approximately 2 (1.87) children in 2015. • Why is the number for this class higher (lower)? 1. If family had zero children, they did not send them to college. 2. Sampling by child rather than by family: • Family with n children is n times more likely to be sampled. We need to know whether research sampled at random and the sampling method (unit). • Selection bias is one type of bias, but not the only potential source of bias. 3/1 Operationalization: Theory of Economic Voting Theory: Economic performance related to incumbent vote share. Causal statement: High economic performance causes higher incumbent vote share. How we operationalize the concept economic performance will a↵ect testable hypotheses. With unemployment: Higher unemployment rate will lead to lower incumbent party vote share (negative). With GDP growth rate: Higher GDP growth rate will lead to higher incumbent party vote share (positive). 5/1 Example: Measuring Depression Depression is a real thing but how should we measure it and why does it matter if this concept is measured accurately? • First, assume we want to study cures for depression. • If we don’t measure depression correctly or accurately, how can we have confidence in the e↵ectiveness of purported remedies? Political science has a number of important concepts that, like depression, are dicult to measure: 1. Judicial activism: role of judiciary in policy making process. 2. Political legitimacy 3. Social capital: level of interconnectedness (cohesiveness of population). 4. Democracy 7/1Problems in Measuring Concepts of Interest There are four criteria you want ensure in measuring your concept: 1. Conceptual Clarity: Clear sense of what the concept is trying to measure. • Do we want to measure income or standard of living? 2. Appropriate level of measurement: Individual, state, country, etc. 3. Reliability 4. Validity 8/1 Measurement Bias and Reliability Measurement Bias: The systematic over(under)-reporting of values of variable. • Less of a problem than unreliability for theory testing. • Looking for general patterns: higher values of X associated with higher values of Y • If measurement of X is systematically biased upward, the same general pattern of association between X and Y will be visible. 10/1 Validity I “A valid measure accurately represents the concept that it is supposed to measure.” We want to account for potential response bias in thinking about measurement. For example, if we want to measure level of prejudice and ask: • Do you harbor large amounts, some or none prejudice when it comes to people of di↵erent races, religions or ethnicity? What is the problem? 12/1 Reliability • Key characteristic: Repeatable or consistent. • Example of Unreliable Bathroom Scale • Step on scale once and it reads 150 pounds. • Step on scale second time (immediately) and it reads 146 pounds. • Di↵erent results ) unreliability. • Do not confuse variability with unreliability. • If I step on scale five months later and it reads 155 pounds, does not mean scale is unreliable measure of weight. 9/1 Common Types of Bias 1. Sampling bias: Results from nonprobability samples or under(over)-coverage. • Samples of opportunity: If Fox News or MSNBC asked viewers about upcoming presidential election. 2. Response bias: Results from poor question wording, ordering of questions, interviewer e↵ects, and/or when subject gives incorrect response. • Example: Interest group contributions: (1) Should laws be passed to eliminate all possibilities of special interests giving huge sums of money to candidates? (2) Should laws be passed to prohibit interest groups from contributing to campaigns, or do groups have a right to contribute to the candidate they support? • First question 80% said yes; second, 40%. 3. Nonresponse bias: Sample subjects refuse to answer or participate. 11/1 Religiosity Interested in questions like: How does religiosity a↵ect attitudes towards democracy? One common way to measure: How often do you attend religious services? (Question on World Value Survey) What might be the problem with this measure if we are doing a cross-sectional study? Attending religious services may not be a good indicator (or even appropriate) for all religions. For example, in majority Buddhist country, a better measure might ask how often do you light incense or pick auspicious dates for important events? 13/1Validity II So how do we determine a measure’s validity? 1. Face Validity: Superficial assessment about measure’s validity • Does it appear to be measuring what we want it to be? 2. Content Validity: Identification of important/essential elements of the concept • Elections one important element of democracy but not only one. • Forces researcher to define concept clearly - have conceptual clarity. 3. Construct Validity: Degree to which measure is related to other measures with which the measure should be theoretically associated. • Problem is if there is no association. 14/1 Measuring Democracy II Validity • Consensus that continuum of democracy: governments can have both authoritarian and democratic elements. • Researchers have to clearly identify the core elements of democracy. • Polity IV measure: 1. Regulation of executive recruitment. 2. Competitiveness of executive recruitment. 3. Openness of executive recruitment. 4. Constraints on chief executive. What is one criticism of Polity IV measure? 16/1 Measurement Main Takeaways • Problem of measurement exists in all social sciences. • Must have conceptual clarity and understanding of key elements of the concept. • Think carefully about the data and measurements of concepts. • What are the pros and cons of di↵erent operationalizations or measurements of our theoretical concepts? • Sources of bias: measurement, sampling, response, non-response. • Consequences of poor measurement • Relationship between X and Y obscured. • Low confidence in our findings 18/1 Measuring Democracy I • Conceptual Clarity • Di↵erent conceptualizations of the democracy. • Robert Dahl, for example, argued that there are two critical elements: contestation and participation. • Level of measurement: Country • Reliability: Clear and transparent set of coding rules so that can be replicated. • FH and Polity are two most common measures 15/1 Consequences of Poor Measurement The most important consequence is for our ability to make confident conclusions about the relationship between X and Y . The consequences of poor measurement depends on whether systematic or not, random or not. 17/1 Descriptive Statistics for Continuous and Categorical Variables 14 September 2016 1/1Types of Variables Remember our two main types of variables: 1. Categorical • Nominal: Religion • Ordinal: Current family financial situation relative to one year ago: (1) Much better, (2) Somewhat better, (3) Same, (4) Somewhat worse, (5) Much worse 2. Continuous Variables: percents, raw income data, economic indicators Variable type will a↵ect the statistics and analyses we use. 2/1 Parameters vs. Statistics Remember we distinguish between population and sample: • Population is total set of subjects (units) we want to study. • Parameter: Numerical summary of population (usually represented by greek letter) • µ (mu) for the mean • A sample is a subset of the population (usually represented by roman letter) • Statistic: Numerical summary of sample data. • Usually use letter (either roman or greek) with bar or hat: Y¯ (Y-bar) ˆ (beta-hat) 4/1 Moments and Rank Statistics We have two main types of descriptive statistics: 1. Moments: “Set of statistics that describe the central tendency for a single variable and the distribution of values around it.” • The mean (1st moment); variance (2nd moment) 2. Rank statistics: Identify “crucial junctures” of a continuous variable when ordered from smallest to largest. • The median 6/1 Equal Unit Di↵erences Another way to di↵erentiate between categorical and continuous variables is to think about equal unit di↵erences. • “A variable has equal unit di↵erences if a one-unit increase in the value of that variable always means the same thing.” • Family Financial Situation: Is di↵erence between “somewhat worse” (4 3) and “same” the same as the di↵erence between “much worse” and “somewhat worse” (5-4)? • This di↵erence is not meaningful nor necessarily the same. • Continuous variables have equal unit di↵erences. • Age measured in years: one unit increase in age always has same meaning (1 year older). 3/1 Descriptive Statistics • We use descriptive statistics to summarize our data and describe the spread. • Descriptive statistics summarize the center and variability. • Measures of central tendency (e.g., mean) and spread (variance). • We also use descriptive statistics to identify outliers and missing observations, and examine skew. • We can describe data with statistics or with graphs and tables. • Which statistics and graphs you use will depend on which type of variable(s) you are using. 5/1 The Mean: The First Moment Definition “The mean is the sum of the observations divided by the number of observations.” • Measure of central tendency. • Let Y1, Y2,...,Yn be the observations of a sample sized n (i.e. there are n observations in the sample) • Denote the sample mean by Y¯ = 1nPni=1 Yi • Also known as the “average” value of a variable. 7/1Properties of the Mean, I • Only for quantitative variables (continuous and binary/dichotomous). • Why can we use the mean for binary variables? • Influenced by outliers • Using GNP per capita as measure of average well-being (standard of living) problematic in countries with high income inequality. • Two countries can have comparable GNP per capita but varying (and quite di↵erent) poverty levels. • We can use a weighted average to help reduced e↵ect influential values on mean. • Common weighted average is your grade in course. 8/1 The Median Definition “The median is the observation that falls in the middle of the ordered sample.” • In other words, the median is the value at which half of the observations fall above and half fall below. • (n+1) 2 observation • The median is a rank statistic of central tendency. • It is the value of the variable at the 50% rank: 50% of the observations fall below the median value and 50% fall above. • If n is even, take the average of the two center observations. 10/1 Skew • Skewness is measure of a distribution’s symmetry around the mean. • There is a skewness statistic you can calculate but don’t worry you don’t have to! • If skewness = 0, then distribution is symmetric around the mean. • If skewness < 0, then there are more observations below the Properties of the Mean, II Zero-sum property: Pn i=1(Yi Y¯ )=0 Sum of the di↵erence between each Y (Yi) and the mean of Y (Y¯ ) is equal to zero. Least-squares property: Pn i=1(Yi Y¯ )2 < Pni=1(Yi c)2 8 c 6= Y¯ Y¯ has the smallest sum of squares than any other potential value, c. We refer to the mean as the expected value (E(Y )) of a variable. • With these properties, the mean is our best guess if we have no other information. 9/1 Properties of the Median • Most appropriately used with quantitative variables. • Mean = Median for symmetric distributions. • Mean < Median for left-skewed distributions. • Mean > Median for right-skewed distributions. • Median is not influenced by outliers or spread of distribution. • This makes it a better indicator of average wealth when we examine income data 11/1 Kernel Density Plots Shows smoothed calculated density of observations: 8 0.6 0.y timean than above (right-skew). • We assess skewness graphically or by comparing mean to median. • Kernel density plots (k-density) • If mean > median ) right skewed. 12/1 4 sn0.eD2 0.0 30 40 50 60 70 Incumbent Vote Percentage 13/1Kernel Density Plot of GDP per Capita Kernel Density Plot of Logged GDP per Capita Mean > Median for right-skewed distributions. 14/1 15/1 The Variance and Standard Deviation • Most important measure of dispersion (or spread) to describe distribution of values. • Population parameter represented by 2 (Variance) or (Standard Deviation) • Use s for statistic, sample standard deviation (or sd) and s2 for variance. The Variance and Standard Deviation • The standard deviation is simply the square root of variance sP(Yi Y¯ )2 • Sometimes will also see ˆ and ˆ2 • The variance is the second moment (mean is the first SD(Y ) = sY = n 1 moment). • The Variance measures deviation from a central tendency measure (e.g. the mean) • Why is it incorrect to add the deviations and then square the sum of deviations? P(Yi Y¯ )2 n 1 16/1 V ar(Y ) = s2Y = 17/1 Properties of the Standard Deviation • s 0 • s = 0 )? • Influenced by outliers • If distribution of observation is bell shaped (normally distributed), we can use the Empirical Rule 1. 68% of observations fall within 1 standard deviation of the mean: (Y¯ s, Y¯ + s) 2. 95% fall within 2 standard deviations or approximately 1.96 standard deviations: (Y¯ 2s, Y¯ + 2s) 3. Almost all remaining observations will fall within 3 standard deviations. 18/1 The Empirical Rule • Only for distributions of the population variable assumed to be normal (bell shaped). • If distribution is normal, the Empirical Rule follows mathematically. • Many distributions do in fact (empirically) follow this distribution: height, weight, grades, job satisfaction, etc. • We can assess the normality of a variable in two ways: 1. Visually or graphically using histograms, kernel density plots. 2. Comparing central tendency measures: how di↵erent are the mean, median and mode? Commit this rule to memory (and know why its important). 19/1Other ranks In addition to the 50% rank (the median), we are also use the 25% and 75% ranks. • 25% known as first quartile rank; 75% known as third quartile rank. • The di↵erence between the 75% and 25% is known as the interquartile range or IQR. • The 50% rank (the median) is measure of central tendency; the IQR is measure of spread (or dispersion). • The IQR describes the spread of the middle half of the observations. • IQR good for comparing variability across groups and across time. 20/1 Identifying Outliers • We can use the IQR to identify outliers. • If an observation falls above 75%Rank + 1.5(IQR) it is an outlier. • If an observation falls below 25%Rank - 1.5(IQR) it is an outlier. • We can also use a z-score • This measures how many standard deviations away from the mean an observation falls. • z = Y1Y¯ sY • Since it is unusual for an observation to fall more than 3 standard deviations from the mean (by the empirical rule), we consider observations with a z-score (absolute value) larger than 3 to be an outlier. If |z| > 3 ) outlier 22/1 Statewide Murder Rates • We found that Louisiana (MR=20.3) was an outlier if we just look at the 50 US states. • Using the IQR method, is Louisiana still an outlier if we include DC? • Murder rate for DC is 78.5. • Including new value, we would have to recalculate statistics (mean, quartiles, sd). • The 25% rank is 3.90 and the 75% rank is 10.40. • First step, calculate IQR: = 10.40 3.90 = 6.5. • Second step, multiply IQR by 1.5: 1.5 ⇤ 6.5=9.75 • Third step, does Louisiana fall above 9.75 above the upper (3rd) quartile or below the lower (1st) quartile? • Fourth step, calculate upper and lower bounds • Lower bound: 3.90 9.75 = 5.85; Upper bound: 10.40 + 9.75 = 20.15 • Conclusion? Louisiana is an outlier (just barely). 24/1 Outliers Outliers: cases for which values are extremely high or extremely low relative to the rest of the values of the variable. • Important to identify (and account for) because can seriously influence measures. • Outliers can also indicate coding error (the data was entered incorrectly). • Need to consider whether value is consistent or inconsistent with others. • What do you do with outliers? 1. Remove (never do) 2. Transform 3. Leave alone 4. Report results with and without outliers 21/1 Example: US Murder Rate and Violent Crime Rates Data on murder rate per 100,000 across US states. 1. Murder rates across 50 US States: Y¯ = 7.3; s = 4.0 • Louisiana has a murder rate of 20.3 per 100,000. Is it an outlier? • z = 20.37.3 4.0 = 3.25. • If knew quartile ranks, could make conclusion based on quartile ranks. 23/1 Statewide Murder Rates 1. Is DC (mr=78.5) an outlier? • New mean with DC is Y¯ = 8.73 and s = 10.63 2. We know from our previous exercise that 78.5 > 20.15. 3. But what about calculating the z-score? • z = 78.58.73 10.63 = 6.56 • Since |zDC | > 3, DC is an outlier. 4. Would we expect our rank statistics to be di↵erent excluding DC? Why or why not? • With DC: 3.90, 6.80, 10.40. • Without DC: 3.90, 6.70. 10.30. 25/1The Mode and its Properties Definition “The mode is the value that occurs most frequently.” • Can be used for all data types (most commonly used for categorical variables). • Mean = Median = Mode for unimodal, symmetric distributions. • Why is the mode not very useful for continuous variables? 26/1 “The average income in America is not equal to the income of the average American” • Average income: the mean as measured by income per capita. • Average household income in 2012: $72,641. • The issue: we do not know who is earning how much? (i.e., income inequality). • Median household income: $51,939 in 2013 • Average income can increase, but the average American can be no better o↵. • The mean can be “statistically correct but grossly misleading” 28/1 Describing and Measuring Association: The 3rd Causal Hurdle • Studied ways to describe a single variable. • However, we are interested in relationship between two (or more variables). • Put another way: we’re interested in the associations between two variables. • When X increases, does Y also increase (or decrease)? • We call the analysis of association between two variables bivariate analysis. • The dependent variable (or response) variable depends on or is explained by the independent (or explanatory) variable. 2/17 Relative Frequency • Relative frequency: proportion (or percentage) of observations that fall within a category. • Di↵erent from absolute frequency. • For comparison, relative frequency is more useful. • We need to have some context, otherwise it is dicult to make sense of raw numbers. • Percentage of voters who prefer candidate A in sample as opposed to the raw number. • Oil rents per capita or oil rents as percentage of GDP as opposed to raw rents. • Relative frequencies for continuous variables: • Break down into intervals: age groups, income groups, and so on (create a categorical variable from a continuous) 27/1 Bivariate Descriptive Statistics Describing the relationship between two variables 19/21 September 2016 1/17 Analyzing Association: Categorical Variables Most common way to examine bivariate relationship between two categorical (ordinal or nominal) variables is to use a contingency table • Cross-tabulation depicts relationship between two variables. • The column variable is the independent variable. • The row variable is the dependent variable. • We are looking for percentage di↵erences between categories. • We have statistics (summary measures of association) which we will learn later on (2). • Now, we are just going to focus on percentage di↵erences within each row across categories. 3/174/17 5/17 From the below contingency table, do we observe an association between gender and vote choice in the 2012 presidential elections? Why or why not? Gender and Vote Choice Romney Obama Male 43.52% 36.54% Female 56.48% 63.46% 100% 100% 6/17 7/17 Analyzing Association between Continuous Variables Covariance: way of summarizing the pattern of association between two variables. The sample covariance: Pn Covariance and Scatter plots Figure 7.4. Scatter plot of change in GDP and incumbent-party vote share with mean-delimited quadrants. 0 6covXY = sXY = • Positive numerator when: i=1(XiX¯)(YiY¯ ) n1 e gatnecreP etoV ytr(− +)= − (+ +)= + 0 51. Xi X >¯ 0 and Yi Y >¯ 0 2. Xi X <¯ 0 and Yi Y <¯ 0 • Negative numerator when: 1. Xi X <¯ 0 and Yi Y >¯ 0 2. Xi X >¯ 0 and Yi Y <¯ 0 8/17 aP tn(− −)= + (+ −)= − 0 eb4mucnI0 3−15 −10 −5 0 5 10 Percentage Change in Real GDP Per Capita 9/17Correlation Coecient • The covariance tells us the direction (+, , 0) but not the strength. • The covariance also has no theoretical upper bound and is a↵ected by measurement units: • GDP per capita vs. GDP per capita in 1,000 US$ Pearson’s r • Also called sample correlation coecient. • Pearson’s r is our estimate of the unknown population association ⇢ (rho, pronounced row). • We calculate the sample correlation coecient like this: P( xix¯ sx )( yiy¯ • If you use the first measure (GDP per capita) the covariance will be much much larger than if you used the second. rxy = sy ) n1 • This makes our lives dicult since we cannot compare di↵erent covariances to each other. • Instead, we use the correlation coecient (Pearson’s r) which accounts for measurement units standardizing the scores. 10/17 Properties of Pearson’s r • Bounded between 1 and 1 ! [1, 1] • r < 0 ) negative relationship: as X increases, Y decreases. • r = 0 ) No relationship (linear relationship) • r = |1| ) Perfect (linear) relationship • Measures degree of linearity • NOTE: If relationship between x and y is not linear, r will poorly estimate and should not be used 12/17 Scatterplots Scatterplots useful for examining relationship between two variables. We use scatterplots for two main reasons: 1. See what the relationship between X and Y looks like. • We’re looking for a linear relationship. • When relationship is not linear, we have to transform one of our variables or use another much more complicated method of estimation which unfortunately we will not have time to learn. 2. Strength and direction of relationship. • We can tell from a scatterplot whether the association is positive, negative, or close to zero. http://www.statcan.gc.ca/edu/power-pouvoir/ch9/ scatter-nuages/5214827-eng.htm 14/17 • You will not have to calculate this but you will need to know how to interpret the sample correlation coecient. • This requires understanding what the equation is capturing (see Figure 7.4). 11/17 Scatterplots • Graphical way to summarize the relationship between two quantitative continuous variables. • Independent variable is plotted on the x-axis. • Dependent variable is plotted on the y-axis. • We are looking for whether the two variables tend to move together. • From the scatterplot, we obtain an idea about direction of relationship, its strength and functional form. • For the purposes of this class, we are looking for positive or negative linear relationship. 13/17 Scatter Plot for Cell Phone Use and GDP per Capita 15/17Scatter Plot for Cell Phone Use and Logged GDP per Capita 16/17 Descriptive Statistics: Plots and Graphs 26 September 2016 1/20 Bar Graph of Religious Identification 0 060 05Final Note on Correlation Coecient Correlation Not Causation Pearson’s r only measures degree of relationship, not whether there is causal linkage between two variables Think of the spurious relationships we’ve discussed 17/17 Bar Graphs and Histograms Bar Graphs • Frequency or relative frequency distribution for categorical variables. • The height of each bar represents (relative) frequency of observations within that category. • Good for categorical variables (also can use pie chart) Histograms • (Relative) frequency distribution for quantitative variables. • Each interval has bar over it with height determined by number of observations falling within in the interval. 2/20 Pie Graph of Religious Identification Protestant s esaC fo rebmuN3/20 0 040 030 020 010 Protestant Catholic Jewish Other None None Catholic Jewish Other 4/20y cneuqerF5/20 Histogram of Mobile Phone Subscriptions (per 100 pop) 0 40 30 20 10 0 50 100 150 200 dat2$wef_mobile l atoT fo tnecreP6/20 Histogram of Mobile Phone Subscriptions (per 100 pop) 20 15 10 5 0 0 50 100 150 200 x Box-Whisker Plots • Summarize both variability and center. • The box contains the middle 50% of the observations (all observations that fall between the 25% and 75% ranks). Figure 6.4. Box-whisker plot of incumbent-party presidential vote percentage, 1880-2004. highest non−outlier 0 675% e gatne0 c• The median is marked with a line in the box. r5eP et• The whiskers mark the non-outlier minimum and maximum oV tnvalues. ebm0 u4c• Outliers are marked with asterisk. nI0 37/20 8/20 50% / median 25% lowest non−outlier low outlier Kernel Density Plots 0 020 510 01 rep sen● ● ● ● ● ● Shows smoothed calculated density of observations: 8 0.6 0.y tiohP eliboM0 010 5● 4 sn0.eD● 2 0.0 30 40 50 60 70 Incumbent Vote Percentage 1 2 3 4 5 6 7 8 10 Region 9/20 10/20Bivariate Graphs 0 21● ● 0 ● ● ) ● 0s1• Can use graphs and plots to look at relationship between two htri● b (and three) variables. e● vScatterplot of Battle Deaths and Infant Mortality Rate ● i0 l 8● 00● 0• Common graphs: ,● 1 rep●● 1. Scatterplots (2 continuous variables) 0 ( 6e● t● a● R● 2. Bar graphs by groups: ● ytil● a● t0 • Can use for 2 categorical and 1 categorical and 1 continuous ● r4oM tn● ● ● ● ● afnI0 2● ● ● ● ● ● ● ● ● ● ● 0 ●● ● ● ● ● 11/20 12/20 Battle deaths looks like it is probably skewed (a lot of 0’s): Kernel Density of Battle Deaths 5 100.00 0 2000 4000 6000 8000 Battle Deaths What should we do? Transform the variable (log) Kernel Density of Logged Battle Deaths 5 1.00 1.0y tisneD13/20 100.05 000.00 000.00 2000 4000 6000 8000 N = 44 Bandwidth = 124.1 y tisneD14/20 5 0.00 0.0−2 0 2 4 6 8 10 12 N = 44 Bandwidth = 0.8602 Infant mortality looks pretty good (in terms of approximating normality) Kernel Density of Infant Mortality Rate Now lets redo the scatterplot: Scatterplot of Battle Deaths and Infant Mortality Rate 0 21● y tisneD0 20.05 10.00 10.0) shtrib evil 000,1 rep( etaR ytilatroM tnafnI0 010 80 60 4● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 2● ● ● ● ● ● ● ● 5 00.0● 0 ● ● ● ● ● 0 00.015/20 0 50 100 150 N = 190 Bandwidth = 8.828 2 4 6 8 Battle Deaths (logged) Pearson’s r 0.0036 16/20Bar Graph with 2 Categorical Variables Bar Graph of Gender and Vote Choice, 2004 Election Male Female 0 020 510 010 50 Voted for Kerry Voted for Bush 17/20 Describing one variable Can describe a variable with both statistics and graphs. • Continuous variables: histograms, box-plots, k-density plots. • Categorical variables: pie chart, bar graph. Remember: the type of statistics or graphs you use depend on the type of variable(s) you have. 19/20 Describing One Variable We’re looking at four primary characteristics: 1. Location: Measures of central tendency (typical values). • Mean, median. • The mean has nice properties (zero-sum and least-squares properties) that allow us without any additional information to make best guess. 2. Spread: Measures of dispersion. 3. Shape of distribution: • Does the variable distribution approximate normality? 4. Sample size (n). 18/20 Limitations of Descriptive Statistics and Graphs • Always look at basic descriptive statistics and graphs. • It is very easy to miss outliers, coding errors, or skewed distributions if you do not. • However, we cannot test theories with just one variable. • Theories are about causal relationships (between two or more variables). • As such, we want to look at bivariate and multivariate relationships. • 4th Causal Hurdle: need to control for other possible variables 20/20