Popular in Scope and Methods for Political Science Research
Popular in Government and Politics
This 36 page Class Notes was uploaded by Michal Antonov on Sunday July 24, 2016. The Class Notes belongs to GVPT100 at University of Maryland - College Park taught by Johanna Birnir in Spring 2016. Since its upload, it has received 13 views. For similar materials see Scope and Methods for Political Science Research in Government and Politics at University of Maryland - College Park.
Reviews for Variables
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 07/24/16
Variables Measuring Concepts - Step 3: create variables • Variables are "empirical measurements of characteristics" • Variables can take on different values - they vary • There are different types of variables Levels of measurement • There are three main levels of measurement for variables o Nominal - categories o Ordinal - the categories are ordered o Interval - the values of the variable communicate actual differences • Each of these types of variables can be measured with numbers, but the numbers have different meaning • How we measure variables can also have an effect on our findings o Example - effect of type of government on civil war o General finding: democracy doesn't have big effect on civil war o But if we divide into three types • Strong democracies less likely • Strong autocracies less likely • Semi-democracies more likely Nominal variables • Also called "categorical variables", if they have numbers the numbers just indicate categories • Categories with no rank - no category is "better" or "more of" or "greater" than another; number assigned is arbitrary • Examples o Region names o State names o Party labels Ordinal variables • Can be ranked categories o Low, medium, high • Can assign numbers o Examples - countries are poor (1), middle income (2), rich (3) • Numbers show order, but we cannot say that 2 is twice 1, etc. • Make up many of the questions in the survey research • Gauge approval or disapproval of government policies o Immigration reform o Same sex marriage Interval variables • Can use math on numbers • Distance between numbers is meaningful • Example o Average household income o A household making $100,000 is twice as wealthy as a household making $50,000 Dichotomous variables • An additional kind of nominal variables (dummy variables) • Can only take on two avlues • Examples o Dead/alive o In war/not in war o Republican/non-republican Likert scale • 5-7 value ordinals that capture strength and direction of agreement o Agree strongly o Agree somewhat o Neither agree nor disagree o Disagree somewhat o Strongly disagree Index • Additive combination of ordinal variables, each of which is coded identically • Also called summative scale or ordinal scale • Allows for more reliable measure of a characteristic Measures of Central Tendency Descriptive statistics • Once we have measured concepts and created variables, want to loos at those variables • What does the sample look like? o Where do most cases fall? o What's the average value? o Is there a lot of variation? • The appropriate way to describe variables depends on the level measurement o Nominal o Ordinal o Interval • For nominal and ordinal variables, we can use frequency distributions o Raw frequency o Percentage o Cumulative percentage Frequency Distribution Nominal Variable Region Frequency Percentage Cumulative Northeast 203 16.99 16.99 Mid Atlantic 179 14.98 31.97 South 210 17.57 49.54 Midwest 118 9.87 59.41 Southwest 138 11.55 70.96 Mountain West 154 12.89 83.85 West 193 16.15 100 Total 1195 100 Frequency Measures • Frequency o Raw frequency = number of cases with the same value o Total frequency = total number of cases • Percentage = number of cases with the same value of the total number of cases • Cumulative percentage = percentage of cases at or below specific value o Measure for oridinal level variables - relative amount of characteristics being measured Skew • Negative skew = distributions with skinnier left-hand tail • Positive skew = distributions with skinnier right hand tail • Mean sensitive to skews but median is not o Median = "resistant measure of central tendency" Measures of central tendency • Mean - average of all values • Median - line all values up, middle one • Mode - most frequent value What to use when? Type Mode Median Mean Nominal Yep Ordinal Yep Yep Interval Yep Yep Yep Frequency Distribution Ordinal Variable Party Identification Frequency Percentage Cumulative Strong Democrat 203 16.99 16.99 Weak Democrat 179 14.98 31.97 Independent 210 17.57 49.54 Democrat Independent 118 9.87 59.41 Independent 138 11.55 70.96 Republican Weak Republican 154 12.89 83.85 Strong Republican 193 16.15 100 Total 1195 100 Graphics • Another way to describe variables is graphically • Bar chart/pie graph for categorical variables • Histogram for interval variables Dispersion • More to describing a variable than reporting its measure of central tendency. Variable (esp. political variables) also described by dispersion How are the cases spread across the possible values? • Standard deviation o Average distance from the mean • Standard deviation tells you how close to the mean each observation is • Example: o Country A: mean income o Country B: Dispersion Normal distribution Variance Dispersion of Nominal Variables • Low o One mode prominent o Bar chart 1 peak o Fewer cases in nonmodal category • High o Bimodal/multiple modes o Bar chart does not have one peak o Cases widely spread Dispersion of Ordinal Variables • Low o Mode and median same or similar o Single peak bar chart o Case cluster around median • High o Mode and median separated by at least one value o Not single peak bar chart o Cases spread out Dispersion of Interval Variables • Low o Median and mean similar o Bar chart single peak o Most cases cluster around the mean • High o Median and mean different o Bar chart not single peak o Cases spread out Theory and Hypothesis Theories • Generally, in empirical political research, seek to develop and test theoretical arguments o Ex.: what percentage of likely voters to vote for President Obama? o Ex.: Are people with more education more likely to vote? • Theories are explanations which generally make casual statements o If X happens, then Y will follow as a result • Generally probabilistic, not deterministic • Identifies a relationship between an independent and dependent variable Steps to Theory Building • Identify actors that matter for your theory o Example: why do some countries have more open trade than others? • Identify assumptions behind your argument o Ex.: what do voters care about? What do politicians care about? • Identify premises flowing from these assumptions and show how they relate to each other Croco (2011) Example: Actors • Assumptions o Leaders • Want to stay in power • Have access to more information than citizens o Citizens • Want competent leaders • Will be more risk averse to costs of continued war • Will punish leaders who are responsible for bad outcomes: Culpable Leaders • Premises that flow from the assumptions o Leaders • Culpable leaders will win more than non-culpable leaders • Culpable leaders will preside over more extreme outcomes, both good and bad, than non-culpable leaders o Citizens • Culpable leaders who lose are more likely to be punished by citizens than non culpable leaders who lose are more likely to be punished by citizens than non-culpable leaders who lose IV and DV • Dependent varibale: variable that represents the effect in a causal explanation • Independent variable: variable that represents a causal factor in an explanation • DV is expressed as Y • IV is expressed as X • Examples o Culpable leaders more likely to win • IV: culpable leader • DV: war outcome Theory vs. Hypothesis • They differ • Theory - argument explaining why X has some effect on Y • Hypothesis: empirical prediction from that argument • Example o Theory - the costs of losing war are higher for leaders of democracies because they will be removed from office. Therefore, leaders of democracies are very selective about what wars they will fight o Hypothesis - democracies are more likely to win wars than non-democracies Hypothesis • A hypothesis tells you several things o Unit of analysis o Dependent variable o Independent variable o Direction of the predicted relationship • Specific theories can often lead to prediction at different units of analysis • Ex: theory that economic performance affects chances of re-election o Potential unit of analysis • Individual • Congressional districts • Election cycle Example 1 • Hypothesis: In comparing countries, those that are democracies will have higher rates of economic growth than those that are non-democracies • Unit of Analysis: country • Independent variable: Type of government (democracy/non-democracy) • Dependent variable: Rate of economic growth • Predicted relationship: democracy is positively associated with rates of economic growth What not to do • The main determinant of war is the distribution of power in the international system • In comparing individuals, annual income and the level of education are related • Democracies are peaceful • In comparing individuals, some people are more likely to favor the death penalty than others • Making comparisons • Cross-tabs o DV and IV are nominal or ordinal o Frequency distributions • Mean comparisons o DV is interval; IV is nominal or ordinal o Distribution of means • Cross-tabs o Rules • IV values are the columns, DV values are the rows • Calculate percent of categories of the IV § Columns always sum to 100, rows do not • Compare across columns for the same value of the DV • Relationships between variables o Direct: IV goes up, DV goes UP o Inverse: IV goes up, DV goes down o Linear: increase in IV leads to consistent increase OR decrease in DV o Curvilinear: relationship between DV and IV depends on value of IV being examined Experiments and Controlled Comparisons • Experiments • An experimental design isolates the impact of all other factors and examines directly the effect of the independent variable on the dependent variable • Steps • Assign two groups randomly, each individual has an equal chance of being in each group • Treatment/test group • Control group • Pre-measure each group on the dependent variable • Apply the measurement to the test group • Measure each group on the dependent variable, compare changes • Kinds of experiments • Lab • Field • Survey • Have a high degree of internal validity • The researcher can isolate the impact of the independent variable on the dependent variable • Randomization and pre-measurement assure that impact is from the independent variable • Experiments may not have external validity, so it can be hard to generalize to the real world • Setting can be artificial • Individuals may act in ways they think researcher wants • More terms to know • Research design: overall setup for looking at the relationship between X and Y • Experimental design: ensure test and control group are the same in EVERY WAY except one: treatment • Selection bias: non-random assignment to treatment and controlled • Controlled comparison design: look at the effect of X and Y while controlling for other factors IMPORTANT • Control variable is not the same thing as control group o Control group: gets the placebo treatment, or is not altered o Control variable: included I models to account for rival explanations. You add control variables when you • Want to rule out the influence of other factors • Know something else matters, but want to see how your IV of interest compares in terms of effect Examples • Medical • Hypothesis: In comparing individuals with colon cancer, those who receive Drug A will have less severe levels of cancer than those who do not • Steps: • Set of individuals with colon cancer • Randomly assign to two groups, one will receive Drug A, one will receive placebo • Pre-measure levels of cancer severity between two groups • Give treatment • Measure levels of severity between two groups Implementation problems • Many questions were are interested in cannot be studied through an experimental design o Cannot manipulate everything o Ethical concerns • Examples o Effect of democracy on economic development o Effect of war on trade That's why there's observational research! • In an experiment, the research assigns individuals to a control and treatment group • However, in social science, we often do not have this power • So, in observational research, we attempt to observe the effect of a treatment • In true experiments, researcher decides who receives control and who receives treatment • In observational research, we observe this, we don’t decide it. • Many ways to collect data for observational research: o Surveys o Archival research o Existing datasets o Interviews • When testing hypotheses, need to worry about omitted variables. This is why specifying control variables is important Compositional differences • Any characteristic that varies across categories of an IV • If the variance is systematic, we have a problem • Example: women are more likely to be democrats Rival explanations • X: IV • Y: DV • Z: Rival explanation (soon to be control variable) • Z may or may not be a big deal. Big deal if it defines a large compositional difference across values of X. Spurious relationship • Z defines a large compositional difference across X Example Democrats Republican Favor 7 5 Oppose 5 7 7/12 favor 5/12 favor Democrats Republican Favor X X X X X X X X X X X X Oppose X X X X X X X X X X X X 7/12 favor 5/12 favor Democrats Republican Favor X X X X X X X X X X X X Oppose X X X X X X X X X X X X 7/12 favor 5/12 favor Lots of women in the democratic column; more liekly to favor gun control Stork example • Places with high birth rates also have a lot of storks…. • OMG! STORKS BRING BABIES! • Or…..storks just roost in places with lots of people… Additive relationship • X and Z both have relationships with Y that compliment one another (i.e., knowing about both helps you know more about Y) • The control variable (Z) is a cause of the DV (Y) but defines a small compositional difference across values of X. o Because the relationship b/w (X) and (Z) is weak, X retains a causal relationship with (Y) after controlling for Z Interactive relationship • The effect of X on Y DEPENDS ON the value of Z • Example: effect of X on Y is strong for low values of Z but weak for high values Zero order relationship • Overall association between two variables that does not take into account other possible differences Controlled comparison table • Allows us to determine the controlled effect o The relationship between an IV and a DV within one value of a CV o CV = Z Partial effect • Effect of part among women: 16.6 • Effect of party among men: 19.7 • Q: what is the partial effect of party on gun control opinions controlling for gender? • A: about 17 (the rough weighted average of the wo individual effects) Rule of direction Identifying a pattern • Does a relationship exist between the IV and DV in at leasr one value of the CV? o If no: spurious o If yes: go to question 2 • Is the tendency of the relationship between the IV and the DV the same for all values of the CV? o If no, then an INTERACTION is taking place o If yes, go to question 3 • Is the strength of the relationship between the IV and the DV the same for all values of the CV? o If yes, ADDITIVE o If no, INTERACTION Foundations of Statistical Interference Sampling • Our hypotheses are prediction about the effect of an iv and a dv in a population of cases o Examples: country years, individuals, registered voters • But, we can generally only examine relationship in some sample of these cases o Examples: all countries from 1970-1999, a group of registered voters surveyed Sample and population • Two key concepts o Population - the universe of subjects the researcher wants to describe o Sample - a number of cases or observations drawn from a population • Leads to two different values o Population parameter - actual value o Sample statistic - estimate of the population parameter Inferential statistics = set of procedures for deciding how closely a relationship we observe in a sample corresponds to the unobserved relationship in the population from which sample was drawn Population Sample • Universe of cases the • Cases and observations researcher targets drawn from the • Statistics generated = population population parameters (a • Statistics generated = characteristic of the sample statistics population) (estimate of population parameter based on sample drawn from the population) Why do we sample? • It may not be possible to identify all population members • If possible, collecting information one ach population member is: o Time consuming o Expensive o Logistically difficult • Sample data can be better o Quick to capture mobile populations o Collect without too much awareness Examples • Moneyball o Don't look at idiosyncratic features of players. Look at their overall performance of data o Maximize the number of runs, not whether someone looks like a "complete package" o Result: a better team for less money • Criminal justice o Take a massive amount of data. Figure out what predicts violent crimes and repeat offenders o Use those predictors when sentencing o Result: limiting incarceration Key terms • Random sample o Every member of the population has an equal chance of being chosen for the sample • Sampling frame o Method for defining the population the researcher wants to study • Selection bias o Some members of the population are more likely to be included in the sample than others • Response bias o Some members of the population are more likely to respond than others Random sampling • In order to obtain best estimate, it is crucial to have a random sample o Each individual in the population has an equal chance of being in the sample • Not using a random sample introduces the possibility for an additional error in the sample • Do not want additional error, because all samples already contain "error" in that sample statistic is different from population parameter o Random sampling error • The ideal situation • Hard to do in practice - how to insure that everyone has an equal chance of ending up in the sample and of participating • Examples of potential problems o Internet surveys? o Mail surveys? o Phone calls with random digit dialing? o Knocking on doors of houses? • Still, important to try to get as close as possible • Always random sampling error, want to minimize other error • Random sampling error decreases as size of sample increases o Margin of error larger in samller sample Error and samples • Three possible ways to get an error o Selection/sampling bias: some individuals are more likely to be included in the sample than others. Wrong sampling frame o Response bias - some individuals are more likely to be measured than others o Random sampling error - the extent to which a sample statistic differs by chance from the population parameter Calculate standard deviation • Calculate each a value's deviation from the mean • Square each deviation • Sum the squared deviations • Calculate the average of the sum of the squared deviations = variance • Take the square root of the variance = standard deviation Other approaches • Quasi-random sampling - or "cluster sampling" o Example - picking specific localities, and then randomly selecting individuals in realtion to them o Can minimize costs of conducting face to face interviews • Purposive sampling o Over representing some groups to make comparisons • Example - comparing college students with adults generally • Imporant - only allows for comparing groups, not for generalizing to overall population Sampling error • Size of the sample o The bigger the sample, the smaller the error o The smaller the sample, the bigger the error • Variation in the population characteristic being measured o The bigger the variation, the bigger the error o The smaller the variation, the smaller the error Variance = dispersion of cases across the values of the variable Calculate standard deviation 1. Calculate each value's deviation from the mean 2. Square each deviation 3. Sum the squared deviations 4. Calculate the average of the sum of the squared deviations = variance 5. Take the square root of the variance = standard deviation N = population size n= sample size µ = true meaning of popualtion σ = population standard deviation THE BOTTOM LINE • Central limit theorem • If we were to take an infinite number of samples of the size n from a population of N members, the means of these samples would be normally distributed • If we take enough samples of a population, the mean values for a given variable in each sample will have an approximately normal distribution • Allows us to calculate the likelihood that a given sample deviates from the true population mean Calculus: integrals; area under the graph How to convert to Z score • Find the value's "deviation from the mean" • Divide it by the standard deviation Z= (deviation from mean)/(standard unit) Confidence intervals • 95% confidence interval = the interval within which 95% of all possible sample estimates will fall by chance o Range in which we are 95% confident that the population parameter falls within Calcualting confidence intervals • Lower bound = -1.???? (standard error) • Upper bound = +1.???? (standard error) ???? = ???????????????????????? ???????????????? • The CLT allows us to assume that the boundaries of the confidence interval are defined by the sample mean minus and plus 1.96 standard errors. • Conclude: 95% of all possible random samples yield sample means within these bounds. Example 95% confidence interval short-cut
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'