Introduction to Social Statistics
Introduction to Social Statistics SOCY 2061
Popular in Course
Popular in Sociology
This 72 page Class Notes was uploaded by Heloise Glover on Thursday October 29, 2015. The Class Notes belongs to SOCY 2061 at University of Colorado at Boulder taught by Randall Kuhn in Fall. Since its upload, it has received 30 views. For similar materials see /class/231815/socy-2061-university-of-colorado-at-boulder in Sociology at University of Colorado at Boulder.
Reviews for Introduction to Social Statistics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/29/15
CHAPTER 6 PROBABILITY Probability Statistics is about making ESTIMATIONS about a POPULATION from a SAMPLE In general there will be a difference between a population PARAMETER and a sample STATISTIC The rules of probability help us to make sense of this error Eg what is the probability that my SAMPLE actually looks like the POPULATION What is Probability The probability of an event A occurring depends on two things The number of outcomes indicated by A The total number of possible outcomes Probabilities range from O to 10 Who NT oral Probability Examples What is the probability of drawing the king of hearts One kind of hearts divided by 52 cards 152 019 What is the probability of drawing a king Four kings divided by 52 cards 452 077 What is the probability of drawing a heart 13 hearts divided by 52 cards 1352 250 If hearts are the only cards dealt what is the probability of drawing a face card king queen jack 3 face cards divided by 13 cards313230 Simple Random Samples The use of probability in statistical inference is based on the idea that a sample is simply a random selection of cases from a population Just as we can know what to expect from a poker hand or a random card drawn from a deck we should expect certain things from a random sampling from any population Our use of probability in assessing statistical samples is based on how well the data approximate a simple random sample Simple random samples have uniform probability of selection Each individual in the population must have an equal chance of being selected Convenience Samples simplify things by only samplings from certain strata or clusters Ex A stratified sample picks communities at random then choose individuals at random in each stratum An individual s weight in the sample can be adjusted up if he was less likely than others to be selection or down if he was more likely Some groups are artificially removed from the population like the institutionalized population Simple random samples have constant probability of selection There must be a constant probability for each and every case of selection Sampling must be done WITH REPLACEMENT of selected cases back into the population Example What is the probability of drawing the 9 of clubs 152 Draw Againnow what is the probability of getting the 9 of clubs With replacement 152 Without replacement 151 u The exact shape of the normal distribution is specified by an equation relating each X value score to each Yvalue frequency The equation is 1 2 2 Y e X u 26 2762 TI and e are mathematical constants ln simplerterms the normal distribution is symmetrical with a single mode in the middle The frequency tapers off as you move farther from the middle in either direction The normal distribution following a z score transformation 228 Z Proportion in Body Proportion in Tail 000 5000 5000 050 6915 3085 10 8413 1587 123 8907 1093 20 9772 0228 241 9929 0080 Z BODY Tail Learning to use the ZTable at the back of the Book The Unit Normal Table gives the proportion of values in a distribution above or below a score if You know the mean and sd of your sample You assume that this sample is normally distributed Transform your xvalues into zscores Then look up the zscore in the ztable to get the associated probability Keep in mind that the probability can either be expresses as the proportion in the tail or the proportion in the body The Unit Normal Table A List of z scores B Proportion in body of normal distribution up to z score C Proportion of normal distribution located in tail beyond z score D Proportion between the mean and the zscore value A lt0 C 0 2 Proportion Proportion Proportion 1n Bodv 1n Tail Bel Mean and z B 0 00 5000 5000 0000 0 01 5040 4930 0040 M90 I 0 02 5030 4920 0000 O 03 5l20 WW W 5032 4130 0332 o 22 5371 4129 0371 o 23 5910 4090 0910 O 241 5945 4052 0948 o 25 5937 4013 0987 c 026 4023 3974 l026 027 3034 3933 1064 M900 2 0 23 6l03 3897 1103 029 3141 3059 1141 030 6179 3321 179 031 3217 3703 1217 D 32 gt255 3745 l255 0 33 0293 3707 1293 0 34 3331 3039 1331 Mean 2 Example United States education distribution Mean is 14 Standard Deviation is 2 What is the probability of randomly drawing a person with greater than 16 years of education Convert the X to a Zscore Z161421 Find the probability associated with that zscore z1 pzgt1 1587 1587 5415 z 0 1 X 14 16 1Probabiity of more than 16 years pxgt16 or pzgt1 1587 2Probabiity of having more than 14 years pgt1450 3Probabiity of having more than 14 but less than 16 years of education is 50 15873413 Another Example Normally distributed population of test scores with mean of 80 and a standard deviation of 10 Find the probability of pXgt85 Tail or body or between Tail Find the Zscore 85801005 Convert Zscore to a probability p03085 Find the probability of pXlt95 Tail or body or between Body Find the Zscore 95801015 Convert Zscore to a probability p9332 A Few More Find the probability of p80ltXlt95 Tail or body Between Find the Zscore 95801015 Convert Zscore to a probability p4332 Find the probability of p85ltXlt95 Tail or body or between Must compare two values We know that pXgt85 03085 Since pXlt95 09332 then pXgt95 00668 80 p85ltXlt95 0308500668 02417 Or we know that pXlt95 09332 Since pXgt85 03085 then pXlt85 06915 80 p85ltXlt95 0933206915 02417 CHAPTER 7 REVIEW OF STANDARD ERROR What is Standard Error Standard Deviation ox On averagehow much do the Zoe m2 scores in a distributiondeiiate O39X T from the mean of the distribution Describes the distribution ofX Standard Error 0 On averagehow much do the sample means given a g particular sample size deviate X from the population mean Describes the distribution of X i W Student Sample 1 Sample 2 Sample3 Sample4 Sample 5 1 30 40 40 30 25 2 39 32 39 34 26 3 36 13 27 32 35 4 29 20 38 30 38 5 32 37 27 37 40 6 37 33 29 32 27 7 28 34 40 37 25 8 29 35 26 40 36 9 36 27 37 25 38 Mean SD 039 08 Sample Means 1 33 2 30 3 34 4 33 WE ESTIMATE THE STANDARD DEVIATION OF THE DISTRIBUTION OF SAMPLE MEANS BY 5 32 USING THE STANDARD DEVIATION FROM ONE Mean 32 SAMPLE AND DIVIDING IT BY THE SQUARE SD 014 ROOT OF THE SAMPLE SIZE 393 13 Standard Deviation and Standard Error Zscores forX Uses Zscores for X Standard Deviation uses Standard Error Example If scores on second exam are Z Z x I known to be distributed with a 0 mean of 83 and a standard W deviation of 9 What is the probability of obtaining a Z M 100 sample of 9 students whose 2 mean test score is less than 5 80 Z scores and Probability 00 80 834 4602 Z 5000 82 4207 3821 3446 3085 2743 299 Plt1587 2119 1841 39 558 AA io in L4 0 bilb t k L CgtIV 1357 The Important Stuff Standard error is the standard deviation of sampling means We don t really observe distribution of sampling means Therefore we don t actually calculate standard error like we do standard deviation Instead we estimate SE by dividing the SD of a sample by the square root of the sample size As we increase our sample sizewe naturally decrease the error associated with our estimate SE identifies the probability that your sample statistic resembles the population parameter CHAPTER 8 HYPOTHESIS TESTING Hypothesis Testing The Logic Scientific method Control Group no treatment administered Experimental Group gets treatment In experimental design the only thing different between these two groups is the treatment Thereforeif there is a change in the experimental group you know that it was caused by the treatment The Scientific Method a researcher predicts what happens before the treatment is administered This prediction is called a hypothesis The hypothesis is an a priori statement about the expected outcome of an experiment based on previous research Step 1 State the Hypotheses Null Hypothesis H0 States that the treatment will NOT have an effect In general the null hypothesis states that there is no change no effect no differenceetc Your sample mean is not different from the population mean Alternative Hypothesis H1 States that there treatment WILL have an effect In the language of the experimental designthe alternative hypothesis states that the independent variable will affect the dependent variable Your sample mean is different from the population mean Errors in Hypothesis Testing Actual Situation No Effect Effect HO true HO false Reject Ho Type Error Correct Effzztl IS alpha a Decision Decision Retain Ho39 Correct Type II Error E eCt is Decision not real Error continued Type I Error when you reject a true null hypothesis Also called a false positive Ex concluding that the treatment had an effect when in fact it did not Probability of committing a Type I error given by alpha Type II Error when you fail to reject a false null hypothesis Ex concluding that there is no effect when in fact there is an effect Step 2 State Criteria for Your Decision What level of uncertainty are you comfortable with This comfort level is described by alpha or Alpha is the probability that the null hypothesis is true To be 95 certain that the treatment has an effect eg support for the alternative hypothesis the alpha level is 05 When we say that a result is statistically significant we mean significant given a certain alpha level I am 90 95 99 sure that the null hypothesis is rejected The confidence interval or margin of error in statistical results is based on the same concept Ex If a poll said Bush was 4 points ahead of Kerry with a 3 point margin of error the 3 reflected a chosen alpha level probably 95 Reject H0 95 Probability that H0 is true eg that the treatment had no effect Reject H0 I 21196 M from H0 Z 196 P 025 if Alpha Level 2 05 P 025 Ciitical Region 5 chance that the null hypothesis is actually true Step 3 Collect the Data and Compute the Sample Statistics You need a mean p and a standard deviation 0 from the population and a mean from your sample x to be able to make a decision x Ol39 Step 4 Make a Decision Two possible decisions 1 Reject the Null Hypothesis find evidence that there is a difference between the treatmentand control group 2 Fail to Reject the Null Hypothesis not able to distinguish between the control and experimental groups Example Rats and Cheese Suppose you know that it takes the average rat 10 minutes to find a lump of cheese in a maze with a standard deviation of 2 You develop a training regimen for rats and you want to know if has any effect on them You train 25 rats and then time them through the same maze and they complete the course in 9 minutes Did the regimen actually impact the rats State the Hypotheses and set the Criteria for the Decision Null Hypothesis the training system will not impact the rat s time through the maze Alternative Hypothesis the training system will significantly impact the rat s time through the maze What is your criteria Alpha 05 Collect the Data and Compute the Sample Statistics x Z 0 Population Mean 10 Population Standard Deviation 2 V N Sample Mean 9 Sample Size 25 9 10 Z 2 25 Make a Decision Z 250 plt0062 Decision reject the null Conclude difference in rat s time You can conclude that there is a difference in the times because the pvalue is less than 05 alpha Similarly you can make this conclusion because the zscore is greater than 196 One or Two Tails That last example tested the hypothesis that there was a difference between the two groups of rats It did not state anything about the direction of the expected effect If the hypothesis was changed so that rats with training were predicted to have LOWER times than those who did not then we have to change our ALPHA level a little bit Reject H0 95 Probability that H0 is true eg that the treatment had no effect Reject H0 I 21196 M from H0 Z 196 P 025 if P2025 Alpha Level 2 05 Ciitical Region 5 chance that the null hypothesis is actually true Reject H0 95 Probability that H0 is true eg that the treatment had no effect 21165 H from H0 P05 Alpha Level 2 05 Ciitical Region 5 chance that the null hypothesis is actually true ltONm OOWWmanZOZ gt20 zmgtN Emmmmmmaz Examples of different values for correlations a 0 Weak relationship v I v I 39 r 90 39 39 39 r 40 Strong relationship X X C d 39 r 0 E erfect relationship 39 39 No relationship r10 X Hypothesis Test for Pearson Correlation Does a nonzero correlation differ from zero given the p value and the sample size Testing a null and alternative hypothesis H0 p0 there is no correlation H1 p 0 there is a real correlation Degrees of freedom are n2 Given a sample of size n only n2 items can vary Choose an alpha level Use Appendix Table 86 in the back of the textbook part of it is on the next slide Section of table of critical values for Pearson correlation One Tailed Probabilities 005 0025 0005 00005 TwoTailed Probabilities DF 01 005 001 0001 2 09 095 099 0999 3 0805 0878 0959 0991 4 0729 0811 0917 0974 5 0669 0754 0875 0951 6 0621 0707 0834 0925 7 0582 0666 0798 0898 8 0549 0632 0765 0872 9 0521 0602 0735 0847 10 0497 0576 0708 0823 11 0476 0553 0684 0801 12 0458 0532 0661 078 13 0441 0514 0641 076 Taking the example from previous class of hours studied and exam score Correlation was p068 sample size n12 Did this correlation differ significantly from 0 Degrees of freedom 10 Alpha level of 005 twotailed test So the critical correlation would be 0576 So a correlation of p068 with sample size n12 would significantly differ from O What does Correlation Not Do Correlation does not allow you to predict values of y given a known value of x For exampleif GPA and SES are positively associated with an rof 56 you can t estimate a students GPA if you know their SES You can only say that students with higher levels of socioeconomic resources have on average higher GPAs Regression Regression addresses this shortcoming by estimating a the relationship between two variables as a linear function In other wordsit plots the points on a scatter plot and it DRAWS the BEST FITTING LINE through these data In contrast correlation just looks for any kind of relationship What Does Linear Regression Look Like Trying to nd the line that fits the data the best eg has the least error E g E x 2 c a g y y ttlngllne 39 r0 EE 8 The Best Fitting Line Minimizes the Difference Between the Observed and the Expected Value 100 95 90 85 8O 75 7O 65 60 55 50 Exam Score Expected 72 O 5 1O 15 20 25 Hours Studied What do you need to Describe a Line Intercept Where the line crosses the Yaxis vertical axis Slope How steep the line is magnitude The direction of the line sign Positive Negative SlopeIntercept Form Suppose you have two variables X and Y If x and y are related to one another in a linear fashion then Y a bx a intercept b slope Example Slope Intercept Form Yltgt Rise1 Slope 12 O I Run2 f quot Intercept Slope Example Slope Intercept F Rise1 Slope 12 Run 2 f 5 f I x 2 4 f v Intercept Slope Example Slope Intercept For Yltgt A Rise1 Slope 12 V Run 2 5 x Y104x Y10X Y10 4X Y104X Y10 4X Y10X Y104X Y10 4X The Solution for The Least SquaredError Solution Ordinary Least Squares OLS Y INTERCEPT SLOPEX b exam b a6 165 SSX 2 39 x x x y JC x2 1O 4 What does Y 2 16x look like 1 Standard Error for Regression Estimates Like other statistics sample regression coefficients have se Pretty ugly and you almost never calculate it by hand but The only partthat you haven t already calculated by now is this Also it should look a bit familiar Similarto pooled variance Calculates difference between SS for Y and correlation between X and Y DF correction is n2 because you have two varia les ZoceExye i Y 976 29X Parameter Estimates Parameter Variable Label DE Estimate Intercept Intercept 976579 FAEDUC FATHER39S EDUCATION YEARS l 029375 Parameter Estimates Standard Variable Label DE Error Intercept Intercept l 02l23l FAEDUC FATHER39S EDUCATION YEARS 1 001700 Parameter Estimates Variable Label DE t Value Pr gt it Intercept Intercept l 4600 lt000l FAEDUC FATHER39S EDUCATION YEARS l 1728 lt000l Years of Schooling 976 29 er s Years of Schooling Is this effect significan Yes plt001 g is 29 significant different from O Parameter Estimates Parameter Variable Label DE Estimate Intercept Intercept l 976579 FAEDUC FATHER39S EDUCATION YEARS l 029375 Parameter Estimates Standard Variable Label DE Error Intercept Intercept 02l23l FAEDUC FATHER39S EDUCATION YEARS 001700 Parameter Estimates Variable Label DE t Value Pr gt ti 4 no lt0001 lt0001 pe 29375 by the standard error 01700 Intercept Intercept EAEDUC FATHER39S EDUCATION YE How is the tstatistic calculated Simply divide the value of the 29375017OO 1728 Because the nsize is greater than 120the tdistribution is similar to a zdistribution If the slope coefficient was not significantly different from 0 what would the tvalue be T lt 196if the null hypothesis is true ie if the slope is really 0 Hypothesis Testing With Regression Step 1 State the Hypotheses Null Hypothesis There is no relationship between x and y eg slope O Alternative Hypothesis There is a relationship between x and y eg slope at 0 Step 2 Locate the Critical Value Most regression analyses are done with samples larger than 120 Therefore the critical value for a twotailed test will almost always be 196 Step 3 Calculate the Tstatistic Simply divide the slope by the standard error Step 4 Make a decision If T gt 196 REJECT THE NULL and conclude that the slope is significantly different from O If T lt 196 FAIL TO REJECT THE NULL and conclude that the slope is not significantly different from O CHAPTER 11 HYPOTHESIS TESTS FOR RELATED MEASURES DESIGN Related Measures Designs Repeated measures of withinsubjects design You take the same particular people and measure the same variable twice before and alter a treatment Also referred to as a panel survey designquot Matchedsubjects study Two different samples ofpeople as in last class Butyou match each person in Group 1 to a speci c person in Group 2 and treat them as the same person can match people according to similar characteristics same age same sex etc Also referred to as a matched case control designquot Difference between conducting Independent and Related Measures Tests Independent measures Calculate mean for each group Calculate the difference between the means M1M2 Test a null hypothesis that the difference between the means equals 0 or that M1M2 Related measures Calculate the difference for each matched case Calculate the mean difference MX1X2 Test a null hypothesis that the mean ofa the differences equals 0 You act as ifyou havejust one sample and conduct a one sample ttest for the sample ofdifferences Hypothesis Testing with Means from Independent Samples Step 1 State Hypotheses r Null Mean difference Equals El HEI D El 7 Alternative Mean difference nut Equaltu El H1 D Step 2 identify the Critical Value 7 Aiphaievei T 7 Degrees at Freedurn h71 Step 3 Compute the Test StatiStiC 7 mm the T7statrstrerera single sample Step 4 Make a decision 7 if yuur test Statistic t7va1ue is greater than yuur ermea1 value then yuu than reject the hull hyputhesis ahe cunclude that there is a difference in t E means Example Score on Exam 1 and Exam 2 DX2X1 Testing the Hypothesis 2tailed Step 1 State Hypotheses Null Mean difference between Exam 1 and 2 is 0 HO39 0 Alternative The mean difference does not equal 0 H1 D 0 Step 2 Identify the Critical Value Alphalevel Cl 05 One orTwo Tails Two Degrees of Freedom 3 So the critical t value is 318 Computing the t statistic Calculate the standard error ofthe 57 5 D 0 9 m difference se 09 D n 2 39 Calculate the ttest score for M7 74 whether the estimated mean t D D difference differences from the 55 55 hypothesized mean difference 0 re556 i556 15 very big and Wellabove the 09 Which critical score 7 So We relect the null hypothesls and accept that the exam scores aye roved e Orlertalled test would look even better Now let sjump back and use an independent measures test on these data Again Exam 1 had mean of 43 sd of 256 Exam 2 had mean of 48 sd of 261 Null hypothesis Exam 1 and 2 have same mean H 2 p Alternative Exam 1 and 2 have different means Step 2 Identify the Critical Value Alphalevel 01 05 One or Two Tails Two Degrees of Freedom 6 So the critical t value is 245 Computing the t statistic Square sd to get variance 5 3 Calculate the sefor unpooled variances 28 5min Calculate the tstatistic 7 a 0 27 X 7 X2 7 s c pt th null Y hypothesls and strongly 5min relect any dlffererlce between Tlrne1al39ldTlrne2 4843 5 i ilt e 7 027 1828 71828 Magnitude of the Observed Difference For the related measures D design just divide the d mean difference by the 5 standard deviation for the 5 difference d It s a huge effect Magnitude of the Observed Difference using Independent Measures Approach 1 You can assume the am e sd since they are so close 2748743 5 019 s 261 261 d 0 79 very SmaH 2 You can calculate the wixsbwwnob pooled sd 2614 quotMd1 5 veryslmllar 2057 58 2043 63 J6685425 86 6 So 1 0 79 roughlythe same o19 25 86 25 86 So what happened here This is a key issue in research design By treating the two exams as independent in the second example we vastly overstated our error If we know these are the same people taking two separate exams we see that everyone improved But there was a lot ofvariance between people The person who started with a 14 moved up to 18 The person who started with a 72 moved up to 78 The second test treated them as entirely independent so the er or was dominated by the difference between the people 278 Variability and Consistency The variance in a related measures test indicates the consistency of an effec A treatm that consistently adds 5 points at any level of preexisting skill will have lowvariability A treatment that adds 10 points for some and 0 points for others will have hig error The independent measures test cannot isolate the treatment effect in the same wa Because we cannot separate the error in the effect 39om the error in the samples On the other hand not that many social science research questions fit the related measures setup Review of the Key Concepts 7 x7 l Tnis Zescure descrlbesthe relatlve leieatiein ufa panlcular seeire Z 7 x Wnen tne rnean H and standard deviatlun e are knuvvrl q 2 Tnis Zescure descrlbes tne relatlve leieatiein at a sample rnean Wnen tne pupulatlun rnean andsd e areknuvvrl Thesdf x Jquot distributien er rneans is ealled standard errerand is tne s d divided bythe sddare reieit at n 3 Same as abuve Except pupulatlun s d e is estirnated Witn sample s d s and tne test statistic lncurpurates sample size degrees at freedum 3 7 7 2 2 Tnis tests tne dlffererlce in twei meansfrum t X1 X2 5 ndependent samples Slmllartuthetrs tatls tlc SQ a n n abuve butse estirnate is rnere cumplex 5 Tnis tests tne relatlve leieatiein ufthe rnean eir tne dlfference etween scores from related sa mpes sirnilartei a butusesthe dlffererlce between WEI scares lnstead Elf CINE set Elf scares LINEAR AND MULTIVARIATE REGRESSION Final Problem Set Veronika r n r my Veronika is awebbased statistics engine at the University or Texas at stin You can analyze a single dataset the 1993 National Longitudinal Survey orvouth without learning to operate complex statistical packages he website uses Statistical Application System SAS in the background to conduct the analyses you request Regression Techniques Quantify the relationship between an independent variable x and the dependent variable y Y can be Categorical eg Nominal or Ordinal Continuouseg Interval or Ratio In this class we will only deal with continuous Y These models can incorporate any type of independent variable however in this class we are only concerned with two types Continuous Variables Binary Variables dummy variablesquot Dummy Variables A type of independent variable NOIR Nominal Ordinal Interval Ratio A nominal variable with two categories is often called a dummy variable Examples Male 1 Female 0 Black 1 NonBlackquot 0 Dummy Variables in Regression Regression slopes b indicate how much a one unit increase int e in ependent variable increases or decreases the dependent variable 7 YABgtlt e DependentvanaoieinterceotSLOPEindeoendentyanaoie Eg Yearly Income Intercept bEducation r Y 20000 i000x e A one yeariricrease in education is associated Witn a i000 increase in yeariyiricome Butwith a dummy variablethere are onlytwo categories so 7 Wnat does a one unit increase rnean fora variable tnat is coded eitner i or 0 Linear Regression and Dummy Variables Linear regression models treat these variables as if they are continuous Therefore a one unit increase in the independent variable can only mean an increase from the value 0 to the value Since these represent different groups eg women and men the value of the dummy variable is simply the difference in means for the two groups cc 1 L cores for dlfference 111 means he TTEST Procedure Statlstlcs Lower CL Upper CL Lower CL arlahle Class Meon Meon Meon std nev 693 quotthan El 351 23956 26648 11938 61293 Urban 1 1228 auzaa 32155 15an 61293 ruff r1rz r782 15 OUTPUT FROM THE REGRESSIDN ANALYSIS ooo MSE 15214 RrSooore oo222 ependen Mean 29557 Ad ReSo nn215 ueff Vor 5445371 pororoeoer Escmaces eoer arlahle Label Dr Eslmace ncercEpc IntercEp 1 253m Rosa nanquot RESID m v3 1993 1 55947n527 Yearly Income Intercept Slope urban Y 25302 5895 x 31197 5252M 1 J amnaur Anza 25am gtlt Urban Residence Multivariate Analysis Multivariate analysis often constitutes the final stage of data analysis It is best performed after researchers understand the characteristics of individual variables univariate analysis and the relationships between any two variables bivariate analysis Why do we need multivariate approaches to studying data There are two general reasons documenting collective effects and Accounting for potentially spurious factors From Sweet p131 Simple Vs Multivariate Regression Simple Regression r Y a r Y is tne dependentvariable and x is tne independent variable 7 Tnis only allows ONE irideperideritvariable Multivariate Regression Vaoixi beZ r Y is stiii tne dependent variable but now tnere are two independent variables xi and x2 7 Allows researchers to CONTROL for characteristics and identify both DlRECT and lNDlRECT EFFECTS r This can be extended to include a numberof independent variables Y a t bixi t b2gtlt2 t b3gtlt3 t b4gtlt4 Going back to the Scientific Method Goal is to determine causal relationships The independent variable x is hypothesized to CAUSE some change in the dependent variable y In a laboratory participants are randomly assigned to two groups 7 ontroi Group No experimentation is done 7 Experimental Group Experiment is performed Because these groups are believed to be identical with the exception of receiving the treatment any difference in the control group and experimental group is believed to be due to the experiment Multivariate Analysis and the Quasi Experimental Method Rarely do we have experimental data in the social sciences Instead we collect surveys from one point in time and it is not possible to experiment with these data in the classic sense Whereas the scientific method can determine causality by controlling for certain characteristics the quasiexperimental method can only use pseudocontrols in order to identify independent effects Reasons for Multivariate Analysis Capture Collective effects Create sophisticated models that incorporate many factors X s affecting Y Address Spurious effects Our effects may be explained by a third factor Ifwe do not control for that third factor our conclusions may be misleading We talked aboutthis in the last class The Elaboration Model Understand relationship between two variables through controlled introduction of other variables We focus on the relationship between one dependent and one independent variable But we try to control for the other intervening factors that might really explain the XY relationship This is because we can t achieve experimental control Two factors can explain away your effect Variable X2 causes both X and Y confounding effect Variable X causes X2 which causes Y intervening ect More on Elaboration Models You have an effect ofX on Y When you add in Z s the X gt Y effect can either be replicated it still exists or explained Replication may occur in some cases but not others It may work for some speci cations ofthe dependent variable but not others It may work in interaction with some other variables thus it works for Ex 1 What does it mean to control for something So perhaps professional status determines health So perhaps professional status determines income which really determines healt Ex 3 Confounding Effect Then education determines occupation income and health Maybe education is the real factor ofinterest Ex 4lndependent Effects on Health Error Predicted Values What change in health status would you predict for A 10year increase in age Controlling for other factors younger people are better schooled but earn less etc what is the effect of 1 0 years ofschooling on ea 10 027 27 health status declines by 27 with 10 years of a e Effect is only slightly signi cant A onethousand dollar increase in yearly income A bluecollar versus a whitecollar worker A 5year increase in years of education The difference between a man and a woman