Statistics for Sociologists III
Statistics for Sociologists III SOC 362
Popular in Course
Popular in Sociology
This 11 page Class Notes was uploaded by Deron Effertz on Thursday September 17, 2015. The Class Notes belongs to SOC 362 at University of Wisconsin - Madison taught by Charles Halaby in Fall. Since its upload, it has received 8 views. For similar materials see /class/205170/soc-362-university-of-wisconsin-madison in Sociology at University of Wisconsin - Madison.
Reviews for Statistics for Sociologists III
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/17/15
sociology 362 data exercise 3 dummy variables The following problems are to be done using dmycps1985idta a le that I will distribute via email attachmenti Once you open this le using stata type the command notes to get the identity and coding of all the variables 1 Fit the models hrwage a femi 6i and hrwage a southi 51 a interpret a and B in each tted model b satisfy yourself that the regression results do indeed correspond to what you would get with a ttest of the difference between two population means For example try ttest hrwagebyfem 2 Fit the multiple regression model hrwage a fsfemi sfsouthi sin a lnterpret the estimated coef cients of fem and south bi Generate the observed table of hrwage means for the four combinations of south and femi Try the stata command tab2 south femsummhrwagenosti cl Generate the table of tted means for the four combinations of south by femi You can just use the estimated model for this What is the estimated mean hourly wage for a female working in the south For a male working in the nonsouth d How does the tted table compare to the observed For example is the sex difference in the observed table the same for both regions as it is in the tted table 3 Suppose we are interested in the relationship of hourly wage hrwage to schooling edyrs but we have reason to believe that sex fem may be implicated in the relationship For each of the following estimate the null and alternative models7 and use the Ftest to arrive at your conclusions ai Assuming the schooling slope is the same for males and females test the hypothesis of no difference in intercepts ie no sex difference in mean Y when schooling is controlled bi Assuming the male and female intercepts are the same test the hypothesis that schooling has no effect on earnings cl Assuming the male and female intercepts are different test the hypothesis that schooling has no effect on earnings di Assuming that male and female intercepts are different test the hypothesis that male and female slope coef cients of schooling are equali er Test the joint null hypothesis of no sex difference in intercepts or in schooling slopes This is the socalled test of overall homogeneity or equality of regressions Use the sex dummy fem and an interaction term that you construct to test this hypothesis fl Again test the hypothesis of overall homogeneity of the regression of hrwage on edyrs for males and females but now represent the alternative model by running separate regressions of hrwage on edyrs for men and women 5 The alternative model of problem 4e can be written as hTWZQe 04 i d2 5i 2f mi sli6i 1 where Ii femi edyTsZi The alternative model of 4f can be written as hrwagem am medyrsm em hrwagef af fedymf 6f where the subscripts stand for male and female and the models are ts separately to male and female observationsi Show how the intercept and coefficients of the rst model are related to the intercepts and coef cients of the second model Use the estimates from the models tted above 6iSuppose we are interested in the relationship of hourly wage hrwage to schooling edyrs but we have reason to believe that race race may be implicated in the relationship For each of the following use the Ftest to arrive at your conclusionsi ai Construct in stata the appropriate dummy variables and then do the appropriate regression for testing the hypothesis of no overall differences in the mean hourly wages of workers of different races ignore schoolingi bi Assuming the schooling slope for the regression of wage on schooling is the same for workers of different races test the hypothesis of no racial differences in intercepts of the wage regression ie no race differences in mean hourly wage when schooling is controlled cl Assuming race differences in intercepts test the hypothesis that schooling has no effect on earnings di Assuming racial differences in intercepts test the hypothesis that there are no racial differences in slope coef cients of schooling ie no racial differences in the wage returns to schoolingi er Test the joint null hypothesis of no racial differences in intercepts or in schooling slopesi This is the socalled test of overall homogeneity or equality of regressions Use the race dummies fem and interaction terms that you construct to test this hypothesis fl Again test the hypothesis of overall racial homogeneity of the regression of hrwage on edyrs but now represent the alternative mo el y running separate regressions of hmage on edyrs for the three races Make sure you understand how the tted models doing the test this way correspond to the tted model of problem er sociology 362 ttest for coefficient and Ftest for model in a simple regression null m ode l a regress hrwage Source 1 SS df MS Number of obs 515 1 F 0 514 000 Model 1 000 0 Prob gt F Residual 1 12374963 514 24 0758035 R squared 00000 1 Adj R squared 00000 Total 1 12374963 514 240758035 Root MSE 49067 hrwage 1 Coef Std Err t Pgt1t1 95 Conf Interval cons 1 9088874 2162155 4204 0000 8664099 9513649 alternative model b regress hrwage edyrs Source 1 SS df MS Number of obs 515 1 F 1 513 9772 Model 1 198006338 1 198006338 Prob gt F 0 0000 Residual 1 103948996 513 202629622 R squared 01600 1 Adj R squared 0 1584 Total 1 12374963 514 240758035 Root MSE 4 5014 hrwage 1 Coef Std Err t Pgt1t1 95 Conf Interval edyrs 1 823475 0833033 9885 0000 6598174 9871327 cons 1 1774601 1116715 1589 0113 3968497 4192961 Keypoints 1 There s a theorem that says that ifyou square a t ratio that has n k 1 degrees of freedom you get an F ratio with 1 df in the numerator and n k 1 dfin the denominator 2 Since the dependent variable is the same in the null and alternative models so too is the SStotal That is SStotalnull SStotalalt Since the regression identity states that SStotal SSreg SSresidual it follows that SSre gnull SSre sidualnull SSregalt SSresidualalt This means that as we move from the null to the alternative model any decrease in the SSresidual will be accompanied by an increase of equal size in the SSreg Hence for the models above the decrease in the SSresidual as we move from null to alternative is 19800633 8 which is precisely the increase we witness in the SSregression as we go from null to alternative s whm mum 50 MW n max w hm m Mimimg rem mm M be mm m m cases mm mm de 7 59km Rgadfa 11A SResniAtyA zhemmve mmd dcnmcd by A X samumcs 21ch m unmmmwmm 1 xc fmmzbovcwchzvew885V 9772 4 my sociology 362 logistic regression example The data for this example come from the 1985 cps The y variable called rich is a dummy coded 1 if hourly wage is in the 90m percentile or above 0 otherwise Let s begin by fitting the linear probability model This is just a simple regression 1 regress rich edyrs Source 1 SS df MS Number of obs 515 1 F 1 513 4501 Model 1 331257874 1 331257874 Prob gt F 00000 Residual 1 377553824 513 073597237 R squared 00807 1 Adj R squared 00789 Total 1 410679612 514 079898757 Root MSE 27129 rich 1 Coef Std Err t Pgt1t1 95 Conf Interval edyrs 1 0336817 0050204 6709 0000 0238186 0435448 cons 1 3569582 067301 5304 0000 4891777 2247387 Now let s get the fitted probabilities of being rich for each year of schooling 2 predict lpmprob option xb assumed fitted valuem Note that 1pm prob is the name I chose for the variable that holds the fitted probabilities 3 tab edyrssummlpmprob years of Summary of Fitted values 1 schooling 1 Mean Std Dev Freq 800 1 08750459 0 15 900 1 05382289 0 12 1000 1 02014119 0 17 1100 1 01354051 0 27 1200 1 04722221 215 1300 1 08090391 0 36 1400 1 11458561 0 55 1500 1 1482673 0 13 1600 1 181949 70 1700 1 2156307 0 24 1800 1 2493124 0 31 Total 1 08737864 08027892 515 Notice that three of the fitted probabilities are negative This won t happen when we fit the logistic working on the logit scale Here s the logistic model 4 logit rich edyrs Logit estimates Number of obs 515 LR chi21 4078 Prob gt chi2 00000 Log likelihood 13226951 Pseudo R2 01336 rich 1 Coef Std Err z Pgt1z1 95 Cohf Interval edyrs 1 4272412 071766 5953 0000 2865824 5678999 cohs 1 8405214 1095768 7671 0000 1055288 6257549 stata gives the coefficient 427 on the log odds scale we can ask stata to produced a fitted equation for this model in which case it will give a fitted log odds of being rich for each year of schooling Here it is predict lg oddsxb tab edyrssummlg oddm years of Summary of Linear predictioh 1 schooling 1 eah Std Dev Freq 800 1 49872842 0 15 900 1 45600429 0 12 1000 1 4132802 0 17 1100 1 37055607 0 27 1200 1 32783196 215 1300 1 28510783 0 36 1400 1 24238372 0 55 1500 1 19965959 0 13 1600 1 15693547 70 1700 1 11421134 0 24 1800 1 7148723 0 31 Total 1 27689485 10183116 515 Well there s not much to be said about this except that the log odds of being rich increase by 427 for each year of schooling Now let s look at the graph of the fitted log odds agaihst schoolihg it should be a straight line with slope 427 and intercept 840 5 gteph lugudds edyte tweet pred mmn Ware M mew Well we eehvt see the hteteept heeeeee 1t eeeete et eeyte a wheeh 5 net eh the eete Letve meve te the eeee eeele worklng oh the odds scale Letve hegeh by eehvettehg the legeeee eeetteeeeht A27 te eh eeee tetee I Just M the tenemhgv odds tetee exyt 4212412 1 533022 et I eeh Just ask Stats te geve me the adds tetee exteetxy thh the texxewehg eemehev 5 lugxt et 19th estxmates uemhet e eh 515 LR ehezm A0 78 qub gt ch12 a noun Leg Jxkelxhuud e132 2695 Yseudu x2 0 1336 teeh Odds Ratxu Std tt 1 rgtzx 95x Cunf xhtetven eeyts 1 533022 1100189 5 953 a one 1 331868 1 76A557 The next thehg I heed te d9 es eehstteet the fxtted adds e heehg teeh end then eespxey them set eeeh yeet e seheexehg 5 gee eeee xp11gudds 7 tee eeyteenm eeeey yeeee et enmmeey et eeee Schuulxng Mean std Dev wete 1975929 12529991 515 Th1 wuuld be e geee txme te make enee yen knuw new te nee tte expunentxa tnnetnen en yene calculatux F9 exemene yen Shuuld tnne ttet nt yen take the negeeeee et nenng enet tee 5 yeeee et Schuulxng e1 9995959 yen een eenveet nt te tte tnttee enenn eeee 19579977 Hence exelte1 9995 y 195799 Th1 wuuld else be e geee txme te get en Antuxtxve fee tee tte fuxmula39 Odds tetne exem exm 4272412 1 59 wetnee ttet tee eny twe yeeee et Schuulxng vennee ttet eee ene yeee eeeet tte eetne et tte eeee et nenng enet et tte tngtee Schuulxng value te tte eeee et te Juwe value eqnene 1 59 New Jet39s geeet tte tnttee enenn eeee They wxll net geeet ee e eteengtt nnne geeet egennet Schuulxng ttey wxll en expunentxa tnnetnen a gteen eeee eeyte working on the probability scale Remember that earlier I ran the model logit rich edyrs Since I haven t fit any other models since then stata knows that when I issue the following command I am referring to that earlier model and asking for the fitted probabilities 9 predict probp prob is the name that I chose for the variable that holds the fitted probabilities 10 tab edyrs summprob years of Summary of Prrich S d v 1 schooling 1 Mean Freq 800 1 00677792 0 15 900 1 0103533 0 12 1000 1 01578473 0 17 1100 1 02399644 0 27 1200 1 03632249 215 1300 1 0546256 0 36 1400 1 08137297 0 55 1500 1 11956079 0 13 1600 1 1723084 70 1700 1 24193253 0 24 1800 1 32852313 0 31 Total 1 08737864 08665454 515 You should convince yourself that you could compute these fitted probabilities by hand from the original logit output For exmaple suppose I want the estimated probability that someone with 15 years of schooling is rich The first thing I do is compute the fitted logodds of being rich at 15 years of schooling From what we did above that figure is 19965959 Then we do the following Prob rich15years exp 1 9965959 1exp 1 9965959 135796771 13579677 11956079 note that the numerator of the probability is just the plain odds of being rich that was computed above Now let s graph the fitted probabilities against schooling The curve will be a logistic although pmmm Vears m scnuoung 1t 199k e lat Jeke eh expehehhee heeehse the ehehgh he hegeh he shew the e e se pxubabxlxty hevee gehs hxgh haractexxstx e 1 Sxtxc shep Bfthe e9 9w well dB the xtted pxubabxlxtxes match up he s One quesheeh we mxght ask es h the eehhemmy ehsee Peepeeheehs e Peepme whu eee xxch at eeeh yeee seheemehga These absexved Peepeeheehs eee ease estxmates s the true pxubabxlxty meeevs the ehsweev 1tted edyzs zxuh rub a m mmmm m mmsa 9 a mm a m 0104 1a m mmmm m 0159 11 m mmmm m m24m 12 m 0219 m 0353 13 m 0555 m 0545 14 a mass m 0914 15 m 1539 m 1195 15 12 123 11 m 2911 m 2 19 1a m 3225 m 3295 new Jess n Peeeeehhy es he Pee yeee es semhne M m m A7 087378691262144272 0341 heee s Mum es we n We ma es the megeshee eeesseeeehh The seghee we get es veey emese he the estxmate yxelded by the meheee pxubabxlxty made en the fxxst P ge