Class Note for PUBHLTH 640 at UMass(4)
Class Note for PUBHLTH 640 at UMass(4)
Popular in Course
Popular in Department
This 21 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Massachusetts taught by a professor in Fall. Since its upload, it has received 22 views.
Reviews for Class Note for PUBHLTH 640 at UMass(4)
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Puleth 640 Intermediate Biostatistics Unit 5 Logistic Regression Practice Problems Week 8 SOLUTIONS 1 Source Kleinbaum Kupper Miller ansz39zam Applied Regression Analysis and OtherMultivariable Methods Third Edition Paci c Grove DuxburyPress 1998 p 683 problem 2 A five year followup study on 600 disease free subjects was carried out to assess the effect of 01 exposure E on the development or not of a certain disease The variables AGE continuous and obesity status OBS the latter a 01 variable were determined at the start of the followup and were to be considered as control variables in analyzing the data A State the logit form of a logistic regression model that assesses the effect of the 0 1 exposure variable E controlling for the confounding effects of AGE and OBS and the interaction effects of AGE with E and OBS with E S olution logit7r 50 33 32AGE 33OBS 34AGEE 35OBSE I used the following notation TE Probability disease AGEE AGE E This is a created variable that is the interaction of AGE with E OBSE OBS E Similarly this is the interaction of OBS with E logit7r 30 53 32AGE 33OBS 34AGEE 35OBSE Wksisolutionsdoc Puleth 640 Intermediate Biostatistics B Given the model you have for part A give a formula for the odds ratio for the exposuredisease relationship that controls for the confounding and interactive effects of AGE and OBS Solution The solution here follows the ideas on pp 911 in Lecture Notes 5 Logistic Regression Value of Predictor for Person who is Exposed Not Exposed Predictor E l 0 AGE AGE1 AGEo OBS OBS1 OBSo AGEE AGE1 0 OBSE OBS1 0 Then OR exp logit7 for exposed person logit7 for NON exposed person eXP i Bo B1 l B2AGE1 l B3OBS1 l B4AGE1 l B5OBS1 39 B0 B2AGE0 B3OBS0 eXP 31 l 32AGE139AGE0 33OBS1 39OBSO l 34AGE1 35OBS1 C Now use the formula that you have for part B to write an expression for the estimated odds ratio for the exposuredisease relationship that considers both confounding and interaction when AGE40 and OBSl S olution O exp B1 40B4 B5 Value of Predictor for Person who is Exposed Not Exposed l 0 AGE 40 40 OBS l l AGEE 40 0 OBSE l 0 OReXP B1 32403940 331391 3440 351 eXP B1 3440 351 Predictor E Wksisolutionsdoc Puleth 640 2 OPTIONAL This problem asks you to try your hand at one regression diagnostic checking linearity 0n the logit scale The data source for problem 2 is the depression data set that you worked with during week 7 Intermediate Biostatistics Source A fi A Clark VA andMay S ComputerAidedMultivariate Analysis Fourth Edition Boca Raton Chapman anal Hall 2004 Recall that these data are a study of depression and was a longitudinal study The purpose of the study was to explore the correlates of occurrence of depression with respect to several types of variables demographics life events stressors physical health health services utilization medication use lifestyle and social support Again the data are available in SAS and STATA formats here and from the course website httpwwwuniXoitumassedubiep640w Consider the following two variables Variable Codings Format in SAS Label in STATA CASES 0 Normal 1 Case of Depression CASES CASES AGE Continuous years A Using the software of your choice obtain the values of the quartiles of age S olution P25 280 P50 425 P75 590 Wksisolutionsdoc Puleth 640 Intermediate Biostatistics B Using these quartile values create a set of three design variables that are indicators of the 2quotd 25 3rd 25 and upper 25 of the values of age Produce a frequency table distribution of each of these three indicator variables Hint Design variables also called indicator or dummy variables were introduced in Topic 2 Regression and Correlation See pp 5153 1 ageZ l Freq Percent Cum 111111111111111111111111 0 l 222 7551 75 51 1 l 72 2449 100 00 111111111111111111111111 Total l 294 10000 1 ageB l Freq Percent Cum 111111111111111111111111 0 l 215 73 13 73 13 1 l 79 26 87 100 00 111111111111111111111111 Total l 294 100 00 1 age4 l Freq Percent Cum 1111111111 111111111111111111111111 0 l 226 7687 76 87 1 l 68 2313 100 00 1111111111 111111111111111111111111 Total l 294 10000 C Fit a logistic regression model for the outcome CASES that includes as predictors the design variables you produced in part B Logistic regression Number of obs 294 LR chi23 366 Prob gt ch12 03010 Log likelihood i13223409 Pseudo R2 00136 cases l Coef Std Err z Pgtlzl 95 Conf Interval 11111111111 1111111111111111quota111111111111111111 1 age2 l 70300523 4045487 7007 0941 78229533 7628486 1 age3 l 75166637 4301908 7120 0230 71359822 3264948 1 age4 1 77099543 4702295 7151 0131 71631587 2116785 cons l 71304949 2818673 7463 0000 71857398 7752499 Wksisolutionsdoc Puleth 640 Intermediate Biostatistics 5 D Fill in the following table Notice that the estimated regression coef cient for the referent group the lowest quartile ofAGE is 0 Quartile Mldpomt 5 se 1 23 0 0 2 34 0 0301 04045 3 51 05167 04302 4 68 07 100 04702 E Construct a plot of XMidpoint versus Y If you re feeling ambitious construct a similar lot with accoman in 95 con dence limits Puleth 640 2009 Week 8 Solutions Assessment of Log Linearity 40 so midpoint Smrce week87640png F From the plot you constructed in part E what do you conclude about the linearity of logit CASES in AGE S olution Logit CASES is marginally linear in AGE Wksisolutionsdoc Puleth 640 Intermediate Biostatistics For stata v 10 users Note comments begin with asterisk and are in blue Commands to stata are in bold black FILE gt LOG gt BEGIN will start a recording a log of your session Be sure to chose the extension log so that you can cut and paste into a word document later Command SET MORE OE E prevents screen by screen pausing of results set more off Use the drop down menu FILE gt OPEN will read in the data set depressdta use quotUserscarolbigelowDesktop1 Teachingweb540data setsdepressdtaquot SOLUTION for ZA Obtain the Quartiles of AGE Use command CENTILE With option c255075 to obtain the 1st 2nd 3rd quartiles centile age c255075 H Binom Interp H Variable 1 Obs Percentile Centile 95 Conf Interval iiiiiiiiiii Jillquotiquotiiquotquotiiquotiquotquotiiquotquotiquotiquotquotiiiiiiiiiiiiiiiiiiiiiii age 1 2 94 25 28 2 6 31 1 50 42 5 38 47 1 75 59 57 61 Create a new Variable AGEQUART that has Value equal to quartile generate agequart 1agelt28 2agegt28 amp agelt425 3agegt425 amp agelt59 4agegt59 if age label variable agequart quotAGEQUART Quartile of Agequot tabulate agequart AGEQUART Quartile of 1 Age 1 Freq Percent Cum iiiiiiiiii iiiititititiiiiiiiiiiiiiiiiiiiiii 1 1 75 2551 25 51 2 1 72 2449 50 00 3 1 79 2687 76 87 4 1 68 2313 100 00 iiiiiiiiii iiiititititiiiiiiiiiiiiiiiiiiiiii Total 1 294 10000 SOLUTION for 23 Create three design variables Produce a frequency table of each generate i7age2agequar 2 generate generate tabulate i age3 Freq Percent Cum iiiiiiiiii iiiiquotitititiiiiiiiiiiiliiiiiiiiii 0 1 215 7313 73 13 1 1 79 2687 100 00 iiiiiiiiii iiiititititiiiiiiiiiiiiiiiiiiiiii Total 1 294 10000 Wksisolutionsdoc Puleth 640 Intermediate Biostatistics 7 tabulate iiage4 i age4 1 Freq Percent Cum ttirtttttttttttttttttttttttttttt 0 1 226 7687 76 87 1 1 68 23 13 100 00 t1ttttttttttttttttttttttttt Total 1 294 100 00 SOLUTION for ZC Fit a logistic regression model using the command LOGIT and the 3 design variables just created logit cases iiagez ifageB iiage4 Logistic regression Number of obs 294 LR chi23 366 Prob gt chi2 03010 Log likelihood i13223409 Pseudo R2 00136 cases 1 Coef Std Err z Pgt1Z1 95 Conf Interval 771tttttttttttttttatttttttttttttttttttttt i age2 1 50300523 4045487 5007 0941 58229533 7628486 i age3 1 55166637 4301908 120 0230 1359822 3264948 i age4 1 57099543 4702295 151 0131 1631587 2116785 cons 1 51304949 2818673 463 0000 51857398 752499 SOLUTION for 2D Fill in the table With midpoint of age and betas and se39s Use command SORT to sort data by quartile of age sort agequart use command CENTILE With option C50 preceded by BY command to obtain midpoints by agequart Centile age C50 rgt agequart 1 quot Binom Interp quot Variable 1 Obs Percentile Centile 95 Conf Interval iiiiiiiiiii quotiiiiiquotiiquotiiiiiquotquotiquotiquotquotiiiiiiiiiiiiiiiiiiiiiii age 1 75 50 23 22 24 7gt agequart quot Binom Interp quot Variable 1 Obs Percentile Centile 95 Conf Interval quot Binom Interp quot Variable 1 Obs Percentile Centile 95 Conf Interval 7gt agequart 4 quot Binom Interp quot Variable 1 Obs Percentile Centile 95 Conf Interval iiiiiiiiiii quotiiiiiquotiiquotiiiiiquotquotiquotiquotquotiiiiiiiiiiiiiiiiiiiiiii age 1 68 50 68 6538909 71 Wksisolutionsdoc Puleth 640 Intermediate Biostatistics 8 SOLUIION for 215 NOTE The solution to 2E requires saVing depressdta clearing the work space creating a new little data set with the points that you want to plot saVing it and using it Use the drop down menu EILE gt SAVE AS to save your enhancements to your data Before creating a new data set use the command CLEAR clear Use the drop down menu DATA gt DATA EDITOR to create a new data set When you re done entering data exit the data editor Use the drop down menu FILE gt SAVE AS to save your new data as WEEKSPLOTdta save quotUserscarolbigelowDesktopweek8plotdtaquot file UserscarolbigelowDesktopweekSplotdta saved Use the drop down menu EILE gt OPEN to actually use your new data set use quotUserscarolbigelowDesktopweekBplotdtaquot Use command LIST to check creation of data set list aaaaaaaaaaaaaaaaaaaaaaaaaaa 77 1 midpoint betahat sebeta 1 iiiiiiiiiiiiiiiiiiiiiiiiiii 1 1 1 23 0 0 1 2 1 34 n 0301 4045 1 3 1 51 n 5167 4302 1 4 1 68 n 71 4702 1 eeeeeeeeeeeeeeeeeeeeeeeeeee 77 Use command GRAPH TWOWAY CONNECTED followed by Y Variable first then X Variable graph twoway connected betahat midpoint NOTE Once the basic graph has been created for you you can click on the graph editor it s an icon that looks like a histogram and try your hand at creating titles etc Mien you are done he sure to save your graph by clicking in the icon that looks like a disk A good choice for the extension is png Wksisolutionsdoc Puleth 640 Intermediate Biostatistics For S S users BE640 Intermediate Biostatistics 2009 Week 8 Logistic Regression options nocenter nodate libname in zbigelowteachingweb640data sets Use Depression Dataset from Afifi Clark and May data tempkeepage cases set indepress run proc freq datatemp tables cases run quit Homework Ouestion 2A Obtain the quartiles of age proc univariate datatemp var age run Homework Question 25 Create 1 a set of three indicator variables that are indicators of the 2nd 3rd and 4th 2590 of values of age and 2 an ordinal variable with values 1 2 3 4 for quartile of age data temp set temp initialize indicators to missing Iage2 Iage3 Iage4 Use logical operators to define 01 indicators If agegtZ then do Iage2280 lt age le 425 Iage3425 lt age le 590 Iage4590 lt age Create ordinal variable with values 1 to 4 agegr p1 1Iage2 2Iage3 3Iage4 end run CHECK proc sort datatemp by agegrp run proc univariate datatemp by agegrp var age run proc freq datatemp Wksisolutionsdoc Puleth 640 Intermediate Biostatistics by agegrp tables Iage2 Iage3 Iage4 run Homework Question 25 Produce a frequency distribution of the three indicator variables proc freq datatemp tables Iage2 Iage3 Iage4 run Homework Question 20 Fit a logistic regression model of outcome CASES with predictors the three design variables Use option DESCENDING so that event modeled is CASES1 proc logistic datatemp descending model casesIage2 Iage3 Iage4 run Homework Ouestion 2D Obtain midpoint of age in each quartile for plotting later note data is already sorted by agegrp proc means datatemp MEDIAN by agegrp var age run Data set for plotting data temp2 input midage beta sebeta upp beta 196sebeta low beta 196sebeta cards 23 0 0 34 00301 04045 51 05167 04302 68 07100 04702 run Homework Ouestion 2E Plot of X midpoint of age versus Wksisolutionsdoc Puleth 640 Intermediate Biostatistics Y estimated coefficient from logistic model goptions resetsymbol symbol1 vsquar e colorblue ij pr oc gplot datatemp2 plot betamidage title Estimated beta by Midpoint of Age r un Homework Ouestion 2E Similar plot with 9590 confidence limits goptions resetsymbol symbol1 vsquar e color r ed ij symb012 vsquar e colorblue i symbols vsquar e color r ed 1 pr oc gplot datatemp2 plot uppmidage betamidage lowmidageover lay title Estimated beta by Midpoint of Age r un quit Estimated beta by Micpoint of Age Estimated beta by Mlcbomt of Age its 7 7 5 L 3 l a 6 i 41 M j quot 39 Mg pr K V l i 4 0 3 0 q n m M m 1 w m NW W Week8701jpg Wee68702jpg Wk87solutionsdoc 11 PuhHlvh m lnvermerliave Binmu39su39cs Fur SPSS users 2A Analyze gt Desmpuve Stausucs gt Frequmnes 5m 5 age m eavsat as mum N Vahd BA Mesmg g Pevcenmes 25 28 EIEIEIEI an 42 5mm 75 59 Hum 23 1Tmnsfurm gt Recudegt 1mg Dxffaem Vanables I Recnde imn Dillemnl Variables Numechauah e gt DulDuWauaMe WWWEHENE Name ggepeeeeg e eeeg gte uz Age 29 m o2 lag ns Age as m 59 lag aw Age 5 m as lag and New Va ues E lavhana case se echan sandman cuek 01d and New Valuesquot Mayhem an 12 PuhHlvh m lnvermerliave Binmu39su39cs Recnde imn mum Variables Old and New Values ma Me New mg 0 yam mg 11 1 0 System yummy O SJsAem yummy 0 Cam HM va uelS 0 System av 9321 1mm a ma gtNew Range 29 5 1 ELSE gtn 111111151 7 1 27 0 Range LDWEST Waugh vame 0 W e W mm D mum memes ave slung v 1 O 411 11121 vames n Analvze gt Descnpuve Stausucs gt Freuumnes 02 Ana 29 m 42 Cummam Fveuuenc Pevcem Vahd Pevcem Pevcem Vahd an 222 75 5 75 5 75 5 1 an 72 24 5 24 5 1mm n mm 294 1mm 1 1mm 1 m m Cummam Fveuuenc Pevcem Vahd Pevcem Pevcem Vahd an 215 731 731 731 1 an 79 2B 9 2B 9 1mm n mm 294 1mm 1 1mm 1 04 Age in m as Cummam Fveuuenc Pevcem Vahd Pevcem Pevcem Vahd nu 22E 7B 9 7B 9 7B 9 1 DD BE 23 1 23 1 1mm n Tma 294 1mm n 1mm n Mayhem dun PuhHIIh m lntzmediztz Einslzn39slhs 20 Analyzegt Regesamgt Exnarylagsuc I Lngislic Regressinn yage 1n yeavs 51 151 mm Quendem 66 DZ Age 291a o2 laget E ayidew sed 1 93 bua Age as An 59 age m Age 5n 1g 89 age mggm DH Mew 1 1912 v Se echan Veueme m 1e 1 Ag u lazeQZ new umqu szhlvslnlhe Equamn a s E Wam g1 S19 Exp a men mm mm ma nne 1 am am 1a ageoa 7517 m 1M2 1 23m 597 3990 e 71D ND 2 28B 1 131 652 Camam 71 315 282 21 m 1 nun 271 a Vav1ab1252meveduns12p1 ageoz 39203139206 ID frameshmatestE andZC Quamle 1 2 2 4 7A 5 en 71 n 47 2E Create a swan museum hnldthe datafmm 2D F11egt Newgt mu magnum due 14 PuhHIIh m lnmmediztz mam5m Add wur amed variables by rm enumg the cnlumn huang uncanny says vaxquot N an add the am mm the mm m 20 N aw 5 Ave Ta pm the am Graphgt SumDu Sxmple Scatter gt De ne I Simple Scallelplnl X Axxs H mm K Axxs NEW eA Mavkevs by Lahe Eases by t sine aha Pane by E Temp ale Buss cm Svemhcahans ham Chck 0k magnum due 15 PubHIth 640 Intermediate Biostatistics 0000 O 0200 0400 betahat 0600 0800 I I I I I I I 200 300 400 500 600 700 800 midpoint 2F logitcases appears to be linear in age with a negative slope As age increases as evidenced by midpoint betahat 3 decreases Wksisolutionsdoc 16 Puleth 640 Intermediate Biostatistics For Minitab Users 2A Stat gt Basic Statistics gt Display Descriptive Statistics Display Descriptive Statistics 3 C1 AGE Eariahles 32 CASES iv variables optional tatistics Graphs Help QK Cancel Click Statistics button Descriptive Statistics Statistics Trimmed mean I nanmissing I SE of mean i ng l N missing I tandard deviation W Minimum 1 Ntutal l Eariance 17 Maximum 1 Qumulative N l Coefficient of variation l ange 1 Eercent 1 Cumulative percent I7 Eirst quartile I Sum of squares l7 Median J Skegness 137 Third quartile l urtosis I lnterguartile range J MSSQ Help QK Cancel Click OK Wk87solutionsdoc Puleth 640 Intermediate Biostatistics 18 Click OK Results for sgsdepressmtw Descriptive Statistics AGE Variable Minimum Maximum AGE 1800 2800 4250 5900 8900 213 Create 3 new variables to represent the 2quot 3 and 4 11 upper quartile Ex AgeQZ AgeQ3 AgeQ4 This can be accomplished by place the names at the top of columns C3 C4 and C5 Now sort the data in ascending order by age following the navigation below Datagt Sort C1 AGE ort columns E3 AGE CASES C4 A9803 C5 A9804 y column AGE 1quot Descending By column I l quot 1 By golumn l i Bycojumn vw xi Store sorted data in I Neg worksheet 5 01iginalcoumns 1 Columns of current worksheet Help QK Cancel Click OK The new indicator variables should be set up as shown in the following table Quartile AgeQZ AgeQ3 AgeQ4 Age Range 15t 0 0 0 1828 2quotd 1 0 0 29 42 3 d 0 1 0 43 59 4 0 0 1 6089 Wksisolutionsdoc FuhHth 64D lntermediam Eustau39m39cs 19 Unfurmnately 11mm du es nut have amenu uphun tn pupulate the data as required 5 yuu Win need In enterthese by hand Huwsver drag suing and cupy and paste techniques Lhatwurk in Excel 215 wurk m anab AGE AgeQZ AgeQK AgeQ4 18 n n n 29 1 n n 43 n 1 n n n n 1 zc lugtFCASESl pquot mageQz zzgeQK zzgeQA Sm gt Rzgzssinn gt nary Lngjs c Riggssinn ainaiy Lngisliv Regiessinn c1 AGE 6 Respnnse icms Fveguenw 1 C2 CASES nplinnzl c3 A9202 CA 1191203 6 Success i iv 1 cs A9204 r Sugcess 1 y r F me 17 Ageoz Ageoa A9204 Exams nplinnzl guphs i 1 Besulls Help chek OK Oglinns gauge Cancel ELL M Binary Logistic Regression CASES versus Agemz Agema AgeQA Llnk Funcclun Lug Response Informaclun Varlahle Veiue Count c1132 1 SD isvenci u 244 Total 294 mayhem ans Puleth 640 Intermediate Biostatistics 20 Logistic Regression Table Odds 95 C1 Predictor Coef SE Coef Z P Ratio Lower Upper Constant 130495 0281867 463 0000 AgeQ2 00300523 0404549 007 0941 097 044 214 AgeQ3 0516664 0430191 120 0230 060 026 139 AgeQ4 0709954 0470229 151 0131 049 020 124 LogeLikelihood 7132234 Test that all slopes are zero G 3656 DF 3 P7Va1ue 0301 gtgt Remarks I CASESl is modeled as the event by default a good thing for us I None of the covariates are significant at the alpha005 level I We failed to reject the global hypothesis that all slopes 0 thus it is reasonable to assume there is no significant difference among the quartiles of age with respect to their association with the occurrence of depression 2D from estimates in 2B and 2C Quartile l 2 3 4 2E Create a separate dataset to hold the data from 2D File gt New Worksheet Place the names of the new variables at the top of the columns C1 C2 C3 Then enter the data from 2D by hand Graph gt Scatterplot gt With Regression Wksisolutionsdoc PuhHIIh m Scallelplnl 7 Wm Regmssinn lntzmediztz Einslz slhs m New gnaameex 3 Wm Lm dvmm gt geawe Lahe s DaAanew Mama s mam Dgta Hanan BK Ban22 Chck OK Scallerplol of belahal vs midpoint u u o c an 1 an 2 U an 3 m 2 an 4 m a an 5 an 5 an 7 an a 2m 3m D 5D 5D 7n ED midpuinl 21m Jug Leases decreases aPpEars ca he llnear m age Inch a aesame sl pe as evldenced by mldpulnm hecahat m On average as age 21 Wsaam dnc
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'