### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Intrmd Biostatistics PUBHLTH 640

UMass

GPA 3.88

### View Full Document

## 68

## 0

## Popular in Course

## Popular in Public Health

This 38 page Class Notes was uploaded by Agustin Bechtelar on Friday October 30, 2015. The Class Notes belongs to PUBHLTH 640 at University of Massachusetts taught by Staff in Fall. Since its upload, it has received 68 views. For similar materials see /class/232294/pubhlth-640-university-of-massachusetts in Public Health at University of Massachusetts.

## Similar to PUBHLTH 640 at UMass

## Popular in Public Health

## Reviews for Intrmd Biostatistics

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/30/15

Puleth 640 Intermediate Biostatistics Unit 6 Introduction to Survival Analysis Practice Problems SOLUTIONS The following are some hypothetical data on two groups smokers and nonsmokers in a study that investigated survival days following a root canal Group DaysX Status at Last Follow up C smokers 4 alive smokers 7 dead smokers 8 alive nonsmoker 29 alive smokers 29 dead smokers 31 alive nonsmoker 40 dead smokers 65 dead nonsmoker 69 dead nonsmoker 78 alive nonsmoker 79 alive nonsmoker 106 dead smokers 107 alive nonsmoker 129 dead smokers 130 alive smokers 140 alive smokers 142 alive smokers 149 dead smokers 158 alive smokers 160 dead nonsmoker 161 dead smokers 162 alive smokers 187 dead smokers 188 alive nonsmoker 197 dead nonsmoker 204 alive nonsmoker 208 alive smokers 221 dead nonsmoker 228 dead nonsmoker 231 alive 1 Use the KaplanMeier method of estimating the separate survival functions for smokers and nonsmokers You may do this using whatever software you like but include in your answer a HAND calculation of the KaplanMeier estimates This could look the table on page 23 of your lecture notes for unit 6 Introduction to Survival Analysis Wk97solutionsdoc Puleth 640 Intermediate Biostatistics event t start at 0 Wk97solutionsdoc Puleth 640 Intermediate Biostatistics Wk97solutionsdoc Puleth 640 Intermediate Biostatistics SAS Solution data temp input smoking days status car s39 smokers 4 alive smokers 7 dead quot Rows omitted m nonsmoke 231 alive run Create indicators as needed and formats for readability proc format value groupf O quotOnonsmokerquot l quotlsmokerquot value censorf O quotOalivequot l quotldeadquot run data temp set temp group if smokingquotsmokersquot then groupl else if smokingquotnonsmokequot then group0 format group groupf censor if statusquotalivequot then censorO if statusquotdeadquot then censorl format censor censorf run Obtain tabulation of Kaplan Meier Estimates of Survival by Group proc lifetest datatemp methodkm plotss time dayscensorOL strata group run quit Wk97solutionsdoc Puleth 640 Intermediate Biostatistics Partial listing of output 7 The LIFETEST Procedure stratum 1 group 0nonsmoker ProductLimit Survival Estimates Survival Standard Number Number days Survival Failure Error Failed Left 0000 10000 0 0 0 13 29000quot 0 12 40000 09167 00833 00798 1 11 69000 08333 01667 01076 2 10 78000quot 2 9 79000quot 2 8 106000 07292 02708 01355 3 7 129000 06250 03750 01510 4 6 161000 05208 04792 01577 5 5 197000 04167 05833 01568 6 4 204000quot 6 3 208000quot 6 2 228000 02083 07917 01669 7 1 231 000quot 7 0 The LIFETEST Procedure stratum 2 group 1smnker ProductLimit Survival Estimates Survival Standard Number Number days Survival Failure Error Failed Left 0000 10000 0 0 0 17 4 000 0 16 7000 09375 00625 00605 1 15 8 000 1 14 29000 08705 01295 00856 2 13 31 000quot 2 12 65000 07980 02020 01048 3 11 107000quot 3 10 130000quot 3 9 140000quot 3 8 142000quot 3 7 149000 06840 03160 01386 4 6 158000quot 4 5 160000 05472 04528 01651 5 4 162000quot 5 3 187000 03648 06352 01852 6 2 188000quot 6 1 221000 0 10000 0 7 0 NOTE The marked survival times are censored observations Wk97solutionsdoc Puleth 640 Intermediate Biostatistics Minitab Solution Stat gt ReliabilitySurvivalltenter days in Variables box and Group as By variablesltClick Cbnsor dialog mltClick Estimate and check KaplanMeier m Results for smokeMTW Distribution Analysis Days by Group 7 nonsmo ers Days Variable KaplaneMeier Estimates Number at Number Survival Standard 950 Normal CI Time Risk Eailed Probability Error Lower Upper 40 12 1 0916667 0079786 0760290 100000 69 11 1 0833333 0107583 0622475 100000 106 8 1 0729167 0135483 0463624 099471 129 7 1 0625000 0150952 0329140 092086 161 6 1 0520833 0157690 0211766 082990 197 5 1 0416667 0156828 0109290 072404 228 2 1 0208333 0166884 0000000 053542 Distribution Analysis Days by Group Variable Days Group KaplaneMeier Estimates Number at Number Survival Standard 950 Normal CI Time Risk Eailed Probability Error Lower Upper 7 16 1 0937500 0060515 0818892 100000 29 14 1 0870536 0085566 0702829 100000 65 12 1 0797991 0104768 0592650 100000 149 7 1 0683992 0138576 0412388 095560 160 5 1 0547194 0165110 0223585 087080 187 3 1 0364796 0185190 0001830 072776 221 1 1 0000000 0000000 0000000 000000 Wk97solutionsdoc Puleth 640 Intermediate Biostatistics 2 BY HAND perform and show your work a log rank test comparison of the estimated survival curves for smokers and nonsmokers An idea for organizing your work is the table on page 30 of your lecture notes for unit 6 Introduction to SurvivalAnalysis u at among Nt 7 d1 surviving nu at risk among nonsmokers Nt Total at risk M NtNt 1 On Eloul n1dtNt VOlt EOlt deaths deaths 2 0 E0 J 2 u It 7 602717 03088 1103 rank1df deaths 3 0644 I 2 won 39 1 Wk97solutionsdoc Intermediate Biostatistics Puleth 640 3 USING WHATEVER SOFTWARE you like perform and show your work the log rank test that you did by hand in problem 2 SAS Solution Proc lifetest gives the results of the log rank test Recall the code used proc lifetest datatemp methodkm plotss time dayscensor0 strata group run quit Test of Equality over Strata Pr gt Test ChiSquare DF ChiSquare it 08300 Wilcoxon 00461 1 1 08947 2LogLR 00175 Minitab Solution Here too the same instructions used to obtain the Kaplan meier estimates separately by group will also yield the results of the log rank test Thus Stat gt ReliabilitySurvivalltenter days in Variables box and Group as By variablesltCliok Censor dialog ltCliok Estimate and check KaplanMeier Distribution Analysis Days by Group Comparison of Survival Curves Test Statistics DF PeValue ChieSquare w lg Method gt Wilcoxon 0046096 1 0830 4 Write an expression for a Cox Proportional Hazards Model that could be explored to investigate the association of survival time following root canal with smoking status Wk97solutionsdoc Puleth 640 Intermediate Biostatistics De ne all terms A Cox PH model for the hazard of death following root canal and its association With smoking status is ht Z h0 t eXp BZ where ht Z instantaneous hazard of death at time t given survival to t39 for person With covariate Z 1100 baseline hazard of death at time t given survival to t39 Z indicator of smoking status with Zl for smokers 0 for nonsmokers 5 What assumptions must hold in order for this model to be valid on A Cox PH model for the hazard of death following root canal and its association with smoking status is ht Z h0 t eXp BZ where ht Z 7 7 instantaneous hazard of death at time t given survival to t39 for person with covariate Z 1100 baseline hazard of death at time t given survival to t39 Z indicator of smoking status with Zl for smokers 0 for nonsmokers Wk97solutionsdoc Puleth 640 Intermediate Biostatistics 6 Usin whatever software ou like t the model ou stated in roblem 4 Re ort g y y P P your output and provide annotations that explain the output SAS Solution Cox PH Model of 01 indicator of smoking status so as to get estimated hazard ratio and associated 95 CI limits run tWit Use option RISKLIMITS proc phreg datatemp model daystcensor0 grouprisklimits Plrtill listing of output 7 Variable group Analysis of Maximum Likelihood Estimates Parameter Standard Hazard DF Estimate Error ChiSquare Pr gt ChiSq Ratio 1 031746 057308 05796 1374 The PHREG Procedure This is a Wald chi square For significance of GROUP Model Information Data Set WORKTEMP Dependent Variable days Censoring Variable censor Censoring Values 0 Ties Handling BRESLOW 30 Number of Observations Read Number of Observations Used Summary of the Number of Event and Censored Values check that SAS is Percent Total Event Censored Censored identifying events correctly Testing Global Null Hypothesis BETA0 Test ChiSquare DF Pr gt ChiSq Likelihood Ratio 03084 1 05787 Log Rank Test Score 03069 1 05796 when model has only 01 Wald Wk97solutionsdoc Puleth 640 Intermediate Biostatistics The SAS System 1814 Monday April 16 2007 9 The PHREG Procedure Analysis of Maximum Likelihood Estimates Standard Hazard 95 Hazard Ratio Parameter Pr gt ChiSq Ratio Confidence Limits Variable DF Estimate Error ChiSquare group 1 P 308 03069 05796 0447 4224 031746 beta for group 1374 Estimated HR exp 031746 In this sample smokers have a nonstatistically significant p58 relative hazard of death that is 37 greater than that of nonsmokers following root canal HR 137 with 95 CI limits 045 to 422 7 Compare the t of the model you obtained for problem 6 to the results of the log rank test that you got for problem 2 and 3 A Cox PH model for the hazard of event with one 0l predictor is equivalent to the log rank test for the comparison of two groups Log Rank Test Chi Square 03088 on dfl has pValue 5784 Cox PH Model Score Test for signi cance of 01 GROUP 03088 on dfl has pValue 5784 Wk97solutionsdoc Puleth 640 Intermediate Biostatistics 12 For Stata V 10 Users Create a stata data set set more off Use data editor to create a data set label define groupf O quotOnonsmokerquot l quotlsmokerquot label define censorf O quotOalivequot l quotldeadquot label values smoking groupf label values status censorf Check that data creation is correct using command LIST rrrrrrrrrrrrrrrrrrrrrrrrrrrr 77 smoking days status l l l 28 lsmoker 22l ldead 29 Ononsmoker 228 ldead 30 Ononsmoker 231 Oalive iiiiiiiiiiiiiiiiiiiiiiiiiiii quoti save data as 6407week972009dta Wk97solutionsdoc Puleth 640 Intermediate Biostatistics Stata requires that you declare survival data as such using the command STS stset TIMEVARIABLE failureEVENTVARIABLEFAILUREVALUE stset daysfailurestatus failure event status obs time interval 0 days exit on or before failure 30 total obs 0 exclusions 30 obs remaining representing 14 failures in single recordsingle failure data 3647 total analysis time at risk at risk from t 0 earliest observed entry t 0 last observed exit t 231 Solution to 1 Kaplan Meier curves separately for smokers and non smokers sts lilst byGROUPVARIABLE sort smoking sts list bysmoking ilure 7d status 1 analysis time it days Beg Net Survivor Std Time Total Fail Lost Function Error 95 Conf Int Ononsmoker 29 13 0 1 10000 40 12 1 0 09167 00798 05390 09878 69 11 1 0 08333 01076 04817 09555 78 10 0 1 08333 01076 04817 09555 79 9 0 1 08333 01076 04817 09555 106 8 1 0 07292 01355 03677 09051 129 7 1 0 06250 01510 02762 08423 161 6 1 0 05208 01577 01979 07690 197 5 1 0 04167 01568 01309 06859 204 4 0 1 04167 01568 01309 06859 208 3 0 1 04167 01568 01309 06859 228 2 1 0 02083 01669 00140 05618 231 1 0 1 02083 01669 00140 05618 1smoker 4 17 0 1 10000 7 16 1 0 09375 00605 0 6323 09910 8 15 0 1 09375 00605 0 6323 09910 29 14 1 0 08705 00856 0 5733 09660 31 13 0 1 08705 00856 0 5733 09660 65 12 1 0 07980 01048 0 4937 09304 107 11 0 1 07980 01048 0 4937 09304 130 10 0 1 07980 01048 0 4937 09304 140 9 0 1 07980 01048 0 4937 09304 142 8 0 1 07980 01048 0 4937 09304 149 7 1 0 06840 01386 0 3394 08750 158 6 0 1 06840 01386 0 3394 08750 160 5 1 0 05472 01651 0 2003 07976 162 4 0 1 05472 01651 0 2003 07976 187 3 1 0 03648 01852 0 0669 06866 188 2 0 1 03648 01852 0 0669 06866 221 1 1 0 00000 Wk97solutionsdoc Puleth 640 Intermediate Biostatistics Solution to 2 Log rank test comparing survival across groups sts test GROUPVARIABLE logrank sts test smoking logrank u e 7d status ai r analysis time it day Logirank test for equality of survivor functions l Events Events smoking l observed expected 0nonsmoker l 7 797 lsmoker l 7 603 Total l l4 14 00 ch12 l 031 Prgtch12 05784 Plot of Kaplan Meier curves sts graph byGROUPVARIABLE sts graph bysmoking Estimated Survival After Root Canal Kaplan Meier Smokers versus NonSmokers Alive 025 050 075 100 000 0 50 100 150 200 230 Days smoking Ononsmoker smoking lsmollter Nate 7 I did sane edits to the grsph using the graph editor so as to abtsin the titles and labels Wk97solutionsdoc Puleth 640 Intermediate Biostatistics Solution to 6 Fit of Cox PH model Note Separate commands are required to obtain betas V hazard ratios 1 To obtain hazard ratios sts cox GROUPVARIABLE 2 To obtain betas requires specifying no hazard ratios sts cox GROUPVARIABLE nohr stcox smoking failure 7d status analysis time it days Iteration 0 log likelihood 34773766 Iteration 1 log likelihood 346l9583 Iteration log likelihood 346l9583 Refining estimates Iteration 0 log likelihood 346l9583 Cox regression W no ties No of subjects 30 Number of obs 30 No of failures 14 Time at risk 3647 LR chi2l 031 Log likelihood 34619583 Prob gt chi2 05787 it 1 Haz Ratio Std Err z Pgtlzl 95 Conf Interval smoking 1 1373438 7870813 055 0580 4466919 4222895 stcox smoking nohr failure 7d status 1 analysis time it days Iteration 0 log likelihood 34773766 Iteration 1 log likelihood 346l9583 Iteration 2 log likelihood 346l9583 Refining estimates Iteration 0 log likelihood 346l9583 Cox regression W no ties No of subjects 30 Number of obs 30 No of failures 14 Time at risk 3647 LR chi2l 031 Log likelihood 34619583 Prob gt chi2 05787 it Coef Std Err z Pgtlzl 95 Conf Interval smoking 1 3173174 5730736 055 0580 8058862 1440521 Wk97solutionsdoc Puleth 640 Intermediate Biostatistics Unit 5 Logistic Regression Practice Problems Week 8 SOLUTIONS 1 Source Kleinbaum Kupper Miller ansz39zam Applied Regression Analysis and OtherMultivariable Methods Third Edition Paci c Grove DuxburyPress 1998 p 683 problem 2 A five year followup study on 600 disease free subjects was carried out to assess the effect of 01 exposure E on the development or not of a certain disease The variables AGE continuous and obesity status OBS the latter a 01 variable were determined at the start of the followup and were to be considered as control variables in analyzing the data A State the logit form of a logistic regression model that assesses the effect of the 0 1 exposure variable E controlling for the confounding effects of AGE and OBS and the interaction effects of AGE with E and OBS with E S olution logit7r so 31 32AGE 33OBS 34AGEE 35OBSE I used the following notation TE Probability disease AGEE E E This is a created variable that is the interaction of AGE with E OBSE OBS E Similarly this is the interaction of OBS with E logit7 30 53 BZAGE 33OBS 34AGEE 35OBSE Wksisolutionsdoc Puleth 640 Intermediate Biostatistics B Given the model you have for part A give a formula for the odds ratio for the exposuredisease relationship that controls for the confounding and interactive effects of AGE and OBS S olution The solution here follows the ideas on pp 911 in Lecture Notes 5 Logistic Regression Value of Predictor for Person who is Predictor Exposed Not Exposed E l 0 AGE AGE1 AGEo OBS OBS1 OBSo AGEE AGE1 0 OBSE OBS1 0 Then OR exp logit7 for exposed person logit7 for NON exposed person w eXP i Bo B1 l B2AGE1 l B3OBS1 B4AGE1 B5OBS1 39 B0 B2AGE0 B3OBS0 eXP 31 l 32AGE139AGE0 33OBS1 39OBSO l 34AGE1 35OBS1 C Now use the formula that you have for part B to write an expression for the estimated odds ratio for the exposuredisease relationship that considers both confounding and interaction when AGE40 and OBSl S olution O exp B1 40B4 B5 Value of Predictor for Person who is Predictor Exposed Not Exposed E 1 0 AGE 40 40 OBS l l AGEE 40 0 OBSE l 0 OReXP B1 32403940 331391 3440 351 eXP B1 3440 351 Wksisolutionsdoc Puleth 640 Intermediate Biostatistics 2 OPTIONAL This problem asks you to try your hand at one regression diagnostic checking linearity 0n the logit scale The data source for problem 2 is the depression data set that you worked with during week 7 Source A fi A Clark VA and May S Com Quter Aided Multivariate Analysis Fourth Edition Boca Raton Chapman and Hall 2004 Recall that these data are a study of depression and was a longitudinal study The purpose of the study was to explore the correlates of occurrence of depression with respect to several types of variables demographics life events stressors physical health health services utilization medication use lifestyle and social support Again the data are available in SAS and STATA formats here and from the course website httpwwwuniXoitumassedubiep640w Consider the following two variables Codings Format in SAS Label in STATA 0 Normal 1 Case of r 39 CASES CASES I Continuous years I I A Using the software of your choice obtain the values of the quartiles of age S olution P25 280 P50 425 P75 590 Wksisolutionsdoc Puleth 640 Intermediate Biostatistics B Using these quartile values create a set of three design variables that are indicators of the 2quotd 25 3rd 25 and upper 25 of the values of age Produce a frequency table distribution of each of these three indicator variables Hint Design variables also called indicator or dummy variables were introduced in Topic 2 Regression and Correlation See pp 5153 iiagez l Freq Percent Cum 0 l 222 7551 7551 1 l 72 2449 10000 Total l 294 10000 ifageB l Freq Percent Cum 0 l 215 7313 7313 1 l 79 2687 10000 Total l 294 10000 iiage4 l Freq Percent Cum 0 l 226 7687 7687 1 l 68 2313 10000 Total l 294 10000 C Fit a logistic regression model for the outcome CASES that includes as predictors the design variables you produced in part B Logistic regression Number of obs 294 LR chi23 366 b gt ch12 03010 Log likelihood i13223409 Pseudo R2 00136 cases l Coef Std Err z Pgtlzl 95 Conf Interval i age2 l 0300523 4045487 007 0941 8229533 7628486 1 age3 l 5166637 4301908 120 0230 1359822 3264948 iiage4 l 7099543 4702295 151 0131 1631587 2116785 cons l 1304949 2818673 463 0000 1857398 752499 Wksisolutionsdoc Puleth 640 Intermediate Biostatistics 5 D Fill in the following table Notice that the estimated regression coef cient for the referent group the lowest quartile ofAGE is 0 Midpoint E Construct a plot of XMidpoint versus Y If you re feeling ambitious construct a similar lot with accoman in 95 con dence limits PubHIth 640 2009 Week 8 Solutions Assessment of Log Linearity 4o so midpoint Smrce week87640png F From the plot you constructed in part E what do you conclude about the linearity of logit CASES in AGE S olution Logit CASES is marginally linear in AGE Wksisolutionsdoc Puleth 640 Intermediate Biostatistics For stata v 10 users Note Commands to stata a FILE gt LOG gt BE Be sure to chos Command SET MOR set more off Use se quot comments begin with asterisk and are in blue re in bold black GIN will start a recording a log of your session e the extension log so that you can cut and paste into a word document later E OE E prevents screen by screen pausing of results SOLUTION for U the drop down menu FILE gt OPEN will read in the data set depressdta quot i 1L39 1 1 in A b t A dtaquot if A 2 O tain the Quartiles of AGE Se command CENTILE Wlth option c255075 to obtain the let 2nd 3rd quartiles generate agequart tabulate agequart label variable agequart quotAGEQUART Quartile of centile age c255075 Binom I te Variable 1 Obs Percentile Centile 95 Conf Interval age 1 2 94 25 2 6 31 1 50 42 5 38 47 1 75 57 61 Create a new Variable AGEQUART that has Value equal to quartile 1agelt28 2agegt28 amp agelt425 3agegt425 amp agelt59 4agegt59 if age Agequot SOLUTION for 23 Create three des AGEQUART 1 Quartile of 1 Age 1 Freq Percent Cum 1 1 75 2551 2551 2 1 72 2449 5000 3 1 79 2687 7687 4 1 68 2313 10000 Total 1 294 10000 ign variables Produce a frequency table of each i age2 1 Freq Percent Cum 0 1 222 7551 7551 1 1 72 2449 10000 Total 1 294 10000 tabulate i age3 i age3 1 Freq Percent Cum 0 1 215 7313 7313 1 1 79 2687 10000 Total 1 294 10000 Wksisolutionsdoc Puleth 640 Intermediate Biostatistics tabulate iiage4 i age4 1 Freq Percent Cum 0 1 226 7687 7687 1 1 68 2313 10000 Total 1 294 10000 SOLUTION for ZC t a logistic regression mo del using the command LOGIT and the 3 design variables just created logit cases iiagez ifageB iiage4 Logistic regression Number of obs 294 LR chi23 366 Prob gt chi2 03010 Log likelihood i13223409 Pseudo R2 00136 cases 1 Coef Std Err z Pgt1Z1 95 Conf Interval i age2 1 0300523 4045487 007 0941 58229533 7628486 i age3 1 55166637 4301908 120 0230 1359822 3264948 iiage4 1 57099543 4702295 151 0131 1631587 2116785 cons 1 1304949 2818673 463 0000 51857398 752499 SOLUTION for 2D Fil in the table With midpoint of age and betas and se39s Use command SORT to sort data by quartile of age sort agequart use command CENTILE With option C50 preceded by BY command to obtain midpoints by agequart Centile age C50 rgt agequart 1 Binom Interp W Variable 1 Obs Percentile Centile 95 Conf Interval age 1 75 50 23 22 24 7gt agequart 2 quot Binom Interp quot Variable 1 Obs Percentile Centile 95 Conf Interval age 1 72 50 34 33 36 7gt agequart 3 Binom I terp Variable 1 Obs Percentile Centile 95 Conf Interval age 1 79 50 51 50 54 7gt agequart 4 quot Binom Inter Variable 1 Obs Percentile Centile 95 Conf Interval age 1 68 50 68 6538909 71 Wksisolutionsdoc Puleth 640 Intermediate Biostatistics SOLUIION for 215 NOTE The solution to 2E requires saving depressdta clearing the work space creating a new little data set Wlth the points that you want to plot saVing it and using it Use the drop down menu EILE gt SAVE As to save your enhancements to your data Before creating a new data set use the command CLEAR clear Use the drop down menu DATA gt DATA EDITOR to create a new data set When you re done entering data exit the data editor Use the drop down menu FILE gt SAVE AS tO save your new data as WEEK8PLOTdta save quotUserscarolbigelowDesktopweek8plotdtaquot file UserscarolbigelowDesktopweekSplotdta saved Use the drop down menu EILE gt OPEN to actually use your new data set use quotUserscarolbigelowDesktopweekBplotdtaquot Use command LIST to check creation of data set list 1 midpoint betahat sebeta 1 iiiiiiiiiiiiiiiiiiiiiiiiiii 1 1 1 23 0 0 1 2 1 34 n 0301 4045 1 3 1 51 n 5167 4302 1 4 1 68 n 71 4702 1 i i Use command GRAPH TWOWAY CONNECTED followed by Y Variable first then X Variable graph twoway connected betahat midpoint NOTE Once the basic graph has been created for you you can click on the graph editor it s an icon that looks like a histogram and try your hand at creating titles etc Mien you are done he sure to save your graph by clicking in the icon that looks like a disk A good choice for the extension is png Wksisolutionsdoc Puleth 640 Intermediate Biostatistics For S S users BE640 Intermediate Biostatistics 2009 Week 8 Logistic Regression options nocenter nodate libname in zbigelowteachingweb640data sets Use Depression Dataset from Afifi Clark and May data tempkeepage cases set indepress run proc freq datatemp s cases Homework Ouestion 2A Obtain the quartiles of age proc univariate datatemp 9 run Homework Question 25 Create 1 a set of three indicator variables that are indicators of the n rd and 4th 2590 of values of age and 2 an ordinal variable with values 1 2 3 4 for quartile of e data temp set temp initialize indicators to missing Iage2 Iage3 Iage4 Use logical operators to define 01 indicators If agegtZ then 0 Iage2280 lt age le 425 Iage3425 lt age le 590 Iage4590 lt age Create ordinal variable with values 1 to 4 agegr p1 1Iage2 2Iage3 3Iage4 CHECK proc sort datatemp by agegrp un proc univariate datatemp by agegrp var age n proc freq datatemp Wksisolutionsdoc Puleth 640 Intermediate Biostatistics by agegrp tables Iage2 Iage3 Iage4 run Homework Question 25 Produce a frequency distribution of the three indicator variables proc freq datatemp s Iage2 Iage3 Iage4 run Homework Question 20 Fit a logistic regression model of outcome ASES with predictors the three design variables Use option DESCENDING so that event modeled is CASES1 proc logistic datatemp descending casesIage2 Iage3 Iage run Homework Ouestion 2D Obtain midpoint of age in each quartile for plotting later note data is already sorted by agegrp proc means datatemp MEDIAN by agegrp var age run Data set for plotting data temp2 input midage beta sebeta upp beta 196sebeta low beta 196sebeta cards 23 0 0 34 00301 04045 51 05167 04302 68 07100 04702 run Homework Ouestion 2E Plot of X midpoint of age versus Wksisolutionsdoc Puleth 640 Intermediate Biostatistics Y estimated coefficient from logistic model goptions resetsymbol symbol1 vsquar e colorblue ij pr oc gplot datatemp2 p ot betamidage title Estimated beta by Midpoint of Age r un Homework Ouestion 2E Similar plot with 9590 confidence limits goptions resetsymbol symbol1 vsquar e color r ed ij symb012 vsquar e colorblue i symbols vsquar e color r ed 1 pr oc gplot datatemp2 plot uppmidage betamidage lowmidageover lay title Estimated beta by Midpoint of Age r un quit Estimated beta by Micpoint of Age Estimated beta by Mlcbomt of Age its 7 i 7 K I a 2 Ain a i gt i 7 Mg rr e 7 7 a 7 Mi 3 i r 4 1 4 I4 eek8701jpg Wee68702jpg Wk87solutionsdoc PuhHlvh m lnvermerliave Binmu39su39cs 12 Fur SPSS users 2A Anal and New Va ues E mm case se echan sandman Chnk 01d and New Valuesquot Magnum dun PuhHlvh m lnvermerliave Binmu39su39cs Recnde imn mum Variables Old and New Values O Syslem mlssmg O SJslem mlssmg 0 Cam ald valuelsl O Syslem avgsev mlssmg G Range 129 7 1 27 0 Range LDWEST Waugh value 0 Rang Value Waugh HlaHEsl D Dulpul vanaljleS ave slung 0 All glhev values n Analvze gt Descnpuve Staumcs gt Freuumnes 02 Ana 29 m 62 Cumulalwe Pevcenl 75 5 Fveuen Pevcenl Valld Pevcenl Pevcenl DU 215 73 1 73 1 73 1 DD 79 2B 9 2B 9 1EE El 294 1EE El 1EE El m Age in m as Cumulalwe Fveuen Pevcenl Valld Pevcenl Pevcenl 22E 7B 9 7B 9 1 7B 9 1 DD BE 23 1 23 1 mu m Tulal 294 mu H mu m Magnum dun PuhHIIh m lnmmediztz mam5m zc Analyzegt Regesamgt Emuylagsuc I Lngislic Regmssinn yagemyeavsal aslhw E Dendem 3mm DH mew Eme v Se emvgmme aw w M u 152212 Mng Mam szhlvslnlheEqudlnn I I a SE Wam m Sm EXpE I ma I am am o i g i a m 25 cm a separate museum mam datafmm 20 Fdegt Newgt mu magnum dnc PuhHIIh m lnmmediztz mam5m NawSave Ta pm the am Graphgt SumDu Sxmple Scatter gt De ne amp sine aha gm Mamva E Temp ale Buss cm Svemhcahans ham Iple Scallelplnl v Am 7 K E 9 am x Am E Wm B Lahe Eases by Ngtwaml gthcvwm w my u Chck 0k magnum dnc Puleth 640 Intermediate Biostatistics 0000 O 0200 0400 betahat 0600 0800 I I I I I I 200 300 400 500 500 700 300 midpoint 2F logitcases appears to be linear in age with a negative slope As age increases as evidenced by midpoint betahat 3 decreases Wk87solutionsdoc Puleth 640 For Minitab Users Intermediate Biostatistics C1 AGE C2 CASES f2 Help 2A Stat gt Basic Statistics gt Display Descriptive Statistics Display Descriptive Statistics Eariahles iv variables optional tatistics Graphs Click Statistics button I SE of mean F tandard deviation l Eariance Coefficient of variation l7 Eirst quartile 17 Median 17 Third quartile I lnterguartile range Help Descriptive Statistics Statistics 1 Trimmed mean i ng I5 Minimum 17 Maximum ange I Sum of squares l Ske ness l urtosis J MSSQ 1 nanmissing l N missing I N total I Qumulative N l Eercent 1 Cumulative percent Click OK Wk87solutionsdoc Puleth 640 Intermediate Biostatistics 18 Click OK Results for sgsdepressmtw Descriptive Statistics AGE Maximum Variable Minimum C AGE 5 9 0 0 89 0 0 l 1800 2800 4250 23 Create 3 new variables to represent the 2quot 3 and 4 11 upper quartile Ex AgeQZ AgeQ3 AgeQ4 This can be accomplished by place the names at the top of columns C3 C4 and C5 Now sort the data in ascending order by age following the navigation below Datagt Sort Cl AGE ort columns ES AGE CASES C4 A9803 C5 A9804 y column AGE I39 Descending Ely column I 1 1 By column I 1 By column l 39 Store sorted data in Iquot Neg worksheet 539 01iginalcoumns 139quot Columns of current worksheet U l Help Cancel Click OK The new indicator variables should be set up as shown in the following table Qpartile AgeQZ AgeQ3 AgeQ4 A e ange 1s 0 0 1 2 2quotd 1 0 0 2 41 3 d 0 1 0 4 5 4 0 0 1 6 9 Wksisolutionsdoc FuhHth 64D lntermediam Eustau39m39cs 19 1w mmi H Pr AGE AgeQZ AgeQK AgeQ4 18 n n n 29 1 n n 43 n 1 n n n n 1 zc lugtFCASESl pquot megeQz zzgeQK zzgeQA Sm gt Rzgzssinn gt nary Lngjs c Rigessinn Binary Lngisliv Regmssinn c1 AGE 6 Respnnse CASES Fleguenw 32 CASES nplinnzl c3 Ageoz c4 Ageoa success i r c5 A9204 r Sugcess r F me M nde Ageoz Ageoa A9204 Exams nplinnzl guphs Oglinns r 11 Besulls gauge Help Cancel Chck OK Binary Logistic Regression CASES versus Agemz Agema AgeQA 11m Funcclun Lug Response Informaclun Varlahle Value Count ASES 1 SD invenci 44 u 2 Total 294 mayhem ans Puleth 640 Intermediate Biostatistics 20 Logistic Regression Table Odds Predictor Coef SE Coef Z P Ratio Lower Upper Constant 130495 0281867 463 0000 AgeQ2 00300523 0404549 007 0941 097 044 214 AgeQ3 0516664 0430191 120 0230 060 026 139 AgeQ4 0709954 0470229 151 0131 049 020 124 LogeLikelihood 7132234 Test that all slopes are zero G 3656 DF 3 P7Va1ue 0301 gtgt Remarks I CASESl is modeled as the event by default a good thing for us I None of the covariates are significant at the alpha005 level I We failed to reject the global hypothesis that all slopes 0 thus it is reasonable to assume there is no significant difference among the quartiles of age with respect to their association with the occurrence of depression 2D from estimates in 2B and 2C 2E Create a separate dataset to hold the data from 2D File gt New Worksheet Place the names of the new variables at the top of the columns C1 C2 C3 Then enter the data from 2D by hand Graph gt Scatterplot gt With Regression Wksisolutionsdoc PuhHIIh m lnmmediztz Einslz slhs 21 Scallelplnl 7 Wm Regmssinn mum VJYvanahle x amasm maaam helahal 3 s helahal caxa Lahe s Qaa Vxew Mum s mam Dgla mum EK Ban22 Chck OK Scallerplol of belahal vs midpoint betahat 5D midpuinl a negaclve sl pe On average as age 2m Jaaauaaaaay BPPEBIS no he llnear n aaa Inch as evldenced by mldpulnm hecaha m decreases Waaam sac Puleth 640 Intermediate Biostatistics Unit 6 Introduction to Survival Analysis Practice Problems Optional The following are some hypothetical data on two groups smokers and nonsmokers in a study that investigated survival days following a root canal Group DaysX Status at Last Follow up C smokers 4 alive smokers 7 dead smokers 8 alive nonsmoker 29 alive smokers 29 dead smokers 31 alive nonsmoker 40 dead smokers 65 dead nonsmoker 69 dead nonsmoker 78 alive nonsmoker 79 alive nonsmoker 106 dead smokers 107 alive nonsmoker 129 dead smokers 130 alive smokers 140 alive smokers 142 alive smokers 149 dead smokers 158 alive smokers 160 dead nonsmoker 161 dead smokers 162 alive smokers 187 dead smokers 188 alive nonsmoker 197 dead nonsmoker 204 alive nonsmoker 208 alive smokers 221 dead nonsmoker 228 dead nonsmoker 231 alive 1 Use the KaplanMeier method of estimating the separate survival functions for smokers and nonsmokers You may do this using whatever software you like However it might help your understanding to produce a HAND calculation of the KaplanMeier estimates This could look like the table on page 23 of your lecture notes for unit 6 Introduction to Survival Analysis 2 Try doing BY HAND a log rank test comparison of the estimated survival curves for smokers and nonsmokers An idea for organizing your work is the table on page 30 of your lecture notes for unit 6 Introduction to SurvivalAnalysis WkQJJracticedoc Puleth 640 Intermediate Biostatistics 3 USING WHATEVER SOFTWARE you like try performing and show your work the log rank test that you did by hand in problem 2 4 Write an expression for a Cox Proportional Hazards Model that could be explored to investigate the association of survival time following root canal with smoking status De ne all terms 5 What assumptions must hold in order for this model to be valid 6 Usin whatever software ou like t the model ou stated in roblem 4 Re ort g y y P P your output and provide annotations that explain the output 7 Compare the t of the model you obtained for problem 6 to the results of the log rank test that you got for problem 2 and 3 WkQJJracticedoc

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.