### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 273 Class Note for STAT 51200 with Professor Jennings at Purdue

### View Full Document

## 54

## 0

## Popular in Course

## Popular in Department

This 9 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 54 views.

## Similar to Course at Purdue

## Reviews for 273 Class Note for STAT 51200 with Professor Jennings at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Statistics 512 Marfan Syndrome Case Study Background According to the Medline Plus Medical Encyclopedia7 Marfan syndrome is an inheritable disorder of connective tissue which adds strength to the body7s structures that affects the skeletal system7 cardiovascular system7 eyes7 and skin7 The most obvious effects of Marfan syndrome are skeletal defects typically recognized in a tall7 lanky person with long limbs and spider like ngers7 chest abnormalities7 curvature of the spine7 and a particular set of facial features7 including a highly arched palate and crowded teeth77 Abraham Lincoln was thought to have Marfan syndrome In 19967 the prevalence of Marfan syndrome was estimated to be 1 in 3000 to 5000 people A common complication of Marfan syndrome is aortic rupture7 a result of the patients aorta growing too large with respect to the rest of the body Ultimately7 the life expectancy of an untreated Marfan patient is around 40 years With treatment7 the life expectancy approaches population norms An important tool for diagnosing Marfan syndrome7 consid ered in several studies in the late 19907s7 uses the relationship between aortic root diameter ADD7 as measured by an echocardiograph and approximate body surface area BSA The following analysis is based on data compiled by Dr Paul Pitlick at the Stanford De partment of Pediatrics Choosing the Model An initial plot of the data with a smoother Figure 1 seems to indicate that while there is some curvature in the data7 a linear t might not be inappropriate AoD 14259 12629BSA7 amp2 398 The t is signi cant F 1315171 lt 000017 and both coef cient estimates are signi cantly different from zero p lt 00001 in both cases The predictor explains 55 R2 05584 of the variability in the response However7 the residual plot is perhaps not satisfactorily scattered and centered around the zero line Figure 27 and the residuals were a bit too right skew to be from a normal distribution See the histogram and qqplot in Figure 3 The tests for normality as applied to the residuals all show a highly signi cant deviation from a normal distribution Previous experience had shown that a log log transformation taking the log of both the predictor and response is necessary to produce the best t lndeed7 the log transformation on the AoD values is suggested by the Box Cox procedure in SAS7 both when the predictor is the original BSA measurement and when the predictor has been log transformed The fact that the R2 jumps to 061 with the log log transformation also makes this suggestion more attractive Body Surface Area and Annie Diameter with Smoolhing Curve 4n DZDJIIOHJDEDYEJDSLDVV121116151J171Jlil asumm Figure 1 Scatterplot of aortic root diameter AOD against body surface area BSA The plot of the transformed variables Figure 4 makes the data look closer to linear The best tting line is log210D 3300463 logBSA the estimated error standard deviation is 01397 and the R2 value is 06101 Both parameter estimates are still signi cantly different from zero Note that a regression through the origin7 with this and the untransformed data7 would probably not lead to better predictions or inference ls this intuitively satisfactory At the same time7 there is some suggestion that several of the smallest logBSA ob servations may be in uencing the goodness of the curve lndeed7 after taking out the three smallest observations7 the R2 falls to 05133 However7 the resulting predictor 7 logAOD 330 0441logBSA7 6 0140 7 is barely different from the original trans formed estimator So while these points in the lower range of logBSA may have caused an overly optimistic assessment of the t7 the model was not directly in uenced7 by their pres ence The fact that the results of the inference on the parameters do not change p lt 00001 in every case further con rms that this model would probably give us the most accurate predictions and most valid inferential results An analysis of the residuals Figure 5 shows that there are no obvious patterns which would indicate deviations from linearity or from the assumption of constant variance While there is some deviation from normality see Figure 67 the large sample size n 106 en sures reasonably valid inference Body Size and Aonic Diameter Residual Plot 4n DZDJIIOHJDEDYEJDSLDVV l21ll6l51 l7l1 2 asumm Figure 2 Scatterplot of original model residuals against body surface area Inference lnference7 based on the model and parameter estimates7 should help us nd succinct answers to practically important questions Unfortunately7 practical importance can be dif cult to divine without further knowledge of the context of the research For this section7 we will try to suggest interesting questions and see how our inferential techniques can be used to provide answers Prediction Figure 7 shows 95 con dence intervals and prediction intervals for the linear model based on the transformed data The practical signi cance of these bands is not so important7 since most clinical studies are interested in the relationship between body surface area and root aortic diameter for diagnosing Marfan syndrome There is not so much interest in predicting aortic diameter which is hard to measure from the more easily calculated approximation of body surface area There is7 however7 a substantial interest in making predictions of aor tic diameter based on body surface area and previous aortic diameter measurements The methods for these models are beyond the scope of this class Still7 several aspects of the con dence interval and prediction interval regions are worth noting First off7 notice that three data points fall outside the prediction intervals This should not be too surprising7 since we would expect 005 gtlt 106 53 points to fall outside the prediction intervals The probability that three or fewer points would fall outside the prediction region assuming the coverage is exactly 95 is 0218 This probability suggests that the coverage of the prediction intervals is close to valid mummmw demmmw Figure 3 Histogram and qqplot of residuals original model In addition we might be interested in how much better are our predictions when we use BSA as a predictor The R2 of 06101 suggests that the error variance is 1 7 061 039 times the variance of the Y7s Note that this means the ratio of the standard errors is approximately V039 0628 However suppose we wish to compare the length of the con dence interval for the simple linear model at some value X to the con dence interval length for the model with no predictor The null model con dence interval 32193304 covers the linear regression con dence interval at X X 32353289 The ratio of the lengths of these intervals is the 0628 we would expect However at the extreme high end of the predictor distribution where logBSA 05247 the linear model con dence interval is 2321 times as large as the null con dence interval the difference is that the con dence interval in the linear regression case moves with different X values The price we pay for using a predictor is that the interval widens as we get away from the central X value Ultimately the prediction intervals give us a sensible notion of where the root aortic di ameter values should lie for persons with a given body surface area For instance a Marfan patient with a BSA of 1 m2 hence a logBSA value of 0 would rarely have an aortic root diameter of 15 mm 1 Calibration Of course we might instead be interested in diagnosing the sensibility77 of an aortic root diameter reading This would require inverse prediction or calculating which BSA values would likely correspond with a given AoD measurement The book suggests an inverse prediction interval for a response observation Yh of the form Y 7 b 5 red h 0 itwuian b1 b1 1The prediction would of logAOD would be 312976 with a standard error of 82Std Error Mean Predict2 0101920101372 01392 The hypothesized value of log15 is thus 41234 standard units away from the predicted value The probability that a value so far from the prediction would appear in these patients is approximately 5 X 10 5i Log Body Surface Area and Log Aorlic Diameter wi39lh Smodlhing Curve in m Ll laglESA Figure 4 Scatterplot of transformed aortic root diameter against transformed body surface area where spred is the prediction standard error with Xh For instance7 an aor tic diameter observation of 30 mm will with probability 095 come from a patient with logBSA between 03748 and 08226 or BSA between 06875 and 2276 Note how this inverse prediction interval covers a large portion of the range of the predictor distribution A geometric approximation to this interval can be seen by nding the intersection of the prediction boundaries and the horizontal line at logAOD log30 The x coordinates of these intersections are approximately where you can nd the boundaries of the inverse prediction interval Try this with the right hand graph in Figure 7 Comparison with Normal Patients The primary clinical use for this data is to compare the prediction model for Marfan patients with the same predictions for normal patients Figure 8 shows the prediction line for Mar fan patients with a 95 Working Hotelling con dence region along with the expectation line for normal patients ElogAOD 30189 04629logBSA There is an obvious difference between the lines nearly any null hypothesis test comparing the two lines would be con dently rejected On Joint Con dence Regions Of course7 the Working Hotelling band is just one example of a con dence region for the entire line Using the Bonferroni correction to form it7 a jointly 95 con dence region for the Marfan parameters 7 a rectangle with sides 32667 3329 for 60 and 038017 05450 for 61 7 does not cover the normal patient parameter values Residual Scanerplot wi39lh Transformed Variables RH unl us lunlBSM Figure 5 Scatterplot of residuals frorn nal model against log transformed body surface area If our interest were only in describing patients with a few distinct BSA values7 we might get even better results using the Bonferroni corrected or Scheffe corrected rnean prediction intervals The Working Hotelling cutoff value W for this data is 2483 This cutoff value is achieved in other words7 B tn2172 g 2 2483 when 9 gt 3 for the Bonferroni correction or when 9 2 2 for the Scheff correction This means that when we are forming a con dence region with more than 3 different Bonferroni corrected con dence intervals or more than one Scheff corrected con dence interval7 the actual joint coverage level is much greater than 95 In other words7 we would do better ie7 get more power using the Working Hotelling bands7 which7 although they are designed to provide correct coverage for a joint 10017 oz con dence region for my number of different con dence intervals7 give srnaller ie7 more powerful con dence regions than the Bonferroni or Scheff regions On the other hand7 if for the sake of argument we only ever wished to get 95 coverage jointly on 2 different rnean values7 we would get a smaller con dence region using the Bonferroni correction on each component interval Power Notice that the value of 61 in the normal data 04629 is well within the 95 con dence interval 0390647 053446 for 61 in the Marfan data Before concluding that the difference between the slopes is not signi cant7 we should consider the power of a test whose null hy pothesis is that 61 04629 Figure 9 shows the value of the power for different alternative values of 61 Only a clinical expert could decide what values would constitute practically signi cant differences7 but it seems that many small differences would be detected with high probability in this study It should not be much of a surprise that the power for detecting the difference between the normal and Marfan slope 7 a test with very large p value 7 is close to 005 wumdhmm Figure 6 Histogram and qqplot of residuals from the nal model mmsmmmmmwmmm mmmmmmmmmmlm mum J ulugn mum mum Figure 7 Con dence intervals and prediction intervals for transformed data The upshot of the indistinguishability of the normal and Marfan population slope pararneters depends on the ultimate goal ofthe researcher If the slope value from the normal population is measured exactly7 then one might be inclined to believe that that number should be used for the slope in the Marfan population However7 its not clear which set of parameter values will give the best prediction in the extreme tails of the data distribution At the same tirne7 predictions based on the two slope values that are within the range of the data would be very close Of course7 if the slope number from the normal population is based on data of a simi lar sample size with the same error variance7 then a larger7 multivariate model would be needed to address questions of differing slope parameter values Overlap and Classi cation An important aspect of the clinical usefulness of this data relies on its discrirninatory power the practitioner wants to know if these two measurements BSA and AoD can be used to discriminate between healthy and Marfan patients While we dont cover problems of clas Working Hotelling con dence region and the line for normal petiems mum mum Figure 8 The dotted expectation line for normal patients does not fall within the 95 con dence region for Marfan patients si cation in this class remember7 the response follows a binomial 01 distribution in that case7 we can arrive at some preliminary insights about the best classi cation rule for these data Given that the relationship between BSA and ADD in both populations 7 as well as the boundaries of the prediction intervals 7 are all lines with nearly the same slope7 we would expect that the classi cation rule will also be a line In other words7 there is probably some slopeintercept combination 6061 such that for a patient with a given BSAADD reading Xth7 if logYh 7 60 61 logXh gt 07 we can be reasonably certain that the patient has Marfan syndrome As it is7 the intercepts between the two models differ by 02788 log mm units The standard error of prediction is approximately half this value over the entire range of the logBSA values If we assume the slopes are equal7 putting a line midway between the lines for the two population would mean that approximately 16 of patients the probability of going past one standard deviation in the t distribution in each population would be misclassi ed ls this a practically useful misclassi cation rate Conclusion Think about this example carefully Are there additional questions that our inferential methods could answer Are there practically important questions that cannot be answered with our methods so far Are there further practically important conclusions that we could arrive at using the results of this analysis Finally7 think about how changing different values Power Calculalion for Null Hypothesis of No Slope Di erenoe an at n2 III M H5 05 n7 an n H mm mwlnlinn slap Figure 9 Power curve for the detecting deviations of the slope for the Marfan population from the slope for the normal population 7 eg7 the design7 the 04 level7 the sample size 7 would affect the results of this analysis What would you postulate would be the effects on the Classi cation rule

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.