Popular in Course
Popular in Statistics
This 114 page Class Notes was uploaded by Orval Funk on Monday September 28, 2015. The Class Notes belongs to STAT102 at University of Pennsylvania taught by Staff in Fall. Since its upload, it has received 53 views. For similar materials see /class/215434/stat102-university-of-pennsylvania in Statistics at University of Pennsylvania.
Reviews for INTROBUSINESSSTAT
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/28/15
Statistics 102 Categorical Predictors Spring 2000 I Categorical Variables Part 1 Project Analysis for Today First multiple regression Add predictors to the initial model with outliers held out and interpret the coefficients in the multiple regression Some of these new predictors eg location are categorical and require the methods of today s class Review Collinearity in Multiple Regression What is collinearity Also known as multicollinearity Collinearity is correlation among the predictors in a regression As such collinearity does not Violate an assumption in regression and is in fact a typical feature of most regression models What does collinearity do in regression Consequences Complicates interpretation making it hard to separate the predictors ln ates the SE s of the estimated coefficients How can I tell if collinearity is present Graphically Scatterplots help but leverage plots are better Multiple simple regression Views of one multiple regression Essential for identifying leverage points in multiple regression Do I like the shown simple regression model Tests Big F ratio small tratio Diagnostic Variance in ation factors VIF cs 1 SEslope estimate for Xj z E SDAdjusted Xj 0 lVIFj V3 SDXj lVle SE if no collinearity What do I do about collinearity Nothing Collinearity complicates our ability to interpret but in sample prediction remains OK in the presence of collinearity Reformulate predictors Identify distinct concepts Get rid of one of the offenders Diagnostics Vif help you decide which Summary discussion on page 147 of the casebook Statistics 102 Categorical Predictors Spring 2000 2 Example of Multiple Regression Automobile design Car89jmp page 109 What is the predicted mileage for a 4000 lb design and what characteristics of the design are crucial How much does my 200 pound brother owe me for gas for carrying him 3000 miles to California Oops it s urban mileage in example Initial onepredictor model Transform response to gallons per 1000 mile scale 11 200 lbs for 3000 miles z 82 gals RMSE 423 p 111 Skewness in residuals from regression with Weight p 112 Predicted consumption 4000 lbs 639 using JMP Add variable for Horsepower p 1 17 R2 increases from 77 to 84 added variable is significant t721 Predictors are correlated higher SE for Weight plot on p 120 11 200 lbs for 3000 miles z 53 gals RMSE drops to 350 Residuals evidently more normally distributed Predicted consumption 40001bs 200 HP 65 0 579 721 Next steps for this model What other factors are important for the design How small can we make the RMSE Statistics 102 Categorical Predictors Spring 2000 3 Example with Extreme Collinearity in Multiple Regression Stock prices and market indices Stocksjmp page 138 What s beta for Walmart when regressed on returns of two indices Initial correlations and scatterplot matrix show outliers and high collinarity between the two market indices Addition of sequence number to the plots also shows time series patterns Fitted slope of stock returns on market estimate the m for the stock Initial beta estimate in regression of Walmart on SampP alone is 124 with SE 12 is it significantly larger than one Add VW market returns to the regression Huge collinearity correlation between VW and SampP is 0993 so ahnost no unique variation in either one given that other is in model Either taken separately is a good predictor but show weak effects ie not significant when used together Squished leverage plots little unique variation in either predictor available to explain the variation in the response p 144 More complete VW index is better predictor as financial theory suggests and we should use it alone to estimate the beta for Walmart Term Estimate Std Error t Ratio Probgtt Intercept 0020 0006 322 00016 WV 1239 0118 1049 lt0001 Statistics 102 Categorical Predictors Spring 2000 New Ideas and Terminology for Today Categorical variable Represents group membership eg type of car race sex religion JMP denotes as nominal or ordinal in the column header JMP does a lot of work in the background when these terms are added to a regression building special variables to represent the groups and then adding these variables in the background showing you the resulting regression coefficients 1in fl Hm lilplim Same concept as seen in anova Interaction implies that the effect of one predictor on the response depends on the value of other predictors Measures how the slope of one predictor depends upon levels of others Important in many models crucial in models with categorical Interaction with categorical gt slope depends upon the group Important questions to answer when using categorical variables Are the fits in the different models parallel ie Is interaction present Are the error variances comparable Heteroscedasticity can be a problem Messy part Interpreting the output Take your time Write down the fit for each group one at a time until output is familiar Be careful reading JMP output correctly Term for one group will not be explicitly shown Analysis of covariance A regression model that contains both categorical and continuous predictors usually with a focus on the difference among the groups Statistics 102 Categorical Predictors Spring 2000 5 Categorical Predictors in Multiple Regression Two Groups Employee performance study Managerjmp page 161 Q 11 e s t i 0 n 5 Do data support the claim that externally recruited managers do better Which of two prospective job candidate should we hire the internal or the externally recruited manager D a t a 150 managers 88 of which are internal and 62 are external Analysis Initial comparison of the two groups Average performance rating for internal managers is significantly lower than that for external managers difference 072 with t 298 page 161 An aside Regression with a twogroup categorical variable alone is the same as a two sample ttest between the two groups Confounding issue Salary is much higher for externally recruited managers They occupy higher level positions within the company and Salary is related to rating p 164165 Separate regressions of Rating on Salary for IiiHouse suggest reversed difference on means at fixed salary internal are more highly rated p16667 Combined as one multiple regression Slopes are parallel ie no significant interaction and model page 168 implies that internal managers actually rate signi cantly higher difference 0514 2 X 0257 with t246 C 0 n c l u d e After checking assumptions conclude that ought to hire the internal candidate since at a given salary we expect the internal manager to fare better JMP tricks Point codeslabels using the values of a categorical variable Fitting several models in one Fit Y be view Statistics 102 Categorical Predictors Spring 2000 6 Next Time Review session This Friday reviewing material using multiple regression with emphasis on how these ideas are relevant to the project Categorical predictors continued Categorical predictors interact with the other predictors in the model We ll do another example with these to help with the IMP notation Statistics 102 Spring 2000 Categorical Predictors Correlations I Variable WALMART WALMART 1000 S P5 0 0 0682 VW 0696 Sequence Number 0055 S P5 0 0 0682 1000 0993 0002 VW Sequence Number 0696 0055 0993 0002 1000 0036 0036 1000 Scatterplot Matrix I TIWALMART I 02 01 01 02 010 000 010 020 ISequence Number Il39l lllllll 1010 40 70100 Statistics 102 Categorical Predictors Spring 2000 Initial regression fit Parameter Estimates I Term Estimate Std Error t Ratio Probgtt VIF Intercept 0024 0006 402 00001 0 SP500 1244 0123 1010 lt0001 1 Fit using both market indices lParameter Estimates I Term Estimate Std Error t Ratio Probgtt VIF Intercept 0015 0007 2 00356 0000 SP500 1258 1041 121 02294 74297 WV 2458 1016 242 00171 74297 Leverage plots SP500 WALMART o I 010 000 SP500 Leverage I I 020 WALMART I 010 VW Leverage Statistics 102 Categorical Predictors Spring 2000 9 l conum Mngr Rating 0391 D I I l I i 4 i o 3 0 s 2 1 I External Internal Origin Means and Std DeWa ons Level Number Mean Std Dev Std Err Mean External 62 6321 1342 017044 Internal 88 5605 1518 016181 hTestl Difference tTest DF Probgtt Estimate 0716 2984 148 00033 Std Error 0240 Lower 95 0242 Upper 95 1191 Assuming equal variances L Categorical Predictors 10 Statistics102 Spring 2000 U E 4 DC a C E 1 I I I I I I I I I I I 50 60 70 80 90 100 Salary Term Estimate Std Error tRatio Probgtt Intercept 1693524 0949254 178 00779 Salary 01090929 0014066 776 lt0001 Internal Term Estimate Std Error t Ratio Probgtt Intercept 1936941 0986231 196 00542 Salary 01053912 0012499 843 lt0001 External Fit as one multiple regression that combines these two Parameter Estimates I Term Estimate Std Error t Ratio Probgtt Intercept 1815 0724 251 00132 Salary 0107 0010 1099 lt0001 OriginEXternalnterna 0122 0724 017 08667 SalaryOriginExternalnterna 0002 0010 019 08499 3 81211102 Spring 2007 Chapter 31 32 Introduction to regression analysis Linear regression as a descriptive technique The leastsquares equations Chapter 33 Sampling distribution of b0 b1 Continued in next lecture Regression Analysis Galton s classic data on heights of parents and their child 952 pairs Describes the relationship between child s height y and the parents height X Predict the child s height given parents height Parent ht Child ht 73 60 72 22 72 59 57 72 73 72 85 70 45 71 68 55 13 7139 70 52 51 20 5 69 70 23 53 10 E 70 74 54 95 5 6739 70 73 55 43 65 59 47 53 10 68 25 52 00 63 55 88 51 31 439 6490 6136 u 6480 6195 636465666768697071727374l 54 21 54 95 Uses of Regression Analysis Description Describe the relationship between a dependent variable y child s height and explanatory variables X parents height Prediction Predict dependent variable y based on explanatory variables X Model for Simple Regression Model Consider a population of units on which the variables yX are recorded Let y x denote the conditional mean of y given X The goal of regression analysis is to estimate uy x Simple linear regression model Llle o lx Simple Linear Regression Model Model more details later y dependent variable B0 and B1 are unknown population parameters therefore are estimated X independent variable from the data I3 I3 e error normally distributed y x o lx yintercept it o slope of the line Rise 31 RiseRun The slope lis the change Interpreting the Coefficients in the mean ofy that is associated with a one unit change in X e g for each extra inch for parents the average heights of the child increases by 0 6 inch 53 at 55 55 57 Ba 59 m 71 72 73 mm i child ht 2646 06 parent ht The intercept IS the est1mated mean ofy for x However this Interpretation should only 272 am when the data contains observations with it near 0 Otherwise It IS an extrapolatxtm ofth model which can 2 unreliable Sectmn 372 Estimating the Coef cients The estimates are determined from 7 observations X1y1 Xnyn 7 by calculating sample statistics 7 Correspond to a straight line that cuts into the data Question What should be considered a good line 9 Least Squares Regression Line What is a good estimate of the line A good estimated line should predict y well based on X 7 Least absolute value regression line Line that minimizes the absolute values of the prediction errors in the sample Good criterion but hard to compute 7 Least squares regression line Line that minimizes the squared prediction errors in the sample Good criterion and easy to compute The Least Squares Regression Line Sum of squared differences 2 12 4 2Z 15 32 32 42 689 Sum of squared differences 2 252 4 25Z 15 25Z 32 25Z 399 4 214 Let us compare two lines The second line is horizontal q 432 2 I T 2 i 12 315 1 The smaller the sum of squared differences the better the fit of the 1 2 3 4 line to the data The Estimated Coefficients Example Heights cont 0 For simple linear regression analysis in JMP Click Analyze Fit Y by X then put child ht in Y and parent ht in X and click OK Then click red triangle next to Bivariate Fit and click Fit Line Some commands we will use later can now be found in the red triangle next to Linear Fit Example Heights cont Based on our observations 0 The summary statistics for parent hts and child hts nd b1 and b0 Child hts Parent hts Mean 6820 Mean 6827 Std Dev 260 Std Dev 179 Std Err Mean 0084 Std Err Mean 00580 upper 95 Mean 6837 upper 95 Mean 6838 lower 95 Mean 6804 lower 95 Mean 6815 N 952 952 o For the regression line 7 From JllPlN b1 061 120 Y 7121 68207061x6827 2655 I The LS equation is 26 5506lx J MP Output Bivariate Fit of child ht By parent ht 1 I I I I I I I I II I I 63 64 65 66 67 68 69 70 71 72 73 74 parent ht Linear Fit child ht 26456 0612 parent ht JMP Output cont Note the values of b0 b1 in the parameter estimates table The other output entries will be explained later Summary of Fit RSquare 0177 RSquare Adj 0176 Root Mean Square Error 2357 Mean of Response 68202 Observations 952 Analysis of Variance Source DF Sum of Squares Mean Square F Ratio 13 1 Model 1 13650 20459 Error 950 527728 556 Prob gt F C Total 951 641378 lt0001 Parameter Estimates T Estimate Std Error t Ratio Probgtt Intercept 26456 2920 1 parent ht 0612 0043 1430 lt0001 Ordinary Linear Model Assumptions Properties of errors under ideal model 7 Lly x o 1x for all x 7 y o 1xl el for all xi 7 The distribution of el l x1 is normal 7 61 n are independent 7 Eex 990 and Varq lxl03922 Equivalent definition For each X y has a normal distribution with mean o 51x and variance of Also yp yn are independent Sampling Distribution of 90191 0 The sampling distribution of 30131 is the probability distribution of the estimates over repeated samples y1 yquot from the ideal linear regression model with xed values of oy l and a and x1xn o quotStandardregressionjmp contains a simulation of pairs x1y1xnyn from a simple linear regression model with o l 1 10 1 AND 0 It contains another simulation labeled x1 y xn y from the same model 0 Notice the difference in the estimated coefficients calculated from the y s and from the y s Bivariate Fit ofy By x Bivariate Fit of y By x 35 25 gt2o quot 39 39 15 I 1039 05 Linear Fit Linear Fit y12001521x y18510512x Two outcomes from standardregressionjmp The values of c1x20 are the same in both data sets Each data set comes from the model with o l l 2 039 l Sampling Distribution Details 0 0 ande have easily described normal distributions 0 Sampling distribution of 0 is normal with E b0 2 o Hence the estimate is unbiased 1 72 i A 1 2 Varb0 O39n1s Ezlxxlix 0 Sampling distribution of 0 is normal with E b1 Ii Mek Where Hence the estimate is unbiased Typical Regression Analysis 1 Observe pairs of data X1y1 Xnyn that are a sample from population of interest 2 Plot the data 3 Assume simple linear regression model assumptions hold 4 Estimate the true regression line uy x o lx by the least squares line it b0 blx 5 Check whether the assumptions of the ideal model are reasonable Chapter 6 and next lecture 6 Make inferences concerning coefficients o l and make predictions 9 b0 blx Notes Formulas for the least squares equations 1 The equations for bU and b1 are easy to derive Here is a derivation that involves a little bit of calculus It is desired to minimize the sum of squared errors Symbolically this is SSEbugtb1 20 bu b1x 2 The minimum occurs when 0 aSSEb bl and 0 aSSE bmbl l I Hence we need 0 gm hmbl 722x1y17b b1x1 and l 8 0 ESSEwmbi 7220 7 bn blx These are two linear equations in the two unknowns bU and b1 Some algebraic manipulation shows that the solution can be written in the desired form 7 b W EU 7 at andbu77b17 2 A NICE FACT that s sometimes useful a The least squares 1ine passes through the point To see this note that if x X then the corresponding point on the least squares line is 9170 1713 Substituting the de nition of 70 yields 77blfblf 7 as claimed b The equation for the least squares line can be rewritten in the form yiy b1xif 3 There are other useful ways to write the equations for 170 and 7 Recall that the sample covariance is de ned as Covxgty EZM WW 7 sxy SW Simi1ar1y the sample correlation coef cient is x 1535 sf s is de ned on overhead 18 and S is de ned similarly Thus R say History of Galton s Data 4 Francis Galton gathered data about heights of parents and their children and published the analysis in 1886 in a paper entitled Regression towards mediocrity sic in hereditary stature In the process he coined the term Regression to describe the straight line that summarizes the type of relational data that may appear in a scatterplot He did not use our current leastsquares technique for nding this line39 instead he used a clever analysis whose nal step is to t the line by eye He estimated the slope of the regression line as 23 Further work in the next decades by Galton and by K Pearson Gossett writing as A Student and others connected Galton s analysis to the least squares technique earlier invented by Gauss 1809 and also derived the relevant sampling distributions needed to create a statistical regression analysis The data we use for our analysis is packaged with the IMP program disk It is not exactly Galton s original data We believe it is a version of the data set prepared by S Stigler 1986 as a minor modi cation of Galton s data In order for the data to plot nicely Stigler jittered the data He also included some data that Galton did not The data listed as Parent height in this data set is actually the average afbath parents heights a er adjusting the mothers heights as discussedin the next note 6 Galton did not know how to separately treat men s and women s heights in order to produce the kind of results he wanted to look at SO after looking at the structure of the data he multiplied ullfemule heights by 1 08 This puts all the heights on very nearly the same scale and allowed him to treatmens andwomens heights together without regard to sex Instead of doing this Galton could have divided the mens heights by 108 or he could have achieved a similar effectby dividing the male heights by 104 and multiplying the female ones by 104 Why didn t he use one afthese other schemes 7 Galton did not use modem randomsampling methods to obtain his data Instead he obtained his data through the offer of prizes for the best extracts from their own family records obtained from individual family correspondents He summarized the data in a journal that is now in the Library of the University College of London Here is what the first half of p 4 looks like According to Galton s notations one should add 60 inches to every entry in the Table Half of p4 of Galton s Journal note the approximate heighm for some records and the entries tall and deformed7 mum unwell 39 quot quot quot TA2 4 39 39 G 7 alton s r 39I L A A Ex quot71 243 39 1986 TheEnglish breakthrough Galton in The History amenstxes The Measurement anncemzmty before 1900 Harvard Univ Press 24 Statistics 102 Multiple Regression Spring 2000 I Multiple Regression Project Analysis for Today First steps Transforming the data into a form that lets you estimate the fixed and variable costs of a lease using a regression model that meets the three key assumptions Review of Multiple Regression from Last Week 0 b j e c t i V e Isolate the key factors that in uence the response and separate their effects M0 1 el Y 30 31 X1 3k Xk Error Sales 30 31 Adv 32 Price Error with Independence Constant variance 02 about regression line Normally distributed errors about the regression line D is c u s 5 io n Model is additive Geometry of multiple regression Slopes measure effect of each predictor holding others fixed Simple regression slope vs multiple regression slope Relationship between R2 and RMSE Both describe goodnessoffit R2 is relative whereas RMSE is absolute They are related as follows RMSE2 Var residuals z 1 R2 Var response Same interpretation in simple one predictor and multiple regression Statistics 102 Multiple Regression Spring 2000 2 Inference in Multiple Regression Inference in multiple regression One coefficient tratio estimate SE Is this slope different from zero Does this variable significantly improve a model containing rest All coefficients overall Fratio anova table Does this entire model explain significant amounts of variation Analysis of variance ANOVA summary page 141 Summary of how much variation is being explained per predictor Example for the car data with weight and horsepower as predictors Source DF Sum of Squares Mean Square F Ratio Model 2 70625945 353130 2883143 Error 109 13350408 1225 ProbgtF C Total 111 83976353 lt0001 Why do we need different tests Each addresses a specific aspect of the fitted model tratio considers one coefficient intercept or slope Fratio considers all slopes simultaneously Why not just do a bunch of ttests one for each slope With 20 predictors and 95 CI you can expect one significant not zero by chance alone Too many things will appear significant that really are not meaningful Recall the use of multiple comparisons in anova Statistics 102 Multiple Regression Spring 2000 3 Collinearity in Multiple Regression What is collinearity Also known as multicollinearity Collinearity is correlation among the predictors in a regression As such collinearity does not Violate an assumption in regression What does collinearity do in regression Consequences Complicates interpretation making it hard to separate the predictors In ates the SE s of the estimated coefficients o 1 SEslope est1mate for Xj z E SDAdjusted Xj 0 lVIFj V3 SDXj lVIFj SE if no collinearity How can I tell if collinearity is present Graphically Scatterplots help but leverage plots are better Multiple simple regression Views of one multiple regression Essential for identifying leverage points in multiple regression Do I like the shown simple regression model Tests Big F ratio small tratio Diagnostic Variance in ation factors VIF What do I do about collinearity Nothing Collinearity weakens ability to interpret but in sample prediction works well or at least is not injured Reformulate predictors Identify distinct concepts Get rid of one of the offenders Stats help you decide which one Summary discussion on page 147 of the casebook Statistics 102 Multiple Regression Spring 2000 4 Example of Multiple Regression Automobile design Car89jmp page 109 What is the predicted mileage for a 4000 lb design and what characteristics of the design are crucial How much does my 200 pound brother owe me for gas for carrying him 3000 miles to California Oops it s urban mileage in example Initial onepredictor model Transform response to gallons per 1000 mile scale Cannot compare Rz s since two model use different dependent variables lPG and GPM Effect of scaling from GPM to GP1000M RMSE 423 p 111 Skewness in residuals from regression with Weight p 112 Prediction 4000 lbs 639 W 200 lbs for 3000 miles z 82 gals Add variable for Horsepower p 1 17 R2 increases from 77 to 84 added variable is significant t721 RMSE drops to 350 Predictors are related both increase together higher SE for Weight Picture explains the increase in SE due to restricted range p 120 TT 200 lbs for 3000 miles z 53 gals Prediction from multiple regression Add a predictor less correlated with Weight use HPPound p 123 Weight and HPPound less related more distinct properties of these cars Engineer can manipulate these separately unlike HP and weight Residual plots Show residuals plotted on fitted values Inspect for deviations from assumptions such as lack of constant variance Leverage plots p 125 Diagnostic plot designed especially for multiple regression Reveals leveraged observations in multiple regression Next steps for this model What other factors are important for the design How small can we make the RMSE Statistics 102 Multiple Regression Spring 2000 5 Example with Extreme Collinearity in Multiple Regression Stock prices and market indices Stocksjmp page 138 What s the beta for Wahnart when regressed on two indices Fitted slope of stock returns on market estimate the M for the stock Huge collinearity correlation between VW and SampP is 0993 so ahnost no unique variation in either one given that other is in model Either taken separately is a good predictor but show weak effects when used together Squished leverage plots little unique variation in either predictor available to explain the variation in the response p 144 More complete VW index is better predictor as financial theory suggests Next Time Categorical predictors Categorical predictors allow us to compare regression models for different groups judging if the models for the different groups are comparable Statistics 102 Multiple Regression Spring 2000 6 Response GP1000M City Summary of Fit RSquare 0765 RSquare Adj 0763 Root Mean Square Error 4233 Mean of Response 47595 Observations or Sum Wgts 112000 quotLack of Fit lParameter Estimates Term Estimate Std Error t Ratio Probgtt Intercept 94323 20545 459 lt0001 Weightlb 00136 00007 1894 lt0001 Analysis of Variance I Source DF Sum of Squares Mean Square F Ratio Model 1 642644 642644 3586195 Error 110 197119 1792 ProbgtF C Total 111 839764 lt0001 Response GP1000M City Summary of FitI RSquare 0841 RSquare Adj 0838 Root Mean Square Error 3500 Mean of Response 47595 LObservations or Sum Wgts 112000 Lack of Fit lParameter Estimates I Term Estimate Std Error t Ratio Probgtt Intercept 116843 17270 677 lt0001 Weightlb 00089 00009 1011 lt0001 LHorsepower 00884 00123 721 lt0001 Analysis of Variance I Source DF Sum of Squares Mean Square F Ratio Model 2 706259 353130 2883143 Error 109 133504 1225 ProbgtF C Total 111 839764 lt0001 Statistics 102 Multiple Regression Spring 2000 7 250 200 39 8150 39 quot g I I I39 I 5 I u 100 5U 39 r I r I r I r I r I 1500 2000 2500 3000 3500 4000 Weightb SE 1 t39 t f X G 1 so eesunae or 39z p J VB SDAdJusted XI 0 V1Fj V3 SDXj VIFI39 SE if no collinearity Term Estimate Std Error t Ratio Probgtt VIF Intercept 116843 172704 677 lt0001 0000 Weightb 00089 000088 1011 lt0001 2202 Horsepower 00884 001226 721 lt0001 2202 10 39 5 I I II 39 g I I I E 39 39 39 39 39 3 n I 39 39 n 0 39 I I I quotI 39 I 39 39II I39 5 quot 5 I 39 I 39 39 39 39 I 30 40 50 60 70 GP1000M City Predicted Statistics 102 Spring 2000 Correlations Variable VW SP500 WALMART VW 1000 0993 0696 SP500 0993 1000 0682 WALMART 0696 0682 1000 Sequence Number 0036 0002 0055 lScatterplot Matrix I Multiple Regression 8 Sequence Number 0036 0002 0055 1000 owe2 39 000 I J l o1o 39 39 020 010 ISP500 000 010 020 1 39 39 WALMART I 39 02 I I 11 quot 00 imh39nu39 39I quot I l 39 V 02 39 120 I ISequence Number 90 I L 60 10 39 quot 39 39I39I39I39I39I39I39I39I IIIIIIII39I39I39I39I39I39I39lllllllll 020 005 10 020 005 10 02 00 2 3 10 40 70100 Parameter Estimates I Term Estimate Std Error t Ratio Probgtt VIF Intercept 0024 0006 402 00001 0 SP500 1244 0123 1010 lt0001 1 Parameter Estimates I Term Estimate Std Error t Ratio Probgtt VIF Intercept 0015 0007 213 00356 0000 SP500 1258 1041 121 02294 74297 WV 2458 1016 242 00171 74297 Statistics 102 Spring 2000 39SP500 I WALMART o Multiple Regression 9 39 I I I I 010 000 10 15 SP500 Leverage WALMART o I WV Leverage Stat 102 Spn39ng 2000 1 Review of One and Two Sample Tests One Sample Tests Normality Assume that the sample of n observations is from a normal population with mean M and variance 02 abbreviated NW 02 Tests of one or sided hypotheses count the number of standard errors that separate the sample mean V from the null hypothesis If a2 is known then the standard error of is a and use 20 values from the normal tablei Otherwise estimate the standard error using the sample variance as S and use values teakl from the ttable with n 7 1 df The case of known variance is mostly of conceptual interest though it does make it possible to answer questions of power and sample size which could not be easily done from the ttablei For testing the hypotheses H0 1 M S 0 Ha M gt 0 use the test statistic 2 on p 314 t on p 325 Z V i 0 7 vi 0 UM 8N5 If the value of 2 or t is negative V lies in the region speci ed by H0 so there s no reason to reject If the 01 value of 2 or t is positive then reject H0 if zgt 2a or tgtto n71 Equivalently reject H0 if the pvalue is less than a see 8i3 p 321 or for twosided tests if the 100la con dence interval does not include M0 537 p 335 Power and Sample Size To nd the power of a test or B which is 1 minus the power you have to gure out the probability of 2 rejecting H0 when it is false You can do this only when a is known For the onesided hypotheses given above you can get the power at some Ma gt 0 as p 319 a Preject H0 176 PY gt 0 204 Ma V7 Ma 0 7 Ma P UM a HNangthMW 0 gt Plug in values for the terms on the right and look up the value in the normal table to nd l i To nd the sample size n required to obtain chosen values for a and B at some Ma there s a similar formula The last equation given above implies that 0 i PN01 lt 204 a n W g 7 0 Ma 2 7 20 a i Solving for the required sample size n gives the formula seen on old exams 02 20 213 n 2 i Ma M0 Stat 102 Spn39ng 2000 2 Twosample Tests The big changes from onesample tests to twosample tests are 0 H0 hypothesizes values for M1 7 2 often H0 M1 7 M2 0i 0 The needed standard error is SEV1 7 V2 0 You have to decide whether the samples are dependent or independent When the samples are dependent in particular have noticeable correlation you should use the paired t test which allows for the correlation 9i3 p 370 The paired t test is just a onesample test based on the differences If the two samples are independent no noticeable correlation then use the twosample t test 9il p 351 Both tests resemble onesample tests in that they count the number of standard errors separating V1 7 V2 from the values speci ed by H0 usually 0 An important variant of the twosample test p 357 allows for the two samples to have different variances Use this one if the two sample standard deviations are differ by much eg one is twice the size of the other Tests for Proportions For a test of H0 7r 7r0 use the normal 2 test formed as p 332 7 7 7 We x7T01 7r0n as long as mm and nl 7 7r0 are about 5 or larger For a two sample test of H0 7r1 7 7r Do use Do 0 usually p 395 7h 7 g 7 D0 7T1 177 7b 177 n1 n2 2 Nonparametric Tests These tests avoid the assumption of normality and accommodate outliers but require the other assumptions such as independent observations The simplest is the sign test discussed in class p 374 Used in the context of a paired ttest it works like a binomial test for the number of heads in n tosses of a fair coini Better nonparametric methods ie ones that work almost as well as the t tests when the data is normal use ranksi ln onesample problems the signedrank test for H0 M 0 uses a zscore from p 375 T 7 M244 T 7 Zranks of pos1t1ve values 2 7 m 24 to see if the ranks of the positive values differ from what is expected under Hoi Since paired comparisons reduce to onesample tests the signedrank test is an alternative to the paired t test For twosample problems you can use the ranksum test lts test statistic is found by merging the data from both samples ordering them and then counting the ranks of one of the two samples The test again works by forming a 2 score that compares the observed sum T to what is expected under H0 p 365 7 n1 n1n21 T 2 T Z ranks of one group 7 Z W Lecture 23 Contingency Tables STAT 102 1 Stat 102 has concentrated on analysis of explanatory and predictive models with one dependent variable Y and one or more predictive variables or factors often denoted by X 1 Data Structures examined include situations described as 0 Ordinary linear regression Y continuous One continuous xvariable 0 Multiple linear regression Y continuous Several continuous xvariables o Polynomial regression Like multiple regression but with some polynomial terms 0 ANOVA Y continuous One or more qualitative Xfactors 0 Regression and ANOVA combined 0 Logistic Regression Y qualitative One or more continuous Xfactors We also included a qualitative Xfactor sex in one example We haven t yet talked about Contingency Tables 0 In these both Y andX are qualitative o The focus is on understanding the relation between Y and X We don t necessarily think of one some as causing the other or as being useful to predict the other 0 Such situations are often analyzed in Contingency Tables In these the null hypothesis H0 is usually Independence of the various factors 0 We ll give two examples 0 We ll describe how to compute the test statistic with complete formulas o We ll also show where to get the tests in JMP Example Wine and Music Do shoppers buy more Wine when appropriate music is played Data A supermarket in N Ireland played no background music French music or Italian music at random times of the day Here is the number of purchases for different combination of type of music and type of Wine 2Way table of results Type of Wine By Type of Music Count French Italian None Total French 39 30 30 99 Italian 1 19 11 31 Other 35 35 43 113 Total 75 84 84 243 Taken from Moore and McCabe Introduction to the Practice of Statistics 5th edition Original source Ryan NorthrupClewes Knox and Thurnham 1998 The effect of instore music on consumer choice of Wine Proc Nutrition Soc 57 p1069 Test of Independence Null Hypothesis In words H0 Row effects are independent of Column effects Symbolically The population probabilities for the row effects Wine are 01 PBuymgFrench Wine denoted by Slmllar def for 0203 The probabilities for the column effects Music are K1 2 PFrenchMusic 21 3 are similarly defined H0 is that the probability of a count in the i j cell is the product for example Pr FrenchampNone Prcount in cell 13 01K393 PB H3 is that H0 is not true Let N 1 denote the number of observations in the i39 cell Let denote the sum of observations in the ith row etc The table now looks like We can estimate the 0 and K terms in the obvious way eg 12 NN Computations for Test Count French Italian None Total French Aq139 Aq23o Aq33o AQ99 Italian N21 1 AQZ19 AQ311 AQ31 Other Ag135 Ag235 N33 43 Ag113 Total AL175 A z84 N 84 3 A 243 15139 NiN 9 Under H0 the cell probabilities then have estimates Pij j Computations cont We can put this together to get estimates for What the cell counts should be if H0 is true The expected under H0 cell counts would be Expectedy E1 N 13 ij39 Pearson s ChiSquare statistic for testing H0 is N E 2 2 U U Under H0 it has a ChiSquare distribution with df I 1J 1 In our example df 2 x 2 4 Your book has tables of the ChiSquare distribution or use JMP We REJECT When 2 gt C with C as found in the table Computations summary 0 Get the table It has I rows and J columns and entries N 11 0 Calculate EU the Expected under H0 cell counts Looking back we see they have the formula E N z N NJ39 U N N H N 0 Calculate 2 2 X2 2 Z NU EU 2 Z Observedij Expectedy I 1 EU Expectedlj Use df I lJ 1 Get the critical value C from the Chi Square table with this df Reject Whenever 2 gt C Analysis in JMP JMP performs the necessary calculations Via Fit Y by X Music as Y Wine as X number as Frequency Wine By Music Count French Italian None Expected Cell Chi 2 French 39 30 30 99 305556 342222 342222 23337 05209 05209 Italian 1 19 11 31 95679 10716 10716 76724 64038 00075 Other 35 35 43 113 348765 390617 390617 00004 04223 03971 75 84 84 243 30 342222 2 For example 342222 2 M and 05209 2 243 342222 Analysis in JMP cont JMP also gives the value of 12 and the resulting Pvalue Test ChiSquare ProbgtChiSq Likelihood Ratio 21875 00002 Pearson 18279 00011 If you calculated by hand you would get this from the table on p9 as 18279 2 23337 76724 0004 5209 You can ignore the Likelihood Ratio statistic It belongs to a different statistical method It comes from a different set of formulas than Pearson s ChiSquare The Pvalues from the two tests are nearly always very similar The result of this calculation is that we Reject H0 We do so at any reasonable level of significance since Pvalue 0011 This should be followed by a summary of the nature of the effects we have detected Nature of the Effects We have Rejected H0 Therefore we believe choice of wine and nature of music are related to each other What is the relation There are formal ways tO investigate this see the Appendix to this lecture but for Stat 102 we recommend looking at the numbers and use commonsense You can look at the cell counts as on p6 andor at the cell Chi Square values as on p9 andor at row or column as below Used carefully any of these looks should yield similar conclusions Here is the table of counts and column Nature of Effects cont Wine Music F 3571 3571 1 11 31 1 22 131 11 46 41 511 7 2 Interesting conclusions are Visible in the colored 2 X 2 section When French music plays customers seem much more likely to purchase French Wine than they are to purchase French Wine When Italian music is playing Also When French music is playing customers seem very much less likely to purchase Italian wine than they are to purchase Italian wine when Italian music is playing Also from the last two columns There doesn t seem to be much difference in distribution of wine choice between times when Italian music is playing and when no music is playing There s a slightly greater of people choosing Italian wine when Italian music is playing but it s not a large difference Second Example Income and Job Satisfaction About 50 years ago a classic sociological study looked at the relationship between a worker s salary and their job satisfaction One twoway table from this study looked at the results from a survey of worker s income and a questionnaire judging their job satisfaction Annual job INCOME was divided into 4 categories lt6000 6000 15000 15000 25000 gt25000 Their SATISFACTION was graded into four categories 1 very dissatis ed 2 somewhat dissatis ed 3 somewhat satis ed 4 very satis ed By accident both examples in this lecture have square contingency tables But that s not necessary rectangular tables are also common Data table and results of the analysis follow Income and Job Satisfaction Data Table Income By Satisfaction Count 1 2 3 4 Total Row 1 06 20 24 80 82 206 971 1165 3883 3981 2 615 22 38 104 125 289 761 1315 3599 4325 3 1525 13 28 81 113 235 553 1191 3447 4809 4 25 7 18 54 92 171 409 1053 3158 5380 Total 62 108 319 412 901 The level of satisfaction does seem to increase somewhat with income For example 97 of the lowest income people are very dissatisfied whereas only 41 of the highest income people are dissatisfied BUT Are the differences we see in the row statistically signi cant to show that Income and job satisfaction are not independent Data taken from Agresti Categorical Data Analysis which contains the original citation An Aside The Mosaic Plot irz JMP provides a very nice visualization for this data Mosaic Plot Satisfaction Income You can clearly see how the of very dissatis ed workers goes down as earnings go up and the of very satis ed workers increases with income Pearson ChiSquare Test of Independence Test of H0 Satisfaction is independent of Income df 3 x3 9 Results from J MP Test ChiSquare ProbgtChiSq Likelihood Ratio 12037 02112 Pearson 11989 02140 The PValue is 02140 We cannot reject the null hypothesis We have to regretfully conclude that numbers like those in the table might well have occurred by chance BUT note that not only do we have numbers like these but they occur in a very obvious pattern Our test isn t looking for any special pattern Might there be a better test to use on this data See the Appendix Appendix Going Beyond the Test of Independence This Material is OPTIONAL It will not be on our exam Our other methods have provided regression coefficients or estimates of factor effects that can be used to understand the nature of factor effects These are also very useful to predict outofsample observations especially When the overall null hypothesis is false One method called a General Linear Model with a log link corresponds to a model of the form Loge u at 3 71 This has the same character as our previous Twoway ANOVA except that it models Loge E NU rather than E NU GLM loglink Wine and Music Data Whole Model Test Model LogLikeihood LR ChiSquare DF ProbgtChiSq Difference 3899 7798 8 lt0001 Full 2172 Reduced 6071 Effect Tests Source DF LR ChiSquare ProbgtChiSq Wine 2 6833 lt0001 Music 2 1131 00035 MusicWine 4 2188 00002 These tables show that various things we can see in the table are statistically significant In particular the interaction is significant It is this interaction term that describes the dependence of Wine and Music The significance of this term is another confirmation of the result of our earlier Pearson ChiSquare test Parameter Estimates This analysis yields a parameter estimates table that one can use to estimate the values of relevant quantities For example one can estimate a quantity like E LogN12 N2 and hence also E N12 N2 This corresponds to the proportion of Italian Music listeners who purchase French Wine Parameter Estimates Term Estimate Std Error LR ChiSquare ProbgtChiSq Intercept 296 013 5104 lt0001 WineFrench 052 014 2025 lt0001 Wineltalian 118 024 6705 lt0001 MusicFrench 056 024 1064 00011 Musicltalian 034 014 705 00079 MusicFrenchWineFrench 073 025 1521 lt0001 MusicFrenchWinetaian 122 046 1653 lt0001 MusictaianWineFrench 042 017 782 00052 MusiCltaianWinetaian 083 026 1521 ltOOO1 A Peculiar Feature The preceding tables do have one peculiar feature We can rationalize this as understandable and not important but we don t fully understand it Note that on p19 the Effect Test for IVIusic is significant This suggests that the three different types of music were played to very different total numbers of wine buyers More precisely one would think this is a test of EN1 EN2 EN3 However N1 75 N2 84 N3 84 These numbers aren t very different so it s very unclear why a test of this null hypothesis should Reject The explanation must lie in the fact that this test is after controlling for the effect of wine and also the effect of the interaction terms There must be something about the presence of the interaction terms that s driving this peculiarity As confirmation of that conjecture we note that the test of a model without interactions gives NO SIGNIFICANCE to the Music factor Effect Tests Source DF LR ChiSquare ProbgtChiSq Wine 2 5543 lt0001 Music 2 068 07134 21 GLM loglink Income and Job Satisfaction Data For this data we could use a model in which Income levels are still treated as categories BUT in which the satisfaction numbers 1 2 3 4 are treated numerically This allows for a model that has a monotone trend More precisely a linear trend for Log E NU In the model without interactions we have Effect Tests Source DF LR ChiSquare ProbgtChiSq Income 3 3292 lt0001 Satisfaction 1 37040 lt0001 Note There is w DF for Satisfaction because there is exactly one regression coef cient for this factor Both income and satisfaction are significant in this model as one might expect from looking at the tables The cell numbers change as income level changes and they increase as satisfaction increases 22 The Interaction Term A Measure of Dependence Here is the table for the model with an interaction term Effect Tests Source DF LR ChiSquare ProbgtChiSq Income 3 3291 lt0001 Satisfaction 1 36964 lt0001 Satisfactionncome 3 865 00343 Our earlier test of independence yielded Pvalue 21 not signi cant But now note that the pvalue 0343 lt 005 So at the conventional level this interaction is signi cant although not dramatically so Looking at the data this way enables us to be weakly convinced that there is some sort of interaction dependence between income and satisfaction in the worker population 23 DRAFT FOR DISCUSSION ONLY UNIFORM EMERGENCY VOLUNTEER HEALTH PRACTITIONERS ACT NATIONAL CONFERENCE OF COMMISSIONERS ON UNIFORM STATE LAWS RESERVED SECTIONS 11 AND 12 Interim Draft January 8 2007 Without Prefatory Note and With Comments Copyright 2006 By NATIONAL CONFERENCE OF COMMISSIONERS ON UNIFORM STATE LAWS The ideas and conclusions set forth in this draft including the proposed statutory language and any comments or reporter s notes have not been passed upon by the National Conference of Commissioners on Uniform State Laws or the Drafting Committee They do not necessarily re ect the views of the Conference and its Commissioners and the Drafting Committee and its Members and Reporter Proposed statutory language may not be used to ascertain the intent or meaning of any promulgated final statutory proposal January 8 2007 DRAFTING COMMITTEE ON UNIFORM EMERGENCY VOLUNTEER HEALTH PRACTITIONERS ACT The Committee appointed by and representing the National Conference of Commissioners on Uniform State Laws in drafting this Act consists of the following individuals RAYMOND P PEPE 17 N Second St 18Lh Floor Harrisburg PA 171011507 Chair ROBERT G BAILEY University of MissouriColumbia 217 Hulston Hall Columbia MO 6521 1 STEPHEN C CAWOOD 108 12 Kentucky Ave PO Drawer 128 Pineville KY 409770128 THOMAS T GRIMSHAW 1700 Lincoln St Suite 3800 Denver CO 80203 THEODORE C KRAMER 45 Walnut St Brattleboro VT 05301 AMY L LONGO 8805 Indian Hills Dr Suite 280 Omaha NE 681144070 JOHN J MCAVOY 3110 Brandywine St NW Washington DC 20008 DONALD E MIELKE 7472 S Shaffer Ln Suite 100 Littleton CO 80127 NICHOLAS W ROMANELLO 11033 Mill Creek Way 206 Ft Myers FL 33916 JAMES G HODGE JR Johns Hopkins Bloomberg School of Public Health 624 N Broadway Baltimore MD 212051996 Reporter EX OFFICIO HOWARD J SWIBEL 120 S Riverside Plaza Suite 1200 Chicago IL 60606 President LEVI J BENTON State of Texas 201 Caroline 13th Floor Houston TX 77002 Division Chair AMERICAN BAR ASSOCIATION ADVISOR BRYAN ALBERT LIANG California Western School of Law 350 Cedar St San Diego CA 92101 ABA Advisor BARBARA J GISLASON 219 Main St SE Suite 560 Minneapolis MN 554142152ABA Section Advisor PRISCILLA D KEITH 3838 N Rural St Indianapolis IN 462052930 ABA Section Advisor EXECUTIVE DIRECTOR WILLIAM H HENNING University of Alabama School of Law Box 870382 Tuscaloosa AL 354870382 Executive Director Copies of this Act may be obtained from NATIONAL CONFERENCE OF COMMISSIONERS ON UNIFORM STATE LAWS 211 E Ontario Street Suite 1300 Chicago Illinois 60611 wwwnccuslorg UNIFORM EMERGENCY VOLUNTEER HEALTH PRACTITIONERS ACT TABLE OF CONTENTS Prefatory Note 1 SECTION 11 CIVIL LIABILITY FOR VOLUNTEER HEALTH PRACTITIONERS VICARIOUS LIABILITY SECTION 12 WORKERS COMPENSATION COVERAGE 11 UNIFORM EMERGENCY VOLUNTEER HEALTH SERVICES ACT Prefatory Note On July 13 2006 the Uniform Law Commission gave nal approval to a version of the Uniform Emergency Volunteer Health Practitioners Act UEVHPA intended to promote the establishment of a robust and redundant system to quickly and efficiently facilitate the deployment and use of licensed practitioners to provide health and veterinary services in response to declared disasters and emergencies The 2006 version of the UEVHPA contains provisions that l establish a system for the use of volunteer health practitioners capable of functioning autonomously even when routine methods of communication are disrupted 2 provide reasonable safeguards to assure that health practitioners are appropriately licensed and regulated to protect the public s health and 3 allow states to regulate direct and restrict the scope and extent of services provided by volunteer health practitioners to promote emergency operations While immediate adoption of the 2006 version of the UEVHPA will assist states in more effectively responding to future emergencies and help alleviate significant deficiencies in this nation s current disaster response legal infrastructure this version of the Act does not address two important topics that most groups and organizations engaged in the development of the UEVHPA indicated were critically important to the effective deployment and utilization of volunteer health practitioners As currently drafted the UEVHPA does not include provisions concerning 1 whether and to what extent volunteer health practitioners and organizations deploying and using these individuals are responsible for claims based on the volunteer s acts or omissions in providing health or veterinary services during emergencies and 2 whether and how the volunteers may be protected in the event of their own injuries or deaths in responding to declared emergencies through workers compensation benefits While the risk of exposure to liability for malpractice claims and the availability of workers compensation benefits are matters of significant concern to all healthcare practitioners these issues are of particular importance and relevance to volunteer health practitioners who may be needed to provide emergency health services to patients and the public in the midst of the challenging circumstances and the suboptimal conditions that arise during emergencies The potential for healthrelated liability claims of patients to arise or for volunteer health practitioners to be injured or killed in service are obvious factors that may impinge licensed practitioners to fully participate in emergency responses Even if the volunteers are ready and willing to serve the entities that host them or send them may have their own liability concerns which may sti e volunteer participation Many existing laws at the federal and state levels recognize the need to provide some protections from liability or workers compensation benefits for volunteers Health Resources Services Administration Emergency System for Advance Registration of Volunteer Health Professionals ESARVHP Legal and Regulatog Issues and Solutions Health Resources and Services Administration HRSA Washington DC May 2006 1180 However the applicability of these laws to volunteer health practitioners is sketchy Existing laws create a l patchwork of protections that may apply to speci c volunteers in limited settings During emergencies volunteer health practitioners or entities that host or send them may not know where their protections lie or if they are protected at all The net result is that some welltrained motivated and valued volunteer health practitioners may not be able to provide essential health services at a time when affected populations need them most Numerous anecdotal accounts of how liability or workers compensation issues limited volunteer participation arose for example during national and state responses to Hurricane Katrina in 2005 There is however a lack of empirical evidence noting the signi cance of liability and workers compensation protections to prospective and actual volunteers To help address this gap the Community Health Planning and Policy Development Section of the American Public Health Association APHA developed an electronic survey on these key issues in the Fall 2006 APHA requested over 10000 of its members complete the online con dential survey including hundreds of licensed health practitioners Though subject to additional veri cation the initial survey results provide real data on volunteer attitudes on some key issues There were 1077 total respondents 773 female 304 male Direct health providers or clinicians accounted for 273 of the survey respondents 294 respondents the majority of which included doctors 261 and nurses 133 Seventy percent of these respondents reported having six or more years experience in their eld of employment Approximately 12 of respondents indicated they were currently enrolled in an ESARVHP or other volunteer registry system In response to the following question As a clinician to what degree does knowing that you have medical malpractice insurance coverage in uence your decision to travel out of state to volunteer in a clinical capacity during an emergency nearly 60 of respondents indicated it was importan 243 or essential 354 In response to the question As a clinician how important is knowing one s scope of practice in a state other than one s home state in determining whether to travel out of state to volunteer in an emergency just under 63 of respondents indicated it was important 295 or essential 334 These questions were designed to assess how much importance a clinician assigns to medical malpractice coverage and scope of practice requirements in deciding whether to volunteer outofstate The implications to one s potential liability are obvious 1 practitioners covered by medical malpractice insurance enjoy some protection from plaintiffs with successful claims in negligence seeking the practitioner s personal assets and 2 liability claims may arise from practitioners who act outside their scope of practice If practitioners cannot determine the applicable scope of practice for their profession in another state they may be opening themselves to liability even for unknowing acts that exceed one s scope Two additional questions answered by all respondents including clinicians provide a precise assessment of their concerns over liability and workers compensation protections When asked as a potential volunteer how important is your immunity from civil lawsuits in deciding whether to volunteer during emergencies almost 70 of respondents indicated it was importan 356 or essential 338 Only 55 of respondents indicated that civil immunity was not important with the remainder 25 saying it was somewhat important 2 Responding to the question As a potential volunteer how important to you is your protection from harms eg physical or mental injuries through bene ts akin to worker s compensation 74 l of respondents indicated it was important 447 or essential 294 Only 48 of respondents indicated that workers compensation bene ts were not important with the remainder 21 saying it was somewhat important Thus based on these current survey results nearly 70 of respondents many of who are prospective or actual volunteer health practitioners clari ed that civil immunity and workers compensation protections are important or essential facets of their decision whether to volunteer during an emergency In developing the version of the UEVHPA presented to the 2006 Annual Meeting of the Uniform Law Commission the Drafting Committee presented proposals to the Commission that would have granted volunteer health practitioners similar immunity from tort claims enjoyed by state employees deployed to emergency scenes through their jurisdictions pursuant to the Emergency Management Assistance Compact EMAC Furthermore the draft version of the Act presented at 2006 Annual Meeting proposed treating volunteer health practitioners as employees of their home state for workers compensation purposes to the extent they did not have access to alternative sources of workers compensation coverage Facing concerns that these proposals required more careful review by the states and members of the National Conference however it was decided to defer nal action on these important topics until the next Annual Meeting of the National Conference in July 2007 The National Conference directed the Drafting Committee to further review analyze and gather comments and recommendations regarding how to most effectively address these topics In response the Drafting Committee circulated a Discussion Draft of amendments to UEVHPA in September 2006 that provided two alternatives for addressing the topic of volunteer liability Option A in the September 2006 amendments provided that volunteer health practitioners are not liable for acts of ordinary negligence but would be subject to claims based on willful wanton grossly negligent reckless criminal or intentional misconduct and that host states would be subject to claims based on ordinary negligence to the same extent as provided by state tort claims acts Option B in the September 2006 draft applied by reference but without further explication or elaboration the protections provided by EMAC the Federal Volunteer Protection Act and other pertinent state laws to volunteer health professionals and groups and organizations that deployed or used volunteer health practitioners to respond to declared emergencies Option B was intended to provide similar liability protections to volunteers as Option A but without creating a new body of law to articulate these principles After extensive discussion of these alternatives at an October 2006 meeting of the Drafting Committee hosted by the American Red Cross in Washington DC a decision was made to utilize an approach that expressly codi ed and de ned the extent to which host states and volunteer health practitioners may be held liable The Drafting Committee concluded that clear and explicit rules were preferable to the incorporation by reference of another body of law that might not be clearly understood or uniformly applied in the absence of its more careful explication This draft presents a modi ed and somewhat improved version of Option A as presented in the September 2006 Discussion Draft 3 The September 2006 Discussion Draft also addressed the issue of workers compensation coverage for volunteer health practitioners It provided that the host state must afford workers compensation coverage to volunteer health practitioners that are not covered by workers compensation insurance or other comparable coverage during their deployment and service as volunteers Lacking detailed input from state emergency management and budgetary officials the Drafting Committee decided at its October 2006 meeting to prepare a revised set of amendments that presents three options for providing workers compensation coverage to volunteer health practitioners These options include 1 treating volunteer health practitioners in all circumstances as employees of host states for workers compensation coverage 2 treating volunteer health practitioners as employees of host states for workers compensation coverage only if the volunteers do not have access to alternative sources of coverage or 3 providing volunteer health practitioners who do have access to alternative sources of coverage as state employees but limiting workers compensation coverage to the costs of health care services thus excluding for example indemnification for lost earning capacity as typically provided via workers compensation The goal in providing these alternatives is to solicit comments and recommendations from state of cials disaster relief organizations volunteers and others affected by the UEVHPA on the most effective and practical approach to providing workers compensation coverage 20 21 22 23 UNIFORM EMERGENCY VOLUNTEER HEALTH SERVICES ACT SECTION 11 CIVIL LIABILITY FOR VOLUNTEER HEALTH PRACTITIONERS VICARIOUS LIABILITY a In this section 1 Coordinating entity means an entity that acts as a liaison to facilitate communication and cooperation between source and host entities but does not provide health or veterinary services in the ordinary course of its activities as liaison 2 Source entity means a person located in this or another state that employs or uses the services of volunteer health practitioners authorized to provide health or veterinary services pursuant to this act b Subject to subsection c volunteer health practitioners authorized to provide health or veterinary services pursuant to this act are not responsible for the payment of a judgment based on their acts or omissions in providing the services nor shall they be named as defendants in an action based on such acts or omissions c Notwithstanding subsection b this section does not apply to l willful wanton grossly negligent reckless or criminal conduct of or an intentional tort committed by a volunteer health practitioner 2 an action brought against a volunteer health practitioner A for damages for breach of contract other than for contracts related to the provision of health or veterinary services B by a source or host entity or C relating to the operation of a motor vehicle vessel aircraft or other 5 16 17 18 20 21 23 24 26 27 29 30 vehicle by a volunteer health practitioner for which this state requires the operator to have a valid operator s license or to maintain liability insurance other than an ambulance or other emergency response vehicle vessel or aircraft operated by a volunteer health practitioner responding to a request for health or veterinary services or transporting a patient d Source coordinating and host entities are not vicariously liable for the acts or omissions of volunteer health practitioners in providing health or veterinary services authorized pursuant to this act e Source coordinating and host entities are not liable for civil damages for the operation of or reliance upon information provided by a registration system unless the acts or omissions constitute an intentional tort or are willful wanton grossly negligent reckless or criminal in nature f Notwithstanding subsection b volunteer health practitioners shall be considered agents or employees of this state under the cite to state tort claims act for purposes of recovering damages from the state based on their acts or omissions in providing health or veterinary services pursuant to this act Legislative Note Subsection 0 should be revised as necessary based upon the provisions of the state s tort claims act to provide for the award of damages by the state to individuals injured as a result of the negligent actions of volunteer health practitioners and to ensure that volunteer health practitioners will not be personally responsible for the payment of civil damages as provided by subsection b Comment All states through the adoption of EMAC have accepted the dual propositions that 1 governmental health practitioners providing interstate assistance in responding to declared emergencies should enjoy limited protections from tort liability and 2 persons injured by governmental health practitioners should have some reasonable ability to pursue tort claims to redress their injuries suffered as a result of acts of professional malpractice Article VI of EMAC provides that officers or employees of a party state rendering aid in another state pursuant to the compact are considered agents of the requesting state for tort liability and 6 oo cNmbUJNH immunity purposes and provides that no party state or its officers or employees rendering aid in another state pursuant to the compact shall be liable on account of any act or omission in good faith on the part of such forces while so engaged or on account of the maintenance or use of any equipment or supplies in connection therewith The compact de nes good faith to not include willful misconduct gross negligence or recklessness These provisions of EMAC generally apply however only to state employees deployed on an interstate basis in response to declared emergencies While some states have expanded these protections to local government employees incorporated into state forces pursuant to mutual aid agreements with very limited exceptions private sector volunteers and disaster relief organizations do not enjoy the same protections and privileges provided by EMAC The proposed amendments adding Section 11 to the UEVHPA apply policies similar to those established by Article VI of EMAC to volunteer health practitioners and organizations engaged in the deployment and use of these volunteers The rationale is that private sector volunteers and organizations providing vital health services during emergencies deserve the same protections and privileges as states and public employees whose resources and efforts they supplement and complement While historically many private sector volunteer health practitioners have responded to emergencies regardless of their potential exposure to civil liability volunteers and disaster relief organizations have consistently identified fears regarding potential exposure to liability claims as a major source of concern and anxiety when engaged in disaster relief activities see discussion above in the Prefatory Notes Many trained volunteers may not serve at all if liability protections do not exist In addition fears of exposure to tort claims have often limited the extent of health services provided Section 11 was drafted on the fundamental premise that all volunteer health practitioners responding to declared emergencies should be treated similarly regardless of whether they are compensated state and local employees or private volunteers providing their services without charges to the host state Coextensively Section 11 seeks to ensure that all persons injured as a result of the negligent or criminal acts or omissions of volunteer health services in providing health services have some access to tort claims regardless of whether they were treated by state or local employees or private sector volunteers Subsection a of Section 11 provides two critical definitions of terms used only in the section and not in other provisions of the UEVHPA namely coordinating entity and source entity A coordinating entity facilitates the deployment of volunteer health practitioners during an emergency Its functions may entail coordination referral or transportation of volunteer health practitioners between the source and host entities or it may simply deal with host entities For example a state ESARVHP program may serve as a coordinating entity during an emergency by helping to deploy volunteer health practitioners to a host entity As well non entities e g hospitals charities churches may help facilitate the use of volunteer health practitioners without actually hosting them to provide health or veterinary services The purpose for defining this term is to recognize the important role of coordinating entities in helping to provide registered volunteers during emergencies thus limiting the potential for 7 oo cNmbUJNH spontaneous voluntarism and extend to these entities liability protections pursuant to subsection 6 A source entity is an entity that employs or uses the services of volunteer health practitioners during nonemergencies authorized to provide health or veterinary services pursuant to this act In other words source entities are the existing employers of volunteer health practitioners or the entity in which the practitioner typically provides health services in nonemergencies Source entities may deploy volunteer health practitioners directly or via a coordinating entity to a host entity during an emergency Source entities are not typically engaged in the oversight or management of volunteer health practitioners during a declared emergency and do not retain responsibility to verify the licensure status and good standing of the volunteers who provide health or veterinary services Subsection b provides that volunteer health practitioners that are authorized to provide health or veterinary services pursuant to the UEVHPA are not responsible for the payment of a judgment based on their acts or omissions in providing the services and may not be named as defendants in an action based on such acts or omissions As used in this section health or veterinary services encompass the provision of services that provide a direct health bene t to individuals or human populations or to animals or animal populations These services may also include healthrelated activities that allow for the efficient provision of health or veterinary services Examples include assistance in patient care where support staff are unavailable e g transporting a patient in the immediate vicinity where health services are being provided and other activities that may be outside the typical scope of health or veterinary services but are still conducive to the provision of patient care Healthrelated services are distinguishable from services that are of a nonhealthrelated nature and afford no direct health benefit to individuals or populations e g the operation of a nonemergency motor vehicle administrative services Whether a service is healthrelated or nonhealthrelated will depend largely on the circumstances and consideration for whether the acts or omissions are integral to the provision of direct health benefits Subsection c provides exceptions to the protections from liability provided to volunteer health practitioners under subsection b A volunteer health practitioner may be liable l for engaging in willful wanton grossly negligent reckless or criminal conduct or for committing an intentional tort 2 in an action for damages for breach of contract or an action brought by a source or host entity other than for contracts related to the provision of health or veterinary services and 3 for the operation of a motor vehicle or other craft for which the state requires the volunteer to hold a valid license or maintain liability insurance other than an ambulance or other emergency response vehicle vessel or aircraft operated by a volunteer health practitioner responding to a request for health or veterinary services or transporting a patient These exceptions may include situations in which a volunteer health practitioner exceeds the scope of practice requirements in the course of providing health or veterinary services For example a lab technician will be deemed to have exceeded the scope of practice of a similarly situated practitioner by performing surgery on an individual A lack of education training and licensure will often be suf cient to constitute at the very least grossly negligent conduct pursuant to Subsection cl The fact that a volunteer practitioner exceeds the scope of practice however 8 oo cNmbUJNH does not of itself constitute conduct for which liability protection is unavailable Subsection cl restates the common exceptions to liability protections found in many volunteer protection acts and other acts for that matter Thus if a volunteer health practitioner acts in a willful wanton grossly negligent or reckless way engages in criminal conduct or commits an intentional tort the practitioner does not enjoy any protection from relevant liability claims brought against the practitioner stemming from this conduct Subsection c2A exempts breaches of contract from the protection provided by subsection b other than for contracts related to the provision of health or veterinary services At its core subsection b provides protection for personal liability arising from the provision of health or veterinary services It does not protect a volunteer health practitioner from liability for actions based in contract except for contracts related to the provision of health or veterinary services Thus if a volunteer health practitioner executes a valid contract to provide health services the obligations imposed by that contract during nonemergencies may only be avoided if there is a valid excuse under the law governing the contract For example in Sullivan v O Connor 363 Mass 579 296 NE 2d 183 Mass 1973 a doctor was found by ajury to have promised a particular result and was held liable for breach of contract even though the jury determined that he had not committed malpractice As constructed Subsection c2A provides protection to the doctor for the contract claim but not for contractual obligations unrelated to the provision of health or veterinary services Subsection c2B provides that a volunteer health practitioner is not afforded civil liability protection for an action brought by a source or host entity This section is meant to ensure that direct claims against a volunteer health practitioner by a source or host entity are not foreclosed simply because the person is acting as a volunteer It provides an avenue for source and host entities to seek redress against a volunteer health practitioner for misconduct that may not necessarily have a direct health effect on individuals or populations Examples may include mismanagement of materials during a response effort or conversion of property or goods provided for the sole purpose of distribution to affected individuals or populations of an emergency Such claims by the source or host entity against the volunteer health practitioner are allowed pursuant to Subsection c2b and Subsection cl if the volunteer s actions constitute a crime or other willful misconduct Subsection c2b is not intended however to be an avenue for thirdparty claims that might indirectly expose the practitioner to the type of liability for which subsection b is intended to provide protection For example a plaintiff might file a claim against a hospital as a host entity for negligent supervision of a volunteer health practitioner In response the hospital might file a thirdparty claim against the practitioner So long as the practitioner s conduct was not within Subsection c the practitioner would not be liable to the hospital Section c2C exempts civil liability protections for injuries resulting from the operation of a nonemergency vehicle for which the host state requires the operator to hold a valid operator s license or maintain liability insurance other than an ambulance or other emergency response vehicle vessel or aircraft operated by a volunteer health practitioner responding to a request for health or veterinary services or transporting a patient This provision 9 oo cNmbUJNH is consistent with federal statutes that provide certain exceptions to civil liability protections afforded to volunteers e g the federal Volunteer Protection Act 42 USCS l4503a4 The intent is to preclude liability protections for actions of volunteer health practitioners that are outside their scope of responsibilities as volunteers Thus a volunteer health practitioner driving an ambulance or other emergency vehicle transporting patients to a triage site is acting within the scope of his responsibilities and may not be found liable for injuries resulting from a vehicular accident provided he did not act willfully or engage in other misconduct The same practitioner who finishes a shift as a volunteer at a host entity and has a vehicular accident driving across town later that evening to eat out at a restaurant is liable for damages caused by the negligent operation of the vehicle Subsection d provides vicarious liability protection for source coordinating and host entities for acts or omissions of their volunteer health practitioners These entities are often concerned about their potential liability in the deployment or use of volunteer health practitioners during emergencies To alleviate these concerns and thereby facilitate the full use of volunteer health practitioners Subsection d provides comprehensive protection from vicarious liability As discussed below such protections are consistent with the legal nature of vicarious liability Vicarious civil liability applies when an employer is responsible for the torts of its employees or agents despite the fact that the employer itself may not have engaged in any negligent activities Liability under this doctrine can attach pursuant to the theories of respondeat superior and ostensible agency Respondeat superior provides for vicarious liability when a negligent health provider is an employee or an agent of an entity and has acted in the course of the employment The theory presumes than the employer has control over and is therefore responsible for the acts of its employees The extent of civil liability in such circumstances depends on the level of control exerted by the employer over the actions of the employee In most jurisdictions the employer will only be liable for acts of the employee undertaken within the scope of employment Hospitals for example may be held liable for the acts of nurses residents interns and certain behavioral health professionals since these health practitioners are often considered employees Similarly a physician who exercises control and authority over other health practitioners e g nurses supporting staff etc can be held liable for their negligence In one case a surgeon was vicariously liable for an error in a sponge count performed by the nursing staff after surgery although the surgeon did not participate in the count Johnson v SouthwestLouisiana Ass 71 693 So2d 1195 La Appl 1997 holding that the surgeon had a nondelegable duty to remove sponges from the patient s body The primary issue in applying respondeat superior is whether an individual is a servant e g employee subject to the control of the master e g employer or an independent contractor The employer s right to control is what distinguishes an employee from an independent contractor Typically entities are not held liable for the negligent actions of independent contractors Therefore during an emergency a hospital would not be vicariously liable for the acts or omissions of a volunteer health practitioner that provides health services to individuals or populations within the hospital provided that the volunteers were looked upon as 10 oo cNmbUJNH 37 38 39 independent contractors and not as agents of the hospital The theory of ostensible or apparent agency imputes liability to entities where l the patient looks to the entity rather than the individual health practitioner to provide care and 2 the entity holds the health practitioner out as its employee Civil liability under the theory of ostensible agency is particularly relevant in emergency situations When a patient enters the emergency room he generally looks to the institution to provide him with care and has no knowledge of the nature of the employment relationship between the physician and the hospital Moreover by permitting the physician to practice in the emergency room the hospital is holding out that individual as its employee This scenario may not be applicable during an emergency for a number of reasons First the host entity is not expected to exert the same degree of control over the health practitioner tantamount to the normal operations of an emergency room Also volunteer health practitioners are not agents of an entity where no employment relationship exists between the entity and the practitioners and where they are not presented as providing health services pursuant to a legal obligation e g a duty to perform under a contract Subsection e clari es that source coordinating and host entities are not liable for civil damages for acts or omissions relating to the operation or use of or reliance upon information provided by a registration system This provision supports the essential roles of these entities in the operation and use of registration systems and the critical need for these systems to effectively respond to emergencies Provided that the acts or omissions that may lead to liability do not constitute an intentional tort or are not willful wanton grossly negligent reckless or criminal in nature entities shall not be civilly liable Subsection f provides protections for individuals injured by acts or omissions of volunteer health practitioners by providing that the state will compensate the individuals for injuries suffered to the same extent that the state would be liable for the actions of state employees under state tort claims acts This subsection is intended to authorize an action to recover damages solely against the state It does not authorize a right of indemnification by the state against the volunteer health practitioner or the filing of any action directly against the volunteer practitioner to the extent otherwise prohibited or restricted by this section SECTION 12 WORKERS COMPENSATION COVERAGE Option A A volunteer health practitioner who is providing health or veterinary services in this state pursuant to this act or who is traveling to or from this state to provide such services shall be considered an employee of this state for purposes of workers compensation coverage concerning any injury or death incurred in traveling or providing the services Workers compensation ll 19 20 21 23 24 26 27 bene ts for volunteer health practitioners are limited to those bene ts provided to state employees under the laws of this state Option B A volunteer health practitioner who is providing health or veterinary services in this state pursuant to this act or who is traveling to or from this state to provide such services and who is not otherwise covered by workers compensation insurance shall be considered an employee of this state for purposes of workers compensation coverage concerning any injury or death incurred in traveling or providing services Workers compensation bene ts for volunteer health practitioners are limited to those bene ts provided to state employees under the laws of this state Option C A volunteer health practitioner who is providing health or veterinary services in this state pursuant to this act or who is traveling to or from this state to provide such services and who is not otherwise covered by workers compensation insurance shall be considered an employee of this state for purposes of any medical workers compensation bene ts concerning any injury incurred in traveling or providing the services Bene ts for volunteer health practitioners are limited to those medical bene ts provided to state employees as part of their workers compensation bene ts under the laws of this state Comment Section 12 sets forth three options for providing workers compensation coverage to volunteer health practitioners Workers compensation is a nofault system that provides an expeditious resolution of workrelated claims Injured workers relinquish their right to bring an action against employers in exchange for xed bene ts This social welfare system is convenient to the employer by allowing for a predictable and estimable award It is also in the interests of the workers since they are not required to demonstrate who is at fault rather a worker must only demonstrate that the injury suffered arose out of or in the course of employment Workers compensation programs thus protect employees from the harms or deaths they incur in the scope of their services However most workers compensation systems 12 oo cNmbUJNH have a major limitation they do not typically cover the activities of volunteers namely because they are not de ned as employees or are acting outside the scope of their employment when volunteering Over 40 states have statutorily extended workers compensation coverage to emergency volunteers principally through emergency or public health emergency laws Emergency System for Advance Registration of Volunteer Health Professionals ESARVHP 7 Legal and Regulatory Issues Presentation prepared by the Center for Law and the Public s Health at Georgetown and Johns Hopkins Universities for the Department of Health and Human Services Health Resources and Services Administration This coverage however may be limited to for example public sector volunteers volunteers who are responding solely at the bequest of a state or local government or volunteers working under the close direction of state or local governments in other jurisdictions Alaska for example provides that any resident engaged as a civilian volunteer in an emergency or disaster relief function in another state or country who suffers injury or death while providing emergency or disaster relief services is considered an employee of the state AS 2330244a Consistent with Options B and C noted above coverage does not extend to volunteers who are otherwise covered by an employer s workers compensation insurance policy or selfinsurance certi cate AS 2330244a3 Who may constitute a volunteer varies from state to state and may not include private sector volunteer health practitioners For example workers compensation coverage is provided in Kentucky pursuant to its mutual aid agreements with other states Such protections extend to emergency management personnel paid or volunteer working for the state or local government KRS 39A2603 4 Similarly in Utah volunteer health practitioners deemed government ie public sector employees would receive workers compensation medical benefits as the exclusive remedy for all injuries suffered UCA 1953 67203la In these states coverage is thus limited to public sector employees working for the state or local governments There is no indication that these protections would be afforded private sector volunteers In sum whether workers compensation coverage for emergency volunteers under state emergency or public health emergency law extends to volunteer health practitioners as defined in the UEVHPA varies across jurisdictions Section 12 provides clearer avenues of redress for injuries incurred by volunteer health practitioners providing health or veterinary services during an emergency Although volunteer health practitioners are not employees in the traditional sense they may be exposed to many of the same risks of harm that are faced by employees of the host entity state or local governments or other employers in the course of providing health or veterinary services during an emergency Each of the Options A B and C treat volunteer health practitioners as employees of the host state for purposes of workers compensation claims This approach has the advantage of treating all volunteers equally and avoiding difficult issues associated with determining whether and to what extent the workers compensation systems of source states provide coverage for volunteers While superficially this approach may appear to expose host states to greater costs expenses associated with paying workers compensation claims of this type during declared emergencies may potentially be submitted for federal reimbursement under the federal Robert T 13 oo cNmbUJNH Stafford Disaster Relief and Emergency Assistance Act 42 USC 51215206 2002 In addition by treating all volunteers as employees of the host state Options AC avoid potential tort claims being asserted against the state such as those currently being litigated in the consolidated World Trade Center Disaster Site Litigation In this case more than 3000 recovery workers have sought to recover damages from the City of New York and the Port Authority on the grounds that the defendants failed to properly enforce site safety standards relating to the use of respirators On October 17 2006 Judge Alvin Hellerstein of the Southern District of New York denied preliminary objections seeking to dismiss these claims on the grounds that the defendants were immune from the claims under various disaster management laws and as agents of governmental authorities entitled to assert governmental immunity By expressly treating volunteer health practitioners as state employees and applying workers compensation laws to such employees Options AC may preclude the future assertion of such claims if brought by volunteer health practitioners while guaranteeing injured volunteers access to health care and compensation for lost wages and earning capacity Option A is based upon the laws of several states that require the state government to provide some coverage for the actions of volunteer health practitioners For example Wisconsin extends the definition of employee for workers compensation purposes to include all emergency management workers even if they are volunteers provided they have registered with the state s emergency management program Wis Stat 10207 16603 amp 166215 Connecticut Illinois and Ohio provide similar protections to volunteers responding to emergencies Conn Gen Stat 281 2814 20 Ill Comp Stat 330510 Ohio Rev Code Ann 412301 amp 4122033 Similarly Washington State provides workers compensation coverage to volunteer emergency workers while registered with an approved emergency management organization if injured in the course of performing volunteer duties Wash Admin Code 11804080 Maryland provides similar protections to civil defense workers Md Labor and Employment 9232 Minnesota provides workers compensation coverage to any volunteer registered with state or local government agencies Minn Stat 1222 subd 2a Option B provides that the host state must afford workers compensation coverage to volunteer health practitioners that are not covered by workers compensation insurance or other comparable coverage during their deployment Under this option a volunteer health practitioner that has no other source of insurance for workrelated injuries or death is entitled to the same workers compensation benefits as employees of the state Accordingly the host state s law governs the grant of any workers compensation award to a volunteer and determines whether an employer rather than the state is mandated to provide workers compensation coverage This section is not intended to supplant the workers compensation benefits that would otherwise be available to volunteer health practitioners provided by an entity or other person in the host state or the state from where they were deployed Some employers for example may extend their workers compensation benefits to their employees who choose to volunteer outside the employer workplace during an emergency In addition some state laws may mandate workers compensation coverage for individuals even when providing voluntary service away from their regular place of employment Option B is only meant to afford workers compensation coverage when no other coverage applies and is not intended to allow redress for volunteer health practitioners who may attempt to circumvent the exclusive remedy provisions of workers 14 oo cNmbUJNH compensation to pursue tort suits against a host entity Option C is similar to Option B but provides coverage only for the costs of medical care Option C is based upon provisions of Texas statutory law that provides medical bene ts but not lost wages or disability bene ts for injuries sustained by volunteers not otherwise provided with workers compensation bene ts TeX Lab Code 501026 The purpose of Option C is to provide states with an option of providing some limited workers compensation bene ts for volunteer health practitioners so as to better control for potential costs In short to the extent that Option C provides limited workers compensation bene ts to volunteers it may be more economical for states considering their potential actuarial risks It may also however present a less favorable legal environment to attract skilled volunteer health practitioners to assist during emergencies as contrasted with Options A or B Statistics 102 Beta andRegressian Spring 2000 1 Beta and Regression Administrative Items Getting help See me Monday 3530 or tomorrow after 230 Send me an email with your question stinewharton Visit the StatLabTAS particularly for help using the computer Problem set 5 Last problem set Preparations for the project Makeup Exam Next week Be there or keep the zero you now have Adjusting for Risk in Investments What s up with the stock market Dice assignment in Stat 101 Returns on several simulated investments Avg Annual Return SD of Return White 0 5 cash Green 75 20 market Red 71 1 3 0 internet Which of these investments worked out for your group What about the mixed investment called Pink which is half in Red and half in White Avg Annual Return SD of Return Pink 355 65 portfolio Effects of variation Variability in returns is expensive Start with an initial wealth W0 of 100 Assume return is up 10 on one day and down by 10 the next Where do you end up after a few days S 111 1 S1 01 S 099 9 losing 1 two days or losing 12 every day Statistics 102 Beta andRegression 2 Spring 2000 Riskadjusted return The riskadjusted value of an investment with stochastic returns R ie the returns change from day to day is often calculated as some will do this differently Value ER VarR2 The second term is often called the volatility drag on the returns Background optional For the interested ones here s an outline justifying the previous expression Again start with wealth W0 If we denote the return on your investment on the i3911 day as R then your wealth after n days is Wn W0 1R11R2 1Rn Rather than work with products convert this to sums by taking logs log Wn log W0 2 log HR and use the Taylor approximation log1X z X XzZ to get log Wn z log W0 2 Ri R2 2 which has long run average E log Wn z log W0 n ER VarRi2 So you want to maximize your expected wealth you want to maximize E Ri VarRi2 Analysis of the dice assignment Returns on several simulated investments Avg Annual Return SD of Return Adjusted White 0 5 0 Green 75 20 55 Red 71 130 135 What about the mixed investment which is half in Red and half in White Avg Annual Return SD of Return Adjusted Pink 355 65 144 Statistics 102 Beta andRegressian Spring 2000 3 P 0 rt f 0 Ho 5 As in the dice example one often combines individually unappealing investments here white and red to form an investment that is attractive The trick is to decide how much of your money to invest Regression offers the key to figuring this out because there is something very fishy about the previous experiment with dice that does not hold up in financial markets What s really artificial Looking at Some Real Returns Correlations over time for several stocks Note that all of the stock returns even those from companies in very different industries are positively correlated StockRetjmp data file Correlations Variable Sears KMart Penney Mcdonalds Citicorp Sears 1000 0603 0631 0584 0525 KMart 0603 1000 0682 0429 0357 Penney 0631 0682 1000 0533 0419 Mcdonalds 0584 0429 0533 1000 0491 Citicorp 0525 0357 0419 0491 1000 Artificial in the dice example Returns are close to independent over time but there is something else arti cial about the dice example Investing in a Market of One P r0 b l e In How much pink should you buy Investing in one security What share 7 of your wealth should you invest in a risky asset holding the rest in cash Statistics 102 Beta andRegressian Spring 2000 4 Goal with one asset One objective is to maximize your longterm wealth The average gain with sharey is Ey R Vary R2 y ER 72 VarR2 which we can maximize as a function of the share finding the maximum long term value is obtained with y ERVarR Example for investing in the SampP 500 Using the monthly valueweighted returns over the 19 year period 19761994 we get this summary for the SampP 500 Normal Quantile with mean 00116 and variance 0019 This return is not adjusted for in ation however and removing the riskfree return 0059 gives an in ation adjusted return of 0057 Divided by the variance you get a net share of y ERVarR 00570019 3 Yup it would have been nice to have been leveraged in the market Things are not so impressive though if you adjust for the cost of having to borrow that money that you just used to invest further in the market Statistics 102 Beta andRegressian Spring 2000 5 Making a Small Portfolio P r 0 b l e m We start with initial wealth say 1 at time 0 We can invest in two securities with returns R1 and R2 Portfolios with the dice looked pretty good but those were independent How much of my current wealth should I put into each investment In other words for the dice why put half in red and half in white And how do you decide this when red and white are not independent Special trick The investment shares 7 and are simply 71 ER1VarR1 and y ERzVarRz when the investments are uncorrelated So how can we make investments uncorrelated Use regression Two investments For an example consider investing either in Amazon or the SampP 500 Here s a plot of their value thanks to Yahoo for the last couple of years on a log scale 10000 wooW 100 1 39 39 I 39 I 39 I 39 I 39 I 0 100 200 300 400 500 600 700 Day with Amazon in red and the SampP in green at the top very at Statistics 102 Beta andRegressian Spring 2000 6 R e t u r n s The next plot shows the daily returns for the same two Now you can start to see some of the volatility in the returns for Amazon l I a I VI Iquot quot1 l quotI I Li l l v H wlu w quot lli I I 39 I 39 I 39 I 39 0 100 200 300 400 500 600 700 Day Some days Amazon was up 20 some days down 20 Here are the associated summary statistics neither adjusted for in ation Amazon SampP500 020 c 010 020 Mean 00068 Mean 000090 Variance 00038 Variance 000015 However these two are correlated as seen in the next plot so we cannot set our investment shares based on these summaries alone Statistics 102 Beta andRegressian Spring 2000 7 R e g r e s 5 io n Regression allows us to easily construct two investments Here s a plot of Amazon on the SampP 020 quot 39 R Amzn 020 I I I I I 007 005 003 001 01 03 05 RSP500 and the associated bivariate ie single predictor regression results R Amzn 000518 18332 R SP500 RSquare 0138 Root Mean Square Error 0057 Mean of Response 0007 Observations or Sum Wgts 725 Parameter Estimates Term Estimate Std Error tRatio Probgtt Intercept 00052 00021 244 00151 RSP500 18332 01707 1074 lt0001 Residuals are not correlated with the predictor Synthetic investments Statistics 102 Beta andRegressian Spring 2000 8 Next Class on Wednesday Simple regression in finance The term beta in finance refers to a regression coefficient On Weds we ll take a look at what makes this coefficient so interesting Statistics 102 Introduction to Regression Spring 2000 I Introduction to Regression Administrative Items Getting help See me Monday 3530 Wednesday from 4530 or make an appointment Send an email to stinewharton Visit the StatLabT As particularly for help using the computer Midterm Exam 2 Tuesday April 4 from 68 pm in Annenberg 110 Assignment 4 Due Wednesday March 29 Questions about Anova How do you find Tukey intervals in a twoway anova It depends upon which means you are making a comparison between For example in the gummy bear experiment are you comparing a the six means based on the 2 positions and 3 elevations b the three means for elevation overall or c the 2 means for positions Here s the approach for each Always keep the conceptual version of the formula for the intervals in mind quot2 39 39 0 dz erencem means qanumber ofmeans compared error df Abservanom for mean First off no matter which situationthe estimate for sigmahat squared is always the mean squared error term from the anova table a Use qm6MSE with 4 observations per mean b Use qm3MSE with 8 observations per mean c You don t need multiple comparisons in this case since you only are comparing two means this is the usual ttest The example on the next page illustrates the calculations with output from an experiment done in class Challenge Question Why do you think that the value labeled q in JlVlP s TukeyKramer output differs by the square root of 2 from that found in the table in the textbook Statistics 102 Introduction to Regression Spring 2000 2 Example Gummy Bear Anova Purpose of the experiment What s the right combination of elevation and position for launching gummy bears the farthest Learn that the error terms in an anova measure the effect of other factors often weird things that happen when you try to repeat the experiment under the same conditions Initial thoughts on data quality Was the experiment done in a way that makes the assumptions of anova reasonable Initial data analysis 6 30 40 50 60 70 80 90100110120 120 1 L 110 I 110 I I 10039 100 90 I 90 I g 80 g 30 l E 7039 3 70 D 60 E D 60 39 50 50 4039 40 30 l I 30 1 Book 5 Books 9 Books I Back Front Should the marginal overall distribution of distance be normal Does the plot of distance by frontback indicate a lack of constant variance Statistics 102 Introduction to Regression Spring 2000 3 Anova output Effect Test Source Nparm DF Sum of Squares F Ratio ProbgtF Position 1 1 5900 69 00169 Elevation 2 2 67271 395 lt0001 PositionElevation 2 2 8601 51 00181 Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model 5 81772 163544 192185 Error 18 15318 8510 ProbgtF C Total 23 97090 lt0001 110 g ackt ron g 90 9 a 70 0 5 E 50 CI 30 I 1 Book 5 Books 9 Books Elevation Is interaction present What would it mean if it is is not The F test indicates significant interaction implying that the differences between front to back for example changes as the elevation changes You can see this effect most clearly in the profile plot I U the combination that went farthest significantly better than the 0 th e r s This requires Hsu s comparison from JlVlP Front with 5 books beats the three lowest combinations but not those at 9 books or back with 5 books MeaniMeanjLSD Front5 Back9 B Front9 Back5 B Front1 Back1 B Front5 157 147 77 20 168 358 Back9 B 167 157 87 10 158 348 Front9 237 227 157 60 88 278 Back5 B 335 325 255 157 10 180 Front1 482 472 402 305 157 33 Back1 B 672 662 592 495 347 157 If a column has any positive values the mean is significantly less than the max Statistics 102 Spring 2000 Are the means Means and Std Deviations Introduction to Regression 4 for frontback different for each elevation Level Number Mean Std Dev Std Err Mean Back1 Book 4 4800 1166 583 Back5 Books 4 8175 236 118 Back9 Books 4 9850 1271 636 Front1 Book 4 6700 668 334 Front5 Book 4 9950 810 405 Front9 Book 4 9150 985 492 For the comparing back to front at 1 book the Tukey interval is 67 48r go 624 6 85 19 r 449461 19 r 207 Notice that the width factor in this interval 207 is bigger than the width factor used in Hsu s comparisons 157 found on the diagonal on Hsu s output How would you compare elevation ignoring interaction Level Number 1 Book 8 5 Books 8 9 Books 8 Mean 5750 9062 9500 Std Dev Std Err Mean 1344 1098 1117 475 388 395 In this case the Tukey interval for comparing e g l to 5 books is 906 575iq05318 8518 331 361326 331 118 and the difference is significant Notice that the MSE from the overall anova is used but that the constant q and divisor under the square root change Overall conclusion Statistics 102 Introduction to Regression Spring 2000 5 Modeling Relationships Some interesting questions 1 How much should you expect to pay for a diamond 2 What level of promotion maximizes profit 3 Where are levels of cellular telephone use headed Pricing diamonds What factors are important in determining the price of a diamond How are these factors related to the price of the diamond Graphically Can you use this relationship to price diamonds Diamond data from web page Prices in Singapore s for diamonds sold retail in 1990 How would you describe the relationship between diamond size measured in carats and price Would you feel comfortable extrapolating this relationship to very small eg industrial diamonds or exceptionally large stones e g Hope diamond Linear equations The diamond relationship is well characterized by a linear relationship We will write these in slopeintercept form as YZB0B1X where the constant 30 is the intercept point where the line intersects the Y axis and 31 is the slope change in Ychange in X J MP analysis With two continuous variables JlVlP s Fit Y by X shows a scatterplot and we can add a summary line Statistics 102 Introduction to Regression Spring 2000 6 Modeling Nonlinear Relationships C u rV a t u r e Linear is the most convenient relationship but is not guaranteed to hold in all problems Other types of relations are generically termed nonlinear or curved Can you think of some that are nonlinear eg in economics finance Examples of nonlinear relationships a Promotion effects b Cellular telephone subscription rates Effect of display space on liquor sales page 12 of casebook How much shelf space is needed for a new product Plot of sales on number of shelffeet used in display Nonlinear decreasing returns to scale Smoothing splines help you Visually detect the presence of curvature Tukey s bulging rule is a graphical device for recognizing which transformations of X and Y helpp 15 of regression casebook Calculus reveals optimal shelf space baseline of 50foot the optimal amount of display space is not to use the most display space because of substitution effects Predicting cellular phone use page 29 of casebook How many subscribers are expected by the end of this year Time series Remarkable pattern misses a lot Lesson to learn Plot should be linear on the transformed scale Statistics 102 Introduction to Regression Spring 2000 7 1100 1000 900 800 700 600 500 Price Singapore dollars 400 300 I I 10 15 20 25 30 35 Weight carats Price Singapore dollars 25963 372102 Weight carats 400 350 300 250 Sales 200 150 100 50 0 I I I I I 0 1 2 3 4 5 6 7 8 Display Feet Linear Fit Transformed Fit to Log Sales 835603 138621 LogDisplay Feet Statistics 102 Introduction to Regression 8 Spring 2000 OUUUUUU I 80 30000000 70 25000000 60 39 2 r 3 20000000 E 50 a 39 3 15000000 40 c 3 39 10000000 g 30 0 5000000 20 0 39i I I I 10 I I 0 5 10 15 20 25 0 1o 15 20 25 Period Period Lecture 22 Logistic Multiple Regression STAT 102 o Logistic Regression is suitable for situations in which the Y variable is categorical rather than continuous 0 We will study only the most common setting in which the Y variable is binary ie has only two possible values 0 Lecture 21 discussed the situation having one continuous X variable as a possible predictor 0 We now turn to some examples involving more than one X variable 0 The Xvariables may be either continuous or categorical BUT 0 Settings in which all Xvariables are categorical have some special features and will be discussed in Lecture 23 Two Examples 0 This Lecture analyzes two separate examples 0 If we do not have time in lecture to completely cover both you should still read and understand both Example 1 Prediction of the Risk of Coronary Heart Disease Example 2 March Madness How well do the seedings predict who will win Is there a difference in this regard between the Men s Tournament and the Women s Example 1 Prediction of the Risk of Coronary Heart Disease DATA The data for this example is taken from the Framingham Heart Study 0 This study was an early large public health study designed to investigate risk factors for Coronary Heart Disease CHD and to study the course of that disease when it occurs 0 Our data consists of health variables measured for a population of 1408 adult health professionals age 4562 0 These variables were measured at the start of the study in the 50s 0 At that time these subjects were also given an intensive and expensive series of examinations of coronary health status 0 Other tests were given and the patients were followedup at regular intervals over their lifetime but we will look only at information from the initial intake exam Variables in our Data All variables measured at initial exam 0 Heart Disease This is our Yvariable A 01 indicator as to whether the initial intensive exam found evidence of CHD 0 Age 0 SBP Systolic blood Pressure 0 DBP Diastolic Blood Pressure 0 CHOL Cholesterol level 0 FRW A measure of weight adjusted for patient s height and age 0 CIG Number of cigarettes smokedday Self Report 0 Sex Logistic Regression Model 0 Goal is to estimate px Pr Y 0x 0 Here X represents the independent variables in the model 0 The assumed form of pX is as follows 6 pX 2 m9 16 Where 6 e 0 Here 6 is a general symbol used to denote the value of a linear function of the independent variables 0 Thus if the model has m quantitative factors and one qualitative factor Sex the model for 6 is 6 30 lx1 mxm 3W 0 Be careful JMP defines p to correspond to the value Y 0 Most other software defines it to correspond to Y l Analysis 0 JMP performs the needed analyses 0 It provides estimates b0bm of the logistic regression intercept and slope coef cients ow m o and estimates b of the logistic ANOVA coef cients 5w JMP also produces tables needed for hypothesis tests 0 The estimates produced by JMP are called Maximum Likelihood Estimates 0 Maximum Likelihood is a general method for producing desirable estimates in many statistical settings Sex Basic Output from a model with all available factors Whole Model Test Model LogLikeihood DF ChiSquare ProbgtChiSq Difference 63094 7 1261882 lt0001 Full 671572 Reduced 734666 RSquare U 00859 Observations 1393 o In the whole model test table ChiSquare 2gtlt LogLikeIihood For this use the Chisquared table with 7 df o This tests H0All regression slopes amp factor effects 0 o n 1406 but Observations 1393 because 13 patients have some missing data and are automatically excluded from the JMP analysis 0 We will exclude all these patients from future analyses 0 Like in usual multiple regression RSquare U 63094 734666 Effect Likelihood Ratio Tests Source DF LR ChiSquare ProbgtChiSq AGE 1 17613 lt0001 SBP 1 14684 00001 DBP 1 0142 07060 CHOL 1 8733 00031 FRW 1 2023 01549 CIG 1 4042 00444 SEX 1 34048 lt0001 o DF here is like in usual tables 1 DF for each regression coefficient and l DF for Sex because there are two seX categories 0 The LR ChiSquare values here come from a Full Reduced analysis using the Whole Model Test table like in usual tables BUT there s a secret We ll give an example later 0 The ProbgtChiSq entries come from these statistics and the Chi squared table with DF Degrees of Freedom Parameter Estimates Term Estimate Std Error ChiSq PrgtChiSq Est te at Intercept 88817 10237 7528 lt0001 AGE 00625 0015 1737 lt0001 45 SBP 00148 000389 1458 00001 100 DBP 000288 000762 014 07059 80 CHOL 000446 000151 878 00031 180 FRW 00058 000406 204 01530 110 CIG 001231 000609 409 00432 5 SEXFEM 045305 007882 3304 lt0001 Fem For log odds of 01 The Estimates are the important part of this table 0 The ChiSquare values and Std Errors here are from a different formula than that used in the Effect LRT table 0 Consequently the Pvalues here may not exactly equal those in the preceding table But they should be close We ve added a column of variables describing a person The est for this person is 6 88817 O625gtlt 45 0148gtlt 100 45305 3307 amp pzegleg965 Re The LR Chi Sq entries in the Effect Tests table 0 LR Chi Sq values are 2 x Full Reduced 0 Here s the Whole Model table for an analysis Without DBP Whole Model Test Model LogLikeihood DF ChiSquare ProbgtChiSq Difference 63023 6 1260459 ltOOO1 Full 671643 Reduced 734666 0 Hence the LR ChiSq for testing DBP after controlling for all other variables is 267l643 671572 2 2 x 072 144 This is the value in the earlier Effect Tests table except for roundoff Estimates in J MP 0 If you go to Save Probability Formulas under the red arrow you get columns headed Lin0 Probl Prob0 Most likely Heart Disease 33073 003532 09647 0 o The entries in green are those in the row for our special Person 0 The Column Lin0 gives the values of 6 0 Most Likely just tells Whether Probl gt 12 You can get the ROC Curve Receiver 09erating Characteristic 090 080 3070 E 060 7 050 E 040 D 030 020 01039 I I I I I I I I I 000 030 050 070 090 1Specificity False Positive True Positive Using Heart Disease39039 to be the positive level Area Under Curve 07036 ROC Table Here are a few lines of the Table Prob 1 Sensitiv SensSpec True True False False Speci city 1 Pos Neg Pos Neg 02028 04098 07166 03069 220 641 445 87 02026 04107 07166 03060 220 640 446 87 02023 04116 07231 03116 222 639 447 85 02018 04134 07231 03097 222 637 449 85 02016 04153 07231 03079 222 635 451 85 o The choice p0 2023 gives the rule that maximizes SensSpec o This rule is Declare 0 NOCHD if 3 2 p0 2023 Otherwise Declare 1 CHD 0 IF half the population at large has noCHD and half does then we can estimate that in future this declaration will be correct with PCorr lPCorrnoCHDlPCorrCHD z l5884l7231 6558 2 2 2 2 However 0 The data is a random sample of health care workers Hence 0 The best estimate for the proportion of the entire population of health care workers ages 45 62 with no CHD is 0 in sample 1086 sample size 1393 0 We can now estimate for the preceding classi cation rule PCorr z 7796x5884 2204gtlt723l 6181 0 Working out the arithmetic a little more carefully yields PCorr z E xi 307 xi 639 222 Total quotTruequot 1393 639447 1393 222 85 We See That 0 The best estimate for the Prob of a correct classi cation in a population like that at hand is Total True in the ROC Table 2 7796 PnoCHD z The best estimate for the Prob of a correct classi cation Here are the relevant rows of our ROC table Prob 1Spec Sens SensSpec True True False False Total 1 Pos Neg Pos Neg True 05122 00147 00782 00634 24 1070 16 283 1094 05120 00147 00814 00667 25 1070 16 282 1095 05052 00147 00847 00700 26 1070 16 281 1096 05045 00157 00847 00690 26 1069 17 281 1095 05043 00166 00847 00681 26 1068 18 281 1094 05040 00166 00879 00714 27 1068 18 280 1095 Hence the above rule with p0 5052 yields an estimated probability of true classi cation of PCorr z 7868 VS P Corr z 861 1393 6181 for the rule in the Table with a Important Note 0 The preceding choice of rule with p0 5052 has assumed that False Positives and False Negatives are equally serious 0 That s not true here 0 As an example Let s assume 0 Diagnosing a nonCHD person as having CHD a False Pos causes one unit of pain and discomfort and o Diagnosing a CHD person as being nonCHD a False Neg causes 10 units of pain and discomfort 0 Then we want to minimize lgtltFalsePos 10 gtltFalseNeg o This occurs atpo 1033 With Sens 9642 amp Spec 2072 0 Then have many FalsePos 861 but very few FalseNeg l l Example 2 March madness Every March the NCAA holds its annual Div 1 College Basketball Tournaments for Women and for Men Teams are seeded into four brackets of 16 in each tournament In each bracket the teams are seeded from 1 best to 16 worst They play a single elimination tournament At the end the winners of the four brackets hold a four team singleelimination minitourney HOW good are the seedings How often does the better seeded team win For the past few years it has been claimed that the women are more accurately seeded than the men If so this may be because the ability differences among the women s teams are wider Is this the case Data The tournament results for both men and women for 2003 2006 have been coded We haven t gotten around to entering 2007 yet Each row of data corresponds to one game The columns are DIFF The difference in seeding between the higher and lower seeded teams These values are 3 O WIN l if the Favorite team higher seeded wins and 0 if that team loses In the nal two rounds it is possible to have DIFF 0 If so one team has arbitrarily been listed rst and WIN 1 if that team wins ROUND round of the tournament Year and M or W are self explanatory PCTHIGH and PCTLOW are the regular season winning proportions for the Higher and Lower seeded teams respectively 0 We ran a logistic multiple regression with Y WIN and 0 Model Effects DIFF M or W Year ROUND 0 Year and ROUND are included primarily as control variables but strong signi cance of either one could be of interpretive interest 0 DIFF is expected to be the dominant effect Because we have coded DIFFSO and because JMP estimates ProbYO ie the probability of an upset we expect the DIFF coef cient to be gt0 0 M or W is the effect of primary interest If the coef cient for Men is signi cant amp gt0 that will show that underdogs do better in the Men s tourney even after taking account of the seeding DIFF and the Year and Round 20 Analysis Whole Model Test Model LogLikeihood Difference 54849 Full 238033 Reduced 292882 RSquare U 01873 Observations 504 For log odds of 01 Effect Likelihood Ratio Tests Source DF LR ChiSquare DIFF 1 71839 Year 4 3398 M or W 1 3009 ROUND 5 10499 ChiSquare ProbgtChiSq 10970 lt0001 ProbgtChiSq lt0001 04936 00828 00623 These tables look much as expected AND M or W is NOT STATISTICALLY SIGNIFICANT at a 005 21 Parameter Estimates Parameter Estimates Term Estimate Std Error ChiSquare ProbgtChiSq Intercept 0709 0255 7756 00054 DIFF 0295 0040 53777 lt0001 Year03 0071 0214 0109 07410 Year04 0058 0212 0076 07831 Year05 0137 0210 0426 05139 Year06 0517 0334 2403 01211 M or VVM 0225 0130 2980 00843 ROUND1 0343 0275 1554 02125 ROUND2 0052 0286 0033 08550 ROUND3 0840 0329 6506 00108 ROUND4 0255 0376 0461 04973 ROUND5 0125 0478 0069 07934 These also look much as expected Especially DIFF gt 0 Overall ROUND is not quite signi cant at 005 but the prominent feature is that ROUND 3 lt 0 This suggests that perhaps favorites do their best in round 3 22 Is the Seeding Committee Doing the Best it Can An interesting feature appears when DIFF in PCT is included in the model Here is the output Whole Model Test Model LogLikeihood Difference 58658 Full 234224 Reduced 292882 RSquare U Observations Effect Likelihood Ratio Tests Source Nparm DF DIFF 1 1 Year 4 4 M or W 1 1 Diff in PCT 1 1 ROUND 5 5 DF ChiSquare ProbgtChiSq 12 11732 lt0001 02003 504 LR ChiSquare ProbgtChiSq 52872 lt0001 3073 05457 3196 00738 7618 00058 11013 00511 23 0 Here Diff in PCT is also statistically signi cant in addition to DIFF It has a noticeable effect on U so it is also numerically relevant o The coefficient of Diff in PCT is lt O as one would eXpect You can check this in the Parameter Estimates table This indicates that even after controlling for seeding teams with higher regular season winning percentages are less likely to be upset by lower seeded teams and Viceversa o This suggests that if it wanted to do so the seeding committee could produce more accurate seedings by paying more attention to the winning percentage of the teams 24
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'