# Regression Analysis and Modeling STAT 511

Penn State

GPA 3.92

This Study Guide belongs to STAT 511 at Pennsylvania State University taught by Staff in Fall.

Date Created: 11/01/15

Statistics 51 1 Study Guide 1 Fall 2001 This study guide is to help you determine ifyou need to review prerequisite Questions A Questions 1 11 refer to the data below B The effect of apple mosaic on growth was measured on 2 year old seedling unpruned trees propagated from 10 infected and 6 uninfected buds from the same mother tree The data is the stem volume in cubic centimeters We will use only the data from the 6 healthy uninfected buds It is already known that uninfected buds from an infected mother tree may differ signi cantly from buds from a disease free mother tree Stem Volume 1384 1324 1325 1065 1870 1652 1 What is the population of interest in this study That is what is the population which the investigator wants to make inferences about 2 What is the sampling population What is the population from which the sample was taken 3 What is the sample size 4 Compute a 95 con dence interval for the population mean 5 What assumptions have you made about the data in forming the con dence interval 6 How could you check these assumptions 7 Is the interval in 4 an interval for the mean of the population of interest or of the sampling population 8 Compute an estimate of the population variance 9 Compute an estimate of the variance of the sample mean 10 From many previous studies it is known that the mean stem volume for buds on healthy trees no infected buds is 1442 Test whether the mean stem volume for healthy buds on this mother tree is 1442 11 What assumptions about the data have you made in performing this test 12 Draw a normal probability plot to match each of the histograms below Statistics 511 Study Guide 1 Fall 2001 13 Which of the histograms show skewness Which of them show heavy tails 14 Suppose you had a sample of size 25 and the histogram of the sample looked like one of the plots below You hoped to use a ttest of the mean For which of the plots below would the test be valid 0 O a b 0 O c d 0 0 e f Statistics 511 Study Guide 1 Fall 2001 STUDY GUIDE 1 SOLUTIONS 1 The population of interest is uninfected buds from infected mother trees 2 The sampling population is uninfected buds from this one infected mother tree 3 The sample size is 6 4 Inference about the mean is based on assumptions of normality and independence see 5 Y 2 7 Z 39 is distributed N u i and Y 7 SY Then confidence limits for u the population mean are given by T i t0c n 1 ST where 3 s2 n and s2 2m Y2 n 1 T 14366667 t0c n1 t05 5 2571 S 282920248 ST Sf 28292248g 1155017 The confidence interval is 14366667 i115501708 gtlt 2571 1139717336 Under these assumptions T is distributed tn1 5 Inference concerning the population mean u is based on the assumptions 1The sample is a random sample independent and identically distributed 2The observations have a normal distribution with unknown mean u and variance 62 6These normality assumptions can be checked by a histogram of the data An NSCORE plot normal probability plot of the data or the deviations from the mean could also be obtained and is more accurate for small sample size A bellshapedquot histogram and a straight line NSCORE plot suggest normality Independence may be difficult to check 7The interval in 4 is for the sampling population mean Statistics 511 Study Guide 1 Fall 2001 8The population variance 62 is estimated by S2 the sample variance 2 ZXYi TY S 1 40021933345 800438667 m 800439 n 9The sample mean T is distributed as a normal with mean u and variance czn ie YNiiczn An estimate of its variance is Szn 8004396 1334064433 10Test H0 u uo 1442 vs HA u uo Test statistic is t Y o tn1 SY The ac 05 decision rule is reject H0 ifltl gt t0c n1 t055 2571 ltl T u0ST l14366667 1442115501708100461721 0046172 Hence do not reject H0 Therefore there is not enough evidence to conclude that the mean stem volume for healthy buds on this tree is not 1442 Alternately notice that 1442 is within the 95 con dence interval for u Since any value in the 10c100 con dence interval will not be rejected in a twotailed test of size at it is clear that H0 is not rejected 11The assumptions are exactly the same as for the con dence interval 12Normal probability plots check for normality by plotting the ordered observations on the Y axis against their expected values if they were drawn from a normal population on the X axis Points drawn from a normal population will tend to fall on a straight line Points drawn from nonnormal distributions may systematically vary from a straight line Statistics 511 Study Guide 1 Fall 2001 a NORMAL DISTRIBUTION Points tend to fall on straight line Obs Normal Scores b HEAVYTAILED DISTRIBUTION Surplus of extreme values causes curvature in either end of normal score plot Obs Normal Scores Statistics 511 Study Guide 1 Fall 2001 c SKEWEDRIGHT DISTRIBUTION Surplus of low values and extreme values on the upper tails combine to give curvature to normal score plot extreme positive values give upward curve surplus of low values creates bulge Normal Scores d LIGHTTAILED DISTRIBUTION Dearth or relative lack of extreme values causes curvature at both ends of normal score plot Dearth of extreme values creates curvature Normal Scores Statistics 511 Study Guide 1 Fall 2001 e BIMODAL SKEW LEFT Think of a skew left distribution then add a bump caused by the second hump in the histogram overall similar to a skew left bump caused by second hump Obs R Normal Scores SLIGHTLY SKEW DISTRIBUTION WITH LIGHT TAILS Similar to lighttailed distribution but asymmetric slightly more points in lower end produces asymmetry R Normal Scores l3Histogram c is skew Histogram e which is bimodal may also be considered to be skew Histogram b has heavy tails l4T tests are valid for normal data hence the t test is valid for histogram a T tests are also valid for nearnormal data with light tails This is true because the relative lack of extreme values will tend to make T closer to u and thus the tstatistic would be smaller than if the data were normal Hence the test is likely to reject H0 when it is true Such a test is said to be conservative Histograms d and f will produce conservative ttests

