### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Class Note for EMSE 171 with Professor Dorp at GW (3)

### View Full Document

## 20

## 0

## Popular in Course

## Popular in Department

This 27 page Class Notes was uploaded by an elite notetaker on Saturday February 7, 2015. The Class Notes belongs to a course at George Washington University taught by a professor in Fall. Since its upload, it has received 20 views.

## Popular in Subject

## Reviews for Class Note for EMSE 171 with Professor Dorp at GW (3)

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/07/15

EMSE 171271 DATA ANALYSIS For Engineers and Scientists Session 4 Goodness of Fit Credibility Intervals Two Sample Hypothesis Testing Joint Normal Distribution THE GE 0 RGE WASHINGTON UNIVERSITY WASHINGTON DC Lecture Notes by J Ren van Dorp1 wwwseasgwuedudorpjr 1 Department of Engineering Management and Systems Egineering School of Engineering and Applied Science The George Washington University 1776 G Street NW Suite 110 Washington DC 20052 Email dorpjrgwuedu STATISTICAL REVIEW XzGoodnessofFit The Chi Square test compares the empirical histogram density constructed from sample data to a candidate theoretical density Assume that the empirical sample 11 inn is a set of n realizations from an underlying unknown random variable X This sample is then used to construct an empirical histogram with m bins where Bin j corresponds to the interval L333 U 33 The chi square test allows some exibility in the choice on bin boundaries The estimator of the probability pj P rX 6 L333 of cell is A 03 pj aj1a 7m7 where Oj is the number of observations in Bin These can be determined using the FREQUENCY array function in Micro Soft Excel EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 94 STATISTICAL REVIEW XzGoodnessofFit Let F be some theoretical candidate distribution with parameter vector Qof the random variable X Whose goodness of fit is to be assessed Then pj E LBj FXUBjIQ FXLBjIQ j 1 m Define next Oj Number of Observations in Bin j X N Ej Expected Number of Observations in Bin pj X N and 2 m Oj Ej2 7 EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 95 STATISTICAL REVIEW XzGoodnessofFit Intuition If F X is a good fit then the theoretical value of pj should be close to the estimated value 3j and thus Oj should be close to Ej Hence a good fit would have a small S2value It can be shown that S2 is a realization of xivariable ie a chi squared random variable with k degrees of freedom where km lQ 1 Here is I Q equal to the number of parameters in the vector Q Note that xi is a random variable with support 0 00 ie it only takes on non negative value s Using the CHIDIST function in Microsoft Excel we can calculate the probability that xi is greater than the observed value S2 If this probability is small large than clearly the observed value 32 may be considered quotbigquot quotsmallquot EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 96 STATISTICAL REVIEW XzGoodnessofFit Define the pvalue of the Chi Squared goodness of fit test as P I Xi gt S2 E p value It is common to reject the candidate theoretical distribution when the p value is smaller than 001 005 or even 010 Rule of thumb for the number of Bins Sample Size N Number of Bins lt 20 Do not use X2 Test 50 5 to 10 100 10 to 20 N gt 100 V N to 3 Rule of thumb for the size of E j which allows the Chi Squared distribution assumption There is no real agreement on this issue it has been suggested that Ej gt 3 4or 5 EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 97 STATISTICAL REVIEW XzGoodnessofFit The Chi Square Test allows for some exibility in the choice of bin boundaries Some have suggested that boundaries should be selected such that the expected number of observations is the same in each bin This would weigh each part of the theoretical distribution equally in the chi squared fit Example 15 Continued Dielectric breakdown voltage data 2446 2561 2625 2642 2666 2715 2731 2754 2774 2794 2798 2804 2828 2849 2850 2887 2911 2913 2950 3088 Equal Bin Width Method MOM t Bin LB UB Oi pi Ei OiEi lEi 1 lt 2446 2607 2 1186 237 006 2 2607 2767 6 3479 696 013 3 2767 2928 10 3782 756 078 4 2928 gt 3088 2 1553 311 039 20 10000 137 Bins 4 Parameters 2 Degrees of Freedom 1 PValue 2419 EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 98 STATISTICAL REVIEW XzGoodnessofFit Equal Bin Width Method MLE t Bin LB UB Oi p E Oi39Ei2Ei 1 lt 2446 2607 2 1126 225 003 2 2607 2767 6 3530 706 016 3 2767 2928 10 3853 771 068 4 2928 gt 3088 2 1491 298 032 20 10000 119 Bins 4 Parameters 2 Degrees of Freedom 1 PValue 2744 Equal Average Observation in Bin Method MLE and MOM t Bin LB UB Oi 2 pi Dr E Oi39Ei2Ei 1 lt 2446 2681 5 2500 2500 500 000 2 2681 2779 4 5000 2500 500 020 3 2779 2878 6 7500 2500 500 020 4 2878 gt 2878 5 10000 2500 500 000 20 040 Bins 4 Parameters 2 Degrees of Freedom 1 PValue 5271 EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 99 STATISTICAL REVIEW Credibility Intervals Given an quotadequatelyquot fitted theoretical distribution FXU to an sample 1 it we can establish an p X 100 credibility interval L U such that PrX E LU m p by setting L 2 lt1 pgt2 Fw where U1p2 and x11p2 are quantiles of the distribution function Fxtvl For example ifwe set 10 090 we have L 005 F1005 The 5 th percentile U 095 F1095 The 95 th percentile EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 100 STATISTICAL REVIEW Credibility Intervals Example 15 Continued Dielectric breakdown voltage data 2446 2561 2625 2642 2666 2715 2731 2754 2774 2794 2798 2804 2828 2849 2850 2887 2911 2913 2950 3088 E 27793 82 m 2137 or s m 1462 We have for a 90 credibility interval L 005 m 2539 U 18095 W Compare this with the earlier established 90 confidence interval 173 X V 214727793 173 X x 214 m 27793 2723 2836 WHAT IS THE DIFFERENCE EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 101 STATISTICAL REVIEW Credibility Intervals The 90 confidence interval above was calculated for E X not X 1 This is true in general a confidence interval is calculated for a characteristic of X such as for example EX Va7 X etc For a random sample X1 Xn the probability that an 1 oz100 confidence interval for E X which is a random interval captures EX equals 1 04 For a fixed sample 1 it no probability interpretation can be assigned to a realization of an 1 oz100 confidence interval When the sample size 71 increases the width of confidence intervals decrease They converge to the true value a single point For a random sample X1 Xn the probability that X is a member within an 1 oz100 credibility interval for X which is also random interval equals 1 oz EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 102 STATISTICAL REVIEW Credibility Intervals When the sample size 71 increases the Width of credibility intervals do not decrease Credibility intervals converge to the true probability interval For a fixed sample 1 it the probability that X is a member Within an 1 oz100 credibility interval for X which is also random interval equals approximately 1 oz 06 08 12 14 16 18 2 PDF True Average 25 Quantile 975 Quantile O Conf Int for the mean ofX 0 Cred Int forX Confidence Interval and Credibility interval using 100 Samples EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 103 STATISTICAL REVIEW Credibility Intervals 0 05 1 PDF True Average 25 Quantile 975 Quantile o Conf Int for the EX o Cred Int forX Confidence Interval and Credibility interval using 500 Samples 0 Note that the width of the confidence interval has decreased Whereas the width of the credibility interval has not 0 Accompanying EXCEL spreadsheet show random behavior of both EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 104 STATISTICAL REVIEW Credibility Intervals Probabilitv Plots in MINITAB to calculate Credibilitv Intervals Probability Plot of Voltage Normal 95 CI 99 Mean 2779 95 95 StDev 1462 90 N 80 AD 0189 70 c 60 g 50 50 g 40 30 20 10 5 5 9 o 1 I I I I mI I 22 24 26 28 3O 32 Voltage EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 105 STATISTICAL REVIEW Probability Plots Probability plots in MINITAB are a powerful visual tool for testing goodness of fit The AD value is the statistic value of the Anderson Darling goodness of fit test similar in spirit as the X2 test Large values of the AD statistic indicate a larger deviation from the fitted theoretical distribution The larger the p value the larger the support for the theoretical distribution If the theoretical distribution is a perfect fit of the data all data point should form a straight line Deviations from the straight line show deviations from the fitted theoretical distribution When can a data point be considered an outlier Answer when a data point is outside the boundaries that are drawn The boundaries in the above figure are 95 confidence intervals for the cumulative distribution function F 9 EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 106 STATISTICAL REVIEW Two Sample Mean Conf Int Let X1 Xn and Y1 Ym be a random sample from a normal distribution with means M1 and M2 and variances 012 and a respectively lfj39s independent of the Xi39s Then we can construct the following T estimator 7 7 M1 M2 T 2 2 Ntv ii TL m which has approximately a t distribution with V degrees of freedom 2 2 2 1 21 V 2 5322 45311711 round V down to the nearest integer EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 107 STATISTICAL REVIEW Two Sample Mean Hyp Testing 1001 00 confidence interval for M1 M2 2 2 8 8 1 2 TL m E g i tu1 a2 The twosample t test for testing H0 M1 M2 A0 is a follows Test statistic value i g AO 1502 l S i 3 Iwmw TL Alternative Hypothesis Rejection Regions for signi cance or H13M1 M2gtA0 to gt t l71a upper tailed H13M1 M2ltA0 to lt t71a lower tailed H13M1 M275A0 to gt tuba2 or to lt tV71a2 two tailed p values can be constructed in a similar fashion as before EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 108 STATISTICAL REVIEW Two Sample Mean Hyp Testing Example 16 As the population ages there is increasing concern about accident related injuries to the elderly The article quotAge and Gender Differences in Single Step Recovery from a Forward Fallquot zzml of Gerontology 7999 M44 M50 reported on an experiment in which the maximum lean angle the furthest a person is able to lean and recover in one step was determined for both a sample of younger females 21 29 years and a sample of older females 67 81 years The following observations are consistent with summary data given in the article YF 29 34 33 27 28 32 31 34 32 27 Sample size n 10 OF 1815231312 Sample size n 5 Does the data suggest that true average maximum lean angle is more than 10 degrees smaller than it is for younger females State and test the relevant hypothesis at significance level 10 by obtaining a p value Assumption Let X1 Xn be the YF and Y1 Ym be the OF random sample from a normal distribution with means M1 and M2 and variances 012 and a respectively EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 109 STATISTICAL REVIEW Two Sample Mean Hyp Testing NullHypothesis H0 M1 M2 10 AlternativeHypothesis H0 M1 2 gt 10 Sample data 5 307n 10 y 162m 5 8 m 76 8 m 197 10 Statistic t0 m 208 a t 53 s 2 D ff E E N egrees o reedom V SW10 SW5 N 559 gt use 5 9 4 PValue PTltT6 gt tO HO W 46200 lt 04 1000 Conclusion Reject H0 You would accept H0 at 462 EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 110 STATISTICAL REVIEW Two Sample Var Hyp Testing Let X1 Xn and Y1 Ym be a random sample from a normal distribution with means M1 and M2 and variances 012 and a respectively lfj39s independent of the Xi39s Recall that the following F estimator 2 2 1 N 2 2 m n 1m 1 5202 1 my 1 0392 i1 1 1 1 08 i i 06 1 1 LL 1 1 Q 1 1 l l 1 04 02 3 j o 1 2 3 4 5 F distribution with n 1 m 1 9 4 EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 111 STATISTICAL REVIEW Two Sample Var Hyp Testing The twosample F test for testing H0 a a is a follows 2 5 Test statlstlc value f0 8 n17m1 assummg H0 2 PDF 5 K 5 lmmHquotWWW1I1ii1iNEHI1II1II1II1II1II1II1II1IHINIHIHIIWIHIHI 0 1 2 3 4 X 5 6 7 8 Approximate FPDF o 005 Left Threshold 0 005 Right Threshold EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 112 STATISTICAL REVIEW Two Sample Var Hyp Testing Alternative Hypothesis Rejection Regions for signi cance 0 H1 a gt 0 f0 gt Fn17m171a upper tailed H1 a lt 0 f0 lt Fn17m17a lower tailed H1 a 75 0 f0 gt Fn17m171a2 or f0 lt Fn17m17a2 two tailed The upper tailed is frequently used in Analysis of Variance ANOVA The p value of this test equals P7 F gt fO Ho Example 16 Continued The article quotAge and Gender Differences in Single Step Recovery from a Forward Fallquot zz7m of Gerontology 7999 M44 M50 reported on an experiment in which the maximum lean angle the furthest a person is able to lean and recover in one step was determined for both a sample of younger females 21 29 years and a sample of older females 67 81 years The following observations are consistent with summary data given in the article YF 29 34 33 27 28 32 31 34 32 27 Sample size n 10 OF 1815231312 Sample size n 5 EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 113 STATISTICAL REVIEW Two Sample Var Hyp Testing Carry out a test at a significance level of oz 010 whether the standard deviations for the two age group are different NullHypothesis H0 012 a AlternativeHypothesis H1 012 75 a Sample data 5 307n 10 y 162m 5 8 m 76 8 m 197 Statistic f0 g m 0384 Criticality Region F974700 179747095 027 600 PValue Cannot be calculated in this case Conclusion Fail to Reject H0 038 E 027 600 EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 114 STATISTICAL REVIEW Two Sample Var Hyp Testing Carry out a test at a significance level of 010 whether the standard deviations for the YF age group is less than that of the OF age group NullHypothesis H0 012 a2 AlternativeHypothesis H1 012 lt a Sample data 5 307n 10 y 162m 5 8 m 76 8 m 197 Statistic f0 8 m 0384 52 Criticality Region F9747010 00 m 0351 00 PValue PTltFn17m1 lt fO HO W 107400 Conclusion Fail to Reject H0 0384 E 0351 00 and 1074 gt 10 EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 115 STATISTICAL REVIEW Joint Normal Distribution The assumption of the two sample hypothesis tests above is that the X1 X and Y1 Ym be a random sample from a normal distribution with means 1 and M2 and variances a and a and Yj39s independent of the X i39s The distribution of X Y is a joint normal distribution r Ir 1114quot 7A 39 5395i11 l3955039 ib EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 116 STATISTICAL REVIEW Joint Normal Distribution Probability density function of a bivariate normal distribution X X 1 rvJWVVl2MeanVector4 1 X2 2 2 Covariance Matrix 2 lt 01 COUX17X2gt CovX1X2 a 1 79 mexphm LOIS 1W 4 Independence in case of the bivariate normal distribution implies Z 012 0 21 10 0 0 a 0 103 What is the shape of the pdf in case of dependence between the two normal marginals EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 117 STATISTICAL REVIEW Joint Normal Distribution 0 T Q FL Q 901 QQOFNN To be able to perform Statistical Inference in the case that the X1 Xn and the Y1 Ym are samples from a multivariate normal distribution with Z nondiagonal requires knowledge of multivariate analysis techniques EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 118 STATISTICAL REVIEW WHAT39S NEXT 0 A review of Matrix Algebra Hotellings T2 test A multivariate hypothesis test for the mean values of a vector where the variance covariance matrix is not unit diagonal 0 Regression Analysis A methodology to investigate the relationship between a single variable and a number of explanatory variables The explanatory variables are not necessarily independent Principal Component Analysis A technique to reduce the dimensionality of a multivariate data set and identify the main quotcomponentsquot These main components are orthogonal independent from one another Analysis of Variance ANOVA An analysis that seeks to explain the variance in a single variable using a number of explanatory variables EMSE 171271 FALL 2005 JR van Dorp 92706 dorpjrgwuedu Page 119

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I made $350 in just two days after posting my first study guide."

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.