Popular in Course
Popular in Statistics
This 9 page Class Notes was uploaded by Orval Funk on Monday September 28, 2015. The Class Notes belongs to STAT111 at University of Pennsylvania taught by Staff in Fall. Since its upload, it has received 26 views. For similar materials see /class/215436/stat111-university-of-pennsylvania in Statistics at University of Pennsylvania.
Reviews for INTRODUCTORYSTATISTICS
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/28/15
Statistics 111 Lecture 11 Introduction to Inference Sampling Distributions Oct 152009 Stat 111 Lecture 11 Sampling 1 Distributions Administrative Notes Recitation is cancelled this Friday Oct 16th Homework 3 must be submitted to your TAs mailbox Huntsman Hall 4th floor by noon on Friday Oct 16th Oct 152009 Stat 111 Lecture 11 Sampling Distributions Inference with a Single Observation Population gt Parameter u Sampling Inference A Observation Xi Each observation Xi in a random sample is a representative of unobserved variables in population How different would this observation be if we took a different random sample Oct 152009 Stat 111 Lecture 11 Sampling 3 Distributions Normal Distribution Model for our overall population Can calculate the probability of getting an observation greater than or less than any value Values ofX Usually don t have a single observation but instead the mean of a set of observations Oct 152009 Stat 111 Lecture 11 Sampling Distributions Inference with Sample Mean 2 Population gt Parameter u A sampling Inference V Estimation i Sample gt Statistic X Sample mean is our estimate of population mean How much would the sample mean change if we took a different sample 7 Key to this question Sampling Distribution of X Oct 152009 Stat 111 Lecture 11 Sampling 5 Distributions Sampling Distribution Focused on models for continuous data using the sample mean as our estimate of population mean Sampling Distribution of the Sample Mean how does the sample mean change over different samples Sample 1 of size n vSample 2 of size n vSample 3 of size n Po uation gtSample 4 of size n p gtSample 5 of size n Parameter u Asample 6 of size n gtltl gtltl gtltl gtltl gtltl Ni V V V V V V Oct 152009 Stat 111 Lecture 11 Sampling Distributions Mean of Sample Mean Variance of Sample Mean First we examine the center of the sampling 39 Nexti We examine the Spread 0f the sampling distribution of the sample mean dIStrlbUtleh 0f the sample mean The spread of the sampling distribution of the sample Center of the sampling distribution of the sample n is mean is the unknown population mean 2 i O i O VARX 7 SDX 7 meanX M n E Over repeated samples the sample mean will on As sample size increases spread of the sample average be equal to the population mean mean decreases no guarantees for any one sample Averaging over many observations is more accurate than just looking at one ortwo observations Oct 15 2009 Stat 111 7 Lecture 11 7 Sampling 7 Oct 15 2009 Sta 111 r Lecmre 11 7 Sampling 0 Distnputiuns Distributiuns Comparing the sampling distribution of the Law Of Large Numbers sample mean when n 0 o If one draws independent samples from a population with mean p then as the number 10 observations of observations increases the sample mean Y gets closer to the population mean p This is easy to see since we know that 1 observation mead u 7 02 ill vananceX i gt 0 as n gets large it Oct 15 2009 Stat 111 7 Lecture 11 7 Sampling 9 Oct 15 2009 Sta 111 r Lecmre 11 7 Sampling 10 Distnputiuns Distributiuns Example Distribution of Sample Mean Population seasonal homerun totals for 7032 baseball Players from 1901 to 1996 We now know the center and spread of the 0 Take different samples from this population and sampiing distribution for the sampie mean compare the sample mean we get each time o In real life we can t do this because we don t usually have the entire population What about the shape of the distribution quot Sample Size mead SDiY 1005ampiesotsizen1 369 684 If our data X1X2 Xquot follow a Normal 1005ampiesotsizen10 443 240 distribution then the sample mean X will 100 samplesot sizen 100 442 066 also follow a Normal distribution 100 samples otsizen 1000 442 024 Population Parameter p442 Oct 15 2009 Stat 111 7 Lecture 11 7 Sampling 11 Oct 15 2009 Sta 111 r Lecmre 11 7 Sampling 12 Distnputiuns Distributiuns Example 0 Mortality in US cities deaths100000 people 9403 n 750 800 850 900 950 1050 1150 o This variable seems to approximately follow a Normal distribution so the sample mean will also approximately follow a Normal distribution Oct 152000 Stat 111 Lecture 11 Sampling 13 Distributions Central Limit Theorem o What if the original data doesn t follow a Normal distribution 0 HRlSeason for sample of 100 baseball players x535 7100 o If the sample is large enough it doesn t matter Oct 152000 Stat 111 Lecture 11 Sampling 14 Distributions Central Limit Theorem o If the sample size is large enough then the sample mean X has an approximately Normal distribution Distribution of X mean pr 0 This is true no matter what the shape of the distribution of the original data Oct 152000 Stat 111 Lecture 11 Sampling 15 Distributions Example Home Runs per Season o Take many different samples from the seasonal HR totals for a population of 7032 players 0 Calculate sample mean for each sample 0 Distribution of Sample Means from different samples I n1 Oct 152000 Stat 111 Lecture 11 Sampling 16 Distributions Example SAT test scores 0 From 2007 data for entire population SAT scores have a population mean of p 1150 and population 3d of o 200 o What is the probability of a single student scoring over 1250 on SAT o Simple standardization ZXM1250115005 200 From Normal table PZgt0503085 o So a single student has a 31 chance of scoring over 1250 Oct 152000 Stat 111 Lecture 11 Sampling 17 Distributions Example SAT test scores 0 What if instead of a single student we have a random sample of 25 students o What is the probability that the sample mean x of our 25 students is over 1250 on SAT 0 Earlierthis class sample mean N follows a normal distribution centered at N with standard deviation Oct 152000 Stat 111 Lecture 11 Sampling 18 Distributions Example SAT test scores So when we do standardization for sample mean we have to pay attention to standard deviation X M 1250 1150 2 7 25 o G zoo IE From Normal table PZ gt 25 00062 So there is only a 062 chance ofthe sample mean of 25 students being higherthan 1250 on time Stat111rLeauve11rSarllpllng ts Dlstvlbutlans Example SAT test scores Probability ofa single student having a SAT score greater than 1250 was 31 Probability ofthe sample mean of 25 SAT scores being greater than 1250 was 062 Why is the probability for the sample mean so much lower Well remember that 1250 is substantially higher than population mean 0 1150 Think about Law of Large Numbers as the sample size grows the sample mean becomes closer to population mean 7 Less cnance of gettng a substantlaiiy nlgner SAT score of 1250 Wltn a larger sarnplel on tum Stat111rLeauve11rSarllpllng 2n Dlstvlbutlans Next Class Lecture 12 Discrete data sampling distribution for sample proportions Moore and McCabe Section 51 on time Stat111rLeauve11rSarllpllng 21 Dlstvlbutlans Statistics 111 Lecture 24 Inference for relationships between variables Dec 3 2009 Stat 111 Lecture 24 Regression Administrative Notes Homework 6 due in recitation this Friday Dec 4th Homework 7 due in TA s mailbox by 5pm on Thursday Dec 10th TAs will hold regular office hours up until exam My office hours are cancelled for Dec 8th but I will be holding office hours on Dec 15th Dec 3 2009 Stat 111 Lecture 24 Regression 2 Final Exam 0 Final Exam is Mon Dec 21St 3 5pm Covers Chapters 1 810 in textbook Bring ID cards to final Calculators 85 x 11 note sheet allowed Will post additional suggested textbook ques ons Stat 111 Lecture Final Exam Room 130 230pm JMHH G06 230 330pm JMHH F95 Dec 3 2009 Stat 111 Lecture 24 Regression Inference Thus Far Tests and intervals for a single variable Tests and intervals to compare a single variable between two samples For the last couple of classeswe have looked at count data and inference for population proportions Before that we looked at continous data and inference for population means Next couple of classes inference for a relationship between two continuous variables Dec 3 2009 Stat 111 Lecture 24 Regression 4 Two Continuous Variables Remember linear relationships between two continuous variables Scatterplots Correlation Best Fit Lines Dec 3 2009 Stat 111 Lecture 24 Regression Scatterplots and Correlation Visually summarize the relationship between two continuous variables wi h a scatterplot Education and Mortality r 051 Draft Order and Birthday r 022 EEEEEE an aaaaaa ny If our X and Y variables show a linear relationship we can calculate a best fit line between Y and X Dec 3 2009 Stat 111 Lecture 24 Regression 6 Linear Regression 0 Best fit line is called Simple Linear Regression Model Yor3 Xe Coefficients or is the intercept and is the slope Other common notation 80 for intercept 31 forslope OurY variable is a linear function of the X variable but we allow for error e in each prediction Error is also called the residual for that observation 39 A Observed Yi residual e Y1 Y1 A Predicted Yi 1 lei Dec 3 2009 Stat 111 Lecture 24 Regression 7 Residuals and Best Fit Line Bo and 31 that give the best fit line are the values that give smallest sum of squared residuals SSR zief ziw f 0401 39Xt2 i1 I ei residuali aaaaaaa en Best t line is also called the leastsquares line Dec 32009 Stat 111 Lecture 24 Regression 8 Best values for Regression Parameters The best fit line has these values forthe regression coefficients Best estimate of slope3 b 1 Best estimate of intercepta a Y 9 X Dec 3 2009 Stat 111 Lecture 24 Regression 9 Example Education and Mortality l i I i 90 95 100 105 110 115 120 Education Mortality 135316 3762 Education Negative association means negative slope b Dec 3 2009 Stat 111 Lecture 24 Regression 10 Example Vietnam Draft Order DraiiOrder Birthday Draft Order 2249 0226 Birthday Slightly negative slope means later birthdays have a lower draft order Dec 3 2009 Stat 111 Lecture 24 Regression 11 Significance of Regression Line 0 Does the regression line show a significant linear relationship between the two variables 0 If there is not a linear relationship then we would expect zero correlation r 0 0 So he estimated slope b should also be close to zero 0 Therefore our test for a significant relationship will focus on testing whether our true slope I3 is significantly different from zero Ho 80 versus Ha peo o Ourtest statistic is based on the estimated slope b Dec 3 2009 Stat 111 Lecture 24 Regression 12 Test Statistic for Slope Our test statistic for the slope is similar in form to all the test statistics we have seen so far b O b SEb SEb The standard error of the slope SEb has a complicated formula that requires some matrix algebra to calculate We will not be doing this calculation manually because the JMP software does this calculation for us Dec 32009 Stat Ml 7 Lecture 24 r Regresslorv Example Education and Mortality Mnmlllv a 11111577 7 171519294 Enmanon V lBivariae Fix 11f Munaliiy By Education v Llnear Fn v Summary or m u r 0161217 uare 551 112 Knot Mean Square E1107 11 91751 Mann DI RUDDMR 940 31187 Omanalum lnr Sum ngsl 60 b Lazk of Fit v Analysis of Variance Snurze DF Sum of Squares Mean S uara F Rillb 561 11 595615 2051175 5 2509 Prat gt F E Total 59 22539536 DUD a 9 1 1119395 1 1 11 12125 111111111111 Term asumau StdEerr 15auo Probgtlll Internet 11511577 9142111 1450 1111111 Bg near r11 Eduzalmn 717 51929 5 197154 74 51 lt 111101 Dec 32009 Statlll rLectLlre 247Regre55l0rr T b 376 45gt3 SEb 8307 pvalue for Slope Test ls T 4 53 significantly different from zero To calculate a pvalue for our test statistic T we use the tdistribution with n2 degrees of freedom For testing means we used a t distribution as well but we had n1 degrees of freedom before For testing slopes we use n2 degrees of freedom because we are estimating two parameters intercept and slope instead of one a mean For cities dataset n 60 so we have df 58 Looking at a ttable with 58 df we discover that the P T lt 453 lt 00005 Dec 32009 Stat Ml 7 Lecture 24 r Regresslorv Dec 3 2009 Statlll rLectlJre 247Regre55l0rr Conclusion for Cities Example Twosided alternative pvalue lt 2 x O 0005 0001 We could get the pvalue directly from the JMP output which is actually more accurate than ttable Since our pvalue is far less than the usual o1level of 005 we reject our null hypothesis We conclude that there is a statistically significant linear relationship between education and mortality Another Example Draft Lottery Is the negative linear association we see between birthday and draft order statistically significant l3 Bivariale Fil of drafmrder By birthday V Linear Fir drafmrdar 2249134 7 0125585 blnhdav v summary f 511 1151111111 1111511914 RSQuare Ad 11 1145127 511m Mzan 51111112 Error 11112112 e v Analysis nf Variance Error 154 15775292 so 1911 131 25a birthday Arm Esllmale 511 mm Rana thgtlll lnterzept 22591111111512112 211511 1111111 mnndav 711225555 1111511152 7142 lt11Q111 b 0226 SEb 0051s Dec 32009 Stat Ml 7 Lecture 24 r Regresslorv Fliunear n1 Suurte DF Sum afsnuaus Mean Square r Ratio 1 21151195 1 21151195 195151 19551 c Total 155 41155527 5 lt1111111 4 pvalue Another Example Draft Lottery pvalue lt 00001 so we reject null hypothesis Conclude that there is a statistically significant linear relationship between birthday and draft order Statistical evidence that the randomization was not done properly Dec 32009 Statlll rLectLlre 247Regre55l0rr Confidence Intervals for Coefficients JMP output also gives the information needed to make confidence intervals for slope and intercept 100C con dence interval for slope 3 b 1 7quot SEb The multiple t comes from a t distribution with n2 degrees of freedom 100C con dence interval for intercept a a 1 7 SEa Usually we are less interested in intercept a but it might be needed in some situations Dec 3 2009 Stat 111 Lecture 24 Regression 19 Cis for Mortality vs Education Parameter Estimates Estimate Std Error tRatio Probgtll lnteKepl 13531577 9142318 1480 lt0001 Educatlon 13761929 8307194 1453 0001 We have n 60 so our multiple t comes from at distribution with df 58 For a 95 CL t 200 o 95 con dence interval for slope 8 b 1 7 SEb 376 1 20831 542 210 Note that this interval does not contain zero 95 con dence interval for intercept or a 1 7SEa13532 1 209141170 1536 Dec 3 2009 Stat 111 Lecture 24 Regression 20 Confidence Intervals Draft Lottery pvalue lt 00001 so we reject null hypothesis and conclude that there is a statistically significant linear relationship between birthday and draft order 0 Statistical evidence that the randomization was not done properly 95 con dence interval for slope 5 b 1 7SEb 023 1 1980 05 033 013 Multiple t 198 from tdistribution with n2 363 df Con dence interval does not contain zero which we expected from our hypothesis test Dec 3 2009 Stat 111 Lecture 24 Regression 21 Education Example Dataset of 78 seventhgraders relationship between IQ and GPA CPA r I I I r r 7 EU 90 100 110 120 130 1 0 I0 Clear positive association between IQ and grade point average Dec 3 2009 Stat 111 Lecture 24 Regression 22 Education Example Is the positive linear association we see between GPA and IQ statistically signi cant 39 39iEivariate Fit of GPA By IQ 711511 5 1 CPA 5 La v Analysis of Variance nurze DF Sum 01 Squares Model 1 13531551 Error 75 203101109 5 1551 77 3391255 7 Parameter Estimates 0 9 a 7 E 4 3 2 L Mean Square 5 Ratio 135319 510005 2572 mm gt r lt 00 emu Estimate SmErrar tRauu thgtIlI 17017515 13557055 1551753 1229 00247 10 01010217 001215 714 lt0001 I r I 7 so so 100 110 120 130 10 IO 7 405557 51 b 0 1 01 1 pvalue SEb 0014 Dec 3 2009 Stat 111 Lecture 24 Regression 23 Education Example pvalue lt 00001 so we reject null hypothesis and conclude that there is a statistically significant positive relationship between IQ and GPA 95 con dence interval for slope 3 b 1 7SEb01011 1990014 0073 0129 Multiple t 198 from tdistribution with n2 76 df Con dence interval does not contain zero which we expected from our hypothesis test Dec 3 2009 Stat 111 Lecture 24 Regression 24
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'