Class Note for STAT 528 at OSU 63
Class Note for STAT 528 at OSU 63
Popular in Course
Popular in Department
This 16 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Ohio State University taught by a professor in Fall. Since its upload, it has received 15 views.
Reviews for Class Note for STAT 528 at OSU 63
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 02/06/15
Stat 528 Autumn 2008 Towards Statistical Inference Reading Sections 33 44 0 Performing statistical inference 0 Population distributions 0 Sampling distributions 0 Visualizing sampling distributions 0 Bias variance and mean square error 0 The law of large numbers 0 A rst look at the central limit theorem An example o A question What proportion of researchers at OSU use statistics in their research 7 this proportion is a parameter p of our population 0 We cannot interview all researchers at OSU We collect a random sample of researchers at OSU We ask them Do you use statistics in your research We calculate the proportion of people in the sample Who use statistics this proportion is a statistic o A parameter is a number used to describe a characteristic of the population eg u a p o A statistic is a function of the sample of data eg 3quot 5 c We often use a statistic to estimate a parameter ln this case the statistic is known as an estimator Performing statistical inference 0 Statistical inference is the art of using information in the sample to infer something about the population from which the sample was drawn 0 Statistical inference is based upon a simple idea We ask the question Are the data as summarized by the statistic or estimator consistent with the model of a population as sum marized by a particular value of the parameter and chance variation77 This type of reasoning leads to the hypothesis test Suppose that we obtain a sample of 100 researchers which yields 42 who use statistics in their research The sample proportion is I 042 Are these data consistent with the belief that p 025 Are these data consistent with the belief that p 045 What values of p are these data consistent with Tools for statistical inference 0 Random sample A random sample consists of n independent draws from some population or n independent values produced by a chance experiment 0 Summary statistic We choose a summary statistic or a small collection of summary statistics to represent the data obtained in our experiment The summary statistic is a random variable 0 Sampling distribution The sampling distribution of a statistic is its probability distribution The distribution depends on features of the population Probability calculations are used to derive the sampling distribution 0 Comparison We compare the observed statistic to its sampling distri bution lf there is a clash between the observed statistic and the sampling distribution we discard the assump tions used to derive the sampling distribution if not we retain the assumptions 0 Hypotheses hypothesis tests p values Type l and Type ll error rates power con dence intervals etc Much more terminology and formalization of the problem yet to come Distributions 0 Population distributions The population distribution is the distribution of a variable in the population It is also the distribution of the variable when we choose one individual at random from the population 0 Sampling distributions Suppose we take a random sample from the population or perform a randomized experiment The statistic we calculate is a random variable The probability distribution of this RV is called its sam pling distribution Visualizing sampling distributions 0 Want to know how a statistic behaves for different sam ples from the population 0 Repeat a large number of times 1 Draw a sample of size n from the population 2 Calculate the statistic based on that sample 0 Summarize the observed values of the statistic in a histogram o This is gives an approximate view of the sampling distri bution Toy example a normal population 1 Suppose our population of values for X is described by a Nlt1O7 22 distribution N102 population density of X 000 005 010 015 020 I 4 6 8 10 12 14 16 Xvalues 2 Draw a SR8 of size n 2 from this population 331 1262151 332 1277690 Calculate the sample mean SE for this sample CE 1269920 Record this value of 3quot Toy example cont 3 Draw another SR8 of size n 2 from this population 61 8885102 62 10007891 Calculate the sample mean for this sample and record it CE 9446497 4 Draw another SR8 of size n 2 from this population 331 9925948 332 8929831 Calculate the sample mean for this sample and record it 5 942789 5 Continue taking SRSs of size n 2 calculating the sample mean and recording the result 6 Draw a histogram of the values of the sample mean Example 1 The mean of samples from a N1022 population 0 Draw 1000 random samples of size n from a Nlt107 22 popu lation For each sample calculate the sample mean 100 150 200 250 50 o 150 100 50 o n B 8 10 meansample n20 2 I39 12 14 l l l l I B 8 IO 12 meansample 14 10 200 150 100 50 o 100 150 200 250 50 o n 5 III 8 10 meansample 12 14 n50 l l l l l B 8 1o 12 14 meansample Example 2 U01 population 0 Draw 1000 random samples of size n from a U0 1 popula tion For each sample calculate the sample mean n 2 n 5 o 2 o n o o D E o n n D I I D II II 00 02 O4 06 08 IO 00 02 O4 08 08 IO meansample meansample n 20 n 50 o o N o n N o n o o N o n o o o o n o D D I 00 O2 O4 08 08 IO 00 O2 O4 08 08 IO meansample meansample 11 Example 3 a rightskewed population 0 Draw 1000 random samples of size n from a right skewed population For each sample calculate the sample mean n 2 n 5 o o m D o N 23 8 N o 2 o o o o o D m I n D 2 D I O O O 2 O 4 O B O O O 2 O 4 O B meansample meansample n 20 n 50 o n N o n o o N o o e a o o o n n D I II D L O O O 2 O 4 O B O O O 2 O 4 O B meansample meansample 12 Example 4 coin ips 0 Flip n biased coins and record the proportion of heads Repeat this procedure 1000 times 500 400 400 300 200 300 200 100 100 D I 0 0 0 2 0 4 0 B 0 0 0 2 0 4 0 B meansample meansample n 20 n 50 50 10 150 200 100 150 200 250 300 50 D II III C I II 2 o 4 o o 2 o 4 B 00 00 0 06 meansample meansample 13 Features of the sampling distribution o The sampling distribution of a statistic is often centered about the value of the population parameter estimated by the statistic o The bias of an estimator is the mean of its sampling distri bution minus the estimand A bias6 Mg 6 An estimator with zero bias is called unbiased in other cases the estimator is called biased o The variance of an estimator is the variance of the sampling distribution A var o The mean squared error of an estimator is A A A MSElt6gt bias26 var 14 Sample size and the sampling distribution o The simplest estimator is 8 the sample mean It is the most common estimator of the population mean Studies of this estimator have led to the law of large numbers As the sample size of a random sample increases the mean of the observations in the sample get closer and closer in probability to the population mean 0 Two key features of sampling distributions The spread of most sampling distributions decreases as the sample size increases 7 more data gives us more information about the statistic The variance is roughly 171 times a constant The standard deviation is roughly 1 times a constant As the sample size increases the sampling distribution tends to look more bellshaped like a normal distribu tion 15 The Central Limit Theorem o The central limit theorem describes this change in shape and spread of the sampling distribution as 71 changes 0 Reconsider the earlier examples of sampling distributions for as Normal population Retains normal shape compression of spread Uniform population Moves toward normal shape com pression of spread Skewed population Moves toward normal shape com pression of spread Biased coin Moves toward normal shape compression of spread 0 These changes hold for most sampling distributions of inter est although there are a few exceptions Later well see where the square root of n behavior comes from 16