Introduction to Statistical Methods
Introduction to Statistical Methods STAT 301
Popular in Course
Popular in Statistics
Mrs. Triston Collier
verified elite notetaker
This 21 page Class Notes was uploaded by Mrs. Triston Collier on Thursday September 17, 2015. The Class Notes belongs to STAT 301 at University of Wisconsin - Madison taught by Staff in Fall. Since its upload, it has received 28 views. For similar materials see /class/205083/stat-301-university-of-wisconsin-madison in Statistics at University of Wisconsin - Madison.
Reviews for Introduction to Statistical Methods
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/17/15
STAT 301 TA Lisa Chung lchungstatwiscedu DISCUSSION 11 Nov 22 2004 Central Limit Theorem If the random sampling is from a normal population with mean p and std a then for any n the X 7 distribution of X is exactly normal with mean u std and W U Z 039 If the random sampling is from an arbitrary population with mean u and std a then when n is XT URZ large n 2 30 the distribution of X is close to normal with mean u std and 039 Statistical Inference Statistical inference deals with drawing conclusions about population parameters from an analysis of the sample data Point Estimator and Standard ErrorSE A statistic intended for estimating a parameter is called a point estimator or simply an estimator The standard deviation of an estimator is called its standard error Point Estimation of the Mean Estimator X SEX Estimated SE X 5 Example 1 The mean and standard deviation of the strength of a packaging material are 55 and 7 pounds respectively If 40 specimens of this material are tested a What is the probability that the sample mean strength X will be between 54 and 56 b Find the interval centered at 55 where X will lie with probability 95 Example 2 Consider a random sample of size n100 from a population that has a standard deviation of 020 a Find the probability that the sample mean X will lie within 2 units of the population mean that isP72 X 7 2 g 2 b What is the probability that X will differ from u by more than 4 units Example 3 Which of followings are point estimates a You are standing in line at a theme park and the sign says it takes 15 minutes to reach the front of the line from here b When calling an 800 numbers for information you are placed on hold and are told by an elec tronic voice that the wait should be about 3 minutes c The package of a frozen pizza reads quotcook for 10 12 minutes Example 4 You have just purchased your favorite chocolate bar and the package reads quot602 a Interpret this as a point estimate What is the population b What is the quantity or feature of bars that is being estimated Off Hour W 100 300 pm 1 1275A MSC 262 1577 STAT 301 TA Lane Burgette burgettestatwiscedu DISCUSSION 10 April 19 20 Statistical Inference Statistical inference deals with drawing conclusions about population parameters from an analysis of the sample data Point Estimator and Standard EerrSE A statistic intended for estimating a parameter is called a point estimator7 or simply an estimator The standard deviation of an estimator is called its standard error Point Estimation of the Mean Estimator X SEX VAT 7 Estimated SE X 7 For large n the 1001 a error margin dis Z0420 f ZaZSEX If a is unknown7 use S in place of a Determing the Sample Size To be 1001 a sure that the error of estimation lXi pl does not exceed d7 the required sample size isZ aZU 2 T Example 1 Determine the point estimate of the population mean u and its 1001 7 a margin of error in each case a n 1507 928627 s 9567 1 a 975 b n 707 8527 2ziii2 2151 oz 95 Example 2 Fifty eight trout caught in a lake had average weight 437 pounds and standard deviation 161 pounds Hon these data estimate the mean weight of catchable trout in this lake and give a 90 error margin Example 3 For each case7 determine the sample size n that is required for estimating the popu lation mean The population standard deviation 0 and the desired error margin are speci ed a a 38 95 error margin 75 b a 1257 80 error margin 45 Example 4 Assume that the standard deviation of the heights of ve year old boys is 35 inches How many veyear old boys need to be sampled if we want to be 90 sure that the population mean heightis estimated within 5 inches Off Hour R 230 430 1 1245F MSG STAT 301 TA Lisa Chung lchung statwiscedu DISCUSSION 9 Mar 28 2004 0 Test for mean when a is known or large sample ba39rXino a n 1 7 a100 Acceptance Region 1 Ho 3MM0 VSHA infLo 039 039 M0Z 7 M0Z 2H0uu0vsHAugtu0 039 7007 M0 23 3 H0uu0vsHAultu0 00 a M0 Za v 0 Test for mean with unknown 0 and small sample T Ii 25 s n71 If its reasonable to assume that the population is normal7 then for small n7 a 1001 a con dence interval for In is with degree of freedom n 7 1 0 Proportion X has a binomial distribution X N Binnp where p is unknown Let x be the observed value of X 7 and use the number x to make an inference about the unknown value of p Point estimate is The con dence interval for 7139 is A 7 17 7 A 7 17 wizam 7rza2 T Of ce 1335 MSC7 263 5948 lOf ce Hour Wed100 200 and Thurs 1100 1200 STAT 301 TA Lisa Chung lchungstatwiscedu 1 7 oz100AcceptanceregionforHO 7r 71390 7139177r 7139177139 072042 7W0sz gt Example 1 Consider the distribution of serum cholesterol levels for all 20 to 74 year old males living in the United States The mean of this population is 211 mgdL7 and the standard deviation is 460mgdL In a study of a subpopulation of such males who smoke and are hypertensive7 it is assumed that the distribution of serum cholesterol levels is normally distributed with unknown mean u 7 but with the same sd a as the original population a Construct the hypothesis for testing whether the serum cholesterol level of smokers is equal to the known population mean b Sample mean of i217 mgdL is observed from a sample of n12 hypertensive smokers Construct 95 CI for the true meanof this subpopulation c Calculate value of this sample d Check whether the null hypothesis is rejected at 04005 e Determine 95 acceptance region and complementary rejection region for the null hypothesis Example 2 Two physicians are having a disagreement about the effectiveness of chicken soup in relieving common cold symptoms While both agree that the number of symptomatoc days generally follows a normal distribution7 one claims most colds last about a week7 soup makes no difference7 whereas the other argues that it does a Construct the hypothesis for testing b After treating a random sample of 16 patients with chicken soup7 they get a mean number of symp tomatic days i547 and standard deviation s36 days Test the hypothesis c One claims The sample size was too small There was not enough power to detect a statistically signi cant difference between u7 days and say u55 days7 even if there was one presentCalculate the minimum sample size required in order to achieve about 80 power of detecting such a genuine difference7 if needed one actually exists Example 3 Proportion posted exam Of ce 1335 MSC7 263 5948 2 Of ce Hour Wed100 200 and Thurs 1100 1200 STAT 301 TA Lisa Chung lchungstatwiscedu DISCUSSION 13 April 25 2004 Correlation 0 Population XY are numerical random variables p7 0 71 p 1 XUY O39XYU C3901XY7 0X VamX7 Ty VarY o Sampleofsizenpr97 71 p 1 51y 22195139 7 7 g o Inference for p H0 p 0 ie no linear correlation between X and Y vs HA p 7 0 Test Statistic T N tn Simple Linear Regression Y 60 61X E o k2 parameters7 regression coef cients 0 Estimate 3031 that minimizes the Erroror Residual Sum of Squares 7 Slope b1 1 7 7 Intercept b0 3 Q 7 31 7 Then 1 be M o Inference for regression coef cients 7 1 7 a100 Con dence limits 7 2 For 30 350 i tutzmizsex 7 2 For 51 3 b1 i tutmass 7 Test Statistics For 30 T 7090 N tn72 For 61 T blg wsm m E where s 51 Sm n 7 Us 0 ANOVA Formulation 7 SSTotal 2219 7 g27 dfTotal n 7 1 7 SSReg 7 y With deeg k 71 7 SSE I TO 21 7 397027 de39r39ro39r n 7 k Examples 1 A survey was conducted by sampling 400 persons who were questioned regarding union membership and attitude toward decreased national spending on social welfare program Of ce 1335 MSC7 263 5948 10f ce Hour Wed100 200 and Thurs 1100 1200 STAT 301 TA Lisa Chung lchung statwiscedu 2 p5167 32 O ice 1335 MSC7 26375948 ZO ice Hour Wed101200 and Thurs 11001200 STAT 301 TA Lisa Chung lchungstatwiscedu DISCUSSION 14 Dec 13 2004 Examples 1 An investigator interested in estimating a population mean wants to be 95 certain that the error of estimation does not exceed 25 What sample size should be she use if a 18 2 A sample of 64 measurements provide the sample mean i 876 and the sample standard devi dation s184 For the population mean construct a a 90 con dence interval 1 99 con dence interval 3 In each case identify the null hypothesis H0 and the alternative hypothesis H1 using the appro priate symbol for the parameter of interest a Subsoil water specimens will be analyzed to determine whether there is convincing evidence that the mean concentration of a chemical agent has exceeded 008 b The setting of an automatic dispenser needs adjustment when the mean ll differs from the intended amount of 16 ounces Several lls will be accurately measured to decide whether there is a need for resetting 4 In a problem of testing H0 u 75versusH1 u gt 75 the following sample quantities are recorded n 56 927704 s680 a State the test statistic and nd the rejection region with 04 05 1 Calculate the test statistic and draw a conclusion with 04 05 c Find the P value and interpret the result Off Hour W 100 300 pm 1 1275A MSC 262 1577 STAT 301 TA Lisa Chung lchung statwiscedu DISCUSSION 11 Mar 11 2004 Two Samples Means When independent 0 1 7 a100 Con dence Interval for M1 7 pg 2 2 2 2 7 2 7 7 2 711 712 0 Test statistic for H0 M1 7 p2 uo X 7 X 7 1 52 2C2 0 Nlt071gt vee When two samples are dependent7 calculate the difference of each matched pair of observatiosthereby forming a single collapsed sample7 then apply the appropriate one sample test 2 Small samples When independent o If m lt 30 andor 712 lt 307 then use t distriloution7 provided H0 a 0 informally lt 97 lt 4 9 0 Then common value of a and 0 can be estimated by the weighted mean of s and 5 n1 71s n2 71s 2 Spooled n1 712 7 2 A 1 1 839539 312200151X o 1 7 a100 Con dence Interval for M1 7 pg 7 7 2 1 7 7 2 1 901 7 952 tdfg Spoole nj 772 951 7 2 tdfg Spoole n 772 dfn1n272 0 Test statistic for H0 M1 7 p2 uo 1m 1X2 tdf xSioozgdrll712 Variance If X1 Nu1701X2 NW H0 af ag then F me 2 Proportion Of ce 1335 MSC7 263 5948 10f ce Hour Wed100 200 and Thurs 1100 1200 STAT 301 TA Lisa Chung lchungstatwiscedu o The con dence interval for 7r1 7 7r2 AliA AliA AliA AliA fiif2Za21 m 7 7r2 Wat 7f1 7f2Za21 7r1 7 7r2 7r2 771 772 7741 n2 0 TestforH07r177r2O A X X 1 1 711700le 171227 8395390 WPOOleu 7 Wpooled m Test statistic Z N01 0 0 Alternative method X2 test for H0 7r1 7 7r2 0 017 7E 2 X2 z lt xt Example 1 The arrival time of my usual morning busB is normally distributed with a neam ETA at 8 am and a sd of 4 minutes My arrival time A at the bus stop is also normally distributed with a mean ETA at 7 50 amand a sd of 3 minutes With what probability can I expect to catch the bus Example 2Assume that population cholesterol level is normally distributed a Consider a small clinical trial design to measure the ef cacy of a new cholesterol lowing drug against placebo A group of six high cholesterol patients is randomized to either a treatment arm of a control arm b Now imagine that the same drug is tested using another pilot study with a different design Serum cholesterol levels of 3 patients are measured at the beginning of the study then remeasured after six month treatment period on the drug Example 3Test of independence Imagine that a marketing research study surveys a random sample of n2000 consumers about their responses regarding two brands of a certain productConsider the null hypothesis H0 713MB WA BC ie the probability of liking A given that B is liked is equal to probability of liking A givan that B is not liked Of ce 1335 MSC 263 5948 2 Of ce Hour Wed100 200 and Thurs 1100 1200 STAT 301 TA Lisa Chung lchung statwiscedu DISCUSSION 4 Feb14 2004 Properties of Mathematical Expectation o For any constant c EcX cEX o For any two random variables X and Y EX Y EX EY and EX Y EX EY Conditional Probability o The conditional probability of A given B is denoted by PAB and de ned by the formula PA o B P A B lt w gt 133 0 Multiplication Law of Probability PA O B PAPBA PBPAB o PB0A 17 PBA Statistical Independence Two events A and B are independent if PA O B PAPB or PAB PA or PBA PB General Facts 0 Law of Additivity PA U B PA PB 7 PA O B If A and B are mutually exclusive PA U B PA PB If A and B are independent PA U B PA PB 7 PAPB o If X and Y are independent random variables VarX Y VarX VarY and VarX Y VarX VarY Examplel The medical records of the male diabetic patients reporting to a clinic during one year provide the following percentages Suppose a patient is chosen at random from this group and the event A B and C are de ned as follows Of ce 1335 MSC 263 5948 10f ce Hour Wed100 200 and Thurs 1100 1200 STAT 301 TA Lisa Chung lchungstatwiscedu A Hehasaseriouscase B Heisbelowllol7 C Hisparentsarediabetic a Find the probabilities PA7 PB7 PAc B7 andPA U C b Suppose that a patient will be chosen at random from the group of patients who are below 40 What is the probability that this patient will have a serious case of the disease Explain how this can be interpreted as a conditional probability c Calculate the following conditional probabilities and interpret them PAclB7 PClA Example2 Construct the probability table and probability histogram for both independent ran dom variableX7 Y below7 and their difference DX Y7 respectively Calculate the mean and the variance of X7 Y7 and D7 respectively Of ce 1335 MSC7 263 5948 2 Of ce Hour Wed100 200 and Thurs 1100 1200 STAT 301 TA Lisa Chung lchungstatwiscedu DISCUSSION 2 Sep 20 2004 Conditional Probability The conditional probability of A given B is denoted by PAlB and de ned by the formula PAlB 13133 Multiplication Law of Probability PAB PAPBlA PBPAlB PBlA17PBlA Independence of two events Two events A and B are independent if PAB PAPB or PAlB PA or PBlA PB Law of Additivity PA U B PA PB 7 PAB If A and B are mutually exclusive PA U B PA PB If A and B are independent PA U B PA PB 7 PAPB Example 1 An urn contains two green balls and three red balls Suppose two balls will be drawn at random one after another and without replacement a Find the probabilities of the events A A green ball appears in the rst draw B A green ball appears in the second draw b Are the two events independent Why or why not Example 2 If PA 5 PB 5 and PA or B 8 a Calculate 1 P AlB ii PAB b Are A and B independent Why or why not c Can A and B be mutually exclusive Why or why not Example 3 In a region 15 of the adult population are smokers 086 are smokers with em physema and 024 are nonsmokers with emphesema a What is the probability that a person selected at random has emphysema b Given that the selected personis a smoker what is the probability that this person has emphy sema Off Hour W 100 300 pm 1 1275A MSC 262 1577 STAT 301 TA Lisa Chung lchungstatwiscedu 0 Given that the selected person is not a smoker7 what is the probability that this person has emphysema Off Hour W 100 300 pm 2 1275A MSC7 262 1577 STAT 301 TA Lisa Chung lchungstatwiscedu DISCUSSION 9 Nov 8 2004 1Correlation Coef cient The correlation coef cient7 denoted by r7 is a measure of strength of the linear relation between the z and y variables A V Sm V 5212 where Szy 295 7 W6 7 21 Sm 7 932 52 Correlation Coe icient Alternative way Sm ZWV M932 52 26402 WW 7 Example 1 For a data set of my pairs7 one nd that n 26 21 1287 Z yi120772xi2 66831 26 59059 2 62262 Calculate the correlation coeffcient Example 2 The following table gives the federal de cit and the number of golfers in the United StatesDetermine the correlation between number of golfers and the federal de cit 2 Linear Regression Example 3 Given ve pairs of values7 Off Hour W 100 300 pm 1 1275A MSC7 262 1577 STAT 301 TA Lisa Chung lchung statwiseedu a Calculate 527 g 5227 SW andSyy l7 Calcualte the least squares estimates oa d l 0 Determine the tted line O Hour W 100300 p111 2 1275A MSC7 26271577 STAT 301 TA Lane Burgette burgette statwiscedu DISCUSSION 4 Feb 15167 2005 Note There was some confusion last week regarding when order matters and when it does not This probably won t help much7 but here are some loose guidelines Unless you deal with each of your draws di rerently7 it probably doesn t matter Note that imposing one particular order is the same as not considering order We do consider order when the rst draw takes on one role7 the second takes on another7 and so on Random Variables A random variable X is a rule or function that assigns one and only one numerical value to each simple event of an experiment 1 Discrete Random Variable it has a nite or in nite many values which can be arranged in a list 2 Continuous Random Variable it has all possible values in an interval We can measure to any desired accuracy Example Identify the variable as a discrete or continuous random variable a The loss of weight following a diet program b The seating capacity of an airplane c The number of cars sold at a dealership on one day d The percentage of fruit juice in a drink mix e What is an example of discrete rv that has in nitely many values Probability Distribution The probability distribution of a discrete random variable X is a list of the distinct numerical values of X along with their associated probabilities Example Let the random varible X represent the maximum of two tosses of a die a List the possible values of X b Obtain the probability distribution of X Expected Value and Variance o The Mean or the Expected Value of X M EX Zifi o Variance of X 02 VarX i 7 V2 0 Standard Deviation a sdX VaTX Off Hour R 230 430 1 1245F MSC STAT 301 TA Lane Burgette burgette statwiscedu Example Given the following distribution7 nd EX7 02 and a X fx 2 5 3 3 4 1 5 05 6 05 Off Hour R 230 430 2 1245F MSC STAT 301 TA Lisa Chung lchung statwiscedu DISCUSSION 5 Oct11 2004 Binomial Table Example 1 Using Binomail Table7 nd the probability of a Five or fewer successes in 9 trials then p7 b No more than 11 and no less than 6 successes in 16 trials when p6 Example 2 Consider four Bernoulli trials with success probability p7 in each trial Find the proba bility that a All four trials result in successes b There is at least one success Example 3 For the binomial distribution with n4 and p257 nd the probability of a Three or more successes b Two or more failures Example 4 If in three Bernoulli trials Pall three are successes 0277 what is the probability that all three are failures Example 5 Let Y be a binomial random variable with n 10 and u 10 Then PY gt 2 7 Off Hour W 100 300 pm 1 1275A MSC7 262 1577 STAT 301 TA Lane Burgette burgette statwiscedu DISCUSSION 10 April 12 13 Sampling Distribution of a Statistic and the Central limit Theorem 0 Parameter A parameter is a numerical descriptive measure of the population It is calculatedestimated from observations in the population Eg u or 02 o Statistic A statistic is a numerical descriptive measure of a sample It is calculated from observations in the sample It does not include any parameters 0 Sampling Distribution The probability distribution of a statistic is called its sampling distribution 0 Mean and Standard Deviation of X Theidistribution of the sample mean7 based on a random sample size of n has mean EX u and sdX 0 Central Limit Theorem lf X17 X27 Xn quotN N02 0 for any n the distribution ofX is exactly normal with mean u standard deviation and X u Z If the random sampling is from an arbitrary population with mean u and standard deviation 0 then when n is largen 2 307 the distribution of X is close to normal with mean u standard deviation and X u x Z Example 1 A random sample of size 2 will be selected7 with replacement7 from the set of numbers 27 4 6 a List all possible samples and evaluate i and 52 for each b Determine the sampling distribution of X c Determine the sampling distribution of 2 Example 2 A population has mean 99 and standard deviatioin 7 Calculate EX and sdX for a random sample of size a 4 and b 25 Example 3 The heights of male students at a university have a nearly normal distribution with mean 70 inches and standard deviation 28 inches lf 5 male students are randomly selected to make up an intramural basketball team7 what is the probability that the heights of the team will average over 720 inches Example 4 The weight of an almond is normally distributed with mean 05 ounces and standard deviation 015 ounces Find the probability that a package of 100 almonds will weigh between 48 and 53 ounces That is7 nd the probability that X will be between 048 and 053 ounces OH R 230 430 pm 1 1245F MSC STAT 301 TA Lisa Chung lchung statwiscedu DISCUSSION 4 Oct04 2004 Expected Value and Variance o The Mean or the Expected Value of X M EltXgt 2mm 0 Variance of X 02 VarX i 7 V2 0 Standard Deviation a sdX VarX Example 1 Given the following distributionfind EX702 and a x fx mmeww bbF39w39cn an Example 2 The probability distribution of a random variable X is given by the function mi 5 4 78437forx0717273 a Calculate the numerical probabilities and list the distribution b Calculate the mean and standard deviation of X Bernoulli Trials and Binomial Distribution 0 Bernoulli Trials 1 Each trial has two possible outcomes success and failure 2 For each trial7 the probability of success PS is the same 3 Trials are independent 0 Binomial Random Variable X the number of successes in n Bernoulli trials with PS p o Binomial Distribution 1 X N Binnp 2 Distribution function Off Hour W 100 300 pm 1 1275A MSC7 262 1577 STAT 301 TA Lisa Chung lchung statwiscedu 3 EX np VarX npq7 sdX VarX 1mpg Example 3 Consider the Bernoulli trials with success probability p14 a Find the probability that four trials result in all failures b Given that the rst four trials result in all failures7 what is the conditional probability that the next four trials are all success Example 4 According to the Mendelian theory of inherited characteristics a cross fertilization of related species of red and white owered plants produces a generation Whose offspring contain 25 red owered plants Suppose that a hortriculturist wishes to cross 5 pairs of the cross fertilized species Of the 5 offspring7 what is the probability that a There will be one red owered plants b There will be 4 or more red owered plants Off Hour W 100 300 pm 2 1275A MSC7 262 1577