### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Introduction to Statistical Inference and Regression ST 372

NCS

GPA 3.79

### View Full Document

## 5

## 0

## Popular in Course

## Popular in Statistics

This 10 page Class Notes was uploaded by Jordane Kemmer on Thursday October 15, 2015. The Class Notes belongs to ST 372 at North Carolina State University taught by Staff in Fall. Since its upload, it has received 5 views. For similar materials see /class/223971/st-372-north-carolina-state-university in Statistics at North Carolina State University.

## Reviews for Introduction to Statistical Inference and Regression

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/15/15

Notes ST372 Engineering Statistics Handout 1 Point Estimation One of the most important things we want to be able to do is estimate quantities In order to do this we perform experiments and collect data Lets consider the case where we want to estimate the gravitational acceleration on some unknown planet where we have just arrived We know from basic laws of motion that 1 2 W yo Eat So we can conceive of an experiment where we drop a mass from a known height yo and measure the time it takes to fall to the ground We assume a known height of one meter and observe the following times t 04727 04627 04447 04507 04607 04787 04707 04397 04517 0459 We can take the sample average as an estimate of t7 f 04982 Can we then use 2 tamp 9 290 7239 some simple algebra to show that 9 We could the transform each observation and calculate the sample mean or we could substitute f In either case we could get a estimate of the value for gravitational acceleration Method of Moments When we make estimates of parameters like this we have two approaches One we can use the method of moments approach This is based on the de nition of moments mn z fzdx as the nth moment of a density We know that the distribution mean is the rst moment and the variance is the second moment minus the square of the rst moment Knowing this and that the sample moments are de ned as 2 N mn Notes ST372 Engineering Statistics Handout 1 We can then knowing the mean and variance use the sample moments to solve for the parameters Consider the gamma distribution where 04 04 E z iVar z 7 B W After some algebra we can then nd the method of moments estimators a i U2 M B 9 Maximum Likelihood Estimators The second method of nding estimators is using the maximum likelihood principle The likelihood for a given set of data from the distribution f is M 11 The estimators for 6 that maximize the likelihood are the maximum likelihood esti mators In order to nd these estimators we take the derivative of the log likelihood with respect to 6 and set it equal to 0 and solve for 0 As an example consider observations from a normal distribution the log likelihood is then N 1 i 2 7i 10g27m2 m M 2 2 szgma2 the derivative with respect to M is Z iNuio U2 as a result the MLE for M is ijN7 the same as the moment estimator This procedure can be dif cult as in the case of the gamma distribution7 I encourage you to nd out for yourselves Notes ST372 Engineering Statistics Handout 1 UMVUE We want our estimators to be good estimators7 so naturally we want the estimators to have two properties We want them to be unbiased and to have the minimum variance of all other unbiased estimators An unbiased estimator has the property that for an estimator E 0 A minimum variance estimator has a smaller variance than any other unbiased estimator We won7t necessarily go into properties concerning proving minimum variance7 nor will we address proving an estimator is unbiased in much detail It is important to note however that the MLE has several preferable properties The MLE is the minimum variance unbiased estimator The MLE also has the very nice property of invariance7 which simply means that the MLE of a function of a parameter is the function of the MLE Returning to our example of collecting information to estimate the gravitational ac celeration We are going to assume that our observed times have a normal distribution with some mean M equal to the true time it would take an object to fall one meter In order to estimate M we can use If Z tN Which is both the MLE and the method of moments estimator The real quantity of interest is the gravitational acceleration 9 Recall we did suggest that we could simply transform the data and then calculate the sample mean The problem is that this would be a moment estimator7 which may or may not be a MVUE Instead we can use the invariance principle an calculate the gravitational acceleration using If Con dence Intervals In the previous section we discussed how to nd an estimator In this section we discuss how to quantify how good we think our estimation process was Note that l didn7t say how good our estimate was If the true value of the parameter is unknown but xed7 it makes no sense to talk about the probability that our estimator is the true value7 nor does it make sense to estimate and interval and state a probability 3 Notes ST372 Engineering Statistics Handout 1 that the true value is in that interval Recall the story of the invisible man walking his dog There is an invisible man in a town and everyday he takes his non invisible dog for a walk The townspeople want to capture the invisible rnan7 but all they can see is the dog7 which may be in front7 beside or next to the man In order to capture the man they decide to throw a net over the dog7 hoping to catch the man The question that they now come up with is how big a net do we use They decide to construct a net large enough that they can be 95 sure of capturing the man The question of the townspeople is how do we design our net have to in order for us to have 95 con dence that it will catch the man The question is one of con dence in their procedure7 not in the net Much like the townspeople7 we want to estimate an interval around a parameter estimate to express out uncertainty in the estirnate7 but we want it to be constructed in such a way that it re ects our con dence in our estimation procedure The things that can e ect our estimation and our con dence are typically7 how we nd our estimate and how many samples we take to calculate our estimate Let us begin with the idea of a mean pararneter M How we estimate this parameter is easy we will use i the MLE We also note that we have previously learned that despite the underlying distribution of z i N Nu702n We use this fact to construct our con dence interval as follows i i ZaZUx as the 1001 7 oz Con dence Interval Note that this procedure only when we know 02 or N gt 30 and we can assume In the case where N lt 30 we have to account for the additional uncertainty involved in estimating 52 In order to do this we use what is called the t or student distribution The is a symmetric distribution with a mean of 0 This distribution is Notes ST372 Engineering Statistics Handout 1 based on the normal density but has heavier tails7 as it accounts for the uncertainty in using the estimate 52 The resulting 1001 7 00 Con dence Interval is i i tmZa where V N 7 1 The values for tya2 are available in tables The nal case we consider is nding a 1001 7 00 Con dence Interval for a population proportion In order to do this we note that our experimental data must follow the constraints z gt 10 and N 7 z gt 10 Under these conditions we can use A l pi Zea2 N It is important to remember that con dence intervals only give use a measure of the following formula where f 1716 uncertainty about the estimation procedure7 not about the parameter value This is an important distinction and one we should keep in mind when using con dence intervals to communicate experimental results What is the con dence interval for the true time for an object to fall 1 m based on the sample data What is the con dence interval for g Hypothesis Tests ln science we often have ideas that we want to verify through experimentation Return to our gravity data from above7 let7s assume that we think that the gravitational acceleration for our new planet is less than earth7s gravitation acceleration 981m52 We can calculate that the true time for an object to fall 1 m on earth is 0452 seconds Our data has values more than that but also some less than that So what can we conclude Our test procedure has to account for the uncertainty of our observations To begin this procedure we have to clearly state what we want to test We need a null hypothesis and an alternative hypothesis The null hypothesis typically represents the status oluo7 or the opposite of what we believe In our case the null Notes ST372 Engineering Statistics Handout 1 hypothesis is that M 3 0452 the opposite of this7 or what we believe is the alternative hypothesis is M gt 0452 We state these as H0 M 3 0452 HA M gt 0452 In order to perform our test accounting for uncertainty we ask the question How likely or unlikely are the data we observe given that our null hypothesis is true 7 We can answer this question if we know the underlying distribution or we can use what we know about the distribution of i to calculate this probability as P95 gt ilM M0 noting that i N NM7 UZN Here note that M0 is the mean from H0 If we know the variance 02 or N gt 30 we can use the normal tables to nd P ii0 gt Z The quantity i 7 M0Un is the test statistic 2 In the case of our data this becomes P177 gt Z 0038 We are going to assume that this works even though we don7t know 039 and N lt 30 In this case we actually see that there is a pretty small chance that we would have observed this data given that the gravitational acceleration on this planet was the same or greater than earth7s As a result we can reject H0 and accept HA This probability is called the p value Can we use the same procedure to test 9 directly In cases where N lt 30 we have to use the t distribution much like we did for con dence intervals Hypothesis tests are one of three patterns Two tailed tests H0 3 M 0 HA 3 M 7 M0 6 Notes ST372 Engineering Statistics Handout 1 Upper tailed test Ho M S MA HA3MgtM0 Lower tailed test Ho M 2 MA HA3MltM0 These are called tailed tests because they refer to how to calculate the p values In the case of Lower tailed tests the p values is the area under the standard normal curve to the left of the test statistic for upper tailed test it is the area to the right of the test statistic In the two tailed case it is the area in tails de ned by the test statistic z and 72 There are two types or errors that can be made in this test procedure 04 Type 1 Error PReject HOlHO is True 6 Type ll Error PFail to Reject HOlHO is False The quantity 1 7 B is PReject HOlHO is False this quantity is called the power of a test and depends on the sample size the variance and the true value of u Typically we reject H0 when the p value is less than 005 o For hypothesis tests about the mean M when N gt 30 or the variance is known we use the test statistic i o UxN Z where z N N0 1 0 When N lt 30 and the variance is unknown we use the test statistic t ii o SW where t N tV where V N71 Notes ST372 Engineering Statistics Handout 1 o For hypothesis test concerning the population proportion we use the test statis tic z z where z N N01 Power Power can easily be calculated by noting that for one tailed test it is simply the p value for our test statistic calculated using M instead of 0 For two tailed tests the formula is a little more complicated In general we can write the formulae for type ll error 6 as Upper tailed tests 7 Moi B i Plt2ltZaUN Lower tailed tests 7 7 Moi B 71 Pltzlt ZaUN Two tailed tests 0 71 0 7M P z lt Za 7 P z lt 7Z0 5 am 2 WW where 04 is a preset Type I error rate7 usually 005 For population proportions the Type ll error rate is Upper tailed tests 6 7 P ZltP0 PZa P01P0N VPl N Lower tailed tests 6 P peep72a mumN xPl 7 p N Two tailed tests 3 Pltlt Wm 7 P poii Dizaz 1001100Ngt Zlt xPl 7 NW Notes ST372 Engineering Statistics Handout 1 Two sample cases In cases often we want to compare two di erent pararneters7 such as two population means or proportions In these cases we are typically interested in the di erences between theses pararneters p17p2 or 1017102 In these cases we can calculate con dence intervals or test statistics for these di erences using i1 7 2 or 131 7132 The problem arises when we need to nd the resulting variance and distribution needed to calculate the con dence interval or test statistics and their underlying distributions For large sarnples7 N gt 30 the variance we use is 0i L3 N N2 we can use 5 and 5 for large samples and are used as a substituted for UZN in the equations for con dence intervals as is El 7 i2 For small samples we will assume that the variances are reasonably close and let the variance estirnate be 2 2 i 2 N1 N2 the resulting underlying distribution will be a t distribution with V N1 N2 7 2 For proportions7 hypothesis testing we use the variance is M 1 1 10 where 16 1 zgN1 N2 For con dence intervals we use the variance Paired Data In the case where the comparison is being made on paired data Paired data is data where the two observations are made from one subject7 before and after treatments are examples of this In this case our data becomes the di erence of the two observations and the di erence is treated as a single sample set of data Notes ST372 Engineering Statistics Handout 1 Some Comments on Testing It is important to consider the di erence between statistical and practical signi cance when reporting results of an experiment There can be a statistically signi cant di erence that is practically not signi cance In any test procedure we naturally want it to be the most powerful test we can have This is accomplished by minimizing the variance This is done by increasing sample size7 improving the accuracy of our measurements or a combination of the two Often sample size increase is economically intractable7 this underscores the need for care to be taken in measuring and collecting data

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.