### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# STATISTICAL ANALYSIS EXST 2201

LSU

GPA 3.9

### View Full Document

## 91

## 0

## Popular in Course

## Popular in Statistics and Probability

This 69 page Class Notes was uploaded by Mrs. Hailee Fadel on Tuesday October 13, 2015. The Class Notes belongs to EXST 2201 at Louisiana State University taught by M. McKenna in Fall. Since its upload, it has received 91 views. For similar materials see /class/223044/exst-2201-louisiana-state-university in Statistics and Probability at Louisiana State University.

## Reviews for STATISTICAL ANALYSIS

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/13/15

are normally distributed with 039 12 minutesln a study the chain recon gured five restaurants to have a single line and measured the wait times for 50 randomly selected customers The sample standard deviation was determined to be minute Does the evidence indicate that the vari ability in wait time is less for a single line than for multiple lines at the a 005 level of signi cance Reject H0 5084 E3 i a 72 74 71 70 77 75 77 72 75 73 75 73 Source espncom Heights of Baseball Players Data obtained from the Nation al Center for Health Statistics show that men between the ages of 20 and 29 have a mean height of 693 inches with a standard deviation of 29 inchesA baseball analyst believes that the standard deviation of heights of majorleague base ball players is less than 29 inches The heights in inches of 20 randomly selected players are shown in the table Comparing Three or More Means One7Way Anaylsis of Variance C719 a Verify that the data are normally distributed by rawing a normal probability p ot b Gainlp neidtesm tandard deviation 2059 inches c Test the belief at the a 001 level of significance NCAA Softball NCAA rules require the circumference of a softball to be 12 i 0125 inchesA softball manufac turer bidding on an NCAA contract is shown to meet the requirements for mean circumference Suppose the NCAA also requires that the standard deviation of the softball circumferences not exceed 005 inch A repre sentative from the NCAA believes the manufacturer does not meet this requirement She collects a random sample of 20 softballs from the production line and finds that s 009 inch Do you believe the balls conform Use the a 005 level of significance Yea reject H 76 72 21 PValues Determine the exact Pvalue of the hypothe 73 sis test in Problem 9 P value O 46 74 N N PValues Determine the exact Pvalue of the hypothe sis test in Problem 10 P value 05199 4 Comparing Three or More Means OneWay Analysis of Variance Preparingfor This Section Before getting started review the following Completely randomized design Section 15 pp 42743 Nature of hypothesis testing Section 101 pp 4547460 Obj ectives b 2 Note to Instructor lt i ieaoinirieiided that hawl C JlTiriilTZi ie l zl reniphaaizeal in this auction on The iiiiiiihei Definition In Other Words ln ANOVA the null hypotheaia ia alwaya thatthe rneana of the different populationa are equal The alternative hypotheaia ia alwaya that the mean of at leaat one population ia different thorn the otheha 0 Comparing two population means Section 112 pp 521528 Normal probability plots Section 74 pp 354358 39 Boxplots Section 35 pp 1617164 Verify the requirements to perform a oneeway ANOVA Test hypotheses regarding three or more means using oneeway AN OVA In Section 112 we compared two population means lust as we extended the concept of comparing two population proportions Section 113 to comparing three or more population proportions Tests for Homogeneity of Proportions Section 122 we now extend the concept of comparing two population means to comparing three or more population means The procedure for doing this is called Analysis of Variance or ANOVA for short Analysis of Variance ANOVA is an inferential method that is used to test the equality of three or more population means For example a family doctor might want to show the mean HDL so called good levels of cholesterol of males in the age groups 20 to 29 years old 40 to 49 years old and 60 to 69 years old are different To conduct a hypothe sis test we assume the mean HDL cholesterol of each age group is the same If we call the 20 to 29yearolds population 1 40 to 49yearolds population 2 and 60 to 69yearolds population 3 our null hypothesis would be H03 M1 M2 M3 C720 Topics to Discuss 39 CAUTION Do not test H0 m M M3 by conducting three separate hypothesis tests because the probability of making a Type I error will be much higher than a 39CAUTIO N C It is vital that individuals be randomly assigned to treatments versus the alternative hypothesis H1 At least one of the population means is different from the others As another example a medical researcher might want to compare the effect different levels of an experimental drug have on hair growth The researcher might randomly divide a group of subjects into three different treatment groups Group 1 might receive a placebo once a day group 2 might receive 50 mg of the experimental drug once a day and group 3 might receive 100 mg of the experi mental drug once a dayThe researcher then compares the mean numbers of new hair follicles for each of the three treatment groups The three different treat ment groups correspond to three different populations It is tempting to test the null hypothesis H0 m M L3 by comparing the population means two at a time using the techniques introduced in Section 112 If we proceeded this way we would need to test three different hypotheses H03 M1 M2 Each hypothesis would have a probability onype I error rejecting the null hypothesis when it is true of a If we used an a 005 level of significance each hypothesis would have a 95 probability of rejecting the null hypothesis when the alternative hypothesis is true ie a 95 probability of making a cor rect decision The probability that all three hypotheses correctly reject the null hypothesis is 0953 086 assuming the tests are independent There is a 7 09 3 1 7 086 014 or 14 probability that at least one hypothesis will lead to an incorrect rejection of H0 A 14 probability of a Type I error is much higher than the desired 5 probability As the number of populations that are to be compared increases the probability of making a Type I error using multiple Ztests for a given value of 1 also increases To address this problem Sir Ronald A Fisher 189071962 introduced the method of analysis of variance Although it seems strange to name a procedure that is used to compare means analysis of variance the justification for the name will become clear as we see the procedure in action The procedure used in this section is called oneway analysis ofvariance be cause there is only one factor that distinguishes the various populations in the study For example in comparing the mean HDL cholesterol levels of males the only factor that distinguishes the three groups is age In comparing the effective ness of the hair growth drug the only factor that distinguishes the three groups is the amount of the experimental drug received 0 mg 50 mg or 100 mg In performing oneway analysis of variance care must be taken so that the subjects are similar in all characteristics except for the level of the treatment Fisher stated that this is easiest to accomplish through randomization which neutralizes the effect of the uncontrolled variables In the hair growth example the subjects should be similar in terms of eating habits age and so on by ran domly assigning the subjects to the three groups and H03 M1 M3 and H03 M2 M3 Verify the Requirements to Perform a One Way ANOVA To perform a oneway AN OVA test certain requirements must be satisfied Requirements of a OneWay ANOVA Test 1 There are k simple random samples from k populations N The k samples are independent of each other that is the subjects in one group cannot be related in any way to subjects in a second group 03 The populations are normally distributed The populations have the same variance that is each treatment group has population variance 02 P Comparing Three or More Means OnerWayAiiaylsis ofVayiance C721 ose we are testing a hypothesis regarding k 3 population means so that the null hypothesis is H03 1 2 3 and the alternative hypothesis is H1 at least one of the population means is different from the others Figure 1a shows the distribution of each population if the null hypothesis is true and Figure 1b shows what the distribution of each population might look like if the alternative hypothesis is true Figure 1 3 l The methods of onerway ANOVA e robust so small departures from the requirement of normality will not significantly affect the resuls of the procer dure In addition the requirement of equal 39 39 to be strictly adhered to especially if the sample size for each treatment group is the sameTherefore it is worthwhile to design an experiment in which the same ples from L r 39 roughly equal in size We can verify the requirement of normality by constructing normal probar bility plots The requirement of equal population variances is more difficult to verify However a general rule of thumb is as follows In Other Words Try to d elgn Experlmente that use ANOVA w k e the same elZE r Note to Instructor 1L r n oorv adlotoly that W re ex populamm varlanv n we way AN WA but quot lelllr vg E0 make W a 11 room when comparmg two means We laaeou Hes m the Fact that thew we Verifying the Requirement of Equal Population Variances pm The onerway ANOVA procedures may be used provided that the largest sample standard deviation is no more than two times larger than the small est sample standard deviation Testing the Requirements of OnerWay AN OVA Prohleni Researcher Jelodar Gholamali wanted to determine the effective ular diet supplemented with a herb fenugreek Group 3 rats were served a regular diet supplemented with garlic Group 4 rats were served a regular IIIIE I cowl01 Fenugreek Garlic onion lu onionThe basis for the study is that Persian folkr 1774 2997 I L quot n fpnnwrppk arlic or onion help to 2968 2407 2022 2583 treat diabetes After 15 days of treatment the blood glucose was measured 2673 2394 1631 2363 in milligrams per deciliter mgdL The results presented in Table 1 are 2567 2077 1847 2440 based on the results published in the articleVerify that the requirements AK ati fied V yr 39lplnrlar F A pt 2 F ffprt 2921 2257 1979 2671 of enugreek Onion and Garlic on Blood Glucose and Histopathology of Pane 2329 2308 1646 2971 creas of Alloxaannduced Diabetic Rm Indian Joumm 0f Medicm Science 2603 2066 1939 2499 2005596469 2838 2133 1581 2651 C722 Topics to Discuss Approach We must verify the previously listed requirements Solution 1 The ram were randomly assigned to each treatment group 2 None of L quot 39 i relatedin anyway L r39 39 A pendent 3 Figure 2 shows the normal probability plots for all four treatment groups All the normal probability plots are roughly linear so we con clude that the sample data come from populations that are approxir mately normally distributed Figure 2 4 The sample standard deviations for each sample are computed using MINITAB and presented in Figure 3 The largest standard deviation is 2118 mgdL and the smallest standard deviation is 1349 mgdL Be cause the largest standard deviation is not more than two times larger than the smallest standard deviation 241349 2698 gt 2118 the re quirement of equal population variances is satisfied gure 3 Descriptive Statistics Control Fenugreek Garlic Onion arla e N ean SE Mea tDeV mmum 1 Medlan g3 Maxlmum Control 5 0 27555 531 1503 25570 25215 25335 29110 29550 Fenugreek 5 0 22415 477 1349 20550 20910 22740 23725 24070 Garllc 5 0 15024 503 1705 15510 15345 15105 19590 20220 Oman 5 0 27100 749 2115 24400 25205 25510 29453 29970 Because all four requirements are satisfied we can perform a onerway ANOVA a Test Hypotheses Regarding Three or More Means Using OneoWay ANOVA 39quotL 39 39 39 anal i w 39 439 virtual iy an 4 L When using software it is easiest to use the P n L 39 L39 L 39 quot cision rule is always the same regardlex of the type of hypothesis being tested Decision Rule in the OneWay ANOVA Test If the Pival ue is less than the level of significance oz reject the null hypothesis EXAIUPLF 2 Performing One rWay AN OVA Using Technology Problem The researcher in Example 1 wishes to determine if there is a difr ference in the mean t e um C 4 L at the oz 005 level of significance Comparing Three or More Means OneaWayAllayls39ls ofValriallce C723 Approach We will use MINITAB Excel and a T184 Plus graphing calcular tor to test the hypothesis If the Pavalue is less than the level of significancewe reject the null hypothesis Result The researcher Wanm to show there is a difference in the mean glucose ong the ion The null 39 39 rlifferenrn 39 39 ucnk rnent groups is the same So the null hypothes39s is H01Heonuol menugreek ugnlrc Manon versus the alternative hypothesis H at least one of the population means is different from the others Figure 4a shows the output from MINITAB Figure 4b shows the output from Excel and Figure 4c shows the output from a T184 Plus graphing calculator Figure 4 OneWay ANOVA Control Fenugreek Garlic Onion Source DF 55 p Factor 3 50091 16697 5321 0000 Error 23 E032 2E7 Total 31 53122 S1694 RSqE61E l2qu adj E470 Indlvldual 95 CIs For Mean Based on Pooled StDev Level N Mean StDev quota 7777777 quota eeeeee a Control 3 27356 0 Fenugreek 3 22416 Garllc a 13024 Onlon a 27100 Pooled 5 Dev 1e94 aM139NT1quotAB Output Anrlva Single Factor E E B Drum E 271 AND Sourc of VSHGUOF SS df MS P efue FEM Between Grunps 58898 59 3 1EEBE 9 SB 2891 3 7A 2 2 BAEEES WIU ml Gioups 8831 615 25 288 E434 Tulal 58122 31 31 h ExcelOutput Onewas HNDUH Onewas HNDUH 3 quot F58 28989526 1 3 743987 3912 MS16696897 Error dP 8 8898 6989 466961 89 311784 Plus Output C724 Topics to Discuss Figure 5 Notice that MINITAB indicates that the Pvalue is 0000 This does not mean that the Pvalue is 0 but instead means that the Pvalue is less than 00001 The output of Excel and the TI84 Plus confirm this by indicating the Pvalue is 37 X 1012 Because the Pvalue is less than the level of signifi cance we reject the null hypothesis There is sufficient evidence to conclude that at least one of the population means for glucose levels is different from the others Whenever you perform analysis of variance it is always a good idea to pres ent visual evidence that supports the conclusions of the test Sidebyside box plots are a great way to help see the results of the ANOVA procedure Figure 5 shows the sidebyside boxplots of the data presented in Table 1 The boxplots support the ANOVA results from Example 2 Blood Glucose Levels Control L Fenugr eek Garlic Onion I I I I I I I I 150 175 200 225 250 275 300 Glucose mgd1 Now Work Problems 11ad A Conceptual Understanding of OneWay AN OVA Look again at Figure 4 You may have noticed that each of the three outputs included an F value F 5821 We now illustrate the idea behind the F test statistic Remember in testing any hypothesis the null hypothesis is assumed to be true until the evidence indicates otherwise In testing the hypothesis regarding k population means we assume that M1 M 4 Mk u That is we assume that all k samples come from the same normal population whose mean is y and variance is 02 Table 2 shows the statistics that result by sam pling from each of the k populations Sample Standard Population Sample Size Sample Mean Deviation 1 n1 E1 51 2 n2 2 S2 3 713 E3 53 k nk Ek sk Comparing Three or More Means 0ne7WayA11aylsis of Variance C725 The computation of the F test statistic requires that we understand mean squares A mean square is an average mean of squared values For example any variance is a mean squareThe F test statistic is the ratio of two mean squares If the null hypothesis is true then each treatment group comes from the same population whose mean is p and whose variance is 02 The sample mean of the entire set of data all treatment groups combined is a good estimate of M We will call this sample mean E Similarly the sample mean of Sample 1 or treatment 1 will be 31 the sample mean of Sample 2 or treatment 2 will be 2 and so on Finding an estimate of 0392 is somewhat more complicated One approach is to estimate 0392 by computing a measure of variation in sample means from one treatment group to the next weighted by the sample size of the corresponding treatment group We call this value the mean square due to treatment denoted MST It is computed as follows r1151 7 E nz z 7 E nkak 7 if k 7 1 If the null hypothesis is true then MST is an unbiased estimator of 02 the vari ance of the population A second approach to estimating 0392 is to compute the sample variance for each sample or treatment and then to find a weighted average of the sample variances We call this the mean square due to error denoted MSE The mean square due to error is an unbiased estimator of 0392 whether or not the null hy pothesis is true It is computed as follows quot1 1Sinz 13 quot t quotk Usfc nik MST 1 MSE 2 where n is the total size of the sample In other words n m n2 H nk The F test statistic is the ratio of the two estimates of 02 7 mean square due to treatment 7 MST mean square due to error MSE If the null hypothesis is true both MST and MSE provide unbiased estimates for 02 the population variance So if the null hypothesis is true we would eXpect the F test statistic to be close to 1 However if the null hypothesis is not true at least one of the sample means from a treatment group will be far away from E the sample mean of the entire data setThis will cause MST to be large relative to MSE which ultimately leads to an F test statistic substantially larger than 1 We now present the steps to be used in the computation of the F test statistic Computing the FTest Statistic Step 1 Compute the sample mean of the combined data set by adding up all the observations and dividing by the number of observations Call this value E Step 2 Find the sample mean for each sample or treatment Let E1 repre sent the sample mean of sample 1 2 represent the sample mean of sample 2 and so on Step 3 Find the sample variance for each sample or treatment Let 5 represent the sample variance for sample 1 5 represent the sample vari ance for sample 2 and so on Step 4 Compute the mean square due to treatment MST Step 5 Compute the mean square due to error MSE Step 6 Compute the F test statistic mean square due to treatment 7 MST mean square due to error T MSE C726 Topics to Discuss EXANIPLE 3 Computing the F7Test Statistic Problem Compute the F test statistic for the data presented in Example 1 Approach We follow Steps 176 just presented Solution Step 1 Compute the mean of the entire data set E 2881 2968 2499 2651 763330 23854 32 Step 2 Find the sample mean of each treatment Call the control group popu lation 1 the fenugreek group population 2 the garlic group population 3 and the onion group population 4 Then t 28812968 28382783956 222912407 21332243916 8 8 E3 1774 2022 1581 18024 E4 2997 2583 2651 27100 8 8 Step 3 Find the sample variance for each treatment group 2 2881 7 278562 2968 7 278562 2838 7 278562 51 22577 8 7 1 2 2291 7 224162 2407 7 224162 2133 7 224162 52 18199 8 7 1 2 1774 7 180242 2022 7 180242 1581 7 180242 53 29103 8 7 1 2 2997 7 2712 2583 7 2712 2651 7 2712 s4 8 7 1 44858 Step 4 Compute MST 827856 7 238542 822416 7 238542 818024 7 238542 8271 7 238542 4 7 1 MST 7 500874112 3 1669580 Step 5 Compute MSE 8 7 122577 8 7 118199 8 7 129103 8 7 144848 32 7 4 MSE 7 803089 28 7 28682 Step 6 Compute the F test statistic 7 mean square due to treatment 7 MST 7 1669580 F 7 5821 mean square due to error 7 MSE T 28684 This is the same result provided by the statistical software and graphing calcula tor in Example Figure 6 Area005 Foasszz 2 99 In Other Words If we reject the null hypotheele when t 5 are all equal Howavan the teat doeen39t tell ue mlch magma dll fer YOUR UNI ComparingTIiree orMore Means OnerWayAiiaylsis ofVariance C727 Luukiu q thatifthe null 39 mean hnnld L computed from each sample treatment 71 72 and so on If one or more of the means computed from eac sample is substantially different from the overall mean MST will be large which in turn makes the F7statistic large In Example 3 L L r L r L 1m H L L n vvenotice r n a AL r r mean 31 27856ve1sus7 23854 and the garlic group has asample mean much smaller than the overall mean 73 18024 versus 7 23 54 The results of the computations that lead to the F7test statistic are present7 ed in an ANOVA table the form of which is shown in Table 3 Source or Sum of F Test Variation Squares Squares Statistic Treatment 5008741 k 7 1 7 4 7 1 7 3 1669580 5821 Error 8030 89 n 7 k 7 32 7 4 7 28 28682 Total 581183 n 7 1 7 32 7 1 7 31 Notice that the sum of squares treatment is the numerator of the compu7 tation for the mean square due to treatment The sum of squares error is the numerator of the computation for the mean square due to error Each entry in the mean square column is the sum of squares divided by the corresponding de7 grees of freedom In addition Sum of squares total sum of squares treatment sum of squares error This result is a consequence of the requirement of independence of the obser7 vations within the groups I were conducting this ANOVA test by hand we would compare the F7test statistic with a critical F7value The critical F7value is the F7value Whose area in the right tail is on with k 7 1 degrees of freedom in the numerator and n 7 k egrees of freedom in the denominatorThe critical F7value for the claim made in Exam 1 1 at the CL 005 level of significance is F01qu m 299 Be7 cause the F7test statistic 5821 is greater than the critical F reject the null hy7 pothesisSee Figure ose the null hypothesis of equal population means is rejected This conclusion quot f L r I 39 39 39 others but We don t know which one We can determine which population means differ using Tukey s tests which is not discussed in this text Concepts and ANOVA Is the test robust ucahulmy L The acronym ANOVA stands for 2 What are the requiremenh to perform a onevvay 3 Whatis the mean square due to treaunentesu39rnate of 11739 What is n square due to error estimate o 7 4 Why does a large value of the Fsmtistic provide evidence against the null hypothesis H0 ul 7 u 7 7 uk 6 Meal treat slllnof names of Mean 1 es gtqlcu2gt Mansnt ulnar2x umnlun mums lm39sm Treatment 2814 3 Error 4915 36 Skill Building In Problems 5 and 6 ll in the ANOVA table 51 am re at mm m llegl e or m mum mum 05 H eurlnm Treatment 387 2 Error 8042 27 Total C728 Topics to Discuss In Problems 7 and 8 determine the F test statistic based on the given summary statistics Hinn Y Smnple Smnple Smnl e Pupulutinn Ii Nibnu qu iulut 1 10 40 48 2 10 42 31 3 10 44 25 9 The following data represent a simple random sample of n 4 from three populations that are known to be normally distributed Verify that the F test sta tistic is 204 Sanlplel Munllle 1 Sample 28 22 25 23 25 24 30 17 19 27 23 30 En 2 I ni Sample Srunple Stunple l opulnl inn Nile Rlenn r39alrirnwe 1 15 105 34 2 15 110 40 3 15 108 30 4 15 90 38 10 The following data represent a simple random sample of n 5 from three populations that are known to be nor mally distributed Verify that the F test statistic is 2599 sample I sample 1 sample 3 73 67 72 82 77 80 W 81 67 77 97 83 96 Applying the Concepts 11 Com Prod uciion The data in the table represent the num ber of com plants in randomly sampled rows a 17foot by i 5inch strip for various types of plot An agricultural re quot searcher wants to know whether the mean numbers of plants for each plot type are different Nllmhernf Plants l lul I v Sludge plot 25 27 33 30 28 27 Spring disk 32 30 33 35 34 34 N0 till 30 26 29 32 25 29 Source Andrew Diemrand Brad Schmidgall Juliet Junior College a Write the null and alternative hypotheses b State the requirements that must be satis ed to use the oneway ANOVA procedure c Use the following partial MINITAB output to test the hypothesis at the a 005 level of signi cance Reject H0 Oneway ANOVA Sludge Plot Spring Disk No Till Source DF 88 MS F Factor 2 8411 4206 710 0007 Error 15 8883 592 Total 17 172 94 d Shown are sidebyside boxplots of each type of plot Do these boxplots support the results obtained in part c No Till Spring Disk Plot Type Sludge Plot 25 30 35 Number of Plants e Verify that the F test statistic is 710 12 Soybean Yield The data in the table represent the num r ber of pods on a random sample of soybean plants for var ious plot typesAn agricultural researcher wants to know whether the mean numbers of pods for each plot type are different a Liberty 32 31 36 35 41 34 39 37 38 N 0 till 34 30 31 27 40 33 37 42 39 Chisel plowed 34 37 24 23 32 33 27 34 30 39 39 L 394 quot lnlipl Tnninr quotnllp p 0 ate H S 3 Comparing Three or More Means OneeWay Anayl sis of Variance a Write the null and alternative hypotheses b State the requirements that must be satis ed to use the onewayANOVA procedure c Use the following MlNlTAB output to test the hy pothesis at the a 005 level of signi cance Keject H0 Oneway ANOVA Liberty No Till Chisel Plowecl 88 M Source DF Factor 2 1490 745 377 0038 Error 24 4747 198 Total 26 6236 d Shown are sidebyside boxplots of each type of plot Do these boxplots support the results obtained in part c Chisel Plowed No Till Plot Type Liberty Number of Plants e Verify that the F test statistic is 377 f Based on the boxplots which type of plot appears to have a signi cantly different mean number of plants Liberty or No Till Births by Day of Week An obstetrician knew that there were more live births during the week than on weekends She wanted to determine whether the mean number of births was different for each of the ve days of the week She randomly selected eight dates for each of the five days of the week and obtained the following data Fri d1 Tu Eill ed u esd ax Th nrsdnx 10456 11621 11084 11171 11545 10023 11944 11570 11745 12321 10691 11045 11346 12023 11749 10283 12927 11875 12433 12192 10265 12577 12193 12132 12422 11189 11753 11593 11903 11627 11198 12509 11216 11233 11624 11465 13521 11818 12543 12543 Source National Center for Health Statistics g Wednesday 7 Q H 5 C729 a Write the null and alternative hypotheses b State the requirements that must be satis ed to use the oneway ANOVA procedure c Use the following MINITAB output to determine if the number of births differs by day of the week using the a 001 level of signi cance Oneway ANOVA Mon Tues Wed Thurs Fri Source DF 88 MS F P Factor 4 11507633 2876908 980 0000 Error 35 10270781 293451 Total 39 21778414 d Shown are sidebyside boxplots of each type of plot Do these boxplots support the results obtained in part c Friday 7 Thursday 7 Tuesday 7 Monday 7 10000 11000 12000 Number of Births 13000 e Verify that the F test statistic is 980 f Based on the boxplots which day of the week ap pears to have a signi cantly different number of births Monday lekin Chunkin The World Championship Punkin Chunkin contest is held every fall in Millsboro Delaware 7 Contestants build devices meant to hurl 8 to 10 pound umpkins across a eld One class of entry is the air can non which must use compressed air to re a pumpkin The following data represent a simple random sample of distances that pumpkins have traveled in feet for the years 2001 to 2004 Is there evidence to conclude that the Ci30 Topics to Discuss rnean distanoe that a pumpkin is fire Various years 1 16 Some www punklnchunkln commam htm d is different for the a Write the null and alternative hypothese b State the requiremenm that the enemy ANO o Use the following MlNlTAB Oneway ANOVA 2001 2002 2003 2004 Source DF ss Ms F VA prooedure s must be satisfied to use output to test the claim of 005 level of significance P Factor 3 659242 219747 282 0065 Error 20 1559456 Total 23 2218698 d Shown are sideebyeside boarplos of eaoh type of plot D part c 0 2004 p o the e boxplos support the resulm obtained in Boxplot of2001 2002 2003 2004 R L se tors and obtained the Sryear rates of return shown in the following table in percent 1701 643 1346 507 1119 990 1950 12979 395 816 2073 344 1038 960 711 6 75 740 1570 Some Mo mingstar com a State the null and alhernatiye hypothesis b Verify that the req i ernens to use the onerwa ANOVA prooedure are satisfied Norrnal probability plow indioate that the sarnple data oorne from normal populations o Test whether the mean rates of return are different at the o 005 level of signifioanoe D0 not reject H d Draw boxplos of the three seotors to support the my sulm obtained in part o Reaction Time In an online psyohology experiment sponr sored by the University of Mississippi researohers aske o y 39 Partieir ly assigned to one of three groups Sube r stimulus while disregarding other s subjeos in group 3 were in the ohoioe group They needed to respond differently depending on the stimuli p t ed Depending on the type of whistle sound the subject must press a oertain button The reaotion time in seoonds for eaoh stimulus is presented in the table so No no 0588 0375 0409 0613 0401 0355 0725 Some PsychExpenmeanThe University oerssrssippi www 0 rss edupsychexps The researcher wane to determine if the mean reaction r iu o lsfance feet 1 l l l l l l l l 3000 3200 3400 3600 3800 4000 4200 4400 4600 D e Verify that the Frmst statistic is 282 15 Rates of Return A stock rn an rate of return of financial ener 7 differed over the past 5 years H o dom e b sarnple of eight oornpanies frorn eaoh o analyst wondered whether the gy and utility stocks tained a simple ranr f the three 15 Ho 9 m in vs Huavleasv one mean ls novequal a State the null and alhernatiye hypotheses b Verify that the requirernens to use the onerwa ANOVA prooedure are satisfied Norrnal probability plow indioate that the sarnple data oorne from a norr rnal p ati o Test whether the mean reaotion tirnes to the three stimr uli differ at the o 005 level ofsignifioanoe d Drawborrplos of the three stiniuli tosupport the anar lytio resulm obtained in part o lab Do not reject Hp Comparing Three or More Means One7Way Anaylsis of Variance C731 17 Crash Data The Insurance Institute for Highway Safety conducts experiments in w j which cars are crashed into a xed barrier at 40 mph In the Instit39ute s 40mph offset J test 40 of the total width of each vehicle strikes a barrier on the driver s side The barrier s deformable face is made of aluminum honeycomb which makes the forces in the test similar to those involved in a frontal offset crash between two vehicles of the same weight each going just less than 40 mph Suppose you are in the market to buy a new family car You want to know whether the mean chest compression resulting from this offset crash is the same forlarge family cars passenger vans and midsize utility ve hiclesThe follong data were collected from the institute s study Honda Pilot 17a H0 m Ly 14M vs Hp at least one mean is not equal a The researcher wants to know if the means for chest compression for each class of vehicle differ State the null and alternative hypotheses b Verify that the requirements to use the oneway ANOVA procedure are satis fied Normal probability plots indicate that the sample data come from normal populations c Test whether the mean chest compression for each vehicle type is different at the a 001 level of significance Do not reject H0 d Draw boxplots of the three types of vehicle to support the analytic results ob tained in part 18 Crash Data The Insurance Institute for Highway Safety conducts experiments in 39r which cars are crashed into a xed barrier at 40 mph In the Institute s 40mph off set test 40 of the total width of each vehicle strikes a barrier on the driver s side The barrier s deformable face is made of aluminum honeycomb which makes the forces in the test similar to those involved in a frontal offset crash between two ve hicles of the same weight each going just less than 40 mph Suppose you are in the market to buy a new family car You want to know if the mean head injury resulting from this offset crash is the same for large family cars passenger vans and midsize utility vehicles The followmg data were collected from the institute s study Mazda MPV 693 Nissan Mur ano Pontiac Grand Prix Nissan 470 Kia Sorento Source 18a H0 11L J47 MM vs H at least one of the means is not equal a The researcher wants to know whether the means for head injury for each class of vehicle differ State the null and alternative hypotheses C732 Topics to Discuss b Verify that the requirements to use the oneway AN A procedure are satis ed Normal probability plots indicate that the sample data come from norm populations c Test whether the mean head injury for each vehicle type differs at the a 001 level of significance Do not reject H d Draw boxplots of the three vehicle types to support the analytic results obtained in part 19 pH in Rain An environmentalist wanted to determine 1quot if the mean acidity of rain differed among Alaska quot Florida and Texas He randomly selected six rain dates at each of the three locations and obtained the data in the following table Al akd Florida Texas 541 487 546 539 518 629 490 440 557 514 512 515 480 489 545 524 506 530 Source National Atmospheric Deposition Program 19a H0 AA 11F 1 V6 H at least one of the means Is not equal a State the null and alternative hypothesis b Verify that the requirements to use the oneway ANOVA procedure are satis ed Normal probability plots indicate that the sample data come from a nor mal population c Test if the mean pHs in the rainwater are different at the a 005 level of signi cance Reject H0 d Draw boxplots of the pH in rain for the three states to support the results obtained in part 20 Lower Your Cholesterol Researchers Francisco Fuentes and his colleagues wanted to determine the most effective diet for reducing LDL cholesterol the so called bad cholesterol among three diets 1 a saturatedfat diet 2 the Mediterranean diet and 3 the US National Cholesterol Education Program or NCEP1 Diet The participants in the study were shown to have the same levels of LDL cholesterol before the study Participants were randomly assigned to one of the three treatment groups Individuals in group 1 received the saturated fat diet which is 15 protein 47 carbo hydrates38 fat 20 saturated fat 12 monounsatu rated fat and 6 polyunsaturated fat Individuals in group 2 received the Mediterranean diet which is 47 carbohydrates 38 fat lt10 saturated fat 22 mo nounsaturated fat and 6 polyunsaturated fat Indi viduals in group 3 received the NCEPl Diet lt10 saturated fat 12 monounsaturated fat and 6 polyunsaturated fat After 28 days their LDL choles terol levels were recorded The data in the following table are based on this study g Saturated Fat Mmlil gunner 245 56 125 123 78 100 166 101 140 104 158 151 196 145 138 300 118 268 140 145 75 240 211 71 218 131 184 173 125 116 223 160 144 177 130 101 193 83 135 224 263 144 149 150 130 20a H0 115 MM MN v5 H at least one of the means la L not equa a Does the evidence suggest the cholesterol levels dif fer State the null and alternative hypothesis b Verify that the requirements to use the oneway ANOVA procedure are satis ed Normal probability plots indicate that the sample data come from norm populations c Test if the mean LDL cholesterol levels are different at the a 005 level of signi cance Reject H d Draw boxplots of the three LDL cholesterol levels for the three groups to support the analytic results ob tained in part Concrete Strength An engineer wants to know if the mean strengths of three different concrete mix designs differ sig ni cantly He randomly selects 9 cylinders that measure 6 inches in diameter and 12 inches in height in which mixture 670301 is poured 9 cylinders of mixture 670400 and 9 cylinders of mixture 670353 After 28 days he measures the strength in pounds per square inch of the cylinders The results are presented in the following table Mixtlu e 67 I 40 Mintwe 6H Mixtlu e 67 03 N 213 He I m 04301 Comparing Three or More Means OneWayAnaylsis of Variance a State the null and alternative hypotheses b Explain why we cannot use oneway ANOVA to test these hypotheses 2 Analyzing Journal Article Results Researchers Brian G Feagan et al Erythropoietin with Iron Supplementation to Prevent Allogeneic Blood Transfusion in Total Hip Joint Arthroplasty Annals of Internal Medicine Vol 133 N0 11 wanted to determine whether epoetin alfa was ef fective in increasing the hemoglobin concentration in pa tients undergoing hip arthroplasty The researchers screened patients for eligibility by performing a complete medical history and physical of the patients Once eligible patients were identified the researchers used a computer generated schedule to assign the patients to the highdose epoetin group lowdose epoetin group or placebo group 67 0 4411 Me7 o 35395 V H11 at least one ofthe means is not equal C 733 The study was doubleblind Based on an analysis of vari ance it was determined that there were significant differ ences in the increase in hemoglobin concentration in the three groups with a Pvalue less than 0001The mean in crease in hemoglobin in the highdose epoetin group was 195 gL the mean increase in hemoglobin in the lowdose epoetin group was 172 gL and mean increase in hemo globin in the placebo group was 12 gL a Why do you think it was necessary to screen patients for eligibility y was a computergenerated schedule used to as sign patients to the various treatment groups c What does it mean for a study to be doubleblind Why do you think the researchers desired a double blind study d Interpret the reported Pvalue Technology Step by step ANOVA TI 8384 Plus treatment Step 1 Enter the raw data into L1 L2 L3 and so on for each population or Step 2 Press STAT highlight TESTS and select FANOVA Step 3 Enter the list names for each sample or treatment after ANOVA For example if there are three treatments in L1 L2 and L3 enter Press ENTER ANOVAL1L2L3 MINITAB Excel Step 1 Enter the raw data into C1 C2 C3 and so on for each sample or treatment Step 2 Select Stat then highlight ANOVA and select Oneway Qustacked Step 3 Enter the column names in the cell marked Responses Click OK Step 1 Enter the raw data in columns A B C and so on for each sample or treatment Step 2 Be sure the Data Analysis Tool Pak is activated This is done by se lecting the Tools menu and highlighting AddIns Check the box for the Analysis ToolPak and select OK Select Tools then highlight Data Analysis Select ANOVA Single Factor and click OK Step 3 With the cursor in the Input Range cell highlight the data Click OK C734 Topics to Discuss Area F Table VI PDistribution Critical Values Degrees ofFreedom in the Numemtor Area in Right Tail 1 2 3 4 5 6 7 8 0100 3986 4959 5359 5583 5724 5820 5891 5944 0050 16145 19950 21571 22458 23016 23399 23677 23888 1 0 025 64779 79950 86416 89958 92185 93711 94822 95666 0010 405220 499950 540340 562460 576360 585900 592840 598110 0001 40528400 50000000 54037900 56250000 57640500 58593700 59287300 59814400 0100 853 900 916 924 929 933 935 937 0050 1851 1900 1916 1925 1930 1933 1935 1937 2 0 025 38 51 39 00 39 17 3925 3930 3933 3936 3937 0010 9850 9900 9917 9925 9930 9933 9936 9937 0001 99850 99900 99917 99925 99930 99933 99936 99937 0100 554 546 539 534 531 528 527 525 0050 1013 955 928 912 901 894 889 885 as 3 0025 1744 1604 1544 1510 1488 1473 1462 1454 a 0010 3412 3082 2946 2871 2824 2791 2767 2749 5 0001 16703 14850 14111 13710 13458 13285 13158 13062 2 0100 454 432 419 411 405 401 398 395 a 0050 771 6 94 6 59 639 6 26 6 16 609 6 04 a 4 0025 1222 10 65 9 98 960 9 36 9 20 9 07 8 98 1 0010 21 20 18 00 16 69 1598 15 52 15 21 14 98 14 80 E 0001 74 14 61 25 56 18 5344 51 71 50 53 49 66 49 00 E 0100 4 06 3 78 3 62 352 3 45 3 40 3 37 3 34 3 0050 6 61 579 5 41 519 5 05 4 95 488 4 82 a 5 0025 10 01 843 776 739 715 698 685 676 as 0010 1626 1327 1206 1139 1097 1067 1046 1029 8 0001 4718 3712 3320 3109 2975 2883 2816 2765 a 0100 3 78 346 3 29 318 311 3 05 301 2 98 0050 5 99 514 4 76 4 53 439 4 28 421 4 15 6 0025 8 81 726 6 60 6 23 599 5 82 570 5 60 0 010 13 75 1092 9 78 915 875 8 7 826 8 10 0 001 35 51 2700 23 70 2192 20 80 20 03 19 46 19 03 0 100 3 59 3 26 3 07 296 2 88 2 83 2 78 2 75 0 050 5 59 4 74 4 35 412 3 97 3 87 3 79 3 73 7 0 025 8 07 6 54 589 552 5 29 5 12 4 99 4 90 0 010 12 25 9 55 845 785 7 6 7 9 6 9 6 84 0 001 29 25 21 69 1877 1720 16 21 15 52 15 02 14 63 0 100 346 3 11 2 92 281 2 73 267 2 62 2 59 0 050 532 4 46 4 07 384 3 69 358 3 50 3 44 8 0 025 7 57 6 06 5 42 505 4 82 4 65 453 4 43 7 01 6 63 7 618 6 03 Comparing Three or More Means One7WayA11aylsis of Variance C735 Area F Table Vquot L m numll PDistribution Critical Values Degrees ofFreedom in the Numemtot tea in Right Tail 9 10 15 20 30 60 120 1000 0100 5986 6019 6122 6174 6226 6279 6306 6330 0050 24054 24188 24595 24801 25010 25220 25325 25419 1 0025 96328 96863 98487 99310 10014 10098 10140 10177 0010 60225 60558 61573 62087 62606 63130 63394 63627 0001 6022840 6056210 6157640 6209080 6260990 6313370 6339720 6363010 0100 938 939 942 944 916 947 948 949 0050 1938 1940 1943 1945 1946 1948 1949 1949 2 0025 3939 3940 3943 3945 3946 3948 3949 3950 0010 9939 9940 9943 9945 9947 9948 9949 9950 0001 99939 99940 99943 99945 99947 99948 99949 99950 0100 524 523 520 518 517 515 514 513 0050 881 879 870 866 862 857 855 853 3 0025 1447 1442 1425 1417 1408 1399 1395 1391 lt3 0010 2735 2723 2687 2669 2650 2632 2622 2614 5 0001 12986 12925 12737 12642 12545 12447 12397 12353 a 0100 394 392 387 3 84 382 379 3 78 3 76 0 0 050 600 5 96 586 5 80 575 569 5 66 5 63 e 4 0025 890 884 866 856 846 836 831 826 a 0010 1466 1455 1420 1402 1384 1365 1356 1347 5 0001 4847 4805 4676 4610 4543 4475 4440 4409 E 0 100 332 3 30 324 321 317 314 312 311 E 0 050 477 4 74 462 4 56 4 50 443 4 40 4 37 E 5 0 025 668 6 62 643 6 33 6 23 612 6 07 6 02 as 0 010 1016 10 05 972 955 9 38 920 9 11 9 03 8 0 001 2724 26 92 25 91 2539 24 87 2433 24 06 23 82 a 0100 296 294 287 284 280 276 274 272 a 0 050 410 4 06 3 94 387 3 81 3 74 3 70 3 67 6 0 025 5 52 5 46 527 517 5 07 4 96 4 90 4 86 0 010 7 98 7 87 756 740 7 7 06 6 97 6 89 0 001 18 69 18 41 1756 1712 16 67 16 21 15 98 15 77 0 100 272 2 70 2 63 259 256 2 51 2 49 247 0 050 368 3 64 3 51 344 338 3 30 3 27 323 7 0 025 482 4 76 4 57 447 4 36 4 25 420 4 15 0 010 672 6 62 6 31 616 5 99 5 82 574 5 66 0 001 1433 14 08 13 32 1293 12 53 12 12 11 91 11 72 0 100 2 56 2 54 2 46 242 2 38 2 34 2 32 2 30 0 050 3 39 3 35 3 22 315 3 08 3 01 2 97 2 93 8 0 025 4 36 4 30 4 10 400 3 89 3 78 3 73 3 68 0010 591 581 552 536 520 503 495 487 C736 Topics to Discuss Table Vquot routinued PDistribution Critical Values Degrees of Freedom in the Numerator Area in Right Tail 1 2 3 4 5 6 7 8 9 10 0100 336 3 01 281 269 261 255 251 247 244 242 0050 512 426 386 363 348 337 329 323 318 314 9 0 025 7 21 5 71 5 08 4 72 448 4 32 4 20 4 10 4 03 3 96 0 010 10 56 8 02 6 99 6 42 606 5 80 5 61 5 47 5 35 5 26 0001 22 86 1639 13 90 12 56 1171 11 13 10 70 1037 1011 989 0100 329 292 273 261 252 246 241 238 235 232 0 050 4 96 4 10 3 71 348 333 3 22 3 14 3 07 3 02 2 98 10 0 025 6 94 5 46 4 83 447 424 4 07 3 95 3 85 3 78 3 72 0 010 10 04 7 56 6 55 599 564 5 39 5 20 5 06 4 94 4 85 0 001 21 04 14 91 12 55 1128 1048 9 93 9 52 9 20 8 96 875 0100 318 281 261 248 239 233 228 224 221 219 0 050 4 75 3 89 3 49 326 311 3 00 2 91 2 85 2 80 275 12 0 025 6 55 5 10 4 47 412 389 3 73 3 61 3 51 3 44 3 37 0 010 9 33 6 93 5 95 541 506 4 82 4 64 4 50 4 39 4 30 0 001 18 64 12 97 10 80 963 889 8 38 8 00 7 71 7 48 7 29 0100 307 270 249 236 227 221 216 212 209 206 0 050 4 54 3 68 3 29 306 290 2 79 2 71 2 64 2 59 2 54 u 15 0 025 6 20 4 77 4 15 380 358 3 41 3 29 3 20 3 12 3 06 g 0 010 8 68 6 36 5 42 4 89 456 4 32 4 14 4 00 3 89 3 80 E 0 001 16 59 11 34 9 34 8 25 757 7 09 6 74 6 47 6 26 6 08 g 0 100 2 97 2 59 2 38 2 25 216 2 09 2 04 2 00 1 96 1 94 0 0050 435 349 310 287 271 260 251 245 239 235 e 20 0025 587 446 386 351 329 313 301 291 284 277 i 0010 810 585 494 443 410 387 370 356 346 337 E 0 001 14 82 9 95 8 10 7 10 646 6 02 5 69 5 44 5 24 5 08 E 0 100 2 92 2 53 2 32 2 18 209 2 02 1 97 1 93 189 1 87 E 0050 424 339 299 276 260 249 240 234 228 224 E 25 0025 569 429 369 335 313 297 285 275 268 261 as 0 010 7 77 5 57 4 68 4 18 385 3 63 3 46 3 32 3 22 3 13 8 0 001 13 88 9 22 7 45 6 49 589 5 46 5 15 4 91 4 71 4 56 a 0100 281 241 220 206 197 190 1 84 1 80 176 173 a 0 050 4 03 3 18 2 79 2 56 240 2 29 2 20 2 13 2 07 2 03 50 0 025 5 34 3 97 3 39 3 05 283 2 67 2 55 2 46 2 38 2 32 0010 717 506 420 372 341 319 302 289 278 270 0 001 12 22 7 96 6 34 5 46 490 4 51 4 22 4 00 3 82 3 67 0100 276 2 36 2 14 2 00 191 1 83 1 78 1 73 1 69 166 0050 394 309 270 246 231 219 210 203 197 193 100 0025 518 383 325 292 270 254 242 232 224 218 0010 690 4 82 3 98 3 51 321 299 282 269 259 250 0 001 11 50 7 41 5 86 5 02 448 4 11 3 83 3 61 3 44 3 30 0 100 273 2 33 2 11 1 97 188 1 80 1 75 1 70 1 66 1 63 0050 389 304 265 242 226 214 206 198 193 188 200 0025 510 376 318 285 263 247 235 226 218 211 0010 676 471 3 88 341 311 289 273 260 250 241 0001 1115 715 563 481 429 392 365 343 326 312 0 100 2 71 2 31 2 09 1 95 185 1 78 1 72 1 68 164 1 61 0050 385 300 261 238 222 211 202 195 189 184 1000 0025 504 370 313 280 258 242 230 220 213 206 0 010 6 66 4 63 3 80 3 34 304 2 82 2 66 2 53 2 43 2 34 Comparing Three or More Means One7WayA11aylsis of Variance C737 Table Vquot Iamtinuml PDistribution Critical Values Degrees of Freedom in the Numerator Degrees ofFreedom in the Denominator Area in Right Tail 12 15 20 25 30 40 50 60 120 1000 0100 238 234 230 227 225 223 222 221 218 216 0050 307 301 294 289 286 283 280 279 275 271 9 0025 387 377 367 360 356 351 347 345 339 334 0010 511 496 481 471 465 457 452 448 440 432 0001 957 924 890 869 855 837 826 819 800 784 0100 228 224 220 217 216 213 212 211 208 206 0050 291 285 277 273 270 266 264 262 258 254 10 0025 362 352 342 335 331 326 322 320 314 309 0010 471 456 441 431 425 417 412 408 400 392 0001 845 813 780 760 747 730 719 712 694 678 0100 215 210 206 203 201 199 197 196 193 191 0050 269 262 254 250 247 243 240 238 234 230 12 0025 328 318 307 301 296 291 287 285 279 273 0010 416 401 386 376 370 362 357 354 345 337 0001 700 671 640 622 609 593 583 576 559 544 0100 202 197 192 189 187 185 183 182 179 176 0050 248 240 233 228 225 220 218 216 211 207 15 0025 296 286 276 269 264 259 255 252 246 240 0010 367 352 337 328 321 313 308 305 296 288 0001 581 554 525 507 495 480 470 464 447 433 0100 189 184 179 176 174 171 169 168 164 161 0050 228 220 212 207 204 199 197 195 190 185 20 0025 268 257 246 240 235 229 225 222 216 209 0010 323 309 294 284 278 269 264 261 252 243 0001 482 456 429 412 400 386 377 370 354 340 0100 182 177 172 168 166 163 161 159 156 152 0050 216 209 201 196 192 187 184 182 177 172 25 0025 251 241 230 223 218 212 208 205 198 191 0010 299 285 270 260 254 245 240 236 227 218 0001 431 406 379 363 352 337 328 322 306 291 0100 168 163 157 153 150 146 144 142 138 133 0050 195 187 178 173 169 163 160 158 151 145 50 0025 222 211 199 192 187 180 175 172 164 156 0010 256 242 227 217 210 201 195 191 180 170 0001 344 320 295 279 268 253 244 238 221 205 0100 161 156 149 145 142 138 135 134 128 122 0050 185 177 168 162 157 152 148 145 138 130 100 0025 208 197 185 177 171 164 159 156 146 136 0010 237 222 207 197 189 180 174 169 157 145 0001 307 284 259 243 232 217 208 201 183 164 0100 158 152 146 141 138 134 131 129 123 116 0050 180 172 162 156 152 146 141 139 130 121 200 0025 201 190 178 170 164 156 151 147 137 125 0010 227 213 197 187 179 169 163 158 145 130 0001 290 267 242 226 215 200 190 183 164 143 0100 155 149 143 138 135 130 127 125 118 108 0050 176 168 158 152 147 141 136 133 124 111 1000 0025 196 185 172 164 158 150 145 141 129 113 0010 220 206 190 179 172 161 154 150 135 116 0001 277 254 230 214 202 187 177 169 149 122 Data North Yorkshire Northeast East Midland Midland East Southeast Southwest Wales Scotland Northern Ire 1AM SAS Output Page 2 of 2 Correlations Plots Sca er plat M TOBACCO by ALCOHOL TGBQCE G 5 115 35 Q 25 ALCUHQL 85 Gengrm ed by the SAS System Local XPWPRO on 8NOV2007 1457 PM leCDocuments and Se ingshnmckennmocal SettingsTempSEG89637882902603b4 1182007 SAS Output Page 1 of 2 a nggm gr m Qui m w wwm wm rmiai mn Anaiysis The CORR Precedure Variabies ALCOHOL i Vmig imz TOBACCO im 3 e Sta siics Variabm Magiquot Ski 39v 8am nimum aximam ALSQHQL 11 5 44364 0 79776 59 88000 4 02000 6 47000 mE ll 3 61818 O 59071 39 80000 2 71000 4 56000 Pmmm mre a en Cae icianm N a 1 2 Fro gt it am m HQ Rho TGBACCG LQGHGL 0 22357 O 5087 Ganei amd by the SAS System Lacal XP PRO on OSNGVZGOT at 457 PM leCDocuments and Se ingsknm kenhmbcaiL LS ihg TempSEG896378829026c3b4 1182007 SAS Output Page 3 of 4 Regression Analysis Plots G wwed TUBA COG by ALCOHOL TQBQC Ci 5 435 35 WM WM M W W 3 1 25 l I I I I 4 4525 525 585 55 ALCGHUL Gmerafed by the SAS System Local XPPRO on 14NOV2007 at 926 AM leCDocuments and SettingshnmckenmLocal SettingsTempSEG32123a99b8946d1 11142007 SAS Output Page 1 of 4 l Egtargmge game f g wm 39mma Li ear Regression Results The REG Procedure Mode LinearuRegressianModel Dependent Variable TOBACCO aming as bgewatio s Read 12 Nummr 0f ngewai ians Used 1 l Number Q bwwatioas wkh Miming Vamea l Anaiysis 0f Variance Sam 0f Mean Source 0F Squareg Squaw Vaiue Pi gt F gc ei 1 017441 017441 047 05087 53 9 331495 036833 Carriacme Totai 10 3 48936 Raa g 060690 RSquare 00500 Bapende ta ean 361818 Ade q 00556 Coaff Var 1677362 Parameter Estimates Parameter tandard Varia e BF stimate Error tVaiue Prgt1ti imtercept 1 271701 132230 205 00701 ALQGHGL 1 016555 024057 069 05087 Gmerai ed by the SAS System Local XPPRO on 14NOV2007 at 926 AM leCDocuments and SettingsmmckennLocal SettingsTempSEG32123a99b8946d1 11142007 5199 192W 36 5 980 056 39 6 P 03 56 050 9 I H 6 l 5 a5 39 pL 02 w a 3X b w H O 3eammq 27 l 9 17 1 one x Se W M 37 83 va 050m 0 3 03075 03L t 2962 00 q gt W f LU C O gf 21050 A i i lewLox SAS Output Page 2 of 2 Correlations Plots Straier pm cf TOBA 000 by ALCOHOL TGB CCG 5 25 1 I I I 45 5 55 B 55 ALCDHQL Gwemted by the SAS Sysi em Local XP PRO on 08NOV2097 at 502 PM leCDocuments and Se ingshnmckenmLocal SettingsTempSEG896f9b6e9f0b97447 1182007 SAS Output E gter g gg 3mm 4 mm a w WNW Miww Cerreiafi n Analysis i ngmi zmz The CORR Pracedure WW3 gar abies ALCOHOL 39S Va a w TOBACCO Page 1 of 2 Simpie iaiig cg Vmiawa N Mean id 38V Sum Minimum Waximum AL 10 558600 067781 5586000 452000 Y ACCG 10 352400 052848 3524000 271000 647000 451000 afgm srreia on Coef cients ix 3 10 Prim gt fri timer Hi3 Rho8 TOBACC 078429 00072 ALCC H L Genera G8NOV2607 at 502 PM leCDocuments and Seuingsmmckemmocal settingsTempSE 896t9b6e9f0b97447 1182007 by the SAS System Local XP PRO an SAS Output Page 3 of 4 Regresgion Analysis Plots Observed TOBACCO by ALCOHOL TQBACCQ 5 43 wwww 35 Wmuwwx e My mmn my MM A39wa MM 5MM WM w BJE MDWM 2 5 I I I I I 45 5 5 5 B 65 RLCGHOL Generafed by the SAS System Local XPPRO on 14NOV2007 at 927 AM leCDocuments and Se ingsknmckennmocal SettingsTempSEG321256d3689188c 1 1142007 SAS Output Page 1 of 4 l Estesgrise l d m Linear Regression Results The REG Procedare Model LinearwRegressienModei Dependent Variable TOBACCO Mamba a sservatisns Read 12 ummr sf Observa ms Used 1 0 umbm s ssswaiiaas amiss Missing yams1s 2 Ammysis cf Variance Sum 9f Nisan swm Q Equams quam F Vaiue 9 gt 3 Ma a 1 154616 154616 1278 00072 rm 8 096748 012094 Ccrrecte 701a 9 251364 35392 was 034776 RSqaam 06151 Begs dent ean 352400 Ade Sq 05670 296wa 986827 Parameter Estimates Parameter Steward Variabie 9F Es mate Errcr Vaiue Prgt i1 imey cept 1 010815 096163 011 09132 ALCQRQL 1 061150 017102 358 00072 Generated by the SAS System Local XPPRO on 14NOV2007 at 927 AM leCDocuments and SettingsmmckennLocal SettingsTempSEG321256d3e89188c 1 1 142007 ALQ TOE 29 W 7 5 028 053 p 0 7 V g 953 gt 19 w lt ystx 073lt A O I so 432 35 madam 0101 2 03 0190 X S L 03 5 j 0 5 am M3 WC quot7 10 0 t Z quot1quot 0 Sb OJ t0 0358 W V ALME 00095 00050 A Y Z 0 1 cw x SAS 39Jsas vemrmmm OneWay Analysis of Variance Results The ANOVA Procedure Class Level Information Class Levels Values Number of Observations Read in Number of Observations Used in Generated by the SAS System Local XPPRO on 05JUL2007 at 434 PM 1122007 SAS Output Page 2 of 6 OneWay Analysis of Variance Results The ANOVA Procedure Dependent Variable Calories Source DF Sum of Squares Mean Square F Value Pr gt F Mode 2 2386166667 1193083333 3628 lt0001 Error 27 887820000 32882222 Corrected Total 29 3273986667 RSquare CoeffVar Root MSE Calories Mean 0728826 1234128 1813346 1469333 Source DF Anova SS Mean Square F Value Pr gt F Type 2 2386166667 1193083333 3628 lt0001 Generated by the SAS System Local XPPRO on 05JUL2007 at 434 PM leCDocuments and SettingsmmckennLocal SettingsTempSEG22127524a4079eb8 1122007 SAS Output Page 3 of 6 OneWay Analysis of Variance Results The ANOVA Procedure Source Levene39s Test for Homogeneity of Calories Variance ANOVA of Squared Deviations from Group Means Type Error DF Sum of Squares Mean Square F Value Pr gt F 2 237833 118916 179 01866 27 1795920 665156 Generated by the SAS System Local XPPRO on leCDocuments and SettingsmmckennLocal SettingsTempSEG22127524a4079eb8 05JUL2007 at 434 PM 1122007 SAS Output Page 4 of 6 OneWay Analysis of Variance Results The ANOVA Procedure Calories Level of Type N Mean Std Dev Beef 10 168600000 183920273 Meat 10 165100000 210947597 Poultry lO lO7lOOOOO 142552135 Generated by the SAS System Local XPPRO on 05JUL2007 at 434 PM leCDocuments and SettingsmmckennLocal SettingsTempSEG22127524a4079eb8 1122007 SAS Output Page 5 of 6 OneWay Analysis of Variance Results The ANOVA Procedure Tukey39s Studentized Range HSD Test for Calories Note This test controls the Type experimentwise error rate but it generally has a higher Type II error rate than REGWQ Alpha 0 05 Error Degrees of Freedom 27 Error Mean Square 328 8222 Critical Value of Studentized Range 3 50643 Minimum Significant Difference 2 01 07 Means with the same letter are not significantly different Tukey Grouping Mean N Type A 168 600 10 Beef A A 165100 10 Meat B 107100 10 Poultry Generated by the SAS System Local XPPRO on 05JUL2007 at 434 PM leCDocuments and SettingsmmckennLocal SettingsTempSEG22127524a4079eb8 1122007 SAS 0 OneWay Analysis of Variance I Box Plot of Calories by Type wane an MD 12m mar Calories Zuni Beef Puumy Generated by the SAS System Local XPPRO on 0 5JUL2007 at 434 PM 1122007 SAS 39Jsas vemrmmm OneWay Analysis of Variance Results The ANOVA Procedure Class Level Information Class Levels Values Number of Observations Read in Number of Observations Used in Generated by the SAS System Local XPPRO on 05JUL2007 at 434 PM 1122007 SAS Output Page 2 of 6 OneWay Analysis of Variance Results The ANOVA Procedure Dependent Variable Sodium Source DF Sum of Squares Mean Square F Value Pr gt F Model 2 31168667 15584333 028 07591 Error 27 1511273000 55973074 Corrected Total 29 1542441667 RSquare CoeffVar Root MSE Sodium Mean 0020207 1711365 7481515 4371667 Source DF Anova SS Mean Square F Value Pr gt F Type 2 3116866667 1558433333 028 07591 Generated by the SAS System Local XPPRO on 05JUL2007 at 434 PM leCDocuments and SettingsmmckennLocal SettingsTempSEG22124e084681591b 1122007 SAS Output Page 3 of 6 OneWay Analysis of Variance Results The ANOVA Procedure ANOVA of Squared Deviations from Group Means Levene39s Test for Homogeneity of Sodium Variance Source DF Sum of Squares Mean Square F Value Pr gt F Type 2 41889281 20944641 078 04693 Error 27 72675E8 26916508 Generated by the SAS System Local XPPRO on leCDocuments and SettingsmmckennLocal SettingsTempSEG22124e084681591b 05JUL2007 at 434 PM 1122007 SAS Output Page 4 of 6 OneWay Analysis of Variance Results The ANOVA Procedure Sodium Level of Type N Mean Std Dev Beef 10 433400000 856675746 Meat 10 451100000 645677078 Poultry 10 427000000 726911274 Generated by the SAS System Local XPPRO on 05JUL2007 at 434 PM leCDocuments and SettingsmmckennLocal SettingsTempSEG22124e084681591b 1122007 SAS Output Page 5 of 6 OneWay Analysis of Variance Results The ANOVA Procedure Tukey39s Studentized Range HSD Test for Sodium Note This test controls the Type experimentwise error rate but it generally has a higher Type II error rate than REGWQ Alpha 0 05 Error Degrees of Freedom 27 Error Mean Square 5597 307 Critical Value of Studentized Range 3 5 0 6 4 3 Minimum Significant Difference 82 957 Means with the same letter are not significantly different TukeyGrouping Mean N Type A 45110 10 Meat A A 43340 10 Beef A A 42700 10 Poultry Generated by the SAS System Local XPPRO on 05JUL2007 at 434 PM leCDocuments and SettingsmmckennLocal SettingsTempSEG22124e084681591b 1122007 SAS 0 OneWay Analysis of Variance Box Plot ofSodium by Type Sodium Enn7 55m 7 sun 7 45D 7 AEIEI 7 35m 7 3mm 7 22f Mea Type Puumy Generated by the SAS System Local XPPRO on 05JUL2007 at 434 PM 1122007 1 pages revised 032312 Page 1 of 220L P03 EXSTZZOI Introduction to Statisticdlna isis Lab Project Data For Spring 2012 Directions Please ask one question appropriate for this set of data do one inferential statistical analysis twosample t test ANOVA or simple regressioncorrelation and write a formal report to present the results Instructions for the Lab Project report are available near the end of the Lab Notes You must use all of the Visits data in your analysis regression will also use the Patrons data Descrigtion A question arose concerning the use of public libraries in various regions of the United States by the people living in the library s service area To get information on this question data was obtained from the US Census Bureau1 on patrons library use for fortyfive states The United States was first divided into three regions Region Northeast Southeast and West and fteen states from each region were selected The states were further divided by size Size Large or Small based on whether their population was over five million people or not The total number of libraries for each state was computed as the sum of the number of central libraries and the number of branch libraries The number of visits per patron Visits in each state was computed by the total number of library visits divided by the total number of libraries The number of patrons per library Patrons in each state was computed by the population divided by the total number of libraries Data West Large Small Large Small Large Small M Patrons M Patrons M Patrons M Patrons M Patrons M Patrons 1033 1774 704 1447 325 2812 366 1635 476 3413 651 1797 639 1627 528 495 477 3585 347 1349 443 3108 559 741 401 1944 584 560 433 2428 419 2237 662 1991 696 2262 801 1594 644 1469 481 2372 451 2142 655 1989 427 3152 594 1520 499 2728 543 2240 574 1790 393 1682 606 1866 633 336 348 2142 322 1240 515 621 650 1397 507 1663 384 1279 592 1096 694 1408 336 1045 487 818 614 3052 489 549 392 690 705 701 1US Census Bureau DataFerret Public Library Survey 2009 8 pages revised 010609 Page 1 of 2201 Example 01 Louisiana 5 tate University EXST 2201 Introduction to Statistical Analysis Example Questions Exam 01 1 Classify a random variable for the colors of automobiles on a used car lot E C a Continuous E b Discrete c Parsinomious E d Qualitative 2 A study was conducted where a political pollster wishes to determine if his candidate is leading in the polls What type of study is this E a Strati ed study E b Designed experiment E 0 News study B d Observational study 3 A pharmaceutical company has developed an experimental drug meant to relieve symptoms associated with the common cold The company found 150 males and counted the number of cold symptoms exhibited by each male The males were then given the experimental drug for one week after which they again counted the number of cold symptoms exhibited by each male The difference in the number of cold symptoms by each male was studied What type of experimental design is this E a Blind doubles E b Group samples E c Matchedpairs E 1 Completely randomized 8 pages revised 010609 Page 2 of 2201 Example 01 4 A small town needs to nd out if the town39s residents will support the building of a new library and decides to conduct a survey of a sample of the town39s residents Which sampling method is best to obtain an appropriate sample of the town39s residents E a Survey 300 individuals who are randomly selected from a list of all people living in the state in which the town is located E b Survey a random sample of librarians who live in the town E c Survey every 14th person who enters the old library on a given day E d Survey a random sample of persons within each neighborhood of the town 5 The following frequency table shows the results from a study of college students who were asked quotHow often do you wear a seatbelt when driving a carquot Response Frequency Do not drive 249 Never 118 Rarely 249 Sometimes 345 Most ofthe time 716 Always 3093 Total 4770 What is the relative 39equency of the category 39Never39 E a 0025 E b 2490 E c1180 E d 0077 8 pages revised 010609 Page 3 of 2201 Example 01 K l m 39 7 What type ofchart is this E a Bar chart E b Block chart E c Pie chart E d Pareto chart 7 The following histogram gives the IQ scores of a random sample of seventh grade students The frequency of each class is labeled above each rectangle What class had the highest frequency En El um LU HD Ell EU 5 III B 5 5 5 5 a a 7 8 PI 1 l 1 1 E E E a 0 60 5 b 100 110 E c 10 E d 60160 8 pages revised 010609 Page 4 of 2201 Example 01 8 What is the shape of the following histogram l E a Skewed middle E b Skewed left E c Symmetrical E d Skewed right 9 The following data give the ight time in minutes of a random sample of 7 ights between the same two Cities 282 270 260 266 257 260 267 What is the mean ight time E a 2665 E b 266 E c 2685 E d 260 10 The following set of data gives the ight time in minutes of a random sample of seven ights between the same two cities 282 270 260 266 257 260 267 What is the median ight time S a 2685 E b 2665 E c 2600 E d 2660 8 pages revised 010609 Page 5 of 2201 Example 01 1 1 The following set of data gives the ight time in minutes of a random sample of seven ights between the same two Cities 282 270 260 266 257 260 267 What is the range of this set of data E a20 E b 6 E c25 E d12 12 For the following set of four data points 22 23 19 28 What is the sum of squares of this set of data E a 150 E b 10 E c 42 E d 25 13 For the following set of four data points 22 23 19 28 What is the standard deviation of this set of data E a1800 E b 374 E c 948 E d 558 14 The following ranked set of data give the weights of 30 preschool children 25 25 26 26 27 27 27 28 28 28 29 29 30 30 30 31 31 32 32 32 33 33 34 34 35 35 37 37 38 38 What are the rst and third quartiles E a 28 34 E b 25 33 E c 25 38 E d 27 38 8 pages revised 010609 Page 6 of 2201 Example 01 15 The following ranked set of data give the bushels of hay from a farmer39s last 10 years quot 147 150 180 189 210 320 375 407 429 580 A What is the inter quartile range E a 279 E b 265 I c 433 E d 227 16 The following set of ranked data give the red blood cell count for 20 randomly selected dogs 02 52 53 56 60 61 61 61 62 64 66 68 68 68 69 70 71 75 81 82 397 What is the ve number summary for this set of data 5 a 02 60 62705 82 E b 02 60565 695 82 7 E c 02 616270 82 E d82 645 02 695 605 39I 7 For the following boxplot What is the most appropriate shape of the distribution of the data I l I I I I I I 1 E a Symmetric E b Skewed left E 0 Uniform E dSkewed right 7 18 For a set of data that is distributed normally with the following information a Mean 60 Population standard deviation 11 What is the z score for a data value of 71 E a 10 E b 65 E c 55 E d110 8 pages revised 010609 Page 7 of 2201 Example 01 19 For a standard normal curve What is the area to the right of z 100 E a 08643 E b 01587 E c01357 E d 08413 20 For a standard normal curve What is the area to the right of z 125 E a 08944 E b 01271 E c 07193 E d 01056 21 For a standard normal curve What is the area between 2 0 and z 3 E a 09987 E b 00013 E c 04641 E d 04987 22 For area under a standard normal curve What zscore separates the bottom 86 from the top 14 E 21147 E b108 E c 019 E d 298 23 For area under a standard normal curve if 901 lies between z and z What is the magnitude of z E a018 E b196 E 0129 E d165 8 pages revised 010609 Page 8 of 2201 Example 01 24 The prices of new homes are distributed normally with mean 150000 and standard quot deviation 1 700 7 What percentage of buyers paid more than 152074 E a4753 E b1112 E c 3888 E d 8888 25 The prices of new homes are distributed normally with mean 150 and standard deviation 11 units are thousands of dollars a 634 5 lt J What percentage of new homes are between 160 and 175 thousands of dollars E a 17 E b 33 E c 14 E d 83 wequot W 26 The race times for a mile long race of boys in secondary school is known to be distributed normally with mean 460 seconds and standard deviation 60 seconds What time separates the shortest 90 of race times from the longest 10 E a 3613 seconds E b 5587 seconds E c 3832 seconds E 1 5368 seconds 27 The time for a race of school children is known to be distributed normally with mean 450 seconds and standard deviation 50 seconds Between what two times would the middle 95 of the children run E a352548 E b 0545 E c 368532 E d 355545 2 pages revised 010609 Page 1 of 2201 ExamKey 01 Louisiana State University EXST 2201 Introduction to Statistical Analysis Example Key Exam 01 Answers Tips 1 d Can t rank 2 d Experimenter does not apply treatment 3 c The males in Group 1 determined the males in Group 2 4 d 118 5 a 45770 00247 6 d Bar Chart ordered from highest to lowest 7 3 Highest bar frequency is 60 8 c 9 b sum A 10 d 1Rank 2 i7 3 Position 35 gt4 11 c 1Rank 2 282 257225 12 c i x x f x if 1 22 1 2 23 0 O 3 19 4 16 4 28 5 25 92 x A SOS 42 13 b 42 SOS 42 from above 5 A 3741 1439 a lei 3075gt8 Q3i 30225gt23 100 100 15 d 407 180227 2 pages revised 010609 Page 2 of 2201 ExamKey 01 Answers Tips 16 b lei5p0sition55 Mi10positi0n105 Q3 i 15 position 155 17 a 18 a z716109091 11 19 b i K 01587 1 20 a 011156 E g 1101056 X 125 21 d 05 15oo1a 39 4 00013 0 22 b i 014 23 d 0901 Z E 24 b 152074 150000 PZgt12201112 1700 39 39 2539 a zX11150 P0909lszs22701725 26 d z010128 x46012860 27 a 21025196 x450i19650 10 pages revised 073109 Page 1 of 2201 Ch01 Louisiana State University EXST 2201 Introduction to Statistical Analysis CHAPTER 01 General Concepts Lesson 010 Overview Lesson 011 De nitions Lesson 012 Sampling Lesson 013 Types of Data Lesson 014 Studies Lesson 010 Overview END 010 Overview 10 pages revised 073109 Page 2 of 2201 Ch01 Lesson 011 Definitions Chapter in Book Q Suggested Concepts and Vocabulary all 15 Suggested Exercises odd 3950 Statistics Popu on Sample I Statistics Definition Defn Statistics The science of Collecting Organizing and Summarizing and Analyzing information in order to draw conclusions Also known as the science of data I Example Statistics Example 01a I Probabilistic Data Statistics is not the science of chaotic data data that has absolutely no pattern If there is absolutely no pattern then it is impossible to relate one set of data to another the important function of inferential statistics Statistics is the science of probabilistic data Probabilistic data has the characteristic of its value being unknown for one observation but its value for many observations is known For example I do not know if the next coin ip will be heads or tails but I do know that in 10000 coin ips there will be about 5000 heads and 5000 tails Probabilistic data is very well characterized in the long run but is unknown in the short run Areas Statistics can be broken down into techniques from three areas Sampling Descriptive Statistics and I nferential Statistics Techniques from all three areas can be combined into different statistical Methods Defn Sampling Techniques used to collect information Major technique is simple random sampling These are Collecting techniques Defn Descriptive Statistics Techniques used to condense and describe sets of data 10 pages revised 073109 Page 3 of 2201 Ch01 Major techniques are frequency table histogram and summary numbers These are Organizing and Summarizing techniques Defn lnferential Statistics Techniques used to systematically draw conclusions about a population from a set of sample data Refers to methods used to interpret data The goal is generalization of information about a sample to information about a population This generalization may lead to incorrect conclusions so inferential statistics uses the mathematics of probability to quantify the lack of certainty by stating a level of confidence in a conclusion Major tools are hypothesis testing and confidence intervals These are Analyzing techniques Defn Statistical Methods Combinations ofthe descriptive and inferential techniques discussed above Population and Sample Defn Population The totality of elements in a well defined group to be studied A population must be well de ned ie you must clearly state what exact and specific elements people animals etc do and do not belong to the population Defn Sample A subset ofa population The larger the sample size the better but the method of sampling ie random is more important than the sample size Defn Individual One object from the population END 011 De nitions 10 pages revised 073109 Page 4 of 2201 Ch01 Lesson 012 Sampling Chapter in Book Q Suggested Concepts and Vocabulary all 1 34 67 Suggested Exercises odd 918 21a 23a 25a Sampling Simple random sampe Samgling Goal The goal of sampling is to collect l A measurable number of individuals that are 2 Representative of the population Then measuring the sample gives us information about the population Basic Techniques There are four basic sampling techniques simple random sampling strati ed sampling systematic sampling and cluster sampling In this class our interest will be in the simplest way to get a sample representative of the population simple random sampling The other three methods are basically modi cations of simple random sampling to reduce costs by increasing the collection of desired information at the cost of undesired information Types Sampling can be done 1 With replacement or 2 Without replacement In practice we usually sample without replacement However statistical techniques assume sampling with replacement This is usually not an issue in most of the situations we encounter so in this course we will only discuss sampling with replacement Errors Nonsampling errors are nonstatistical errors that occur during sampling Most of the nonsampling errors are systematic errors in the design of the sampling method These errors cannot be determined or corrected using statistical techniques Some examples of nonsampling errors are Coverage errors Incomplete list of population Nonresponse errors Cannot measure selected element Inaccurate response errors Poor record keeping lying Measurement errors Ambiguous questions crude tools 10 pages revised 073109 Page 5 of 2201 Ch01 Simple Random Sampling Defn Simple Random Sampling A method of choosing a sample such that each sample ofthe same size has the same chance of being chosen And each individual in the population has an equal chance of being chosen to be in a sample Random sampling 1 Does remove selection bias from the sample 2 Does not affect the natural variability of the data 3 Does not guarantee a representative sample Thus a sample chosen randomly allows us to determine the natural variability of the data and also to use the mathematics of probability to make inferences about the population Random sampling does not guarantee a representative sample but it is the best technique to use and it does allow us to make inferences informed guesses about the population Inference Valid statistical inferences can only be made from a random sample to the population from which the sample was drawn Method Simple Random Sampling 1 Assign every individual in the population a number 2 Select individuals to be in the sample by using a a Random Number Table or b Random Number Generator SData gt Random Var s gt Distribution I Example Simple Random Sample Question 21 page 21 I END 012 Sampling 10 pages revised 073109 Page 6 of 2201 Ch01 Lesson 013 Types of Data Chapter in Book Q Suggested Concepts and Vocabulary all 1012 14 Suggested Exercises odd 1538 5153 I Classes of DaLa Types of Dela Classes of Data Statistics is concerned with three classes of data Defn Constant A characteristic of the individuals in the population whose repeated measurement gives only one possible value Defn Variable A characteristic of the individuals in the population whose repeated measurement gives many possible values Defn Random Variable A characteristic of the individuals in the population whose repeated measurement gives randomly varying values Its value is not known in advance but is determined by chance Statistics is concerned with data from random variables Types of Data Statistics is concerned with three types of data Defn Qualitative Data Data that can be classified by some mutually exclusive and exhaustive quality of individuals of the population Example Colors Religion Gender Defn Quantitative Data Data that is numerical and allows the use of arithmetic operations Discrete Data Quantitative data that has an easily countable number of possible values Example Integers from 1 to 10 Number of items 10 pages revised 073109 Page 7 of 2201 Ch01 Continuous Data Quantitative data that has an in nite number of possible values Example Time Weight Length I Example Types of Data Below I Qualitative Method Identify Type of Data 1 Pick any two data points 2 Can they be ordered NO gt Qualitative Data YES gt Go to Step 3 3 Countable number of values NO gt Continuous Data between YES gt Discrete Data I Example Types of Data Example 01b I END 013 Types of Data 10 pages revised 073109 Page 8 of 2201 Ch01 Lesson 014 Studies Chapter in Book Q L Suggested Concepts and Vocabulary all 67 1 3 Suggested Exercises odd 918 513 I Types of Study Ternlnology of Experiments Types of Study Census A study that does not collect a sample but instead measures every individual in the entire population A census is not of much interest in the statistics of this class Defn Census A study that measures a characteristic of the individuals in a population An observational study Does not involve a sample A census measures a population Example The US Census Observational study A study that does collect a sample Defn Observational Study A study that measures a characteristic in a sample without controlling the experimental units or the treatment Called an ex post study after the fact because the value of the characteristic had already been established Can determine association but cannot determine causation Example The heights of students in this class Experimental design A study that does collect a sample Defn Experimental Design A study that measures a characteristic in a sample with controlling the experimental units or the treatment Example Measure the weight gain of three females versus three males on a week long high protein diet 10 pages revised 073109 Page 9 of 2201 Ch01 Experimental designs can be of two types Independent or Dependent design Defn Independent Design Where all experimental units are chosen randomly and assigned to treatments randomly Book calls this a randomized design Defn Dependent Design Where onehalf ofthe experimental units are chosen randomly and the second half are chosen by matching some characteristic Book calls this a matchedpairs design I Example Type of Study Question 47 page 10 I Choose type The way to choose which type of study to do is to consider the amount of control you have over the characteristics of the individuals in your sample If you can 1 Control the characteristics of the individuals in your sample and 2 Control the treatment then an experimental study is the most appropriate study If you cannot do these two things then an observational study is the most appropriate study Causation Observational studies cannot be used to determine causation because of the possible presence of lurking variables Defn Lurking Variables Characteristics of the individuals in a sample that are not measured during the study but do affect the result An example is snoring and rate of heart attacks The word experiment implies a laboratory with a carefully controlled environment In statistics experiment implies a high level of control as well Only a properly designed experimental study can be used determine cause and often it takes more than one study because it can control or eliminate most lurking variables Terminolo 0 Ex erz39ments Defn Experimental Unit An individual in the sample Defn Treatment A condition of interest that is applied the experimental unit 10 pages revised 073109 Page 10 of 2201 Ch01 Defn Response Variable A quantitative or qualitative variable that re ects the characteristic of interest Defn Double Blind Neither the researcher nor the experimental unit knows if or what treatment is being applied Defn Placebo A false treatment that has no effect Used to prevent experimental units from knowing ifthey receive the treatment I Example Experimental Study Quesion 33 page 51 I END 014 Studies Anova Project Ashley Nini Section 16 Monique Harvey Section 16 Erin Wrona April 20 2009 A local real estate agent is often asked what the usual electric bill is for the area in any given household As statisticians we asked ourselves what we could do to help her come up with an accurate response Using the OneWay ANOVA we began our research by asking the question Is the location of Year 1 Year 2 and Year 3 the same or different for the electricity bill for an individual household in Terre Haute IN from January 1995 7 December of 2000 Information on the amount of the bills was collected from one household and its amount was compared throughout a five year span In our graphical analysis we discovered that Year 1 and Year 2 are skewed to the right Year 3 is more normally distributed There are no unusual features found in our graphs Year 1 has no outliers Year 2 has two and Year 3 has one outlier This information was drawn from our boxplots NumericalAnalysis N Mean Median STD Min Max Yearl 24 117542 1015 57719 50 256 Year 2 24 70875 595 39486 34 199 Year 3 24 84542 85 44945 0 191 One way Anova was the major statistical method used Included in our method is the formula F and Tukey s Studentized Range Honestly Significant Difference HSD While using SAS statistical software we first described our data by distribution analysis Then we performed a one way analysis of variance using analyzegtANOVAgtOne Way ANOVA The Dependent variables were the Bills and the Independent variables were the Years Class Level Information Class Levels Values Years 3 119951996 21997 1998 319992000 Observations Number of Observations Read 72 Number of Observations Used 72 Dependent Variable Bill Source DF Sum of Squares Mean Square F Value PrgtF Model 2 2762844 138142 6 0004 Error 69 15894654 230357 Corrected Total 71 18657499 Means With the same letter are not signi cantly different Tukey Grouping Mean N Years Std Deviation A 11754 24 1 577189436 A B A 8454 24 3 449453573 B B 7088 24 2 394861698 Our ANOVA Pvalue indicates a signi cant value In terms of the means they are not all the same Year 1 and Year 3 are both A in the Tukey grouping and have the same mean What this means in terms of our research question is that they all have different locations and hence the box plot does not indicate that the locations are the same In conclusion we reject the null because at least one mean is not equal to the others Therefore the electric bill for an individual household has different locations for Year 1 Year 2 and Year3 From this it would not be possible to predict the electric bill in this area unless we had more years to gather collective data One way to improve our study would be to only test for a certain season Our study had bills from summer and winter combined which made it harder to get an average measurement revised 073107 Page 10f Equation Sheet 220139Lecture 1 pages Louisiana State University EXST 2201 Introduetion to Statistical Analysis EquatiOn Sheet 39 k Mean 17 299 Percent e AOOn n Inter Quartile Range IQR Q3 Q1 Madlan XOOVH Fences QL3 i15IQR 2 Variance 6232 2m Five Number39Summary minQ1MQ3maX n1 T f t39 x Std 03S 39S 2 z rans ormalon a z 1 CL Yiz 39 j 0 UxZ A 2 J sf S to M mata F nSquot t m CL c7ita n Sci s 0 W A 2 1 2 a 39 C100 fx ta a rox 0 2 EW Contrasts to quot x1 2 20 1 2 2 2 p Zny I S1 S2 r n1 n1 n2 n1 quot2 b Sy 3 b0b1x 1 r b0 J7 b137 S Se b1 O 39 CI b1it n2Sbl b sx n l t0 Sbl 39 10 S np1 p Z a 13 13113 0 quot quot Z xoi05 P Vino I M CI piza2 J 0 A A quotPUP ZO pl p20 01 1 2iza2 g x x1x2 A 1 A i i A A A A p n nln2 pp ppnl 712 P11nP1 172071 192 1 2 Standard Norma Distribution z 00 01 02 03 04 05 06 07 08 09 34 00003 00003 00003 00003 00003 00003 00003 00003 00003 00002 33 00005 00005 00005 00004 00004 00004 00004 00004 00004 00003 32 00007 00007 00006 00006 00006 00006 00006 00005 00005 00005 31 00010 00009 00009 00009 00008 00008 00008 00008 00007 00007 30 00013 00013 00013 00012 00012 00011 00011 00011 00010 00010 29 00019 00018 00018 00017 00016 00016 00015 00015 00014 00014 28 00026 00025 00024 00023 00023 00022 00021 00021 00020 00019 27 00035 00034 00033 00032 00031 00030 00029 00028 00027 00026 26 00047 00045 00044 00043 00041 00040 00039 00038 00037 00036 25 00062 00060 00059 00057 00055 00054 00052 00051 00049 00048 24 00082 00080 00078 00075 00073 00071 00069 00068 00066 00064 23 00107 00104 00102 00099 00096 00094 00091 00089 00087 00084 22 00139 00136 00132 00129 00125 00122 00119 00116 00113 00110 121 00179 00174 00170 00166 00162 00158 00154 00150 00146 00143 20 39 00228 00222 00217 00212 00207 00202 00197 00192 00188 00183 19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233 18 00359 00351 00344 00336 00329 00322 00314 00307 00301 00294 17 00446 00436 00427 00418 00409 00401 00392 00384 00375 00367 16 00548 00537 00526 00516 00505 00495 00485 00475 00465 00455 15 00668 00655 00643 00630 00618 00606 00594 00582 00571 00559 14 00808 00793 00778 00764 00749 00735 00721 00708 00694 00681 13 00968 00951 00934 00918 00901 00885 00869 00853 00838 00823 12 01151 01131 01112 01093 01075 01056 01038 01020 01003 00985 11 01357 01335 01314 01292 01271 01251 01230 01210 01190 01170 10 01587 01562 01539 01515 01492 01469 01446 01423 01401 01379 09 01841 01814 01788 01762 01736 01711 01685 01660 01635 01611 08 02119 02090 02061 02033 02005 01977 01949 01922 01894 01867 07 02420 02389 02358 02327 02296 02266 02236 02206 02177 02148 06 02743 02709 02676 02643 02611 02578 02546 02514 02483 02451 05 03085 39 03050 03015 02981 02946 02912 02877 02843 02810 02776 04 03446 03409 03372 03336 03300 03264 03228 03192 03156 03121 03 03821 03783 03745 03707 03669 03632 03594 03557 03520 03483 02 04207 39 04168 04129 04090 04052 04013 03974 03936 03897 03859 01 04602 04562 04522 04483 04443 04404 04364 04325 04286 04247 00 05000 04960 04920 04880 04840 04801 04761 04721 04681 04641 Area in right tail tDistribution Area in Right Tail df 025 020 015 010 005 0025 002 001 0005 00025 0001 00005 1 1000 1376 1963 3078 6314 12706 15894 31821 63657 127321 318289 636558 2 0816 1061 1386 1886 2920 4303 4849 6965 9925 14089 22328 31600 3 0765 0978 1250 1638 2353 3182 3482 4541 5841 7453 10214 12924 4 0741 0941 1190 1533 2132 2776 2999 3747 4604 5598 7173 8610 5 0727 0920 1156 1476 2015 2571 2757 3365 4032 4773 5893 6869 6 0718 0906 1134 1440 1943 2447 2612 3143 39 3707 4317 5208 5959 7 0711 0896 1119 1415 1895 2365 2517 2998 3499 4029 4785 5408 8 0706 0889 1108 1397 1860 2306 2449 2896 3355 3833 4501 5041 9 0703 0883 1100 1383 1833 2262 2398 2821 3250 3690 4297 4781 10 0700 0879 1093 1372 1812 2228 2359 2764 3169 3581 4144 4587 11 0697 0876 1088 1363 1796 2201 2328 2718 3106 3497 4025 4437 12 0695 0873 1083 1356 1782 2179 2303 2681 3055 3428 3930 4318 13 0694 0870 1079 1350 1771 2160 2282 2650 3012 3372 3852 4221 14 0692 0868 1076 1345 1761 2145 2264 2624 2977 3326 3787 4140 15 0691 0866 1074 1341 1753 2131 2249 2602 2947 3286 3733 4073 16 0690 0865 1071 1337 1746 2120 2235 2583 2921 3252 3686 4015 17 0689 0863 1069 1333 1740 2110 2224 2567 2898 3222 3646 3965 18 0688 0862 1067 1330 1734 2101 2214 2552 2878 3197 3611 3922 19 0688 0861 1066 1328 1729 2093 2205 2539 2861 3174 3579 3883 20 0687 0860 1064 1325 1725 2086 2197 2528 2845 3153 3552 3850 21 0686 0859 1063 1323 1721 2080 2189 2518 2831 3135 3527 3819 22 0686 0858 1061 1321 1717 2074 2183 2508 2819 3119 3505 3792 23 0685 0858 1060 1319 1714 2069 2177 2500 2807 3104 3485 3768 24 0685 0857 1059 1318 1711 2064 2172 2492 2797 3091 3467 3745 25 0684 0856 1058 1316 1708 2060 2167 2485 2787 3078 3450 3725 26 0684 0856 1058 1315 1706 2056 2162 2479 2779 3067 3435 3707 27 0684 0855 1057 1314 1703 2052 2158 2473 2771 3057 3421 3690 28 0683 0855 1056 1313 1701 2048 2154 2467 2763 3047 3408 3674 29 0683 0854 1055 1311 1699 2045 2150 2462 2756 3038 3396 3659 30 0683 0854 1055 1310 1697 2042 2147 2457 2750 3030 3385 3646 31 0682 0853 1054 1309 1696 2040 2144 2453 2744 3022 3375 3633 32 0682 0853 1054 1309 1694 2037 2141 2449 2738 3015 3365 3622 33 0682 0853 1053 1308 1692 2035 2138 2445 2733 3008 3356 3611 34 0682 0852 1052 1307 1691 2032 2136 2441 2728 3002 3348 3601 35 0682 0852 1052 1306 1690 2030 2133 2438 2724 2996 3340 3591 36 0681 0852 1052 1306 1688 2028 2131 2435 2719 2990 3333 3582 37 0681 0851 1051 1305 1687 2026 2129 2431 2715 2985 3326 3574 38 0681 0851 1051 1304 1686 2024 2127 2429 2712 2980 3319 3566 39 0681 0851 1050 1304 1685 2023 2125 2426 2708 2976 3313 3558 40 0681 0851 1050 1303 1684 2021 2123 2423 2704 2971 3307 3551 50 0679 0849 1047 1299 1676 2009 2109 2403 2678 2937 3261 3496 60 0679 0848 1045 1296 1671 2000 2099 2390 2660 2915 3232 3460 70 0678 0847 1044 1294 1667 1994 2093 2381 2648 2899 3211 3435 80 0678 0846 1043 1292 1664 1990 2088 2374 2639 2887 3195 3416 90 0677 0846 1042 1291 1662 1987 2084 2368 2632 2878 3183 3402 100 0677 0845 1042 1290 1660 1984 2081 2364 2626 2871 3174 3390 1000 0675 0842 1037 1282 1646 1962 2056 2330 2581 2813 3098 3300 z 0674 0841 1036 1282 1645 1960 2054 2326 2576 2807 3091 3291 2 pages revised 110912 Page 1 of 220L P03 EXSTZZOI Introduction to Statisticdlna zsis Lab Project Data For Fall 2012 Directions Please ask one question appropriate for this set of data do one inferential statistical analysis twosample t test ANOVA or simple regressioncorrelation and write a formal report to present the results Instructions for the Lab Project report are available at the end of the Lab Notes You must use all of the Price data in your analysis regression will also use the Salary data Description This information was taken from the American Housing Survey for 2003 and represents National information This information was collected by a survey of 21977 households in the United States and relates education level and marriage status with house price and salary The population is all households in the United States and this data is a random sample of the 21977 observations The variables recorded were Education Educational Level measured as the highest certi cation received High School diploma College fouryear degree or Advanced graduate degree Married Marriage Status measured as Married currently married with spouse present or notpresent or NotMarried single widowed divorced Price House Price measured as the cost of the building and land in thousands of dollars and Salary House Holder Salary as annual income in dollars 1 quot39339 quot 39 39 College Advanced Married NotMarried Married NotMarried Married NotMarried m m m Salli m Sali m m m Mr Price m 26 3O 22 25 18 22 62 64 286 38 315 38 4O 45 5O 47 31 25 120 90 52 55 300 37 45 4O 55 28 84 36 135 35 68 4O 55 12 92 15 150 60 68 95 8O 22 129 80 230 60 120 17 84 28 155 78 235 15 125 49 120 15 157 50 125 72 125 36 180 39 140 20 153 56 217 35 163 41 208 75 170 17 235 54 217 93 239 100 249 80 255 52 2 pages revised 110912 Page 2 of 220L P03 When I analyzed the full data set I found the results for House Price shown below Means I Marriage Status I Average I Standard Deviation I Married 149727 136067 Not Married 113538 110001 Means signi cantly different I Linear 39 quot 39 39 I Parameter Estimate I PValue I I Intercept I I lt00001 I Salary 0872 lt00001 I am curious to see how close your results from a smaller random sample will match the results of this larger random sample

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.