Elementary Statistical Methods
Elementary Statistical Methods STAT 30100
Popular in Course
Popular in Statistics
This 24 page Class Notes was uploaded by Bailey Macejkovic on Saturday September 19, 2015. The Class Notes belongs to STAT 30100 at Purdue University taught by Tao Wang in Fall. Since its upload, it has received 242 views. For similar materials see /class/207939/stat-30100-purdue-university in Statistics at Purdue University.
Reviews for Elementary Statistical Methods
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/19/15
Stat 301 Review Final The nal will be broken down as follows Approximately 50 new material Approximately 50 old material from chapters 1 2 3 7 8 and 9 Here is a checklist broken down by section Check Sectlon Concept List Graphs 0 Know which graph to use given a word problem 0 Know how to describe your data based on a given graph Are there any outliers or gaps Is it symmetric skewed left or right Is it unimodal or bimodal Where is the center of the distribution Numerical 0 Know which numerical summaries are most useful based on the Summaries shape of the distribution of your data 0 Know which numerical summaries work best together 0 Understand the concept of a resistant measure know the de nition as well as the measures which are resistant Data Vocabulary concepts collection 0 Anecdotal evidence Available data Unit Population Sample Census Observational study versus experiment Experimental unit Subjects Treatments Factors Factor levels Placebo Control group Statistical signi cance Three principles of experimental design Know how to randomize Problems versus advantages of experiments Non random sampling Random sampling Sampling bias Undercoverage Nonresponse R6 ponse bias Parameter Statistics Sampling variability Sampling distribution of a statistic Unbiased estimator How population size affects the sampling variability of a statistic Experimental Designs Designs Do not just study the de nitions of these three designs You will need to be able to read a problem and determine which type of design was used You will also need to know how to diagram the design 0 Completely randomized design 0 Randomized block design 0 Matched pairs Sampling Designs Designs Do not just study the de nitions of these designs You will need to be able to read a problem and determine which type of sampling was used 0 Voluntary response sample Simple random sample Strati ed random sample quot39 sample What kind of stories and graphs go with a ttestcon dence interval for the onesample mean matched pairs 2sample comparison of means 0 When it is better to calculate a con dence interval versus conduct a hypothesis test Ch 12 What kind of stories and graphs go with a oneway ANOVA problem Ch 13 What kind of stories and graphs go with a twoway ANOVA problem Ch 8 0 Know how to do con dence intervals for both one and two sample proportion problems Know how to do hypothesis tests for both one and two sample proportion problems Know when it is appropriate to use the formulas in these chapters Ch 9 and Section 25 0 Given a twoway table nd the joint distribution of categorical variables Given a twoway table nd the marginal distribution of categorical variables Given a twoway table nd the conditional distribution of categorical variables Given a twoway table nd the joint marginal and conditional probabilities Relationship between a 12 test and a two sample proportion test Do a hypothesis test for a 12 test Know when it is appropriate to use a 2 test Ch 2 and 10 Know how to interpret a normal probability plot scatterplot and residual plot Use SPSS output to nd the following leastsquares regression line correlation r2 and estimate for 6 Find the residual for one of the sets of data Use SPSS to nd the con dence interval for the regression slope and intercept Hypothesis test for the regression slope state the null and alternative hypothesis obtain the test statistic and P value from SPSS output and state your conclusions in terms of the problem Test for zero population correlation state the null and alternative hypothesis calculate the test statistic and nd the Pvalue and state your conclusions in terms of the problem Outlier versus in uential variables Common response versus confounding Causation Ch 11 Use SPSS output to nd the following Leastsquares regression line correlation r and estimate for 6 Use the leastsquares regression line for prediction The F test state the null and alternative hypothesis calculate the test statistic and nd the Pvalue from the SPSS output and state your conclusions in terms of the problem Know how to determine which explanatory variables should be included in a model signi cance tests for j Know how to nd the con dence interval for i mple proportion One percent or proportion Categorical data A X Where p 7 n To nd the z value look at the last row of the t table Hypotheses HE pp versus Hapgtp Hapltp or Hap p Test Stati 39c H pgtp use PZgtz H pltp use PZltz or Ha p p use 2PZ gt zD Look up Pvalues on Normal table samp e proportion Two percents or proportions are compare Categorical data r i A x1 A x2 where A 7 and pl 7 quot1 quot2 To nd the z value look at the last row of the t table um Hnplp2 versus Haiplgtp2HaiplltpzOrHaIp1 p2 m Note 27 L x2 rtl rt2 Pvalue Ham gtpzuse PZ gt2 Haplltpzuse PZltz or Ha pl p2 use 2PZ gt 2 Look up Pvalues on TABLE A 12 test Two categorical variables are compared Categorical data None Hypotheses H U There is no relationship between A and B Ha There is a relationship between A and B Test statistic Read 12 value from the printout Pvalue ReadPvalue from the printout The problems below have been taken from old nals MATCHING For problems 110 write the letter of the most appropriate statistical analysis technique next to the story Note each answer choice may be used once more than once or not at all 1 Is there a significant average difference between Mean and0r Standard Wednesday and Saturday gas prices if we check these deVlath 20 stations on both days F1ve number summary 2 What is the median gas price for Lafayette gas S 1 1 Stations 1mp e mear regress1on 3 Does the number of insurgent attacks in the war in Iraq Multlpie hnear regress1on affect gas pr1ces on a weekly bas1s lsample mean ttest 4 W111 the percentage of people travelmg by plane be higher on Memorial Day weekend or Labor Day Matched pairs Nest weekend 2sample Comparison 5 Do region of the country and size of vehicle small car of means west large car truck SUV have an effect on the number of people traveling over Memorial Day weekend lsample proportion Z test 6 Are region of the country and size of vehicle small car large car truck SUV associated 239sample Propomon Z39 test 7 Is there a significant difference between the average Indiana gas price and the average California gas price Chl39squared teSt today if 20 stations in each state are sampled Oneway ANOVA 8 Is there a difference in the average number of times a T ANOVA month a driver fills up his tank for drivers of small W0 way cars large cars trucks and SUVs 9 I want to predict the number of people who will travel on Memorial Day this year by looking at gas prices temperatures unemployment rates consumer price indices and presidential approval percentages over the past 30 years 10 Is the average gas price for Indiana stations last Wednesday less than 215 For questions 1115 choose the letter for the graph listed below which would be appropriate for answering the questions Each letter may be used once more than once or not at all A Scatterplot 11 B Side by side boxplots C Histogram D Pie Chart What is the percentage of Indiana vehicles which are small passenger cars large passenger cars trucks SUVs and other Is there much difference between the gas mileage of small passenger cars large passenger cars trucks and SUVs Are gas prices and daily high temperature independent Is there a negative association between the number of hybrid cars registered to a state and the number of people who voted for George W Bush in the election Is the distribution of people per state who own hybrid cars symmetric or skewed Alex is a homeowner and is concerned about heating costs He feels the outside temperature has an impact on the amount of gas used to heat his house So he looks on the website wwwweather com and nds the temperatures for each day and determines the average degree days per month He nds his heating bill and records the gas consumption for each month Below is a record of the results and the output after he entered the data into SPSS Month De greedays Gas consumption I Oct I Nov I Dec I Jan I Feb I Mar I Apr I 161 262 370 409 306 155 108 50 61 84 101 80 43 35 Model Summar Adjusted R Std Error of R R Sg re S the Estimate 1 991 a 983 980 4162 ANOVAb Sum of Mean Model Sguares df Sg re F Sig 1 Regression 68990 1 68990 398345 OOOa Residual 1212 7 173 Total 70202 8 a Predictors Constant Degreedays b Dependent Variable Gas consumption Coefficientsa Unstandardized Standardized 9 5 Confidence Interval Coefficients Coefficients for B Std Lower Upper Model B Error Beta t Sig Bound Bound 1 Constant 1 094 258 4235 004 483 1 705 Degreedays 212 011 991 19959 000 187 237 a Dependent Variable Gas consumption a What is the explanatory variable b What is the response variable 0 Describe the form strength and direction of the relationship 3 1 quot1 W Pquot What is the equation of the least squares regression line for the heating season What is the predicted gas consumption when degreedays is 306 Find the residual value when degree days is 306 How much of the variation in gas consumption is explained by the leastsquares regression What is the 95 con dence interval for the regression coef cient of degreedays What is the 99 con dence interval for the regression coef cient of degreedays Do a test to determine if there is a linear relationship between degreedays and gas consumption State your hypotheses test statistic Pvalue and your conclusion in terms of the story 17 As an ma Suppurta Emma39s uutballteam Fetewzntstu du almle analysts Hetuuk a beluw mm mm 55mm eat em mm a Premcmvs canaammnenuanc epsndemvanah e PmmsPuvdue Stayed Eua mems ummmm sunaamma Enemmems Enemmems m mm menmm a n 5 vaev Bum ammo 5521 mm 5254 am 32522 77945 e dancea iame ummw mu am 2m ma um um mm MVauaMe Vamshuduescmed bD E Fquot 0 3 1 D quot1 W Pquot What is the explanatory variable What is the response variable Describe the form strength and direction of the relationship What is the equation of the least squares regression line for the number of points scored What is the predicted number of points scored when the attendance is 56400 When the attendance was 56400 Purdue scored 31 points What is its residual How much of the variation in number of points scored by Purdue is explained by the leastsquares regression What is the 95 con dence interval for the regression coef cient of attendance at games Do a test to determine if there is a negative linear relationship between attendance at games and number of points scored by Purdue State your hypotheses test statistic P value and your conclusion in terms of the story After thinking some more Pete thought there could be other variables that might affect the number of points Purdue scored One variable of interest is the number of points the opponent scores He added this variable to his analysis and did a multiple regression a Using the output on the next four pages what is the best equation of a line for predicting the number ofpoints Purdue scored in a game use or 01 b Give 4 reasons for why you made that choice Correlations Points Purdue Attendance Points Opponents Scored at Game Scored Points Purdue Scored Pearson Correlation 1 611 075 Sig 2 tailed 016 790 N 15 15 15 Attendance at Game Pearson Correlation 611 1 157 Sig 2 tailed 016 576 N 15 15 15 Points Opponents Scored Pearson Correlation 075 157 1 Sig 2 tailed 790 576 N 15 15 15 Correlation is signi cant atthe 005 level 2 tailed Points Purdue Scored o 00 50 40 O O o E 30 o 0 O O O 8 O a g g 20 o g I a 5 D 10 19 E O o 0 O n 00 50000 00000 70000 80000 90000 100000110000120000 10 20 30 40 Attendance at Game Points Opponents Scored 110000 0 0 100000 90000 80000 d E 70000 E D I I C U 13 60000 8 0 E 50000 E 39 I 2 a 40000 10 20 30 40 Points Opponents Scored SPSS output for using POINTS OPPONENTS SCORED and ATTENDANDCE AT AME to predict POINTS PURDUE SCORED Model Summary Adjusted Std Error of Model R R Square R Square the Estimate I1 I 611al 374 269 l 11474 a Predictors Constant Points Opponents Scored Attendance at Game ANovnP Sum of Model Squares df Mean Square F Sig 1 Regression 943174 2 471587 3582 60a Residual 1579759 12 131647 Total 2522933 14 a Predictors Constant Points Opponents Scored Attendance at Game b Dependent Variable Points Purdue Scored Coef cientsa Unstandardized Standardized nnf innts Coef cients Model B Std Error Beta t Sig 1 Constant 55997 13704 4086 002 Attendance at Game 399E04 000 614 2656 021 Points Opponents Scored 265E02 286 021 093 928 a Dependent Variable Points Purdue Scored Adjusted 5m Evmvai R Suave R Suave We Eshmate 61139 M U25 3 mummy Cansmnu ANEndance a Game Mow 3 a Premcmvs CanstanlM endancea Game in Dependemvanab e PmmsPuvdue Scaved Eua mems ummmm sunaamma Mad 1 ammo A endance new Dwendemvauab e 7mm mm Scaved SPSS output for using just POINTS OPPONENTS SCORED to predict POINTS PURDUE SCORED39 Model Summary Adjusted R Square Std Error of the Estimate R Square a Predictors Constant Points Opponents Scored ANovnP Sum of Model Squares df Mean Square F Sig 1 Regression 14 204 1 14204 074 90a Residual 2508729 13 192979 Total 2522933 14 a Predictors Constant Points Opponents Scored b Dependent Variable Points Purdue Scored Coef cientsa Unstandardized Standardized nnf innts nnf r innts Model B Std Error Beta t Sig 1 Constant 24931 8649 2882 013 Points Opponents Scored 9284E 02 342 075 271 790 a Dependent Variable Points Purdue Scored 19 An environmental health professor conducted a study to see whether fastfood workers wearing gloves actually lowers the chance that customers will come down with food poisoning The scientists purchased 371 tortillas from several local fastfood restaurants noting whether the workers were wearing gloves or not 190 of the tortillas came from bare hands restaurants 181 of the tortillas came from glovewearing restaurants The scientists then tested the tortillas purchased for microbe growth They found that the barehands restaurants tortillas gave rise to microbe growth on 18 tortillas and the glovewearing restaurants tortillas gave rise to microbe growth only on 8 tortillas Is the glovewearing restaurants tortillas microbe growth signi cantly lower than the barehands restaurants microbe growth at the 5 signi cance level 1 State your hypotheses for this test 2 Calculate your test statistic 3 Find your Pvalue 4 State your conclusion in terms of the story 20 In a 1984 survey oflicensed drivers in Wisconsin 214 of 1200 men said that they did not drink alcohol Construct a 95 con dence interval for the proportion of men who said that they did not drink alcohol Is your con dence interval calculation reasonable Why 21 On the next page is the SPSS output for a study of alcohol and nicotine consumption among 452 pregnant women Nicotine consumption is divided into 3 categories and alcohol consumption is divided into 4 categories Answer the questions below based on the output that follows a What proportion of the nonalcohol consuming women do not smoke during pregnancy Is this a joint marginal or conditional probability b What proportion of women do not smoke and do not consume alcohol during pregnancy Is this a joint marginal or conditional probability c Find the marginal distribution for alcohol consumption during pregnancy d State the null and alternative hypotheses to test whether there is a relationship between alcohol consumption and smoking during pregnancy e What are the test statistic and Pvalue used to test the hypotheses in part d f State your conclusions in terms of the original problem g Are your results for the above test valid Explain your answer Alcohol Nicotine Crosstabulation Count Nicotine 115 16 or more None Total Alcohol 0110 5 13 58 76 1199 37 42 84 163 10 16 17 57 90 None 7 11 105 123 Total 65 83 304 452 Note Nicotine is measured in milligramsday and alcohol in ounces per day Alcohol Nicotine Crosstabulation Nicotine 115 16 or more None Total Aloohol 0110 Count 5 13 58 76 Expected Count 109 140 511 760 of Total 11 29 128 168 1199 Count 37 42 84 163 Expected Count 234 299 1096 1630 of Total 82 93 186 361 10 or more Count 16 17 57 90 Expected Count 129 165 605 900 of Total 35 38 126 199 None Count 7 11 105 123 Expected Count 177 226 827 1230 of Total 15 24 232 272 Total Count 65 83 304 452 Expected Count 650 830 3040 4520 of Total 144 184 673 1000 ChiSquare Tests Asymp Sig Value df 2 sided Pearson ChiSquare 422523 6 000 Likelihood Ratio 44653 6 000 N of Valid Cases 452 a 0 cells 0 have expected count less than 5 The 3 minimum expected count is 109 Multiple Choice Circle the letter of the correct answer and write its letter in the blank next to each story 2 2 72 N L 4 Does bread lose its vitamins when stored Twenty small loaves of bread were randomly assigned to one of four storage times one two three or four days After the bread had been stored for its respective amount of days its vitamin C content was measured This is an example of a simple random sample completely randomized design randomized block design matched pairs design strati ed random sample WUOW The department of health wanted to know how many people received u shots this year They thought that females were more likely to get a shot so they randomly selected 500 males and 500 females in Lafayette and West Lafayette to survey This is an example of a simple random sample completely randomized design randomized block design matched pairs design strati ed random sample WUOW Which of the following is a potential way to reduce sampling variability A Increase your sample size B Decrease your sample size C Increase your population size D Decrease your population size 20 For questions 2527 choose the letter for the type of bias listed below which is a problem in the story 2 72 2 VI 6 l A Undercoverage B Nonresponse C Response bias John wanted to nd out people s opinions regarding Greater Lafayette Health Services desire to build a new hospital Consequently he took a simple random sample of 500 Lafayette and West Lafayette residents listed in the phone book He is concerned however that those not listed in the phone book may have different views What type of bias is he concerned about When John attempted to collect data from those who made it into his sample he was unable to contact some of them and others refused to answer his survey questions What type of bias could this produce John was pleased with the unanimous response to his survey question which read Do you believe that building a new hospital is a waste of recourses and will leave two perfectly good buildings vacant What type of bias could his survey question be producing For questions 2831 choose the letter for the graph listed below which would be appropriate for answering the questions Each letter may be used once more than once or not at all A Scatterplot B Side by side hoxplot C Histogram D Bar graph 28 Compare the percentage of Lafayette residents who feel that a new hospital should be built with the percentage that don t feel that a new hospital should be built and the percentage who don t care 29 Is the distribution of people s ages who feel a new hospital should be built in Lafayette symmetric or skewed 30 Is there a positive association between the age and number of times a Lafayette resident visits one of the hospitals in a year 31 Is there a difference in the average number of hospital visits per year between Lafayette residents that would like to see a new hospital built and those who would not or don t care 21 MATCHING For problems 324l write the letter of the most appropriate statistical analysis technique next to the story Note each answer choice may be used once more than once or not at all 32 33 34 35 36 37 38 39 40 41 As the outdoor temperature in degrees increases do ice cream sales in dollars increase at the Silver Dipper Is there a signi cant average difference between softserve and hardpacked ice cream if we check the prices of both at 20 different ice cream parlors Do high school students spend more money on ice cream on average than college students Is the average number of scoops of ice cream a person eats in a summer week less than 5 Does a person s favorite avor triple chocolate chunky monkey or vanilla or residential proximity to an ice cream parlor reported only as less than 1 mile between 1 and 5 miles or more than 5 miles or their interaction have an effect on the amount of money a person spends on ice cream in a summer Can a person s age residential proximity to an ice cream parlor reported in miles and IQ do a good job of predicting how many ice cream cones that person will eat in a summer Is there a significant difference between how many ice cream cones a year on average freshmen sophomores juniors and seniors eat What is the maximum price for ice cream cones if I look at prices of single scoop cones from 25 different stores Is there a relationship between a person s favorite avor of ice cream triple chocolate chunky monkey or vanilla and their gender Is the percentage of men who like triple chocolate ice cream the best higher than the percentage of women who like triple chocolate ice cream the best 22 A F Mean and or standard deviation Five number summary Simple linear regression Multiple linear regression lsample mean ttest Matched pairs ttest 2sample Comparison of means ttest lsample proportion Ztest 2sample proportion Ztest Chisquared test Oneway ANOVA Twoway ANOVA Feta m H mm y w stauun39srepux use mu 1 a State the hypumeses furths test Calculate the test statistic Fmd the Frvalue State yuur cundusmn m terms nuns stury the snuuze buttun urvebeluwmsen and clearly On the label mm the 3211de Frvalue furthe hyputhesxs test m pans athmugh a shame 43 Is there a relationship between cigarette smoking and drinking With the recent proposed ordinance for West Lafayette peaking his interest an interested citizen selected a SRS of Purdue students surveyed those students and got the results summarized below Alcoholic drinks per week Cigarettes smoked per day Crosstabulation Cwnt day None 110 1120 21 Total Alcoholic dl l rks None 185 90 98 43 416 perweek 12 64 50 45 19 178 35 57 37 4O 25 159 6 89 57 61 40 247 Total 5 2 244 127 101 ChiSquare Tests Asymp Sig Value df 2sided Pearson ChiSquare 12 845 a 9 170 Likelihood Ratio 12 598 9 182 ggcggrnear 7527 1 006 N of Valid Cases 1amp30 a 0 cells 0 have expected count lessthan 5 The minimum expected count is 2019 If a student has no drinks per week what is the probability heshe smokes no cigarettes Is this a joint marginal or conditional probability Determine if the amount of drinking and cigarette smoking are related State your hypotheses your test statistic your PValue and your conclusion in terms of the story Fquot c Was it appropriate to do the test in part b Justify your answer 24
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'