New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Statistical Methods I

by: Coby Grant

Statistical Methods I STAT 251

Marketplace > Hollins University > Statistics > STAT 251 > Statistical Methods I
Coby Grant

GPA 3.63

Julie Clark

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Julie Clark
Class Notes
25 ?




Popular in Course

Popular in Statistics

This 32 page Class Notes was uploaded by Coby Grant on Monday October 12, 2015. The Class Notes belongs to STAT 251 at Hollins University taught by Julie Clark in Fall. Since its upload, it has received 46 views. For similar materials see /class/222137/stat-251-hollins-university in Statistics at Hollins University.


Reviews for Statistical Methods I


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/12/15
Stat 251 Midterm Solutions October 8 2008 Please write clearly on this paper using the back of a page if you need additional space You may use your text handouts notes calculator Minitab Excel and Java applets For questions that call for calculations present your method of solution in a clear well labeled manner and show the details of your calculations For questions that ask for interpretations and explanations explain your answers fully unless instructed otheiwise For problems with multiple parts be aware that you can usually complete later parts successfully whether or not you do so on earlier parts If one part of a question does use the answer to an earlier part that you have not been able to answer you may use a suitable symbol in place of the answer in working the later part of the question If you made an approximation or an assumption be sure to note when you have done so and why Don t spend too long on one question Tentative point values have been given to each problem to help you manage your time If any questions arise during the exam or you need any terminology clarified please do not hesitate to ask You have 60 minutes to complete the first six problems The last four problems are to be finished on a takehome basis and handed in by 1020 am on Monday October 13 2008 You are to work completely independently on all parts of the exam and you may not use any aids other than those mentioned above Name InClass Questions 1 Researchers in New Zealand investigated whether sleepiness is related to car crashes Connor et al 2002 They took a sample of 529 drivers who had been in a car crash and compared it to a control sample of 584 drivers who had not been in a car crash They interviewed all of these drivers about how much sleep they got in the day and week preceding the crash or non crash for the control group Researchers found that 65 in the crash group and 30 in the control group had ve or fewer hours of sleep in the previous day a 3 pts Is this an observational study or an experiment No explanation is necessary This is an observational study No one was assigned to get a certain amount of sleep b 5 pts Is it appropriate to calculate the relative risk of being in a crash comparing those with ve or fewer hours of sleep to those with more than ve hours of sleep Explain brie y Do not bother to calculate the relative risk regardless of your answer No This is a case control study because the researchers chose a number of drivers who had been in a crash and number who had not Therefore it s not appropriate to calculate the risk of being in a crash based on these data c 8 pts Calculate the odds ratio of being in a crash comparing those with ve or fewer hours of sleep to those with more than ve hours of sleep Also write a sentence interpreting this value I I 5 or fewer hours of sleep I More than 5 hours of sleep I Crash 65 464 Control no crash I 30 I 554 I The odds ratio is 65x55430X464 259 The odds ofbeing in a crash are about 259 times higher among people with five or fewer hours of sleep d 5 pts The p value for comparing these two groups is 0000144 Write a sentence or two interpreting what this says ie this is the probability of what The p value is the probability of obtaining a difference in group proportions as large as observed in this study by the random assignment process alone assuming that there s really no dijfkrence between the groups ie that the crash proportions are not related at all to the amount of sleep e 8 pts Summarize the conclusion to be drawn from this p value being sure to address the issue of cause and effect as well as the issue of statistical signi cance The extremely small p value provides overwhelming evidence that people with 5 or fewer hours of sleep are more likely to be in a car crash than those with more than 5 hours of sleep But we cannot conclude that sleep deprivation is the cause of the increased likeliness of crashing because this is an observational study 11 pts Biologists studied the relative brain sizes measured as brain weight divided by body weight times 1000 for 96 species of mammals The species were also classified by whether their average litter size is less than 2 or not Summary statistics are below av rage e itter variable size N Mean StDev Q Medi Q relative brain 2 or more 45 1 7 904 332 797 1868 Size un 51 5460 2400 5000 10400 A simulation was used to produce 1000 repetitions where the 96 brain sizes were randomly assigned to the 2 litter s39ze groups 12 in us Number in repetitions 15 73 u r 5 u u dillerence in average brain size a 5 pts Based on these simulation results would you consider the increase in avera e brain sizes for the larger litters to be statistically significant Explain by estimating and interpreting the pevalue The observed difwrence in group means 6 89 7 1097 4 08 So 7 thep value Z 106 1000 5007 With such a small p value we conclude that the difference is statistically signi cant It would be very surprising to nd a difwrence of 408 or less more extreme there was no real difwrence in average brain size between the two groups b 3 pts The previous study was operationally identical to that of another study and the results of he J39 L39 J 39 eieiiu iuuguiy Llul39LE as large in each group and the other summary statistics remained similar to the values listed above Without calculating would the pevalue for this combined study be larger smaller or approximately the same as that in a Explain your reasoning Smaller 9 with the larger sample sizes and all else being the same the pvalue will be smaller c 3 pts Suppose a separate analysis was carried out to analyze the following slightly different results average litter Variable size N Mean StDev Q1 Median Q3 relative brain 2 or more 45 1097 1984 332 797 2868 size under 2 51 6886 15460 2480 5000 20480 We cannot calculate estimate the p Value until we run another simulation based on these new data but you can predict without calculating whether the p Value for this comparing the average brain sizes between the two groups be larger smaller or essentially unchanged from the p Value you found in part a Make that prediction and explain your reasoning The p value would be larger because the spreads are larger but all else stayed the same 3 6 pts The ve number summary of the distances in miles of the 72 hikes described in the book Day Hikes in San Luis Obispo County is quot 39 I Lower quartile I Median I Upper quartile I 39 I 06 20 29 4575 95 a 3 pts What if anything can you say about the percentage of hikes that are longer than four miles Since this is not one of the quartiles the most that can be said is that this percentage is less than 50 Four is somewhere between the median 50 percentile and upper quartile 75 percentile b 3 pts What if anything can you say about whether there are any outliers high or low among these hike distances based on the 15 IQR rule Explain be as speci c as you can IQR 2575 15x IQR 38625 238625 457538625 186 84375 So 7 there is at least one high outlier 9 5 and there are no low outliers 4 14 pts Consider two basketball players Alice and Bree Suppose that for games played at Hollins Alice successfully makes 40 of her 50 free throws while Bree successfully makes 9 of her 10 free throws Suppose that for games played away from Hollins Alice successfully makes 2 of her 10 free throws while Bree successfully makes 25 of her 50 free throws a 3 pts For games played at Hollins which player successfully makes a high proportion of free throws Justify your answer with appropriate calculations Alice makes 4050 800 Bree makes 910 900 Bree makes a higher proportion b 3 pts For games played away from Hollins which player successfully makes a high proportion of free throws Justify your answer with appropriate calculations Alice makes 250 200 Bree makes 2550 500 Bree makes a higher proportion c 4 pts Now combine games played at Hollins and away from Hollins When these games are combined which player successfully makes a higher proportion of her free throws Justify your answer with appropriate calculations Alice makes 4260 700 Bree makes 3460 Z 567 Alice makes a higher proportion d 4 pts Explain why Simpson s paradox occur s here Be sure that you do more than describe the paradox be sure to explain why it happens in this case Base your explanation on the data provided Both players high a higher success proportion at home than away Alice gets most of her opportunities at home 5 0 of 60 anal Bree gets most of her opportunities away also 50 of 60 So even though Bree does better than Alice at both places Alice does better overall because she gets most of her opportunities where the success proportions are higher 5 13 pts A recent study Irwin Olmstead and Oxman Journal of the American Geriatrics Society 2007 involved 112 healthy adults ages 59 to 86 from two urban communities who have had previous cases of chickenpox Roughly half of them were randomly assigned to take tai chi classes three times a week for three months and the rest to take health education classes where they were taught good diet habits and stress management Then both groups were vaccinated with a chickenpox vaccine Researchers took periodic blood tests before and after vaccination to determine their level of immunity against shingles the rate of responding cells was determined for each person a 2 pts Identify the observational units in this study 112 healthy adults from two urban communities who have had chicken pox b 4 pts Identify the explanatory and the response variables in this study Explanatory variable tai chi classes or health education classes randomly assigned Response variable immunity level against shingles as determined by blood tests c 3 pts Was this an observational study or an experiment Justify your choice This is an experiment since the researchers imposed the tai chi vs health education classes 1 4 pts If the researchers nd that those Who took the tai chi classes had a higher level of shingles immunity than those Who took the health education classes could they reasonably attribute the immunity to the tai chi classes Explain brie y Yes 7 they can conclude causee quotect because this is a well designed experiment 6 3 pts In 1997 there were 11012 licensed drivers over age 55 involved in fatal crashes whereas there were only 7670 licensed drivers between the ages of 16 and 20 involved in fatal crashes Your friend concludes that younger licensed drivers are safer in terms of fatal crashes than older licensed drivers How should you respond This is not a reasonable comparison since the sizes of the two groups are much different You need to examine the proportions of fatalities in each group instead of the counts Stat 251 Midterm TakeHome Questions This portion of the midterm is due at 1020 am Monday October 13th no exceptions You are to work completely independently on this portion of the exam that means you cannot get help from any other person or from another person s notes or from the web etc You may use your text handouts notes calculator Minitab Excel and Java applets You may consult me if you have questions or if you need terminology clarified but I am unlikely to be available Sunday evening so start the exam TODAY For questions 810 present your method of solution in a clear welllabeled manner and show ALL the details of your calculations For questions that ask for interpretations and explanations explain your answers in excessive detail If you made an approximation or an assumption be sure to note when and how you have done so and why DO NOT expect me to make any assumptions or figure out anything that you have done If you use Minitab Excel or an Applet explain each and every step including every command and macro you used If you decide to explain by submitting a copy of a Minitab or Excel file or a screen shot please make sure that the file screen shot shows everything I could possibly need to see For example the Minitab session window does not display macros Bald answers numbers with no supporting work will receive NO CREDIT on this portion of the exam even if they are correct to 10 decimal places Enjoy Stat 251 Midterm TakeHome Questions 7 6 pts For each of the following properties construct a data set of 10 hypothetical exam scores that satis es the properties listed Assume that exam scores are integers between 0 and 100 inclusive and that you may repeat values a 3 pts The standard deviation is positive but the interquartile range is zero Example 0 0 50 50 50 50 50 50 100 100 IQR 0 s 333 i 0 b 3 pts The interquartile range is as large as possible but the standard deviation is not as large as possible Example 0 0 0 50 50 50 50 100 100 100 IQR 100 s 408 lt largest possible standard deviation which is 527 8 20 pts Can telling a joke increase the likeliness that a customer in a coffe bar leaves a tip for the waiter A recent study investigated this question at a coffee bar at a famous resort on the west coast of France Gueguen 2002 The waiter randomly assigned coffee ordering customers into one of two groups one group received a card telling a joke with the bill and the control group received no card with the bill Results are summarized in the following 2x2 table Joke card No Card Total I Left a Tip I 30 I 16 I 46 I Did not leave a tip I 42 I 49 I 91 Total 72 65 137 Perform Fisher s exact test on these data Include the following a 8 pts an expression for how to calculate the p value such as PrX 2 k or PrOi s c where you indicate what probability distribution X has what its parameter values are and what value k or c is Xhas a hypergeometric distribution withN 137 M 46 n 65 anal we wantPx 5 16 Or N 137 M 46 n 72 analwe wantPx230 or N 137 M 91 n 72 anal we wantPx 5 42 or N 137 M 91 n 65 analwe wantPx 2 49 b 4 pts the numerical value of the p value obtained from Minitab or Excel Px 5 16 0263861 c 8 pts a summary of the conclusion that you draw from the test being sure to address the issue of cause and effect as well as the issue of statistical signi cance With this small p value anal because this was an experiment we can conclude that leaving a joke card causes better patrons at this coffee bar to leave a tip The results are statistically signi cant 7 meaning there is little possibility that this ali erence was alue to ranalom chance 9 Suppose that the subjects in an experiment are to be randomly divided into two groups a 3 pts Suppose that there are 8 subjects of whom 4 are men and 4 are women Determine the probability that 2 subjects of each gender are assigned to each group This is hypergeometric withN 8 M 4 n 4 and we wantPx 2 Px 2 514286 b 3 pts Now consider the general case that there are 4N subjects of whom 2N are men and 2N are women Derive an expression for the probability that N subjects of each gender are assigned to each group as a function of N This is hypergeometric withN 4N M 2N n 2N anal we want Px N Px N C2NNXC2NN C4N2N c 3 pts Produce and submit a graph of your function from b for values of N ranging from 1 to 10 Does the function appear to be increasing or decreasing Explain why this makes sense This function is decreasingi which makes sense because as N gets larger 7 it gets more anal more ali icult for exactly half of the women and half of the men to end up in each group C4N 2N increases much more rapidly than does C2NN2 PxN 10 33 pts A psychology experiment investigated whether people display more creativity when they are thinking about intrinsic or extrinsic motivations The subjects were 47 people with extensive experience with creative writing They were randomly assigned to one of two groups one group answered a survey about intrinsic motivations for writing such as the pleasure of self expression and the other group answered a survey about extrinsic motivations such as public recognition Then all subjects were instructed to write a Haiku poem and these poems were evaluated for creativity by a panel of judges The researchers conjectured that subjects who were thinking about intrinsic motivations would display more creativity than subjects who were thinking about extrinsic motivations The creativity scores are in the Minitab worksheet creativity mtw Note that the data appear in both stacked format c1 and c2 and unstacked format c4 and 05 a 4 pts Which group intrinsic or extrinsic tended to achieve higher creativity scores Report the values of appropriate summary statistics ie measures of center to support your answer Do not bother to write a paragraph or even a sentence 1 1 7 More F endh39 Obsavm Goal deterrmne A39LICf pvniw De ne a mndmv wnmbleX by Y number qfsncveww in gmrxyA We want m nd mrmber nfwmux ta mmgn 3 wwesem M 2mm amp number 9 wcg s fa assign 12 svibjecm to gmupxi 5 FLY 3 a H 01439 many wmr s can we assign 13 sul g39ects m gnu17A C34JJ 3774156 b How mam M39LO39S are Thaw f0 assign 3 successes f0 group1 k C113 165 C Ex ph tin M hj yom39 answer in b is not The enmz nImez39amr quX5 1k Beamsa we also have f0 assign fhe Qfaiinms m gmuyA at Use a combination to determine the number qquvs r0 selecr Winurea om The IS IiiHres in the srllafr 0139 3915 IE 8 T a dererzmne the total number OfWLAj39S r0 Safest the 5 always Lquot 01511111128 mnf pb39ymn answers om b d d 165x 751179395 f Dnn daj onr prawns answer hy39mnr mmwe m a 70 defemmw FLY 5 113929393 0456 33904156 g Explain why the pml labzhry defermined in f is not gmre the pvalne af intenzsr p valne probaiI39Ij39 quem39ng 5 or fewer succcesses in groung FLY 3 I 11 Gwen enn39emonfbr l39 x C 13 17x PLY successes in group A Cl 239 How does F773 expressr39aw chm39ige ftfrere are a tom of successes in the mmfr 0 x 013441124 PLYsuccwses in gv39oup A wk 73413 139 How does This expression change If there are 0 mm ofN expen39menm units In rhe sfuafx39 C C N A112x PLY successes 111 group A CN 12 HV er eometric Probabilities PXx number of ways to selectx successes and nx failures number of ways to select n objects from N x f lm39u39mumN CM 3quot 14 nxa C n N Tlmwmmmn Notes N popularian size n sample size all probabilities are between 0 and 1 sum of III probabilities 1 IE 7 Rye the Inyelgeomen39icfbrmnla to calculate PA 2 C113C1310 55386 00581 39 02412 2 quot04156 PLY1 Ct1112 151111398 0003139 72412 04156 7 18020 CU 02C13121 115 000000481 03413 3 quot04156 IE I To determine rhe pmbubii y qf3 az39fewezsuccesses 13977 gym11A add thesefburpmbabz lz39ries rogerher This is the gvggme H 01439 does this p Ia alue 301771201 fo the appmwmare p value om your twper Simulation p mlue 1319766 Fisher39s Exact Test Let N number of items in the population J number of successes in the population 11 number of items assigned to group A x number of successes in group A s l M I 39l I N N S Note correction F M M N L 39 a large or small part of 1 Note to determine p value always ask 39 g yreviue PAS pwl39ud PM 9 11 139w1nu FOX 8 pwlm PHYS 4 N24 M 11 1112 N24 M1317212 Afr 24 M 11 1212 N24M131712 Usng Minitab to calculate Hypergeometric Probabilities gm ganh Egkm luuls Mnduw haw I Nexlcamman Obtaining a Ilinitab prompt tvrzvltc mu AIHFZ g r w mum Hamelquot 5 l mu 7 54mm 7 5 M l z mmw 39 Negatlvtammwal MakerLDIlnm slum m Vm Maknlndlutnvvarlablesm gm quotquot5 SaLEunm lnteuem andum Dal gt1 Enlssun i my Mamas Cauchlm Emmanuel 53mm La m H 7 WWWMW 7 DO big rqt mm L L I Help a Innutwlumn Optional storm r Input constant Optional mirage i I Cm 39JI htJNa witdhaai VJ quotquotquot X Prob 0 00000048 1 00003173 2 00058170 3 00436273 m mlmvumbs WT39B gt cm 3 SUBC Hypergeometric 24 11 12 x P X lt x 3 89497664 x Ietl W grandam Male Buttean um Male Mash Data in Make Mata Vavlablesm set c1pn coz zmzn C1 6 11 end u in m 1 no mun L uniformquot 37 r amuse mum My Wm m M Even an m Wm M 5am sun n you warm Ft sfm39e w Vizzipz39obrsbilitigv f 7 739 in 39 am A XVIII e U101 m 60139 q 0 to MTB gt plot c2c1 SUBC project ScanerplolofProbvsx lNVESTIGATION 142 Have a Nice Tl i Discussion from your text page 39 The primarv goal ofrandomization is to create groups that have similar eh araeteristics distributions prior to imposing the explanatory variable Thus randomization is meant to equalize the effects of any potential confounding variables between the groups creating groups that overall differ only on the e planatorv variable imposed Note that the 39balancing out applies well to variables that can be observed such as gentler and height and variables that may not have been recorded such as age or walking speed or that cannot be observed such as a hidden gene Although you can force variables like gender or heigh to be equally distributed between the control and treatment groups the virtue of randomization is that it balances out variables that we might not have thought of ahead of time or variables that we might not be able to see or eontro i Thus if we observe a signi cant difference in the response variable between two groups at the end of a study we feel comfortable attributing this difference to the explanatory variable as that should have been the only difference between the two groups This allows us to drath CAUSE 39 D FE CT conclusion between the explanatory and response variables We are not able to draw such conclusions from obserational studies because ther alwars have the gotential of confounding variables that we are not able to con 01 at Note Randomization doesn 39t always mean that we create perectlv identieal groups Recall our class amp apper sinmlatiuns on Friday there were times when the difference in proportions iinudes was i 65 7 which meant that all the males ended up in one group or the oth er fjwwmm a 39 damnm mummid 39 Virammmmmkmm With165ml mm rm Warsaw that M39s mm m mm gnwlmiu rim gem m M 31 4 ampEh hen the d wwaw m ymm mm m Mums be Wm Hammer am my Imggymmgim m 0 ummpmuxmkdmgamm star ser v A L 1r m umnmmwwwmmw 39 mmmmm WMWW mmwmmwmwmmwmmmm O anaemia than Why Emma mm mu min9mm amm an m tam in W I an nkaka afgww l l ohmthnsmmmnmmm Wm an emth WNW 1th we a mm at h M any 34Wme Wamcm m Moum lmhdlmmhmm n W W M in its W tiNI l I c Cakdnh vpmpmkn mm tar 6leth mp umhdmmngIthvmmthmnw A 312 5 A J B 812 a 21 t 667 Odds ratio with 4 6 fmmmthdmhm lbwmm mdeWWhm mahnamnmmquwmm l7a 5 3117 Em gm Mar MmManpw 41 an an 39Wmum s wbmma Wm hmhmaemrbnmm Maia red failure black success Class results Number of Successes o a u o u o u o u o n g u o a o u o o o o u a a o u g o a u u o o n a o o c n g o 3 4 6 7 9 10


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Janice Dongeun University of Washington

"I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.