ProbStats MATH 1530
Popular in Course
Popular in Mathematics (M)
This 37 page Class Notes was uploaded by Ms. Ismael Spinka on Sunday October 11, 2015. The Class Notes belongs to MATH 1530 at East Tennessee State University taught by Staff in Fall. Since its upload, it has received 53 views. For similar materials see /class/221404/math-1530-east-tennessee-state-university in Mathematics (M) at East Tennessee State University.
Reviews for ProbStats
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/11/15
Introducing probability BPS chapter 10 2006 W H Freeman and Company Objectives BPS Chapter 10 Introducing probability a The idea of probability 1 Probability models 1 Probability rules 1 Discrete sample space 2 Continuous sample space I Random variables 1 Personal probability Randomness and probability A phenomenon is random if individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions The probability of any outcome of a random phenomenon can be defined as the proportion of times the outcome would occur in a very long series of repetitions Coin toss The result of any single coin toss is random But the result over many tosses 1 0 is predictable as long as the trials are 39 1 independent ie the outcome of a new 0 9 coin toss is not influenced by the result of 08 s the previous toss i 07 2 0 6 5 392 The probability of heads IS 05 g 05 the proportion of 0 times you get 04 a heads In many E 03 repeated trials 2 02 First series of tosses M 39 Second series 00 I m iiiiii iii 777739 if ll 5 10 50 100 500 1000 5000 Number of tosses Two events are independent if the probability that one event occurs on any given trial of an experiment is not affected or changed by the occurrence of the other event When are trials not independent Imagine that these coins were spread out so that half were heads up and half were tails up Close your eyes and pick one The probability of it being heads is 05 However if you don t put it back in the pile the probability of picking up another coin and having it be heads is now less than 05 The trials are independent only when you put the coin back each time It is called sampling with replacement Probability models Probability models mathematically describe the outcome of random processes They consist of two parts 1 S Sample Space This is a set or list ofaH possible outcomes of a random process An event is a subset of the sample space 2 A probability for each possible event in the sample space 8 Example Probability Model for a Coin Toss S Head Tail Probability of heads 05 Probability of tails 05 A A basketball player shoots HHH three free throws What are HHM the possible sequences of hits H and misses M MltH HM M M HMM 39 2 B A basketball player shoots three free throws What is the S 01 1 2 3 number of baskets made 8 HHH HHM HMH HMM MHH MHM MMH MMM Note 3 elements 23 C A nutrition researcher feeds a new diet to a young male white rat What are the possible outcomes of weight gain in grams 8 0 a all numbers 2 0 Coin Toss Example 8 Head Tail Probability of heads 05 Probability of tails 05 Probability rules 1 Probabilities range from 0 no chance of the event to 1 the event has to happen For any event A 0 s PA s 1 2 The probability of the 3 The probability of an event complete sample space must not occurring is 1 minus the equal 1 probability that does occur Psample space 1 PA 1 Pnot A Probability rules cont39d Aand B disjoint 4 Two events A and B are disjoint if they have no outcomes in common and can never happen together The probability thatAor B occurs is the sum of their individual probabilities PA or B PA U B PA PB This is the addition rule for disjoint events A and B not disjoint Example If you flip two coins and the first flip does not affect the second flip S HH HT TH TT The probability of each of these events is 14 or 025 The probability that you obtain only heads g only tails is PHH g TT PHH PTT 025 025 050 Discrete sample space Discrete sample spaces deal with data that can take on only certain values These values are often integers or whole numbers Dice are good examples of finite sample spaces Finite means that there is a limited number of outcomes Note Discrete data contrast with continuous data that can take on any one of an in nite number of possible values over an interval In some situations we define an event as a combination of outcomes In that case the probabilities need to be calculated from our knowledge of the probabilities of the simpler events Example You toss two dice What is the probability of the outcomes summing five t mm an E E Ths DE D U 39SE39 HE E E 11 12 163 E E E E E E There are 36 possible outcomes in 8 all equally likely given fair dice Thus the probability of any one of them is 136 Pthe roll of two dice sums to 5 P1l4 P213 P32 P41 4 136 19 0111 The gambling industry relies on probability distributions to calculate the odds of winning The rewards are then fixed precisely so that on average players lose and the house wins The industry is very tough on socalled cheaters because their probability to win exceeds that of the house Remember that it is a business and therefore it has to be profitable Give the sample space and probabilities of each event in the following cases a A couple wants three children What are the arrangements of boys B and girls G Genetics tells us that the probability that a baby is a boy or a girl is the same 05 gt Sample space 888 BBG BGB GBB GGB GBG BGG GGG gt All eight outcomes in the sample space are equally likely gt The probability of each is thus 18 a A couple wants three children What are the numbers of girls X they could have The same genetic laws apply We can use the probabilities above to calculate the probability for each possible number of girls gt Sample space 0 1 2 3 gt PX O PBBB 18 gt PX 1 PBBG or BGB or GBB PBBG PBGB PGBB 38 Value of X D l 2 3 Probability 1 f3 3quot3 333 MB Continuous sample space Continuous sample spaces contain an infinite number of events They typically are intervals of possible continuouslydistributed outcomes Example There is an infinity of numbers between 0 and 1 eg 0001 04 00063876 S interval containing all numbers between 0 and 1 How do we assign probabilities to events in an infinite sample space We use density curves and compute probabilities for intervals 53 0 This is a uniform density curve There are a lot of other types of density curves The probability of the uniformlydistributed variable Yto be within 03 and 07 is the area under the density curve corresponding to that interval Thus y o 03 07 1 PO3 S y S 07 07 O31 04 Height I Probability distribution for a continuous random variable individuals with X such that X1 lt Xlt X2 The shaded area under the density curve shows the proportion or percent of individuals in the population with values ofX between x1 and X2 x1 X2 Values of X Because the probability of drawing one individual at random depends on the freguency of this type of individual in the population the probability is also the shaded area under the curve Values of X Intervals The probability of a single event is meaningless for a continuous sample space Only intervals can have a nonzero probability represented by the area under the density curve for that interval 93 05 Area 3012 The probability of a single event is zero Py11 11 0 Heigh The probability of an interval is the same whether 1 boundary values are included or excluded PO s y s 05 05 01 05 POltylt050501O5 0 05 03 l PO s y lt 05 05 01 05 Pylt 05 ory gt 08 Py lt 05 Py gt 08 1 PO5 lt y lt 08 07 We generate two random numbers between 0 and 1 and take Yto be their sum Ycan take any value between 0 and 2 The density curve for Y is Height 1 We know this because the base 2 and the area under the curve has to equal 1 by definition The area of a this triangle is 12 baseheight What is the probability that Y is lt 1 What is the probability that Y lt 05 0125 025 05 0125 O 05 1 15 2 Normal probability distribution Avariable whose value is a number resulting from a random process is a random variable The probability distribution of many random variables is the normal distribution It shows what values the random variable can take and is used to assign probabilities to those values Standard normal Probability asoebq l I 57 545 62 6145 67 6915 72 Height in inches To calculate probabilities with the normal distribution we will standardize the random variable z score and use Table A Reminder standardizing N uo We standardize normal data by calculating zscores so that any Normal curve N040 can be transformed into the standard Normal curve N01 N645 25 Nam 95 99700 68 of data 95 of data 007 of quot I I 7 695 7 57 595 62 645 6 2 X 3 2 1 o 1 2 3 Z Height inches Standardized height no units 22 039 Previously we wanted to calculate the proportion of individuals in the population with a given characteristic Distribution of women s heights N0 039 N645 25 oa39l u Egtltample What39s the proportion of women with a height between 57quot and 72quot 39 That s within 1 3 standard deviations aof the mean y thus that proportion is roughly 997 57 SH was Height in inches Since about 997 of all women have heights between 57quot and 72quot the chance of picking one woman at random with a height in that range is also about 997 What is the probability if we pick one woman at random that her height will be some value X For instance between 68 and 70 P68 lt X lt 70 Because the woman is selected at random X is a random variable Z X ll 0 As before we calculate the z scores for 68 and 70 687645 714 ForX 68 z 25 707645 Forx70quot 2 25 22 NUJ 0 N645 25 Standard normal CUTVB Probabilily 001254 The area under the curve for the interval 68quot to 70quot is 09861 09192 00669 Thus the probability that a randomly chosen woman falls into this range is 669 P68 lt Xlt 70 669 Inverse problem Your favorite chocolate bar is dark chocolate with whole hazelnuts The weight on the wrapping indicates 8 oz Whole hazelnuts vary in weight so how can they guarantee you 8 oz of your favorite treat You are a bit skeptical To avoid customer complaints and lawsuits the manufacturer makes sure that 98 of all chocolate bars weight 8 oz or more The manufacturing process is roughly normal and has a known variability o 02 oz 0 02 oz How should they calibrate the machines to produce bars with a X 8 OZ 2 7 mean u such that Px lt 8 oz 2 How should they calibrate the machines to produce bars with a mean m such that Px lt 8 oz 2 Ii 1 x802 z Here we know the area under the density curve 2 002 and we know x 8 02 We Virant y In Table Awe find that the z for a left area of 002 is roughly 2 7205 z ltgt yX7Zo39 87720502841 02 0 Thus your favorite chocolate bar weighs on average 841 oz Excellent Meaning of a probability We have several ways of defining a probability and this has consequences on its intuitive meaning in Theoretical probability 9 From understanding the phenomenon and symmetries in the problem El Example Sixsided fair die Each side has the same chance of turning up therefore each has a probability 16 El Example Genetic laws of inheritance based on meiosis process 1 Empirical probability 9 From our knowledge of numerous similar past events El Mendel discovered the probabilities of inheritance of a given trait from experiments on peas without knowing about genes or DNA El Example Predicting the weather A 30 chance of rain today means that it rained on 30 of all days with similar atmospheric conditions a Personal probability 9 From subjective considerations typically about unique events 1 Example Probability of a large meteorite hitting the Earth Probability of life on Mars These do not make sense in terms of frequency A personal probability represents an individual s personal degree of belief based on prior knowledge It is also called Baysian probability for the mathematician who developed the concept a We may say there is a 40 chance of life on Mars In fact either there is or there isn t life on Mars The 40 probability is our degree of belief how confident we are about the presence of life on Mars based on what we know about life requirements pictures of Mars and probes we sent a Our brains effortlessly calculate risks probabilities of all sorts and businesses try to formalize this process for decisionmaking Math 1530 Capstone Project 100 points Print Name 1 The project is due by 1115 am on Thursday December 9 2010 No late projects will be accepted 2 You should use a word processor to write up your results 3 Start each problem on a new page 4 I would also ask that you insert any graphs in the appropriate places not attached as an addendum at the back or even at the end of the problem 5 Only insert the relevant portions of a Minitab display used to answer a question not everything Minitab gives you in hoping the right information is somewhere in what you copied into the document 6 DO YOUR OWN WORK Use the following data le httpZZmathetsuedu1530surveyFaIIOZg2193th to answer the first four questions The Minitab worksheet is set up as follows C1 Gender C2 Tattoo C3 Height C4 Dvds C5 Drink What do you typically drink with dinner Coded as 1 Water 2 Soda 3 Milk 4 Alcohol 5 Juice 6 Other Variable type Which of these questions from the class survey produced variables that are categorical and which are quantitative Circle your answer a Gender Categorical Quantitative Neither 1 Tattoo Categorical Quantitative Neither 1 Height Categorical Quantitative Neither 1 What do you typically drink with dinner Categorical Quantitative Neither 1 9957 Note A categorical variable places an individual into one of several groups or categories A quantitative variable takes numerical values for which arithmetic operations such as adding and averaging make sense The values for tobacco use are 0 or 1 but these numbers are just labels for the categories and have no units of measurement lfthe variable is categorical we are usually interested in the proportion or percent of that fall into the subcategories In other words we may be interested in the question quotWhat percent of the student have tattoosquot Iquot Shapes amp Statistics For each variable stated below use appropriate graphs and summary statistics to describe the distributions including any outliers Show the Minitab output that justifies your descriptions 3 Height The distribution ofthe height of the students shows possibly two peaks which seems reasonable since both male and female heights are combined The distribution is somewhat symmetric with mean 67672 inches and standard deviation 4480 inches 1 There are some outliers which can be seen on the boxplot 1 Note I would not consider the outliers to be llstrongquot I found little change in the mean and standard deviation after removing the ve outliers ie mean 67708 standard deviation 4379 1 Descriptive Statistics Height Variable N N L Mean SE Mean StDev Minimum Ql Median Q3 Maximum Height 1306 0 67672 0124 4480 50000 64000 67000 71000 84000 1 Histogram of Height 120 100 Frequency 0 P 50 55 60 65 70 75 80 Height Note Minitab displays an outlier on a boxplot with the symbol A boxplot is a graph of the fivenumber summary Minimum Q1 first quartile M median Q3 third quartile Maximum The graph below is called a modified boxplot When there are outliers Minitab will extend the whisker the vertical line drawn from the box out to the upper limit Q3 15Q3 Q1 or to the lower limit Q1 15Q3Q1 Boxplot of Height Height 1 Note Probably a betteran for describing this distribution is to separate heights by gender since we have this information We see that both distributions are symmetric with some possible outliers In Chapter 3 of Essential Statistics there are examples and exercises about men s and women39s heights We are told that heights ofwomen aged 20 to 29 are approximately Normal with mean 64 inches and standard deviation 27 inches For men aged 20 to 29 height are approximately Normal with mean 693 inches with standard deviation 28 inches We see that our findings differ slightly with the information given in our textbook The outliers in both of these distributions will have some affect on the means and standard deviations The outliers have less influence on the median and quartiles Descriptive Statistics Height Variable Gender Height EEMALE MALE Variable Gender Height EEMALE MALE Mean SE Mean 64915 0112 71559 0130 Maximum 79000 84000 StDev Minimum 1 Median 3084 50000 63000 65000 3021 59000 70000 72000 Histogram of Height by Gender 50 5395 60 65 70 75 80 FEMALE 120 100 Freque ncy Panel variable Gender b Dvds The distribution has a single peak unimodal at the left The distribution is strongly right skewed with many outliers 1 The fivenumber summary is probably the best way to describe this distribution numerically The fivenumber summary for the number of Dvds owned is Minimum 0 Q1 10 Median 265 Q3 54 Maximum 20001 1 Descriptive Statistics Dvds Variable N NL Mean SE Mean StDev Minimum Median Q3 Dvds 1306 0 5993 341 12340 000 1000 2650 5400 Variable Maximum Dvds 2 0 0 0 0 0 Note Interpretation of Q1 25 of the students have 10 Dvds or less Interpretation of QB 75 ofthe students have 54 Dvds or less The interquartile range IQR is the distance between the rst and third quartiles IQR Q3 Q1 54 10 44 The interquartile range is mainly used as a basis for a rule of thumb for identifying suspected outliers We call an observation a suspected outlier if it falls more than 15 x IQR above the third quartile or below the rst quartile l S Histogram of Dvds 600 500 gt 400 U E l 3 300 l d 200 100 C I I 3939 I I I I I 0 300 600 900 1200 1500 1800 Dvds Boxplot of Dvds 2000 96 1500 9e 1000 W 5 as 9 9896 500 W mf m 0 Tattoos Question 2 on the class survey asked llDo you have any tattoosquot Examine the relationship between gender and tattoos Does there appear to be any difference small large between the sexes as far as having a tattoo or not Include in your analysis tables and graphs to support your conclusions There appears to be a very small difference between the sexes as far as having a tattoo The percent of females that have a tattoo is 2552 The percent of males that have a tattoo is 24721 1 Chart of Gender Tattoo Percent J O I n Tattoo NO YES NO YES Gender FEMALE MALE Percent within levels of Gender Tabulated statistics Gender Tattoo Rows Gender Columns Tattoo NO YES All FEMALE 569 195 764 7448 2552 10000 MALE 408 134 542 7528 2472 10000 A11 977 329 1306 7481 2519 10000 Ce11 Contents Count 6 of Row Note Is there a relationship between gender and having a tattoo To answer this question we would have to conduct the chi square test from Chapter 21 The chi square test produced the following results Pearson Chi Square 0108 DF 1 P Value 0743 Hence there appears to be no signi cant relationship between gender and having a tattoo Use the data file httQZmathetsuedu15305urveyFalllOZsleethw to answer the next question The worksheet is set up as follows C1 ToSleep What time did you go to bed last night Coded as 2 for 10 pm 0 for midnight etc amp C2 WakeUp What time did you get up this morning ln other r sleep7 Analyze Hint Consider star ng wrm a scatterplot WakeUp armbl Cale Calculator and ll m the dlalog boxes as shown below we vesult m vanahle ya my 5 2 Exvvessmn WakeUD WakeUD Numhev af haw E unchans gt nu functmns U r 55m as 5 mm Help ml m gt Regesslon gt r and the explanawry vanable be ToSleep Fitted Line Plot Number of hours slept 7657 05732 ToSIeep 5 150424 RSq 273 RSqadj 273 g 15 2 in in I o S 10 o I E 2 5 0 50 25 00 25 50 75 100 ToSIeep The least squares regression equation is Number of hours slept 7657 05732 ToSleep 1 There appears to be a couple of outliers but overall the t looks okay We see that 273 o R Sq ofthe variation in the number of hours slept can be explained by the least squares equation So there is about 72 o ofthe variation in the number of hours slept that the model doesn t explain 1 The correlation coef cient r 522 indicates a moderate negative linear relationship 1 For every hour later a student went to bed heshe had about 34 minutes 5732 less of sleep on the average Some observations If a student went to bed at midnight ToSIeep 0 then heshe got a nearly 7 hours and 40 minutes 7657 of sleep on the average Ifa student went to bed at 11 pm ToSIeep 1 then heshe got about 8 hours and 14 minutes of sleep 7657 5732 1 82302 Use the data file httpgmathetsuedu1530surveyFa10TextDrive Tobaccomtw to answer the next two questions 5 Text while driving and tobacco use Choose a student at random from this group Insert the twoway table here to answer the following questions From the menu select STAT 1 Rows TextDrive Columns Tobacco NO YES All NO 362 122 484 7479 2521 10000 4330 3065 3922 2934 989 3922 YES 474 276 750 6320 3680 10000 5670 6935 6078 3841 2237 6078 All 836 398 1234 6775 3225 10000 10000 10000 10000 6775 3225 10000 Cell Contents Count of Total a The probability that the student has used tobacco products in the past 30 days is 3981234 32251 b The conditional probability that the student has texted while driving given that the student has used tobacco products in the past 30 days is about 276398 6935 1 c The conditional probability that the student has used tobacco products in the past 30 days given that the student has texted while driving is about 276750 3680 1 d Based on our class survey is it possible to find the percentage of students that text and use tobacco products simultaneously while driving NO If so what is it This question was not directly asked on our survey 1 6 Text while driving A survey in 2008 found that nearly half of all drivers aged 1824 send text messages while they drive Question 12 of our class survey asked llDo you text while drivingquot Suppose we are interested about the proportion of all college students that text while driving a car a Does our class survey give evidence to conclude that more than half of all college students text while driving The hypotheses for a test to answer this question are He p 5 vs Ha p gt 5 1 Here is the relevant output 1 Test and CI for One Proportion TextDrive Test of p 05 V5 p gt 05 Event YES 95 Lower Sample p Bound ZeValue PeValue Variable X N lTextDrive 750 1234 0607780 0584918 757 0000 b What proportion of the sample survey responded quotYesquot to the question 060778 1 c Find the Pvalue for the test in part a Pvalue 0000 1 1 Test and CI for One Proportion TextDrive Event YES Variable X N Samp e p 95 C1 TextDrive 750 1234 0607780 0580538 063502D Using the normal approximation d Find a 95 confidence interval for the proportion of students who have texted while driving 0580538 0635021 Anywhere from about 58 o t0 64 o of all students have texted while driving e Interpret your results relative to the survey conducted in 2008 and comment on any assumptions that are needed for your conclusions to be accurate There is evidence to suggest that more than half of all college students text while driving 1 The main concern here is the lack of a random sample Does this sample truly represent all college students If we can39t argue that this sample represents all college students then the above conclusions are invalid 1 Use the data file httpmathetsuedu1530surveyFalllOZtext messagesmtw to answer the following question 7 Number oftext messages Is there good evidence that all female college students and all male college students differ in their mean number of text messages sent per month Give a 95 confidence interval and interpret your results What assumptions are we making about the samples for our interpretation to be valid This is a 2 sample t problem A 95 confidence interval for w pm the difference in the true mean number of text messages sent per month is anywhere from 207 to 3301 1 Since 0 is between these two numbers there does not appear to be good evidence to suggest that all female college students and all male college student differ in their mean number oftext messages sent per month 1 The students that responded to this question are not a random sample ofall college student Hence the t interval may be meaningless We may be able to argue that these samples represent a random sample from ETSU since nearly all students need to take Math 1530 1 1 TwoSample TTest and Cl TextMessages Gender Twoisample T for TextMessages SE Gender N Mean StDev Mean Female 629 1669 1437 57 Male 443 1514 1444 69 Difference mu Female 7 mu Male Estimate for difference 1547 95 CI for difference 7207 330D TiTest of difference 0 V5 not T7Value 173 P7Value 0084 DF 948 The histograms below are very skewed and the distribution of the number oftext messages sent by the males has numerous outliers The tprocedure works fairly well for skewed distributions when the sample sizes are large We see that both sample sizes are large here so the condition of Normality is not a major concern here1 Histogram of TextMessages 0 800 1600 2400 3200 4000 4800 Female Male 100 80 gt U 5 60 u o It 40 20 0 ll 1 ii r1 6 800 1600 2400 3200 4600 4800 TextMessages Panel variable Gender Use the data file httpzmathetsuedu1530surveyFaII10dinner drinkmtw to answer the following question 1 8 Drink please Question 11 from the class survey asked quotWhat do you typically drink with dinnerquot The data has been coded as follows 1 Water 2 Soda 3 Milk 4 Alcohol 5 Juice 6 Other The null hypothesis is that all drinks are equally probable Do the data give significant evidence that dinner drinks are not equally probable Analyze Minitab 16 must be used to solve this problem Yes The null hypothesis H0 p1p2p3p4p5p6 all drinks are equally probable during dinner can be rejected since the Pvalue 0000 1 It appears that water is the favorite 6181372 drink during dinner 1 ChiSquare GoodnessofFit Test for Categorical Variable Drink Test Contribution Category Observed Proportion Expected to Chi75q 618 0166667 228667 662888 2 372 0166667 228667 89845 3 93 0166667 228667 80490 4 20 0166667 228667 190416 5 124 0166667 228667 47909 6 145 0166667 228667 30613 N N L DF Chi75q P7Va1ue 1372 0 5 110216 0000
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'