### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# INTR THY PROBAB&S I STAT 341

ISU

GPA 3.5

### View Full Document

## 13

## 0

## Popular in Course

## Popular in Statistics

This 37 page Class Notes was uploaded by Giovani Ullrich PhD on Saturday September 26, 2015. The Class Notes belongs to STAT 341 at Iowa State University taught by Staff in Fall. Since its upload, it has received 13 views. For similar materials see /class/214406/stat-341-iowa-state-university in Statistics at Iowa State University.

## Reviews for INTR THY PROBAB&S I

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/26/15

Geometric Distribution A discrete random variable is said to have a geometric distribution if c There are independent and identical trials Each trial can be thought of as a draw from a population where the drawing is done with replacement Each trial has two possible outcomes success and failure The probability of success on each trial is the same p Therefore the probability of failure on each trial is 17 p q o The experiment is repeated until the rst success occurs The random variable Y is de ned as the number of the trial on which the rst success occurs The parameter for the geometric random variable Y is the probability of success on each trial 19 The probability distribution function of the geometric random variable Y is wp ip 4 hr yLZ o The theoretical mean of the geometric random variable Y is Working with geometric random variables in R The R built in function for the geometric distribution de nes the random variable Y differently than your textbook To avoid any confusion I have written functions in R called geo which use the R built in functions for the geometric distribution to calculate probabilities and generate random values from the geometric random variable Y as de ned in your textbook Before you use R for the geometric distribution you will need to copy and paste the following functions into the command window dgeolt functionypdgeomy1p rgeolt functionnprgeomnp 1 To nd a probability PY y py for a single value y the command in R is dgeoltyPgt To nd the probability PY y use the sum command to add up all py values for y between and including 1 and y sumdgeo1yp To nd the probability Py1 g Y yg use the sum command to add up all py values for y between and including yl and y2 sumdgeoy1y2p To nd the probability PY 2 y 17 PY lt y 17 PY y 7 17 use the sum command to nd PY y 7 1 and subtract this value from 1 1 sumdgeo1y 1 p Problems 1 An oil prospector will drill a succession of holes in a given area to nd a productive well The probability that he is successful on a given trial is 02 a What is the probability that the third hole drilled is the rst that yields a productive well b If the prospector can only afford to drill at most ten wells7 what is the probability that he fails to nd a productive well c How many wells would the prospector expect to drill before he nds a productive well to In the game of craps7 two different dice are rolled and the sum of the 2 dice is determined a What is the probability that the rst seven will occur on the 5th roll b What is the probability that the rst seven will occur somewhere in the rst 5 rolls c How many rolls would you expect to make to obtain the rst seven Negative Binomial Distribution A discrete random variable Y is said to have a negative binomial distribution if c There are independent and identical trials Each trial can be thought of as a draw from a population where the draw is done with replacement 0 Each trial has two possible outcomes success and failure The probability of success on each trial is the same p Therefore the probability of failure on each trial is 17 p q o The experiment is repeated until the T success occurs The random variable Y is de ned as the number of the trial on which the T success occurs The parameters for the negative binomial random variable Y are the probability of success on each trial p and the number of the rth success The probability distribution function of the negative binomial random variable Y is 71 199 iilgtpr1py T for yrr1r2 The theoretical mean of the negative binomial random variable Y is The theoretical variance of the negative binomial random variable Y is 02 Vy T1pp Working with negative binomial random variables in R The built in function in R for the negative binomial distribution is different from the distribution described in your textbook To eliminate any confusion I have written a function in R called dnbin that matches the distribution in your textbook Before you calculate any probabilities for the negative binomial distribution you will need to type in this function in R dnbinlt functiony r pchoosey1 r1pquotr1p quot yr To nd a probability PY y py for a single value y the command in R is dnbin y r p To nd the probability PY y use the sum command to add up all py values for y between and including r and y sumdnbinryrp To nd the probability Py1 g Y yg use the sum command to add up all py values for y between and including yl and yg sumdnbiny1y2rp To nd the probability PY 2 y 17 PY lt y 17 PY y 7 17 use the sum command to nd PY y 7 1 and subtract this value from 1 1 sumdnbinry1 rp Problems H How is the probability distribution function py derived 2 An oil prospector will drill a succession of holes in a given area to nd a productive well The probability that he is successful on a given trial is 02 a What is the probability that the fth hole drilled yields the second productive well b If the prospector can only afford to drill at most ten wells7 what is the probability that the prospector will nd two productive wells c Find the mean and variance of the number of wells that must be drilled if the prospector wants to establish three wells 03 In the game of craps7 two different dice are rolled and the sum of the 2 dice is determined a What is the probability that the second seven will occur on the 6th roll b What is the probability that the third seven will not occur in the rst ten rolls c How many rolls would you expect to make to obtain the second seven Gamma Distribution Section 46 The Gamma Distribution is used to model continuous data Whose values are positive and have a probability histogram that is skewed to the right The properties of the gamma distribution are 0 The parameters for the gamma distribution are 04 gt 0 and B gt O The 04 parameter is referred to as the shape parameter and the B parameter is referred to as the scale parameter for reasons will be investigating The probability density function for the gamma distribution is yaileiy fyW 0 g y lt 00 where Na 0 ya le ydy o In most circumstances the distribution function does not have a closed form solution Therefore7 the probabilities for the gamma distribution must be obtained from tables or from a statistical software package Here are graphs of the probability density function and the distribution function of a Gamma distribution with 04 2 and B 2 p d f of Gamma Distribution Witn aipna 2 and beta 2 Distribution of Gamma Distribution Witn aipna 2 beta 2 n n Eli i 1v o The theoretical mean of the gamma distribution is M E Y a3 0 The variance of the gamma distribution is HWww2 Working with the gamma distribution in R To nd the probability PY S y the command in R is pgammay shape alpha scale beta To nd the value of y such that PY S y p the command in R is qgammap shape alpha scale beta To generate observations from a gamma distribution the command in R is rgammanumobs shape alpha scale beta where numobs is the number of observed values you would like to generate Problems to Four week summer rainfall total in a section of the Midwest have approximately a gamma distribution with 04 16 and B 20 The unit of measurement is inches a Find the probability that the rainfall total for this particular four weeks in the summer will be greater than 5 inches b Find the mean amount of rainfall for this particular four weeks in the summer c Find the variance of the rainfall amounts for this particular four weeks in the summer Annual incomes for heads of household in a section of a city have approximately a gamma distribution with 04 1000 and B 20 The unit of measurement is dollars a Find the mean and variance of the incomes for heads of households in this section of the city b The median income is de ned as the income where 50 make more and 50 make less Find the median income c The rst quartile income is de ned as the income where 25 make less and 75 make more Find the rst quartile income d The third quartile income is de ned as the income where 75 make less and 25 make more Find the third quartile income e Between what two incomes do the middle 50 of heads of households in this section of the city make Binomial Distributions A discrete random variable is said to have a binomial distribution if c There are a xed number of independent and identical trials 71 This means that each trial can be thought of as a draw from a population where the drawing is done with replacement 0 Each of the 71 trials has two possible outcomes success and failure The probability of success on each of the 71 trials is the same p Therefore the probability of failure on each of the 71 trials is 1 7 p q o The random variable Y is de ned as the total number of successes in the 71 trials 0 The parameters for a binomial random variable Y are the number of trials 71 and the proba bility of success on each trial p o The probability distribution function of the binomial random variable Y is pltygtlt gtpylt1epgt y for yo1n y The theoretical mean of the binomial random variable Y is MEY 71 The theoretical variance of the binomial random variable Y is 02 WY 71191 29 npq Working with binomial random variables in R Finding probabilities To nd a probability PY y py for a single value y the command in R is dbinomynp To nd the probability PY y use the sum command to add up all py values for y between and including 0 and y sumdbinom0ynp To nd the probability Py1 g Y yz use the sum command to add up all py values for y between and including yl and yg sumdbinomy1y2 np To nd the probability PY 2 y use the sum command to add up all py values for y between and including y and n sumdbinomynnp Finally to get a list of all py values for a particular binomial random variable Y use the dbinom command to list all py for values of y between and including 0 and n dbinom0nnp Generating observed values You can also use R to generate observed values from a binomial distribution with n and p Generating a large number of observed values is helpful in studying the characteristics of the distribution of the random variable Y The command is rbinomnumobs n p where numobs is the number of observed values you would like to generate Problems 1 How is the probability distribution function py derived 2 Show that py has the two properties of a probability distribution function 3 Derive the formula for the expected number of successes in 71 trials 4 Use R to generate 10000 observations of a binomial random variable with n 20 andp 025 Make a histogram of your observations and determine the mean variance and ve number summary of your data Use this information to describe your distribution 5 Now use R to generate 10000 observations of a binomial random variable with n 20 and p 075 Make a histogram of your observations and determine the mean variance and ve number summary of your data How does this distribution compare to the distribution from problem 5 6 From previous experience it is known that the number of free throws a basketball player makes in a game behaves like a binomial random variable For a particular basketball player his probability of making a single free throw is 085 In a particular game this basketball player attempts 17 free throws a Find the probability this player will make all 17 free throws attempted b Find the probability this player will make 10 or less free throws of the 17 attempted c Find the mean number of free throws made in this game d Find the standard deviation of the number of free throws made e At what point would you conclude that this player is having an off77 night 7 A collection of 5 radar sets functioning independently are used to detect the presence of aircraft in a given area The probability that a radar set will detect a given aircraft is 095 a Find the probability all 5 radar sets will detect an aircraft b Find the average number of radar sets that will detect an aircraft C d Suppose instead of 5 radar sets we have 1 radar set What does the probability that the radar set will detect a given aircraft have to be for the two systems to be equivalent Find the probability that two or fewer radar sets will detect the aircraft Uniform Distribution Section 44 The uniform distribution is used to model continuous data when the probability histogram is approximately a horizontal line throughout the range of the data The uniform distribution has the following properties 0 The parameters of the uniform distribution are the minimum 01 and maximum 02 value The standard uniform distribution has minimim 01 0 and maximum 02 1 o The probability density function for the uniform distribution is 1 01lty 02 fy 7 o The distribution function for the uniform distribution is 0 yltt91 FyPYy 329 012402 1 ygtt92 Here are graphs of the probability density function and the distribution function of a Uniform distribution with 01 0 and 02 1 p d f ofUniforrn distribution Witn rnin 0 and max 1 Distribution of Uniforrn distribution Witn rnin 0 and max 1 N f m o The theoretical mean of the uniform distribution is M Ey w o The variance of the uniform distribution is 2 02 i 0amp2 V Y 0 12 Working with uniform random variables in R To nd the probability PY y7 the command in R is punifyab To nd the value of y so that PY S y 197 the command in R is qunifpab To generate observed values from a uniform distribution7 the command in R is runif numobs ab where numobs is the number of observed values you would like to generate Problems 1 to 03 F Customers arrive randomly at a bank teller s window Given that one customer arrived during a particular 10 minute period7 let Y equal the time within the 10 minutes that the customer arrived If Y follows a uniform distribution7 nd a The minimum and maximum values for Y b The probability density function for Y c P2 Y 8 d PY Z 8 e The mean and variance of Y Let Y be a uniform random variable with minimum 01 0 and maximum 02 1 De ne Wab7aYwherealtb a Find the distribution function of W b What distribution does the random variable W have Let Y be a continuous random variable with the following probability density function fy2y 0 y 1 Use R to generate 10000 observations of the random variable Y Let Y be a continuous random variable with the following probability density function Use R to generate 10000 observations of the random variable Y Estimating the Probability of the Birthday Problem Using R Stat 341 Fall 2008 This is the R code that looks at the probability of the birthday problem the probability of at least 2 people in a room of 71 people sharing a birthday In this handout we will look at the problem using 71 26 since this was the number of people in Stat 341 on the rst day we began discussion of this problem To begin calculating an emperical or observed probability we need to rst set up the problem There are 365 different possible birthdays ignoring Feb 29th In R we will use the variable days to indicate the possible birthdays for each person dayslt C 1 365 In a class of n 26 people getting a random birthday for each person is like sampling from the days variable with replacement In R this command is bdays26lt sampledays 26 replace T Here is an example of the variable bday526 bdays26 214 183 52 284 77 111 116 72 73 104 342 125 184 133 240 160 321 31 263 135 149 144 284 259 66 143 Are any of these birthdays the same Scanning through the list of 26 birthdays to pick matching ones is a little dif cult Since there are only 365 possible values that can be in bday526 if we use a histogram with breaks on these 365 values we can look at birthday matches by looking at the counts for the histogram To set this up in R the commands are bdbreakslt COz365 05 histbdays26 breaks bdbreakscount The counts for the histogram of the 365 days of the year for bdays26 are OOOOI OOI O OOI OOOI OO O00OOOOOOOOOOOOOOOOOOOOOOOOOO1O 0000000000000100000000000001000 O1000OOOOOOOOOOOOOOOOOOOOOOO100 0001000000001000000010100000001 00000000001000OOOOOOOOOOOOOOOOO O00OOOOOOOOOOOOOOOOOOOOOOOO1000 OOOOOOOOOOOOOOOO100000000000000 001000O000000000000000020000000 OOOOOOOOOOOOOOOOOOOOOOO10000000 0000000100000000000000000000000 OOOOOI OOOO Scanning through this list it is easy to see that each day of the year is a birthday for either 0 1 or 2 people in the sample of 26 people So in this sample we have two people who share a birthday What if no one shared a birthday Then the largest count for the histogram of the 365 days of the year for bdays26 would be 1 We can now use R to estimate the probability that the maximum count of this histogram will be two or more To estimate the probability of the birthday problem we will need to use R to repeatedly select birthdays randomly for groups or samples of 26 people each For each sample we will look at whether or not two or more people in the sample share a birthday The commands in R to do this loop 10000 times are maxcountslt rep0 10000 for i in 110000 bdays26lt sampledays 26 replace T bdcountslt histbday526 breaks bdbreaks plot Fcounts maxcounts i lt max bdcounts The rst line initializes the variable maxcounts for use in the loop The for loop then repeats the pattern discussed above and saves the maximum count of the histogram of the days of the year for the bdays26 variable We are interested in any value of maxcounts that is 2 or larger This indicates that at least 2 people share a birthday in the sample of 26 people in the room One way to get this information from the variable in R is to add up all the times that maxcounts is greater than or equal to 2 The command is sumifelsemaxcounts gt2 1 0 You can also have R make a table of the maxcounts variable This will give you the observed values of maxcounts and the number of times these values occurred in the 10000 samples The command and the output is below tablemaxcounts 4 4040 5784 175 1 The output ofthe table command is given below You can see that in 5960 out of the 10000 samples 5784 175 1 at least two people shared a birthday for an emperical probability of 05960 This emperical probability is not exact but thanks to the relative frequency idea of probability this emperical probability of 05960 based on 10000 samples will be close to the theoretical probability value Beta Distribution Section 47 The beta distribution is used to model continuous data with values between 0 and 1 Typically beta distributions are used to model proportions The characteristics of the beta distribution are 0 The parameters of the beta distribution are 04 gt 0 and B gt 0 Unlike the gamma distribution these parameters do not have special names With different values of these parameters the shape of the probability density curve can change signi cantly The probability density curve for a beta distribution is HOMO where Ba 7 W Under most circumstances the distribution function for the beta distribution has no closed form solution Therefore in order to nd probabilities associated with the beta distri bution we must use tables or a statistical computer package Here are graphs of the probability density function and the distribution function for a beta distribution with 04 2 and B 3 p d f of Beta Distributiori With alpha 2 arid beta 3 Distribution ofBeta Distribution With alpha 2 arid beta 3 o The theoretical mean of the beta distribution is 04 E Y 7 M a B o The variance of the beta distribution is a3 02 W lta6gt2lta6 1 Working with the beta distribution in R To nd the probability PY S y the command in R is pbetay alpha beta To nd the value of y so that PY y p the command in R is qbetap alpha beta To generate observations from a beta distribution the command in R is rbetanumobs alpha beta where numobs is the number of observations you would like to generate Problems 1 The relative humidity Y when measured at a location7 has a beta distribution with 04 4 and B 3 Find the probability that the relative humidity at this location will be greater than 60 to The percentage of impurities per batch in a chemical product is a random variable Y with a beta distribution with 04 3 and B 2 A batch with more than 40 impurities cannot be sold a What is the probability that a randomly selected batch cannot be sold due to excess impurities b Find the mean and variance of the percentage of impurities in a randomly selected batch of chemicals OJ What is another name for the beta distribution when 04 1 and B 1 q The weekly repair cost Y for a machine has a beta distribution with 04 1 and B 3 Measurements are in hundreds of dollars How much money should be budgeted each week for repair costs so that the actual cost Y will exceed the budgeted amount only 10 of the time Calculating the Theoretical Probability of the Birthday Problem Using R Stat 341 Fall 2008 This is the R code that looks at the probability of the birthday problem the probability of at least 2 people in a room of 71 people sharing a birthday In this handout we will rst look at the problem using 71 26 since this was the number of people in Stat 341 on the rst day we began discussion of this problem Then we will look at calculating the probability for a general value of 71 To begin calculating the theoretical probability we need to rst set up the problem There are 365 different possible birthdays ignoring Feb 29th We need to calculate all the possible ways a group of 26 people can have birthdays In R this is allpossbd26lt 365quot26 To calculate the probability ofthe birthday problem we are going to look at the opposite situation called the complement event where no one in the group of 26 people shares a birthday How many ways can this happen In R this is noshare26lt prodc 340 365 We can also write this as noshare26lt choose 365 26 factorial 26 The theoretical probability of the opposite situation is noshare26allpossbd26 This value is 04017592 The probability that we want is 1 noshare26allpossbd26 This value is 05982408 Notice how close the emperical probability of 05960 is to the theoretical probability of 05982408 How could we calculate this theoretical probability with a general value of 71 Let s assign n values from 1 to 100 In R the command is n lt Cl 100 To get the number of all possible ways to assign birthdays to 71 people we need to calculate allpossbdlt 365quotn To get the number of ways that no one in the group of 71 people will share a birthday we need to calculate nosharelt choose 365 n factorial n Then the theoretical probabilities for the birthday problem with 71 people is bdprobslt 1 noshareallpossbd For n from 1 to 100 the theoretical probabilities ar D 1 0000000000 0002739726 0008204166 0016355912 0027135574 0040462484 7 0056235703 0074335292 0094623834 0116948178 0141141378 0167024789 13 0194410275 0223102512 0252901320 0283604005 0315007665 0346911418 19 0379118526 0411438384 0443688335 0475695308 0507297234 0538344258 25 0568699704 0598240820 0626859282 0654461472 0680968537 0706316243 31 0730454634 0753347528 0774971854 0795316865 0814383239 0832182106 37 0848734008 0864067821 0878219664 0891231810 0903151611 0914030472 43 0923922856 0932885369 0940975899 0948252843 0954774403 0960597973 49 0965779609 0970373580 0974431993 0978004509 0981138113 0983876963 55 0986262289 0988332355 0990122459 0991664979 0992989448 0994122661 61 0995088799 0995909575 0996604387 0997190479 0997683107 0998095705 67 0998440043 0998726391 0998963666 0999159576 0999320753 0999452881 73 0999560806 0999648644 0999719878 0999777437 0999823779 0999860955 79 0999890668 0999914332 0999933109 0999947953 0999959646 0999968822 85 0999975997 0999981587 0999985925 0999989280 0999991865 0999993848 91 0999995365 0999996521 0999997398 0999998061 0999998560 0999998935 97 0999999215 0999999424 0999999578 0999999693 We can plot the number of people in the room versus the probability that at least two people in the room will share a birthday The R command for this plot is below and the plot can be found on the next page plotnbdprobs type quot1quot People are generally surprised that the probability of the birthday problem is just over 05 for only 23 people in the room and that the probability reaches 075 with only 32 people in the room This probability will continue to grow closer to 1 until 71 366 When there are 366 people in the room there must be at least 2 people who share a birthday making the probability exactly 1 In conducting the t and t the theoretical probability of the birthday problem we have made several assumptions These assumptions are necessary to simplify the problem in order to be able to either simulate or calculate the probability Whether these assumptions are reasonable is open for debate Assumptions in Calculating 0r Simulating the Birthday Problem 0 No one has a birthday on February 29th While this assumption is clearly not true violating the assumption won t change the probabilities very much 0 Birthdays are evenly distributed throughout the 365 days of the year c No twins triplets etc are in the room Probability of at least Two People Sharing a Birthday 10 08 06 04 02 00 Theoretical Probability Plot of Birthday Problem 0 20 40 60 80 100 Number of People in Room Introduction to Probability Stat 341 Fall 2008 What is the probability of obtaining heads when ipping a coin Almost everyone would answer 05 Why What does the value of 05 really mean When you ip a coin once you get either heads or tails So when you ip a coin once either the event occurs heads or doesn t occur tails Then why do we assign a probability of 05 to the event of obtaining heads when ipping a coin The concept of probability is based on what occurs in the long run in a large number of trials So if we ip a coin 100 times we expect to obtain around 50 heads 100 05 The probability of an event is always de ned in terms of this long run relatively frequency So if we keep ipping the coin forever an in nite number of trials heads will occur in exactly onehalf of the ips This is the theoretical probability of the event 05 Clearly we cannot ip a coin or perform any other event an in nite number of times So if we were repeating an event or simulating an event on the computer how many simulations would we need until the observed relative frequency would be close to this theoretical probability When you ip a coin 100 times you expect the relative frequency of heads to be 05 50 ips out of 100 However for every ip you are away from 50 the relative frequency of heads changes by 001 Get 45 ips or 55 ips out of 100 with heads and your relative frequency is a full 005 away from the theoretical probability of 05 Contrast this when you ip a coin 10000 times In this situation you still expect the relative frequency of heads to be 05 5000 ips out of 10000 This time however for every ip you are away from 5000 the relative frequency of heads only changes by 00001 Get 4995 ips or 5005 ips out of 10000 with heads and your relative frequency is only 00005 away from the theoretical probability of 05 To see how this works we can simulate ipping a coin in R and look at the relative frequency of obtaining heads To set up the coin we will de ne a new variable coin to have the values either 0 Tails or 1 Heads Here is the command in R coinlt c01 Flipping a coin is like sampling from the coin variable with replacement To ip the coin 100 times we will use the sample command in R and save the result to the variable ipleO flipleOlt samplecoin 100 replace T In my 100 ips I obtained 51 out of 100 heads for an emperical probability of 051 You can get the number of heads out of your 100 ips and the proportion of heads out of your 100 ips using the R commands sumflipleO sumflipleO1OO Flipping the coin in R 100 more times will give you a different set of outcomes and most likely a different number and proportion of heads out of your 100 ips If we repeat this process many times we can see what would commonly happen for the number of heads and the proportion of heads out of 100 ips of the coin Here is the code in R that will conduct 10000 trials with each trial consisting of ipping the coin 100 times From this simulation we will save the number and proportion of heads for each trial numhead5100lt rep0 10000 prophead5100lt rep0 10000 for i in 110000 flip5100lt samplecoin 100 replace T numheadleO i lt sumflip5100 propheadleO i lt sumflip5100100 The histograms of both the number and proportion of heads occurring in 100 ips of the coin for these 10000 trials can be found at the end of the document For the number of heads occurring in 100 ips the histogram shows that most trials contained between 40 and 60 ips for a difference of i10 However in a few of the trials there were as low as 30 or as high as 70 heads out of the 100 ips for a difference of i20 For the proportion of heads occurring in 100 ips the histogram shows that most trials contained a proportion of heads between 04 and 06 for a difference of i01 How ever in a few of the trials the proportion was as low as 03 or as high as 07 for a difference of i02 From these results we can see that 100 ips would not be enough to get a relative frequency of the event close to the theoretical probability We need more ips Let s try 10000 ips The code to produce the number of heads and the proportion of heads out of 10000 ips of a coin is given below flip510000lt samplecoin 10000 replace T In my 10000 ips I obtained 4993 heads for an emperical probability of 04993 Just like before ipping the coin another 10000 times will give you a different set of outcomes and most likely a different number and proportion of heads out of your 10000 ips And just like before if we repeat this process many times we can see what would commonly happen for the number of heads and the proportion of heads out of 10000 ips of the coin Here is the code in R that will conduct 10000 trials with each trial consisting of ipping the coin 10000 times From this simulation we will save the number and proportion of heads for each trial numhead510000lt rep0 10000 prophead510000lt rep0 10000 for i in 110000 flip510000lt samplecoin 10000 replace T numhead510000 ilt sumflip510000 propheadleOOO i lt sumflip51000010000 The histograms of both the number and proportion of heads occurring in 10000 ips of the coin for these 10000 trials can be found at the end of the document For the number of heads occurring in 10000 ips the histogram shows that most trials contained between 4900 and 5100 ips for a difference of i100 However in a few of the trials there were as low as 4800 or as high as 5200 heads out of the 10000 ips for a difference of i200 For the proportion of heads occurring in 10000 ips the histogram shows that most trials contained a proportion of heads between 049 and 051 for a difference of i001 However in a few of the trials the proportion was as low as 048 or as high as 052 for a difference of i002 For 100 ips the difference in the expected number of heads 50 and the observed number of heads in the 10000 trials was fairly small no more than i20 However since there were relatively few ips to begin with the relative frequency of heads varied greatly from 03 to 07 from the expected value of 05 For 10000 ips the difference in the expected number of heads 5000 and the observed number of heads in the 10000 trials was fairly large with the largest being i200 However since there were many ips each trial produced an emperical probability of heads that was within 002 of the theoretical probability of 05 The more ips of the coin the closer we expect the relative frequency or the emperical probability to be to the theoretical probability For 100 ips the emperical probabilities are not always very close to the theoretical probability of 05 However with 10000 ips the emperical probabilities are typically within 1 2 percentage points of the theoretical probability Number of Trials Results of 10000 Trials Number of Heads out of 100 Flips 1000 1500 2000 500 30 40 50 60 70 Number of Heads out of 100 Flips Number ofTrials 2000 1500 1000 500 Results of 10000 Trials Proportion of Heads out of 100 Flips 03 04 05 06 Proportion of Heads out of100 Flips 07 Number ofTrials 2000 1500 1000 500 Results of 10000 Trials Number of Heads out of 10000 Flips 4800 4900 5000 5100 Number of Heads out of 10000 Flips 5200 Number ofTrials 2000 1500 1000 500 Results of 10000 Trials Proportion of Heads out of 10000 Flips 048 049 050 051 052 Proportion of Heads out of10000 Flips Rolling Two Dice Examples Using R Stat 341 Fall 2008 This is the R code that looks at two different random variables obtained from the sample space of the experiment of rolling two 6 sided dice the sum of the values on the two dice and the largest or maximum value of the two dice Using R we will study the observed probability distributions of these random variables To begin we need to create a virtual dice in R The code is dicelt c16 We would then like to roll the dice twice and observe the outcome and then repeat the process 10000 times The R code to do this is dice1lt sampledice10000replace T dice2lt sampledice10000replace T No we have 10000 rolls of our two dice in the variables dicel and dice2 But they are in separate variables and we need to join them together into a matrix Each dice will become a column in the matrix and each roll of the 2 dice will become a row in the matrix Here is the R code to do this dicematrixlt cbinddice1 dice2 At this point it may be helpful to look at the matrix dicematrix To look at the rst 20 rows type in the command dicematrix 1 20 Now we would like to look at the values in each row the outcomes of the rolls of our 2 dice and calculate the sum of the two dice and the largest or maximum of the two dice In R whenever you have a matrix you can apply a mathematical function like sum to each row or column of the matrix with the same command In this case we would like to apply two different functions sum and max to each of our 10000 rows of the matrix dicematrix The commands are sumtwodicesimlt apply dicematrix 1 sum maxtwodicesimlt applydicematrix 1 max In the commands above the rst value is the name of the matrix the second value of 1 speci es we want to apply the function to the rows and the third value is the name of the function we want to apply We can then study the 10000 observed values of these two random variables using histograms and summary statistics For example to nd the smallest and largest values of the observed sums you can type minsumtwodicesim maxsumtwodicesim To nd other summary statistics ofthe observed sums like the mean median ve number summary and standard deviation you can type meansumtwodicesim mean sqrt var sumtwodicesim std dev fivenumsumtwodicesim five number summary To get a picture of the observed values you can make a histogram For these values you should set up the histogram so that the observed values are centered in the bars of the histogram For example for the observed sums you should set up your histogram as sumdicebreakslt c112 05 histsumtwodicesim breaks sumdicebreaks Here is a picture of the observed distribution of the sums Observed Distrbution of Sum of Two Dice o m I I o I l l l l l l 2 4 6 8 l 0 l 2 Sum of TWO Dice 1500 2000 Number of Rolls of TWO Dice 1000 Similar code in R Will give you the histogram and summary statistics for the largest or maximum values of the two dice minmaxtwodicesim minimum observed value maxmaxtwodicesim maximum observed value meanmaxtwodicesim mean sqrtvarmaxtwodicesim std dev fivenummaxtwodicesim five number summary maxdicebreakslt COZ6 05 set the breaks for the largest value histmaxtwodicesim breaks maxdicebreaks Here is a picture of the observed distribution of the maximum values Number of ROMS OfTWO Dice 1500 2000 1000 o Observed Distrbution of Sum of Two Dice 2 4 6 8 10 12 Sum ofTWO Dice Hypergeometric Distribution A random variable Y has a hypergeometric distribution if c A sample of size n is selected without replacement from a population of size N Each member of the population belongs to one of two groups success or failure The number of successes in the population is denoted as r and therefore the number of failures in the population is N 7 r The random variable Y is the total number of successes in the sample of size n The parameters for the hypergeometric random variable Y are the sample size n the popu lation size N and the number of successes in the population r The probability distribution function of Y is I 155 if c The theoretical mean of the hypergeometric random variable Y is y01nwherey randniygNir 717 MEY N The theoretical variance of the hypergeometric random variable Y is a Vm nltgt NJ H Working with hypergeometric random variables in R To nd a probability PY y py for a single value y the command in R is dhypery r Nr n To nd the probability PY y7 use the sum command to add up all py values for y between and including 0 and y sumdhyper0yrNrn To nd the probability Py1 g Y yg use the sum command to add up all py values for y between and including yl and y2 sumdhypery1y2rNrn To nd the probability PY 2 y7 use the sum command to add up all py values between and including y and n sumdhyperynrNrn Problems q 01 How is the probability distribution function py derived Let the quantity rN p Derive the expected value and variance of a hypergeometric distribution in terms of p n and N A box contains 40 balls 10 red 15 yellow and 15 green A sample of 3 balls is taken from this box without replacement a b c d What is the probability that all three balls will be yellow What is the probability that exactly two out of three balls will be yellow AAA What is the expected number of yellow balls in the sample A What is the variance of the number of yellow balls in the sample Crates of eggs are inspected for blood clots A sample of three eggs are selected without replacement from a crate of 120 eggs a What is the probability that exactly one out of the three eggs will have a blood clot if the crate contains a total of 10 eggs with blood clots b Use R to calculate the probability that exactly one of the three eggs will have a blood clot if the crate contains a total of r eggs with blood clots for all possible values of r c For what value of r is the probability in part b maximized On any MP3 player the number of songs from any particular artist that appear in the rst 71 songs of a N song playlist has a hypergeometric distribution where r is the total number of songs from that artist in the playlist In my favorites playlist of 240 songs I have 14 songs by the artist Queen a lf I listen to the rst 71 60 songs on my favorites playlist nd the probability that I will hear six songs by Queen b Use R to determine the probability distribution function for the number of songs from Queen that will appear in the rst 71 60 songs on my favorites playlist c In one particular shuf e I heard all 14 Queen songs in the rst 60 songs of the shuf e Should I question the randomness of the shuf e feature d If each shuf e is independent in how many shuf es out of 100 total listening to the rst 71 60 songs should I expect to hear six songs by Queen e If each shuf e is independent how many shuf es would I need to perform in order to expect to hear all 14 songs by Queen listening to the rst 71 60 songs just once Negative Hypergeometric Distribution A random variable Y has a negative hypergeometric distribution if c Objects are selected without replacement from a population of size N 0 Each member of the population belongs to one of two groups success or failure The number of successes in the population is denoted as r and therefore the number of failures in the population is N 7 r Objects are selected from the population until the kth success occurs The random variable Y is de ned as the number of the trial on which the kth success occurs The parameters for the hypergeometric random variable Y are value of k the population size N and the number of successes in the population r The probability distribution function of Y is Niy pyypyf yk7k1N7Tk The theoretical mean of the negative hypergeometric random variable Y is N1gt uEYkltT1 The theoretical variance of the negative hypergeometric random variable Y is kN 1N7 rr 17 k 72 WY r 12r 2 Working with negative hypergeometric random variables in R R does not include the negative hypergeometric random variable as a built in function In order to work with this random variable7 you will need to copy and paste the following function into R dneghyperlt functionyr Nkchoosey1 klchooseNyrkchooseNr To nd a probability PY y py for a single value y the command in R is dneghyperyrNk To nd the probability PY y7 use the sum command to add up all py values for y between and including k and y sumdneghyperkyrNk To nd the probability Py1 g Y yg use the sum command to add up all py values for y between and including yl and y2 sumdneghypery1y2rNk To nd the probability PY 2 y 17 PY lt y 17 PY y 7 17 use the sum command to nd PY y 7 1 and subtract this value from 1 1 sumdneghyperky1rNk Problems 1 How is the probability distribution function py derived 2 A box contains 40 balls 10 red7 15 yellow7 and 15 green Balls are selected from this box without replacement a What is the probability that the third yellow ball will be the seventh ball chosen b What is the probability that the third yellow ball will be the 10th ball chosen c Find the expected number of draws needed to obtain the third yellow ball d Find the variance of the number of draws needed to obtain the third yellow ball 3 A company receives a shipment of 50 condensors7 of which 5 are defective a When sampling without replacement7 what is the probability the second defective con densor will be 10th condensors chosen b When sampling without replacement7 what is the probability the third defective con densor will be chosen within the rst 25 draws c Find the expected number of draws needed to obtain the second defective condensor d Find the variance of the number of draws needed to obtain the second defective conden sor Rolling Two Dice Random Variables Using R Stat 341 Fall 2008 In some examples the sample space of the experiment is small enough to easily use R to study the distribution of the random variables arising from the particular sample space In this help le we will look at R code for the sample space of rolling two dice The sample space consists of 36 simple events each having the same probability 136 The following R code will set up the sample space S in a matrix with two columns the outcomes on the two dice and 36 rows the 36 possible outcomes Stwodicelt scan 1 1 2 1 1 4 1 03030303030 3 1 23242 2 33343 3 43444 4 53545 5 63646 6 OEU39IrbCOIQH I I I I H 2 3 4 5 6 IONIQIQIQ Stwodicelt matrixStwodice byrow T ncol 2 Applying the sum function or the max function to the rows of this matrix will produce the 36 sums and 36 maximum values corresponding to the 36 simple events in S sumtwodicelt applyStwodice 1 sum maxtwodicelt applyStwodice 1 max We can then study the distribution of the two random variables using histograms and summary statistics For example to nd the smallest and largest possible values of the sums you can type minsumtwodice maxsumtwodice To nd other summary statistics of the possible sums like the mean median ve number summary and standard deviation you can type meansumtwodice mean sqrtvarsumtwodice std dev fivenumsumtwodice five number summary To get a picture of the distribution of the possible sums you can make a probability histogram For these values you should set up the histogram so that the possible values are centered in the bars of the histogram For example for the sums you should set up your histogram as sumdicebreakslt C1zl2 05 histsumtwodice breaks sumdicebreaks prob T Here is a picture of the probability distribution of the sums Probability Distribution ofthe Sum ofTwo Dice l l l l l l 2 A 6 E in 12 Sum ur Twu DlEE F rabablllty n in ms Similar code in R will give you the probability histogram and summary statistics for the distribution of the largest or maximum values of the two dice minmaxtwodice minimum value maxmaxtwodice maximum value meanmaxtwodice mean sqrtvarmaxtwodice std dev fivenummaxtwodice five number summary maxdicebreakslt COz6 05 set the breaks for the largest value histmaxtwodice breaks maxdicebreaks prob T Here is a picture of the probability distribution of the maximum values Probability Distribution of Maximum Value ofTwo Dice l 2 a A 5 6 Maximum Value urTwu DlEE F rabablllty Normal Distribution Section 45 A normal distribution is used to model continuous data when the probability histogram has an approximate bell shape The normal distribution has the following properties 0 The parameters for the normal distribution are the mean u and the variance 02 The standard normal distribution has mean u 0 and variance 02 1 o The probability density function for the normal distribution is 2 1 7 y 252 fooltyltoo 7 e 027139 o The distribution function for the normal distribution does not have a closed form so lution You must use tables or a computer package to nd probabilities associated with the normal distribution Here are graphs of the probability density function and the distribution function of a Normal distribution with u 0 and 02 1 p d f ofNorrnal distribution Witn rnean 0 and variance 1 Distribution of Normal distribution Witn rnean 0 and variance 1 o The theoretical mean of the normal distribution is MEYM o The variance of the normal distribution is o The normal distribution is very important in statistical theory and we will be learning much more about this distribution in Statistics 342 Working with normal random variables in R To nd the probability PY S y the command in R is pnormy mu sigma To nd the value of y so that PY y p the command in R is qnorm p mu sigma To generate observed values from a normal distribution the command in R is rnorm numobs mu sigma where numobs is the number of observed values you would like to generate Problems 1 Scores on a particular achievement test are known to have a normal distribution with mean u 75 and variance 02 100 a Find the probability a randomly selected student will score between 80 and 90 on this achievement test b 30 of all students taking this achievement test will score better than what value c Out of 5 randomly selected students taking this achievement test nd the probability that all 5 students will score between 80 and 90 d Out of 5 randomly selected students taking this achievement test nd the probability that 4 out of the 5 students will score between 80 and 90 2 The 7 ll7 problem is important in many industries like those making cereal toothpaste beer and so on If such as industry claims it is selling 12 ounces of its product in a container it must have a mean greater than 12 ounces or else the FDA will crack down on the industry However the industry is allowed to have a very small percentage of the containers less than 12 ounces a If the contents Y of a container have a normal distribution with mean u 121 ounces and a variance 02 nd 02 so that PY lt 12 001 b If a 005 nd 1 so that PY lt 12 001 3 Assume that the ll Y of a lling machine for a beverage has a normal distribution with u 122 and a 01 measured in uid ounces a Find PY lt 12 b 50 bottles of this beverage are selected independently What is the probability that at least one is under 12 ounces Special Gamma Distributions Section 46 There are two special distributions in the family of gamma distributions They are the exponential distribution and the chi square distribution 1 Exponential distribution An exponential distribution is a gamma distribution With 04 1 The only parameter for an exponential distribution is the scale parameter 3 Exponential distributions are often used to model the length of life of electronic components The probability density function of an exponential distribution is 1 e 9 fy6y 0 yltoo B The distribution function for the exponential distribution has a closed form solution The distribution function is 0 y lt 0 F P Y lt y 7y 1573 O yltoo Here are graphs of the probability density function and the distribution function of an Expo nential distribution with B 1 p d f of Exponential Distribution Witn rnean 1 Distribution of Exponential Distribution Witn rnean 1 The theoretical mean of an exponential distribution is u EY B and the variance is 02 VY 62 Working with the exponential distribution in R To nd the probability PY S y the command in R is pexpy 1beta To nd the value of y so that PY S y p the command in R is qexp p 1beta To generate observations from the exponential distribution the command in R is rexp numobs 1beta where numobs is the number of observations you wish to generate 2 Chi Square Distribution A chi square distribution is a gamma distribution with 04 112 and B 2 The only parameter of the chi square distribution is 1 This parameter is referred to as the degrees of freedom of the chi square distribution The chi square distribution occurs frequently in statistical theory so we will be discussing this distribution much more in Statistics 342 next semester Working with the chi square distribution in R To nd the probability PY S y the command in R is pchisqy nu To nd the value of y so that PY S y p the command in R is qchisqp nu To generate observations from a chi square distribution the command in R is rchisqnumobs nu where numobs is the number of observations you wish to generate Problems 1 A manufacturing plant uses a speci c bulk product The amount of product used in one day can be modeled by an exponential distribution with B 4 measurements in tons a Find the probability that the plant will use more than 4 tons on a given day b How much of this bulk product should be stocked so that the plant s chances of running out of the product is only 005 2 One hour carbon monoxide concentrations in air samples from a large city have an approxi mately exponential distribution with mean 36 parts per million a Find the probability that the carbon monoxide concentration exceeds 9 parts per million during a randomly selected 1 hour period b A traf c control strategy rediced the mean to 25 parts per miollion Now nd the probability that the concentration exceeds 9 parts per million 3 Explosive devices used in mining operations produce nearly circular craters when detonated The radii of these craters are exponentially distributed With mean 10 feet Find the mean and variance of the areas produced by these explosive devices

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.