### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Statistics for Scientists STAT 3000

Utah State University

GPA 3.72

### View Full Document

## 10

## 0

## Popular in Course

## Popular in Statistics

This 100 page Class Notes was uploaded by Geovanny Lakin on Wednesday October 28, 2015. The Class Notes belongs to STAT 3000 at Utah State University taught by Christopher Corcoran in Fall. Since its upload, it has received 10 views. For similar materials see /class/230488/stat-3000-utah-state-university in Statistics at Utah State University.

## Similar to STAT 3000 at Utah State University

## Reviews for Statistics for Scientists

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/28/15

II t h s V W Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer2009 111 Famous Discrete Distributions The Binomial and Poisson Distributions Up to this point we have concerned ourselves with the general properties of categorical and continuous distributions illustrated with somewhat arbitrary examples However there are particular distributions that are well understood and have wide applicability We will rst cover two speci c types of categorical variables those that follow the binomial and Poisson distributions Both of these distributions model the probability of observing a certain number of events over a period of time or within a physical space They are used widely in all branches of science and engineering Later we will spend some time discussing the most famous continuous distribution the normal distribution or bell curve i 39 1 Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 The Bernoulli Distribution As a way of understanding the binomial distribution it helps to consider the simplest categorical random variable a binary variable which can take only one of two values such as heads 939 CC or tails yes or no male or female dead or alive etc We typically code a binary random variable X as 1 or 0 We refer arbitrarily to X 1 as a success and X O as a failure How you choose to define success and failure is entirely up to you If X is binary with probability of success p then we say X Bernou11ip That is the pmf forX is PX 1 p and PX 0 1 p Note also that E00 1p O1 p p and VarX 1219 021 p p2 19 p2 p1 p u i h s v Stat 3000 Statistics for Scientists and Engineers m I i Dr Corcoran Summer 2009 The Binomial Distribution Suppose that we have n independent Bernoulli trials each with probability of success p The binomial distribution determines the probability of observing a given number of successes out of the n trials Example IIIA We ip a fair coin 10 times and observe the number of tosses that result in heads The number of heads out of 10 ips follows the binomial distribution i 39 VW Stat 3000 Statistics for Scientists and Engineers Dr Corcoran Summer 2009 Example IIIB We survey 500 randomly selected students at USU and ask them whether they think that President Obama is doing a good job yes or no The number Who respond yes is a binomially distributed random variable The Binomial PMF If X follows the binomial distribution with number of independent trials given by n and probability of success given by p we say thatX Binomialn p The pmf of X is given by PX x ijO pyw forx 0n Note also that E00 np and VarQO npl p u h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IIIC Sixteen percent of senior citizens men and women over the age of 65 in the US suffer from diabetes Assume that the elderly in Cache County are representative of the US population and suppose that we randomly sample 25 Cache seniors What is the probability that exactly two of them are diabetic What is the probability that no more than two are diabetic What is the average number of diabetic subjects you39d expect to see out of this sample of this size What is the variance of the number of diabetics for this sample Suppose that 8 of the 25 are diabetic Is this evidence that the diabetes rate in Cache is higher than the national average II t h s 39 W Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 The Poisson Distribution Like the binomial distribution the Poisson distribution is used to model count data The distinction between the two distributions however is that we use the binomial distribution to model the probability of some count out of a fixed nite number of trials The Poisson distribution does not depend on a xed number of trials the range of a Poisson random variable is O l 2 The Poisson distribution is especially useful for modeling the occurrence of relatively rare events Example IIID An engineer observes traf c ow through an intersection during the period of an hour The number of vehicles that will pass through is a Poisson random variable u h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IIIE A physicist is interested in the per minute intensity of particle emissions from a radioactive substance The number of emitted particles can be considered a Poisson random variable The Poisson PMF There is a single parameter that determines the distribution of a Poisson random variable X the rate parameter y We often refer to y as the rate because it turns out that EX y In other words y represents the average count per unit of time or space Given y the Poisson pmf is x PXx 639 forx0l2 x Note also that VarX y The mean and variance of a Poisson random variable are the same u h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IIIF According to data om the Logan Police Department an average of 575 traf c accidents occur daily in Logan City Given that the daily number of traf c accidents follows a Poisson distribution what is the probability of 3 reported accidents on any given day What39s the probability that there are more than 3 reported accidents on a given day What is the expected value and variance for the number of traf c accidents on a given day Suppose that over the course of a work week Monday through Friday the LPD handles 20 accident reports Is this an quotunusualquot week II t h s 39 my Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 The Poisson Approximation to the Binomial Distribution It turns out the Poisson distribution can provide a good approximation of binomial probabilities This is especially true for a binomial random variable with relatively large n and small p Can you think of a heuristic explanation for this Suppose that you have a random variable X Binomia1np and you wish to use a Poisson approximation for the pmf of X What mean and variance would you use for this Poisson distribution Use the Poisson approximation to compute the probabilities in Example IIC Does the approximation work well Why or why not u t h 5 Stat 3000 Statistics for Scientists and Engineers m I l Dr Corcoran Summer 2009 11 Random Variables Random variables operate in much the same way as the outcomes or events in some arbitrary sample space the distinction is that random variables are simply outcomes that are represented numerically As with events we denote a random variable as a capital letter such as X Y or Z Realizations of a random variable are denoted using lower case For instance x may be used to represent a possible value of the variable X u t h s Stat 3000 Statistics for Scientists and Engineers m 1 i Dr Corcoran Summer 2009 Variable Types DiscreteCategorical I eg ethnic group gender Random Variables Continuous eg age height weight i i i 7 Stat 3000 Statistics for Scientists and Engineers i Dr Corcoran Summer 2009 Variables Map Outcomes to Numbers Experimental Possible Numeric Values Outcomes of a Random Variable i 39 VW Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 Example IIA Consider an experiment where we ip a coin the sample space is S H T We can de ne a random variable X such thatX 1 if the ip turns up heads and X 0 if the ip is tails Example IIB Consider an experiment where we sample an adult and ask for hisher age in years we might View the sample space as something like S 18 19 20 120 Suppose we let Y the person s age in years Then in this case the mapping of outcomes in S to numeric values is straightforward u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 Categorical Random Variables Possible values of a categorical random variable have probabilities determined by a probability mass function or pmf Suppose that x1x2 xN are the N possible outcomes of a random variable X The pmf for X denoted by x or PX x must have the following properties i 0 gPX x11forz39lN and N ii ZPX x1 1 i1 u t h s Stat 3000 Statistics for Scientists and Engineers m 1 1 Dr Corcoran Summer 2009 Example IIC LetX be the sum of two fair sixsided dice Then the pmf forX is given by x1 PX x1 PX lt x1 2 136 136 3 236 336 4 336 636 5 436 1036 6 536 1536 7 636 2136 8 536 2636 9 436 3036 10 336 3336 11 236 3536 12 136 1 PX x Stat 3000 Statistics for Scientists and Engineers llizah taite Dr Corcoran Summer 2009 Example IIC gcont d The third column in the previous table contains the cumulative distribution function cdf or PX 3 xi Note that the pmf determines the cdf and Vice versa The pmf and cdf are plotted below PDF 015 u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 Continuous Random Variables As opposed to a probability mass function a continuous random variable has a probability density function or pdf Suppose a continuous random variable X can take any value between I and u We denote the pdf of X by x for Z lt x lt u However note that x 75 PX x Why must this be so How then do we interpret the pdf for a continuous random variable u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 Interpreting the Probability Density Function It doesn t make sense with a continuous variable X to talk about PX x However we can use the pdf x for Z lt x lt u to compute the probability that we observe a value for X that is within an interval for example Pa S X S 19 Note the probability Pa S X S b is given by the area under the curve fx between the points a and b where a S b and a and b are possible values of X Question how does one nd the area under a continuous curve between two points u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 Properties of the Probability Density Function Given a pdf x we can compute the probability over an interval by Pa s X s b j fxa x This probability has two interpretations 1 It represents the probability that a randomly selected individual from the underlying population will have a value of X that falls within the interval a b 2 It represents the proportion of individuals in the underlying population who have values for X that fall within the interval a b u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 Properties of the Probability Density Function Hence a pdf x must have the following properties i x 20 forleSu ii ffxdx1 In other words a pdf fx must be positive over its domain I u and the area under the curve over that domain must be equal to l II t h s VW Stat 3000 Statistics for Scientists and Engineers ta 1 i Dr Corcoran Summer 2009 Example IID Victims of a particular type of cancer have postdiagnosis survival times that follow the pdf plotted on the following slide If X represents the survival time in months of a given cancer patient then the pdf for X is 1 fx gem x 2 0 Verify that this is a legitimate pdf On the plot diagram the area that represents the probability that a given cancer patient survives no longer than 10 months Diagram the area that represents the proportion Who survive between 12 and 18 months Diagram the area representing the proportion Who survive longer than 18 months u i v Stat 3000 Statistics for Scientists and Engineers m I i Dr Corcoran Summer 2009 Example IID2 cont d Probability density function for survival time of cancer patients 002 003 004 005 006 PDFfx of Survival Time X 001 00 i i i O 1 O 20 3O 4O 50 Survival Time X After Diagnosis months u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IID2 cont d Compute the probabilities that you diagrammed on the plot of the pdf Find and plot the cumulative distribution function Use the cdf to compute the probabilities that you diagrammed on the plot of the pdf What is the median survival time What is the 7 5th percentile of survival u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 The Mean or Expectation of a Random Variable Based on Example IIC suppose that I propose the following game you pay me 7 50 then roll the dice I pay you the sum of the dice in dollars From a purely economic standpoint should you play In order to answer that question you need to know something about the mean of X or the average sum of the dice u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 De nition of Expectation By de nition the average or expected value of a categorical random variable X with pmf at a value x given by PX x for i 1 N is computed as N EX inPX x i1 The expected value of a continuous random variable X with pdf given by x for Z S x S u is computed as EX jquot xfxa x In either case we can view the expectation as a weighted average u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IIE The expected value of the distribution in Example IIC is given by EX 2136 323612136 7 Playing the game described on the previous slide probably wouldn t be your smartest move that is what would be your average net gain or loss In general how do we interpret expectation How would you describe expectation to the average person Hint recall our discussion of longterm relative frequency u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IIF What is the average survival time for the cancer patients in Example IID The expectation in this case is computed by 1300 fxfxdx f e xlsdx xe x15 156 x 5i3 1im xe x 5 15e x 5 0 15 2 0 0 15 15 Hence the average life expectancy after diagnosis is 15 months How do the mean and median compare Why are they different If h 5 Stat 3000 Statistics for Scientists and Engineers til t we Dr Corcoran Summer 2009 The Variance of a Random Variable The mean is used to describe the center of a distribution The variance is used to describe the spread of a distribution about its mean As shown in the plot below two distributions with the same mean can have very different degrees of variability about the mean E00 II t h s 39 W Stat 3000 Statistics for Scientists and Engineers ta 1 l Dr Corcoran Summer 2009 Interpreting the Variance The variance is actually the average squared deviation of the possible values of a random variable about its mean In some sense you can think of variance probabilistically i If you sample one observation at random from a distribution With high variance then there is a relatively high probability that the observation will lie far away from the mean ii If you sample an observation from a distribution with low variance then there is a relatively small probability that the observation will lie far from the mean u t h 5 Stat 3000 Statistics for Scientists and Engineers m I l Dr Corcoran Summer 2009 De nition of the Variance By de nition the variance of a categorical random variable X with pmf at a value x given by PX x1 for z39 1 N is computed as N N Varoo Zixi EltXgtrPltX xi foPltX xi iEltXr i1 i1 The variance of a continuous random variable X with pdf given by x for Z S x S u is computed as moo Jquot x EX2fxdx Jquot x2fxdx EX2 Note that the last part of the equations given above result from the fact that VarQr EX2 E00 This is typically much easier to compute by hand u t h s Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IIG For the distribution in Example IIC VarX2213632236122136 72 25483 49 2583 Hence the standard deviation is 242 u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IIH For the distribution of survival times in Example IID 2 E X2 x2 x dxz x ex15dx lt gt f f gt f 15 Z xze xIS 15xe x15 21526 x15H30 1im x2e x 5 15xe x 5 2152e x15 0 0 2152 0 0 02152 2152 The variance then is MAX EX2 EX2 2152 152 152 months Hence the standard deviation is 15 months II t h s 39 my Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 Some Additional Properties of the Mean and Variance In practical research settings we are almost always interested in the distribution of sums of random variables In particular as we shall see within a few weeks we need to know something about the mean and variance of the sample mean XZLZLX n where the values X Xn represent a sample of observations from some population eg we sample 20 USU students and measure their individual heights Question if each subject in the sample has a mean EX u gld a variance VarXi 02 then what are the mean and variance of X u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 Mean and Variance of a Linear Function LetX be a random variable such that EX y and VarQO 02 Suppose we re interested in a random variable Y de ned as YahK Where a and b are constants In other words Y is a linear function of X Then EY Ea bX Ea bEX a by and VarY Vara bX Vara 2 VarQQ O 9202 9202 u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example III The average high temperature during the month of September in Logan UT is 67 6quot F with a standard deviation of 42 F That is if X is a variable that represents the high temperature in Logan on a randomly selected September day then E00 676 and VarQO 422 What is the average daily temperature in degrees Celsius What is the standard deviation of daily temperature in degrees Celsius u i h s v Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer2009 Mean and Variance of a Linear Combination of Independent Random Variables Let X1 be a random variable such that EX1 u1 and VarX 1 012 and X2 be a random variable such that EX2 u2 and VarX2 022 Where X1 andX2 are independent Suppose we re interested in a random variable Y de ned as Y 61X 1 CZXZ where 01 and c2 are constants In other words Y is a linear combination of X 1 and X2 Then Echl chz 2 CIEX1 2EX2 2 CW1 02 22 and VarY Varch1 CZXZ clearX1 cszarX2 612012 622022 Note that the expectation of Y given above holds regardless of the dependence of X 1 and X2 However X1 and X2 must be independent for the variance formula to hold u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IIJ Consider the cancer patients Whose survival follows the distribution in Example IID Suppose that we randomly sample 2 individuals With this cancer and measure their survival after diagnosis What is the expected combined survival of these 2 individuals in months In years What is the standard deviation of their combined survival in months In years i 39 1 Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 The Mean and Variance of a Sample Mean It might sound odd to talk about the mean and variance of a mean but recall how a sample mean which is just a simple average is computed we randomly sample n subjects X X from a population with an underlying expected value of y and variance of 02 That is EX y and VarXl 02 for z39 1 The sample mean is de ned as X l 7 X n 11 I In other words the sample mean is just a linear combination of the independent random variables X X Hence the sample mean is also random variable and so it has a distribution of its own with a mean and and a variance u a h Stat BOOO Statistics forScientists and Engineers in s film Dr Corcoran Summer 2009 The Mean and Variance of a Sample Mean Note the distinction between i a sample mean and ii the expectation and variance of the underlying distribution we can t emphasize this enough X Sample Mean Random Variable u and 0392 Fixed Parameters Expectat1on and Variance Constants u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 The Mean and Variance of a Sample Mean Having sampled the n subjects X Xn at random from a population with an underlying expected value of y and variance of 02 we can assume that these subjects are independent Then using the rules given four slides previous the expectatipn of the distribution of the sample mean X is u and variance of X is 0211 Can you demonstrate Why u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IIK Consider again the cancer patients whose survival follows the distribution in Example IID Suppose that we randomly select 20 ingviduals with this cancer and compute the average survival time X for the sample What is the expected value of X What is the variance of X By what factor do we need to increase our sample size to divide the standard deviation of X in half u I s 51313000 7 Slmlsllw for Suenmsts and Engineers til late Dr Commaquot Summer 2009 IV The Normal Distribution The normal distribution aka the Gaussian distribution or bell curve is the by far the best known random distribution It s discovery has had such a farreaching impact in modeling quantitative phenomena across the physical social and biological sciences that it s founder has even found his way on to a major currency before the Euro i 39 1 Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 Utility of the Normal Distribution The normal distribution has such broad applicability in part because phenomena in the natural world that result from the interaction of many environmental and genetic factors tend to follow the normal distribution eg height weight measurable intelligence the sums and averages of random samples have distributions that look roughly normal as the sample size gets larger the normal approximation gets better This result is known as the Central Limit Theorem As we will discuss later this applies even to samples of categorical variables Sta130007 Statistics or Scientists and Engineers ta Dr Cowman SummerZOOQ Characteristics of the Normal Distribution 3 20 20 30 30 39 Symmetric about the mean unimodal bellshaped If XN N045 7 Where LL is mean and 172 is the variance 7 then the density function oins given by 1 2 2 0 ex 7 7 217 for foolt ltoo fMt gm Pi 6 X 39 Of the subjects in a normally distributed population 683 lie Within one standard deviation ofthe mean 954 lie Within 2 sd s and 997 lie Within 3 sd s F Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 Computing Probabilities using the Normal Distribution Recall that for a continuous random variableX with probability density function pdf x one cannot compute PX x That is the pdf does not yield probabilities as does a discrete probability mass function Technically for a continuous random variableX PX x 0 However we can compute probabilities over intervals of X 7 that is the probability thatX lies between two numbers a and b is equal to the area under the density curve between a and b for example i 39 1 Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 Computing Probabilities using the Normal Distribution To this point we have computed areas under a density curve by using integration However since the normal density 239 cannot be integrated in closed form and ii is used by researchers with access to modern computing tools probabilities based on the normal distribution can be obtained using tables or computer software A normal probability table looks something like what is shown on the last pages of this handout reproduced from a previous edition of the text Such a table is based on the standard normal distribution or the normal distribution with zero mean and variance of 1 Using this table what is the probability that a randomly sampled NOl variable is less than 134 Less than O28 Between 254 and 168 For what x does PZ lt x 0975 II t h s VW Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 Standardizing a Nu02 Random Variable How do we compute probabilities for a normal distribution with arbitrary mean and variance using a standard normal table Another unique aspect of the normal distribution is that if we have X Nu0392 then any linear function of X is also normally distributed That is if we have Y a bX for arbitrary constants a and b then Y Na by 9202 If we de ne Z X yo then using the notation above a i0 and b 10 so that Z NOl Computing Z is called standardizing X Once we ve converted X into standard units we can compute probabilities over intervals of X by using the standard normal or Z distribution II t h s 39 my Stat 3000 Statistics for Scientists and Engineers ta 1 i Dr Corcoran Summer 2009 Example IVA Data from a study of king crabs on Kodiak Island AK carried out by the Alaska Department of Fish and Game show that male crab length is normally distributed with a mean of 1347 mm and a standard deviation of 255 mm What proportion of the male crab population on Kodiak Island is less than 140 mm What proportion is between 100 and 140 mm What is the probability that a randomly selected male crab will measure at least 170 mm What is the 7 5th percentile of this population The 99th percentile u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 Sums of Normally Distributed Random Variables Yet another interesting feature of the normal distribution is that sums of normally distributed independent variables are also normally distributed Suppose we have two independent random variables X1 and X2 such that X1 Nul012 and X2 Nuzo22 and we define Y such that Y 01X 1 chz where 61 and c2 are constants Then 2 2 2 2 YNC1iu1 6212 Cl 0391 C2 0392 u t h 5 Stat 3000 Statistics for Scientists and Engineers m l i Dr Corcoran Summer 2009 Distribution of the Sample Mean Recall from our discussion of random variables that if we sample n subjects X1 Xn at random from a population with an underlying expected value of y and variance pf 02 then the expectation of the distribution of the sample mean X is u and the variance of X is 0211 From the previous slide we can see further that if the sample comes from a normally distributed population then XNu0392n u h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IVB Consider again the population of Kodiak crabs discussed in Example IVA Suppose that we randomly sample 20 specimens from this population What is the probability that the sample mean will lie between 1247 and 1447 mm Compute an interval centered at the mean it such that a sample average of 20 male crabs will lie Within that interval With 95 probability What sample size is required to reduce the total Width of this interval to 20 mm i 39 1 Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 The Central Limit Theorem Suppose that we have a sample X1X2Xn from some distribution with mean u and variance 02 If n is suf ciently large then the sample mean X 1 Nuo392n This is true even if the underlying population is not normal the approximation improves for relatively larger n We refer to this result as the Central Limit Theorem or CLT It represents one of the most remarkable results in mathematical statistics The CLT applies even to samples from some categorical distributions including the binomial and Poisson distributions u h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IVC Let s carry out an experiment in order to demonstrate the power of the Central Limit Theorem Suppose we have a random variable X that represents the ith ip of a coin for z39 1 We will assume XI 1 for heads and XI O for tails The mass function for X is given by x P1Xix2 O 12 1 12 Hence in this case y 12 and 0392 121 12 1A Note that the sample mean X is just the proportion of ips out of n tries that turn up heads u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer2009 Example IVC2 cont d Note further that the underlying distribution is not at all normal it s a binary distribution with just two points of probability mass at zero and one The density of the normal distribution is continuous unimodal bellshaped symmetric and has a domain over the entire real line However the CLT claims that if n is sufficiently large the distribution of X will be approximately normal What will the mean and variance of this distribution be u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IVC2 cont d Take a coin and ip it 30 times Record the results of the ips in order What is the proportion of your rst 10 ips that were heads What is the proportion of the rst 20 ips the rst 10 combined with the second 10 that were heads What is the proportion of all 30 that were heads Example IVC2 cont d Plot below the distribution of sample proportions for the Whole class with n 10 0001020304050607080910 Example IVC2 cont d Plot below the distribution of sample proportions for the Whole class with n 20 005 015 025 035 045 055 065 075 085 095 Example IVC2 cont d Plot below the distribution of sample proportions for the Whole class with n 30 005 015 025 035 045 055 065 075 085 095 u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IVC2 cont d Answer the following questions How does the shape of the distribution of the sample proportion change as n gets larger What is the estimated mean for the whole class of the distribution of the sample proportion X when n 10 When n 20 When n 30 What is the estimated variance for the whole class of the distribution of the sample proportion X when n 10 When n 20 When n 30 Compare the estimated means and variances to our predictions T Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 Approximating the Binomial and Poisson Distributions In light of the CLT it s not surprising that the normal distribution can provide a fairly accurate approximation 7 under certain not necessarily uncommon circumstances 7 of binomial and Poisson probabilities Can you explain why For example the plots below superimpose normal curves on binomial distributions with different values of n and p For what sorts of binomial distributions will the normal distribution prove more accurate nIOp050 nIOp010 n100p010 nizzost7xvin nzotxinniomixzn i 39 1 Stat 3000 Statistics for Scientists and Engineers ta 1 i Dr Corcoran Summer 2009 Example IVD What are the mean and variance of the number of diabetic seniors in Example IIIE Use a normal approximation to compute PXZ 8 When using a normal approximation to compute binomial probabilities we can improve our accuracy with a continuity correction That is if X Binn p and we wish to compute Pa S X S b then using the normal distribution we would approximate this with Pa 12 lt X lt b 12 For the diabetes example use the continuity correction to compute PXZ 8 as well as PX 2 How accurate are these approximations Example IVE What are the mean and variance of the number of Logan traf c accidents in Example IIIH Use a normal approximation to compute the same probabilities computed in that example u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Resources on the Web There are many applets available Via the intemet that demonstrate the Central Limit Theorem A simple example using dice is found at httpwwwamstatorgpublicationsjseV6n3appletsCLThtml You can nd a more interesting example at httpWWWrufriceedulanestatsimindexhtml You can also easily access webbased CDF calculators for the normal distribution as well as for other distributions related to the normal more on those later For example this website computes probabilities for a variety of common distributions httpWWWstatberkeleyeduusersstarkJavaHtmlProbCalchtrn T Stat 3000 Statistics for Scientists and Engineers Dr Corcoran Summer 2009 The Chi Square Distribution A related distribution that we will use later on during this semester is the chisquare distribution If Z is a standard normal random variable then 22 is a chisquare random variable with 1 degree of freedom or I The sum of n independent chisquare random variables follows a chi square distribution with 11 degrees of freedom denoted by 1 A chisquare random variable has a range that is nonnegative and its distribution is positively skewed For example the pdf for the chisquare distribution with five degrees of freedom looks something like this i 39 1 Stat 3000 Statistics for Scientists and Engineers Dr Corcoran Summer 2009 The t Distribution Another distribution related to the normal and one upon which we will heavily rely is the 2 distribution If Z is a standard normal random variable and X 2 is an independent an random variable then the random variable Z VXZn follows a 2 distribution with n degrees of freedom T A 2 distribution actually looks quite similar to the standard normal distribution it s mean is zero it is unimodal bellshaped and symmetric One distinction is that the variability of the 2 distribution is slightly greater than the Z distribution As n gets very large however the 2 distribution converges to ie is nearly indistinguishable from a Z distribution F Stat 3000 Statistics for Scientists and Engineers Dr Corcoran Summer 2009 The F Distribution The third related distribution that we will use is the F distribution If U and Vare independent an and sz random variables respectively then the variable U n V m follows an F distribution with n and m degrees of freedom We denote this distribution by F mquot The F distribution has a range that is nonnegative and its distribution is positively skewed For example the pdf for the F 5710 distribution looks something like this II t h s 39 W Stat 3000 Statistics for Scientists and Engineers ta 1 i Dr Corcoran Summer 2009 Example IVF Note that we cannot tabulate the X2 2 and F distributions in the same way that we do for the Z distribution there are an in nite number of distributions in each of these families as many as there are values for the degrees of freedom Instead of areas under the curve then you are given tables in your textbook that contain quantiles from a given X2 2 or F distribution For example the X2 table in the back of your book looks something like What you see on the following slide Each row corresponds to a value for the degrees of freedom and each column corresponds to a right tail area Hence the upper 95 quazntile from the X62 distribution is 1635 We denote this by X00536 T A B L E 11 Critical Points Of th Chi Square Distribution 0995 099 0975 095 010 2706 x 4605 I 6 El AIes a 7 779 9236 11071 12833 50RE 16750 1 0675 0871 123 L635 2104 0645 12592 4449 6812 18548 x 09149 1239 2167 16 01 13475 1690 2333 12017 14067 20278 1344 1646 2180 2733 3490 13362 15507 17535 20090 21955 2 700 4 68 4684 16 919 23 589 2156 2558 3247 3940 4365 15987 13307 2045 23209 25188 2 603 3053 3 316 4575 5578 17275 19675 21920 24725 26757 3074 4 6304 13549 21026 3 565 4107 5009 5892 7042 19812 22362 24736 27688 29819 4075 4 7790 21064 23635 26119 2914 31319 4601 5229 6262 7261 8547 22307 24996 27433 30573 32301 5 142 5812 6903 7962 9312 23542 26296 23845 32000 34167 5 697 6403 7 564 s 672 10 08 769 27587 30191 33409 35718 6 265 7015 8 231 9390 10 365 25989 288 9 31526 3 305 37156 6 44 7633 8 907 10 117 11651 27204 30144 32 857 36 191 38 582 7434 3260 9591 10851 12443 28412 31410 34170 37566 39997 u t h s Stat 3000 Statistics for Scientists and Engineers m 1 Dr Corcoran Summer 2009 Example IVF2 cont d 2 2 Flnd the values Of 20025920 9 2010311 Flnd the values of 170055710 1701039 i 39 1 Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 In Review Let s take a breath and summarize some very important points We now have laid a foundation that allows us to describe and analyze data From this point forward we will focus on sampling data and making inferences about the underlying population based on that sample We denote our sample by X1 X2 X The underlying mean for this population is n and the variance is 02 In practice we are often interested in n and 02 although we don t know what they are That 3 why we re gathering the data We will therefore focus much attention on inferring something about population quantities such as n for example based on the sampled data u t h 5 Stat 3000 Statistics for Scientists and Engineers m l i Dr Corcoran Summer 2009 TAKE THESE FACTS WITH YOU l X is a random variable its distribution has a mean of u and a variance of 0272 2 If the underlying population is normally distributed then J is normally distributed 3 Even if the underlying population is not normally distributed the Central Limit Theorem tells us that for suf ciently large sample size n J will be approximately normally distributed u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 I Basic Probability An experiment is any exercise Whose outcome is unknown in advance A sample space S represents all possible outcomes for the experiment Example IA Experiment Flip a fair coin S HT u t h 5 Stat 3000 Statistics for Scientists and Engineers m l i Dr Corcoran Summer 2009 Example IB Experiment Select an individual from a population and observe gender and smoking status S female smoker female nonsmoker male smoker male nonsmoker Example IC We select an individual from this class and inquire about their birthday What is the sample space for this experiment u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 Probability is a measure of the likelihood that a particular outcome occurs Probabilities for the outcomes of any experiment have to follow certain rules that is if an experiment has the sample space S01020N Where the 0 represent the individual N possible outcomes and the probability of 0 is given by POZ pi then i 0319131 and N ii 2 pl PS 1 i1 u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example ID Experiment ip a fair coin with S HT Note that PH PT 12 Hence each probability lies between 0 and l inclusive and PS 1 Example IE Suppose we sample an individual from the US population and categorize them according to race so that S White Black Hispanic Asian Native American Other If PWhite 075 PBlack 012 PHispanic 006 PAsian 004 What are the possible values of PNative American If POther 002 then What is PNative American u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 Regarding Our Choice of Examples It is sometimes easiest to illustrate concepts of basic probability by using games of chance such as ipping coins rolling dice or drawing face cards These experiments are relatively well understood with probabilities that are easy to verify In addition the study of modern probability began as people became interested in better understanding the randomness in games of chance u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 How do we interpret probability For example suppose we assume that there are roughly an equal number of men and women studying at USU and that we conduct the following experiment go down to the Quad and observe the gender of the next individual that passes by What is the probability that person is a woman What does such a probability mean HINT the probability of an outcome is a re ection of the long term relative frequency of that outcome u h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Events An event is a subset of outcomes from a given experiment We usually denote an event by a capital letter such as A B or C for example An event A consists of all outcomes in an experiment s sample space S such thatA is observed to occur if any one of those outcomes is observed to occur The probability of an event A denoted by PA is the sum of the probabilities of all of the outcomes belonging to A u t h s Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IF We roll two fair sixsided dice The sample space is given by 11 12 13 16 21 22 23 26 S 31 32 33 36 61 62 63 66 LetA The sum of the two dice is prime What outcomes belong to A What is PA u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 Event Complements The complement of an event A is denoted by Ac or A It consists of all outcomes in S that do not belong to A Note PA PA 1 Example IG Based on Example IF What outcomes belong to A What is PA u t h 5 Stat 3000 Statistics for Scientists and Engineers m I l Dr Corcoran Summer 2009 Combinations of Events In research we are often interest in how two or more events relate to one another For example suppose that a naturalist wants to assess how a particular microbe is associated with something like chronic wasting disease in elk She might pose questions such as What is the chance that a sampled elk will have both the microbe and the disease What proportion of elk have the disease The microbe Either one or the other Given that an elk has the microbe what s the probability that it has the disease Do these proportions depend further on other factors such as the size or habitat of the elk All of these questions involve probabilities of combinations of two or more events u t h 5 Stat 3000 Statistics for Scientists and Engineers m I l Dr Corcoran Summer 2009 De nitions Given two events A and B in a sample space S we are interested in the following combinations 1 Event intersection A and B denoted byA m B with probability denoted by PA m B 2 Event union A andor B denoted by A U B with probability denoted by PA w B ta h e Stat 3000 Statistics for Scientists and Engineers Dr Corcoran Summer 2009 Probabilities of Event Combinations The probabilityPA MB is given by the sum of the probabilities of all outcomes belonging to both A and B Likewise the probability PA U B is given by the sum of the probabilities of all outcomes belonging to A andor B These combinations are sometimes best illustrated using a Venn diagram as shown below for example in this diagram we have shaded the subset of outcomes belonging to A n B u h Stat 3000 7 Statistics for Scientists and Engineers ta 1 i Dr Corcoran Summer 2009 Probabilities of Event Combinations The Venn diagram below makes it easy to see that PAUB PA PB PAnB S Use a Venn diagram to show for example that AnB39 A39UB39 i Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IH Continuing with Example IF let A The sum of the two dice is prime and B The sum of the two dice is odd What is A m B What is PA m B What is A U B What is PA U B u h 3 E Stat 3000 Statistics for Scientists and Engineers til i ta we Dr Corcoran Summer 2009 Three or More Events All of these concepts extend naturally to combinations of three or more events For example based on the Venn diagram below can you give a formula for P A U B U C that uses only probabilities of intersections 9 C u t h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 Conditional Probability Some of the most compelling questions in research follow this pattern If X then What of Y For example If a patient with cancer is given this drug What is his probability of recovery Given that a survey participant is female how will she likely respond Such questions can be assessed using conditional probability Given two events A and B the conditional probability of A given B is de ned as PA m B PA i B 33 II t h s 39 3 Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 Conditional Probability Note that AlB is not a set of outcomes in other words AlB is not a combination of events like an intersection or union Hence it only makes sense to discuss conditional probabilities not conditional events What we are asking is Does knowledge that B occurred give us additional information about the likelihood of A Example 11 Consider again Example IH Without doing any computation does knowing that the sum of two dice is odd change the likelihood that the sum is also prime i 39 VW Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 Exam 1e II cont d The somewhat obvious answer to the last question on the previous slide is yes that is all but one of the prime numbers between 2 and 12 inclusive are odd Knowing that the sum is odd means that there s a greater likelihood compared to having no such information that the sum is also prime This is veri ed by using the de nition of conditional probability already given PAmB1436Z PM I B PB 1836 939 Note for this example that PAB gt PA which supports our intuition u t h 5 Stat 3000 Statistics for Scientists and Engineers m l i Dr Corcoran Summer 2009 Conditional Probability and Event Intersections Using simple algebra you can see thatPA m B PAPB A This can be extended adinfmz39lum PA m B m C PAPB APC A m B etc Example IJ A deck of face cards has 4 suits hearts diamonds clubs and spades and 13 denominations per suit 2 3 10 J Q K A Hearts and diamonds are red suits and clubs and spades are black suits Hence there are 52 total cards 26 red and 26 black What is the probability that I draw three consecutive red cards without replacement from a shuf ed deck Note that there are two ways of approaching this solution we can either enumerate the entire sample space and count up the number of ways of drawing three reds or we can apply the relationship shown above u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Independence Two events A and B are said to by independent if PAB PA In other words knowledge about B does not change the likelihood of A In Example 11 A and B are therefore said to be dependent By virtue of the formula shown above if A and B are independent then PA m B PAPB Important mutually exclusive and independent are not synonymous On the contrary they are nearly antithetic Why Iltnh iie Stat 3000 Statistics for Scientists and Engineers Dr Corcoran Summer 2009 Independence In research settings independence is not something we can verify a priori using conditional probabilities although we can assess dependence within the gathered sample Rather independence is often conceptual For example do you believe that the following variables are independent Political party af liation of a husband and wife Political party af liation of nextdoor neighbors Political party af liation of randomly selected individuals from the state of Utah Political party af liation of someone living in Logan UT and another living in Denver CO u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IK A Wildlife researcher captures tags and releases 10 of 56 members of one species in region 1 and then captures tags and releases 28 of 75 members of the same species in region 2 When the scientist returns to sample om the two populations there is a 23 probability that she samples om region 1 and a 13 probability that she samples om region 2 What is the probability that she samples a tagged animal om region 1 What is the probability that she samples a tagged animal Given that she sampled an untagged animal What is the probability that it came from region 2 Are an animal s tag status and region independent u t h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Bayes Rule The solution to the third question in Example IK is an example of Bayes Rule PM I B H8 APA PB APA PB A39PA3939 This rule is useful in situations Where information about PBiA PBA and PA is known but PAB is desired This is especially true in screening applications i 39 1 Stat 3000 Statistics for Scientists and Engineers ta 1 i Dr Corcoran Summer 2009 Example IL An AIDS screen marketed for home use is advertised to be 997 accurate What does that mean It actually means the following if you have AIDS this test will yield a positive result 997 of the time This is called the sensitivity of the screen The speci city gives you the proportion of true negatives or the probability that the screen is negative if you don t have AIDS Why is sensitivity in and of itself not all that helpful to the average consumer What is the probability of interest for someone who orders this screen u h 5 Stat 3000 Statistics for Scientists and Engineers m i i Dr Corcoran Summer 2009 Example IL gcont d AIDS prevalence in Utah is about 9 cases per 10000 residents In the US the prevalence is about 25 per 10000 Given an AIDS screen that advertises 997 sensitivity and 991 speci city What is the probability that a randomly selected Utahn for Whom this screen is positive actually has AIDS What is this probability for a randomly selected individual from the US population This quantity is called the positive predictive value of a screen u t h 5 Stat 3000 Statistics for Scientists and Engineers m I l Dr Corcoran Summer 2009 Counting As we have seen in some of our examples computing a probability is often a matter of counting the number of ways an event can occur and then dividing by the total elements in the sample space For many applications counting is a matter of understanding the structure of the problem and then applying a few basic counting rules u h 5 Stat 3000 Statistics for Scientists and Engineers m I Dr Corcoran Summer 2009 Multiplication Rule Suppose that we have an experiment that takes place in K stages Where at the ith stage there are n possible outcomes for i 1 K Then there are nlxnzxman possible experimental outcomes Example IM For lunch you have 3 choices of entree 5 sides to choose from and 4 desserts How many possible meals lie before you sta13000 r stallsllus for Suenllsls and Englneers ta Dr Commaquot Summer 2009 Permutation s Suppose that we want to start afan club in this class for David Archuleta We need a president avice president and atreasurer How many di erent my can we select people from this class to ll these positions The answer is an application ofthe multiplication rule Whii 39 39 39 or doquot39 L39 C ntotal objects this is referred to as apermumu39on The number of such permutations is given by P nn71n72nik1 rt rtecw where rt is read as n factorial with M nnrlnr 22l By definition 1 01 i 39 1 Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 Combinations Suppose now that our David Archuleta fan club will be led by a committee of three coequal members How many such committees can be formed from this class One possible permutation is Bryce Shayla and Anissa However note that since the positions are coequal this is essentially the same committee as that yielded by the permutation Shayla Anissa and Bryce or any other combination of those three In other words to count the number ways of selecting k indistinguishable or unordered objects from 11 total objects referred to as combinations we need to divide out from the total number of permutations the number of ways that we can permute a single combination For example there are 32l 3 ways of ordering the trio of Bryce Shayla and Anissa or any other permutation so the number of ways we can choose k unordered objects from n is given by nPk n k k kn k39 This is alternatively denoted by i 39 VW Stat 3000 Statistics for Scientists and Engineers ta 1 Dr Corcoran Summer 2009 Example IN Here s an added bonus to help you with your next trip to Vegas Fivecard draw is a game in which each player is dealt 5 face cards from a shuf ed deck Hands with combinations of cards that are relatively more rare are relatively more valuable For example a pair refers to a hand that has two cards with the same denomination and three remaining cards that are random ie have no other interesting combinations A threeofakind has three cards of the same denomination A full house is a hand containing a pair and a threeofakind What is the probability of drawing a pair Threeofakind Full house

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.