Join StudySoup for FREE

Get Full Access to
UTD - STAT 2332 - Study Guide

Description

Reviews

Topics covered in Exam 3: CH 15: - Binomial Distribution - Geometric Distribution - Poisson Distribution - Exponential Distribution CH 16: - Law of Averages CH 17: - Expected Value and Standard Error CH 18: - Central Limit Theorem CH 19: - Sample Surveys CH 20: - Chance Errors CH 21, 23: - Confidence Interval for Population Percentage, Population MeanFinding the probability of “success” (what we’re looking for) Binomial coefficient: (see formula below; this formula is the same as the combination formula) Binomial Assumptions/Rules: 1. Each observation falls into 1 of 2 categories 2. Fixed n 3. Observations are independent Note: (n-k)! ≠ n! – k! “success” ???? p or “failure” ???? 1-p4. Probability of “success” p stays constant for each trial Specifically, k = # of successes n = # of observations AKA total # of trials p = probability of success for each trial 1−p = probability of failure for each trial The probability of k successes out of n trials is ��(��) =��! ��! (�� − ��)!����(�� − ��)��−�� Example of Binomial Probability: Hospital records show that of patients suffering from a certain disease, 75% die of it. What is the probability that of 6 randomly selected patients, 4 will recover? Solution: Success????the patient recovers Failure????the patient dies k = number who recover = 4 n = 6 p = 0.25 (1-p) = 0.75 (a) Probability that 4 will recover ��(�� = ��) =��! ��! (��)!(��. ����)��(��. ����)�� = ��. �������� (b) Probability that less than 4 will recover ��(�� < ��) = ��(�� = ��) + ��(�� = ��) + ��(�� = ��) + ��(�� = ��) (c) Probability that at most 4 will recover ��(�� ≤ ��) = ��(�� = ��) + ��(�� = ��) + ⋯ + ��(�� = ��) (d) Probability that more than 4 will recover ��(�� > ��) = ��(�� = ��) + ��(�� = ��) (e) Probability that at least 4 will recover ��(�� ≥ ��) = ��(�� = ��) + ��(�� = ��) + ��(�� = ��) (f) Probability that at least 1 will recover ��(�� ≥ ��) = ��(�� = ��) + ��(�� = ��) + ⋯ + ��(�� = ��) or use the “at least one” rule???? 1-P(none) �� − ��(�� = ��)binomial variable???? X = k = # of successes geometric variable???? X = m = # of trials until 1st success (success happens at m) Ex: Flip a coin until you get Tails, keep rolling a die until you get a 3 Venn diagram comparing Binomial vs. Geometric Distributions: Binomial Distribution ∙ ∙ Fixed n (total # of trials) ∙ k = # of successes (k starts at 0 k=0,1,2…) ∙ Each observation either success or failure ∙ Observations are independent ∙ Probability of success p stays constant for each trial ∙ Discrete (k = 1,2,3etc.) Geometric Distribution∙ No limit to # of trials ∙ m = # of trials (m starts at 1 m=1,2,3…) For Geometric Distribution: The probability that the first success occurs at mth trial is ��(�� = ��) = (�� − ��)��−�� �� Or P(m) but P(X=m) is more common∙ The probability that at least m trials are needed to get the first success is ��(�� ≥ ��) = (�� − ��)��−�� ∙ The probability that more than m trials are needed to get first success is ��(�� > ��) = (�� − ��)�� Lack of Memory Property: - Information from the past/what happened before doesn’t affect the probability; process resets itself even after consecutive failures Example Geometric Probability: A cereal manufacturer puts a special prize in 1/20 of the boxes. (a) What is the probability that you have to purchase 3 boxes to get a prize? ��(�� = ��) = (�� − ��. ����)��−�� ��. ���� (b) What is the probability that you have to purchase at least 3 boxes to get a prize? ��(�� ≥ ��) = (�� − ��. ����)��−�� (c) What is the probability that you have to purchase more than 3 boxes to get a prize? ��(�� > ��) = (�� − ��. ����)�� (d) What is the probability of getting a prize before purchasing 3 boxes? ��(�� < ��) = ��(�� = ��) + ��(�� = ��) or ��(�� ≤ ��) = ��(�� = ��) + ��(�� = ��) (e) Suppose you have already purchased 5 boxes and didn’t get a prize. What is the probability that you have to purchase at least 3 more boxes before getting a prize? This is a lack of memory problem (previous purchases don’t matter) ��(�� ≥ ��) = (�� − ��. ����)��−�� binomial???? # of successes geometric???? # of trials Now we have Poisson distributions Poisson???? # of rare events (represented by X) in time/space/place Ex: number of typing errors per page made by a typist, number of phones exploding in a first-world country ∙ A random variable X has a Poisson distribution if the probability that X = k events will occur is given by ��(�� = ��) =������−�� ��!, �� = ��, ��, ��, ������. Where P(X = k) = probability that X event will have exactly k instances λ = the average # of events per unit of time or area where different values of λ gives a different Poisson modelEx: The average number of lions seen on a 1-day safari is 5. (a) What is the probability that tourists will see four lions on the next 1-day safari? X = # of lions on the next 1-day safari k = 4 λ = 5 ��(�� = ��) =������−�� ��!= ��. ������ (b) What is the probability that tourists will see less than three lions on the next 1-day safari? ��(�� < ��) = ��(�� = ��) + ��(�� = ��) + ��(�� = ��) (c) Find the probability that tourists will see at most three lions on the next 1-day safari? ��(�� ≤ ��) = ��(�� = ��) + ��(�� = ��) + ��(�� = ��) + ��(�� = ��) (d) Find the probability that tourists will see at least 3 lions on the next 1-day safari? ��(�� ≥ ��) = �� − ��(�� ≤ ��) or ��(�� > ��) = �� − ��(�� ≤ ��) (e) Find the probability that tourists will see more than three lions. ��(�� > ��) = �� − ��(�� ≤ ��) 0, 1, 2Relation between Poisson and Binomial ∙ When p (success) is small (≤ 0.05) and n is large enough (≥ 20), the number of successes are rare and then Poisson (λ = np) ≈ Binomial(n, p). Poisson distribution ends up as a close approximation of the binomial distribution. Poisson equation (using λ = np for avg number) ≈ Binomial equation Average of Geometric Distribution: �� �� Average of Poisson Dist: λ Average of Binomial Dist: np Exponential distribution: - Unlike discrete distributions like binomial, geometric, and Poisson, exponential distributions are continuous. - Usually used when talking about time and lifetime Ex: time between marathon runners, lifetime of a carThe probability that an exponential variable X will exceed a given value "����" is ��(�� > ��0) = ��−��0/�� or ��(�� < ��0) = 1 − ��−��0/�� Where µ = average (expected) value (µ > 0) o Each choice of µ gives a different exponential model. *Note that there is no equal sign because ��(�� = ��0) = 0 Ex: Suppose that X = time it takes to buy tickets at a cinema. On average, it takes a 5-minute wait until you get your tickets. Assume that X follows an exponential distribution. - Find the probability that you have to wait for more than 6 minutes. ��(�� > 6) = ��−6/5 = 0.301 - Find the probability that you have to wait less than 7 minutes. ��(�� < 7) = 1 − ��−7/5 = 0.753 ∙ Exponential distributions also have a lack of memory or memory-free property just like geometric distributions. Ex: Lifetime of a functional car follows an exponential model. Given that the car lasted for 10 years, find the probability that the car’s remaining life is 4 years. ��(�� > 14|�� > 4) = ��(�� > 10)sampling/sample size: the number of observations in a sample. We take a sample from the population. Law of Averages: Averages and proportions vary less from the “expected” as sample size increases; the statistical tendency toward a fixed proportion in the results when an experiment is repeated a large number of times Ex: Toss a coin 100 times????percentage of heads (not the # of heads) gets closer to 50% # of heads = half the # of tosses + chance error ∙ chance error: likely to become larger as # of tosses increases, but likely to be small when compared to the total number of tosses. ∙ Increasing sample size reduces variability = smaller margin of error Histogram (on the right) depicts the Law of Average; as the sample increases, the proportions of each observation (# on die— 1,2,3,4,5,6) become approx. equal (less varied).We use sample statistics to estimate the population parameters. Use statistics to estimate the parameter statistics: describes the sample VS. parameters: describes the population �� = �������� ���� ������������ (sample average) Called “X-bar” �� = �������� ���� �������������������� (population average) we’ll use this more than the mean of the population Population Average: the EXPECTED VALUE of the Sample Average. In relating sampling and population (theoretically), ▪ Pop. Avg = Sample Avg ▪ Pop. SD = Sample SD ▪ The Expected Value of the Sample Average notation: ����(��) ▪ W/ Chance Error: (because �� will not be exactly = to �� = ����(��) + ��ℎ�������� ���������� Or “Population Average” Chance Error: measured by the Standard Error of the Sample Average ����(��) ����(��) =���� ���� �������������������� √������������ �������� ������(��) Also, as sample size ����(��) because larger samples????more precise estimate of true pop. avg The Central Limit Theorem describes the sampling distribution. Central Limit Theory: When sample size n is large (≥25), the sampling distribution will be approximately normal no matter what the original population distribution looks like. ∙ Sampling distribution is approx. normally distributed If n is large (≥25) ∙ avg of sample = avg of population ∙ SD (Standard Deviation) is given by ����(��). ∙ Approximately, A normal bell curve68% of sample averages will be within 1 ����(��) of the pop. avg just like 1 SD 95% = 2 ����(��) just like 2 SD 99.7% (nearly all sample averages) = 3 ����(��) just like 3 SD Quick Review: Population???? entire group Sample???? subset of the population, chosen by randomly selecting from population Parameter???? describes the population, pop. parameter is unknown value Statistic???? describes the sample, we can calculate from sample Use statistic to estimate the parameter, as long as sample represents the population. ∙ Planned introduction of chance: best method of choosing sample 1. Selection bias: poor sampling plan 2. Nonresponse bias: People can’t or won’t respond; low response rates Ex: people at work might miss a survey, people don’t answer the phone, etc. 3. Interviewer/Questionnaire Bias: leading questions and surveys commissioned by special interest groups 4. Response errors: People lie or give different answers to different interviewers 5. Chance (Sampling Errors): Errors caused by the fact that we are taking a sample. Control chance error by controlling sample size Ex: not looking at the entire population, undercoverage1. Convenience sample: made up of people who are easy to reach. Ex: Facebook polls 2. Quota sample: sample “hand-picked” to resemble population Sampling Techniques for Probability Samples 1. Simple Random Sample (SRS): a subset of individuals (a sample) chosen from a larger set (a population). Ex: Choose 25 names of employees out of a hat of 100 names. 2. Stratified Random Sample: the researcher divides the entire population into different subgroups or strata (e.g. males vs females) Steps to calculate Chances for Sample Averages: 1. Convert data value to standard units (z-score). �� =���������� − ����(��) ����(��)����(��) = �������������������� ������2. Draw a picture of the desired area under the normal curve. 3. Look up z-score in the Normal Prob. Table and find the area (%) SE(X) measures how close the sample AVG is likely to be to the true population AVG ● What’s the problem here? –SE(X) depends on population SD –we don’t know population SD ● How could you approximate the population SD? ̂ (called “hat”) means estimate or predicted ����̂(��) =���� (������������) √�� Confidence Interval (CI): a range of values (%) that catches the average/mean of the population Margin of ErrorMargin of Error ∙ Interpretation of Confidence Interval: __% of all samples will give an interval that captures the true mean concentration of the population; the true population avg. is within the Confidence Interval. ∙ The higher the %, the wider the CI. (99% has a wider range than 90%) ���������������� ���� �������������������� %Percents/proportions are a special case of averages where all numbers in the original pop. are either 0 or 1 ∙ For percents, when we use a box model to simulate drawing from a population, the box must be a 0-1 box. o pop. avg (the avg of the box) = pop. proportion p o sample avg (the average of draws) = sample proportion ��̂ o pop. SD is the SD of the box, which is given by �������������������� ���� = √�� × (�� − ��) ∙ The sample % is approx. normal if n is large enough (Central Limit Theorem) Finding Chances about sample %: �� =����������(���� ������������ %) − ����(������������ %) ����(������������ %) Where ����(������������ %) = �� × 100% ����(������������ %) = √�� × (1 − ��) ��× 100%For confidence intervals, use ��̂instead of p in SE formula: -end-

References: