ELEM STATISTICS [C3T1G1]
ELEM STATISTICS [C3T1G1] MATH 220
Popular in Course
Popular in Mathematics (M)
This 38 page Class Notes was uploaded by Eunice Schoen on Saturday September 26, 2015. The Class Notes belongs to MATH 220 at James Madison University taught by Staff in Fall. Since its upload, it has received 14 views. For similar materials see /class/214033/math-220-james-madison-university in Mathematics (M) at James Madison University.
Reviews for ELEM STATISTICS [C3T1G1]
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/26/15
Chapter 6 Probability Distributions 60 Probability Basics A Random PhenomenonActivity An activity with possible outcomes cannot predict with certainty which outcome will result 00 Sample Space The set of possible outcomes of a random activity 0 Probability of an Outcome in a Sample Space A measure of the likelihood 0 l or 0 100 that the outcome will result when the activity is performed Larger values mean greater likelihood smaller values mean smaller likelihood D Examples 0 7 Random Activity Toss a fair coin observe upface Sample Space HT 7 Assuming fair coin NH 05 HT 05 i If outcomes in sample space are equally likely the probabilities are identical equal to 1 di vided by number of outcomes 1 o 7 Random Activity Toss a fair die observe number of dots on upface Sample Space l23456 Assuming fair die PM PM PB PM H5 P6 16 0 7 Random Activity Drop a thumbtack ob serve whether point land up or down 7 Sample Space up down Probabilities Cannot get by equally likely ar gument must drop thumback large number of times Then Plup fraction of times point lands up Pdown fraction of times lands down Here probabilities are obtained by observa tion repeating random activity large number of times 0 7 Random Activity Select J MU student at ran dom and determine if student is vegetarian V or not NV 7 Sample Space VNV Outcomes VNV not equally likely PW fraction of vegetarians in population unknown PlNV fraction of non vegetarians in popu lation When selecting a simple random sample of size n 1 from a population then proba bilities of possible outcomes are fractions in population 3 61 How Can We Summarize Possible Outcomes and Their Probabilities A Random Variables o Page 246 A random variable is a numerical mea surement of the outcome of a random phenomenon Often the randomness results from the use of random sampling or a randomized experiment to gather data 0 Examples 7 Number of heads in two tosses of a coin Number of vegetarians in a random sample of 10 JMU students 7 Height of a random selected J MU student Notation Capital letters such as X or Y used to denote random variables Thus X number of heads in two tosses of a coin Lowercase letters used to denote particular values of X such as X 2 heads is one particular value of X Discrete versus Continuous Random Variables A discrete random variable has possible values that are separated such as 012 7 A continuous random variable has possible val ues that are in an interval rather than a set of separate values B Probability Distribution of a Disrete Random Vari able o A probability distribution for a discrete random variable is a table graph formula that assigns a probability Px to each possible value x Px is a value between 0 and l The sum of all Px for all possible values x is equal to l 0 Example C The Mean and Standard Deviation of a Discrete Probability Distribution 0 Page 250 The mean of a probability distribution for a discrete random variable is denoted by u and is u ExPx 61 where the sum is over all possible values of x 0 Examples The mean of the probability distribution is also called the expected value of X The mean u is the mean of a large number of observations of the random variable X thus it is referred to as a population mean The mean 7 would be the mean of a small number or sample of observations on the random variable X thus 7 is now called a sample mean 0 The standard deviation of a discrete random variable denoted by a is a m WM 62 0 Example 5 o The standard deviation 0 is the standard devi ation of a large number of observations of the random variable X thus it is referred to as a population standard deviation The standard deviation 3 w from Chapter 2 would be the standard deviation of a small number or sam ple of observations on the random variable X thus it is now called a sample standard devia tion D Probability Distributions for Continuous Random Vari ables o Page 253 A continuous random variable has possible values that form an interval lts prob ability distribution is speci ed by a curve that determines the probabilities that the random vari able falls in any particular interval of values 7 Each interval has probability between 0 and 1 This is the area under the curve above that interval 7 The interval containing all possible values has probability equal to 1 so the total area under the curve equals to l 0 Example 6 62 How Can We Find Probabilities for Bell Shaped Distributions A The Normal Probability Distribution 0 Symmetric bell shaped characterized by its mean u and standard deviation 0 0 Probability Within any particular number of stan dard deviations of u is the same for all normal distributions 068 Within 1 standard deviation 095 Within 2 standard deviations and 0997 Within 3 standard deviations 0 Standard Normal Probability Distribution nor mal probability of Z scores where z ac HMO B Finding probabilities about Z scores Table A page A1 A2 in the appendix C Finding probabilities about more general normal ran dom variables X 0 Examples 7 63 How Can We Find Probabilities When Each Observation Has Two Possile Outcomes A The Binomial Distribution Probabilities for Counts with Binary Data 0 Binary observation Observation that takes on one of two possible outcomes Random phenomenon observing 11 cases or trials of binary observations Summary number or proportion of n binary cases or trials with outcome of interest Under certain CONDITIONS the number with outcome of interest has the binomial distribution 0 Binomial Conditions 7 Random phenomenon has 11 trials each with two possible outcomes Outcome of interest is called a success and the other outcome is called a failure 7 Each trial has the same probability of success denoted by p i The probability of failure is denoted by l p i The n trials are independent That is the result for one trial does not depend on the results of other trials Binomial Random Variable X number of suc cesses in the 11 trials for binomial conditions 0 Example Binomial conditions Binomial Ran dom Variable 8 B Finding probabilities for Binomial Random Variable using Basics C Probabilities for a Binomial Distribution Formula OReVieW nllgtlt2gtlt3gtltgtltn 1306 lmy l p x 02 0 Example D Do the Binomial Conditions Apply 0 Scenario Random sample of n binary trials from a population of size N with proportion p of suc cesses in population 0 Independence of trials not satis ed 0 n lt 010N trials approximately independent Binomial distribution provides good approxima tion to correct distribution of X number of successes in 71 trials E Mean and Standard Deviation of the Binomial Dis tribution 0 u ZmPm np a my u2Px np1 p 0 Example F Approximation with normal distributions 0 When 71 large np 2 15 and npl p 2 15 9 o Binomial distribution has bell shape and bino mial probabilities can be approximated with area under normal curve with u np and a npl p 64 How Likely are the Possible Values ofa Statis tic The Sampling Distribution A RECALL o Statistic A numerical summary of a sample 0 Parameter A numerical summary of a popula tion B Sampling Distribution 0 The sampling distribution of a statistic is the probability distribution that speci es probabili ties for the possible values the statistic can take on 0 Example C Sampling Distribution of a Sample Proportion of Suc cesses Bin RV Xn 0 Example 0 Mean and Standard Deviation of Sample Propor tion of Successes Mean 19 7 Standard Deviation l W 0 Sampling distribution of sample proportion is ap proximately normally distributed if np 2 15 and M1 p 2 15 Example D Standard Error 0 The standard deViation of a sampling distribu tion is called a standard error 65 How Close are Sample Means to Population Means A The Sampling Distribution of the Sample Mean 0 Sample Mean 7 as a random variable 0 Sampling Distribution of the Sample Mean 7 Example when population sampled is small 11 B Mean and Standard Deviation of the Sampling Dis tribution of Y 0 Suppose random sampling from a population with mean u and and standard deviation 0 0 Mean of the sampling distribution of 7 is equal to population mean u 0 Standard Deviation of the sampling distribution of 7 is U assumes sample size is small rela tive to pop size C Getting Probabilities about 7 0 Normal Populations If the population is normally distributed then the sampling distribution of 7 is normal regard less of the size of n o Non normal populations Central Limit Theorem Even if the population is not normal then the sampling distribution of Y is approximately nor mal for large n Rule of Thumb n 2 30 Example 658 page 297 66 How Can We Make Inferences about a Pop ulation A Three Types of Distributions 0 Population Distribution see page 298 0 Data Distribution see page 298 0 Sampling Distribution see page 298 Chapter 1 The Art and Science of Learning from Data 11 How Can You Investigate Using Data A What is Statistics 0 Statistics plural numbers graphs etc that summarize some phenomenon Examples Batting average of a baseball player crime rate in a city average GPA of JMU rst year students Statistics singular page 5 in textbook The art and science of designing studies and analyz ing the data that those studies produce lts ulti mate goal is translating data into knowledge and understanding of the world around us In short statistics is the art and science of learning from data 0 Studies Surveys Experiments 7 Example Survey of 500 J MU undergraduates to obtain information on amount of money 1 spent on textbooks this semester Goal Es timate average amount of money spent by all J MU undergrads in fall 07 semester 7 Example of Experiment 0 Data Information gathered from studies Textbook spending survey data contain the amounts of money spent by all 500 students B Three Main Aspects of Statistcs areas of study in this course a F7 0 Design refers to planning how to obtain data information such as from an experiment or a survey Planning issues include such things as how many peopleobjects to survey or experiment type of random sampling or randomization etc Description exploring and summarizing pat terns in the data through numerical and graphi cal summaries Inference making decisions or predictions based on the data usually decisions or predictions refer to a larger group not just those in the study Probability another area of study in this course used to measure reliability of inferences 12 We Learn about Populations and Samples A Terminology 3 Subjects entities we measure in a study don t have to be people Usually interest in measurements for some larger group of subjects called a population Then the subjects actually measured are called a a sample of the population Example JMU textbook study Subjects are the 500 students from whom we obtain the amounts Population is the group of all J MU undergradu ates Descriptive Statistics refers to methods for summarizing the data The summaries usually consists of graphs and numbers such as averages and percentages Example J MU Textbook Survey Average amount obtain from the 500 students Inferential Statistics refers to methods of mak ing decisions or predictions about a population based on data obtained from a sample of that population Example Margin of error con dence interval Chapter 7 Parameter Numerical summary of a population Statistic Numerical summary of a sample taken from population Random Sampling Sampling from a popula tion in a random fashion crucial for valid statis tical inferences about populations Note In experiments subjects are usually not se Chapter 2 Exploring Data with Graphs and Numerical Summaries 21 What are the Types of Data A Variable Any characteristic that is recorded for sub jects in a study 0 Examples hair color of a person height of a per son crime rate of a city B Categorical versus Quantitative Variables o A variable is categorical if each observation be longs to one of a set of categories 0 A variable is quantitative if observations take on numerical values that represent different magni tudes of the variable C Discrete versus Continuous Quantitative Vari ables o A discrete quantitative variable is one whose pos sible values form a set of separate numbers such as 0123 o A continuous quantitative variable is one whose possible values form an in nite continuum in some interval D Frequency Table A listing of possible values for a variable together with the number of observations of each value siblings Cumulative Frequency Percent Valid Percent Percent Valid 0 2 63 67 1 10 313 333 400 2 B 250 267 667 3 9 281 300 967 5 1 31 33 1000 otal 30 938 1000 Missing System 2 63 Total 32 1000 22 How Can We Describe Data Using Graphi cal Summaries A Graphs for Categorical Variables o Pie Chart 0 Bar Graph Bl um mum Count B Graphs for Quantitative Variables o Dot Plot 0 Stem and Leaf F5 stSped StemiandiTieaF P1 oi Frequency Stem amp 1951 100 7 3 800 8 00005535 900 9 000000955 700 10 00 700 11 0000043 3 00 17 1 100 13 0 100 3914 0 Siem width 10 Each 1eaF 1 35595 0 Histogram Count I l I I I I 0 500 1000 1500 2000 2500 3000 3500 4000 numcals C Which graph for quantitative variable should be used D Describing the Distribution of a Quantitative Vari able 0 Graph of a quantitative variable shows the dis tribution of the variable 0 Overall Pattern of Distribution values cluster together or are there outliers Count IIIIIIIIIIIIIIIIIIIIIII 0 300 000 900 12001500 1800 2100 2400 2100 3000 100 400 100 1000130016001000 2200 2500 200 500 000 1100 1400 1700 2000 2300 homemiles 0 Distribution has single mound unimodal or two mounds bimodal 0 Shape of Distribution Symmetric Skewed to the Left Skewed to the Right Frequency man 1521 Sid Dev 1D731 N 81 50 23 How Can We Describe the Center of Quan titative Data A Two Kinds of Quantitative Summaries 0 Center refers to a representative value from the data set 0 Spread refers the degree of variation in the data set B Measures of Center 0 Mean Mean is the sum of the values in the data set divided by the total number of values Formula If we let x refer to any arbitrary value in the data set and n represents the number of values in the data set then the mean denoted by 1s Bl Tamp TL 7 Example Numer of Calories Consume E Ex 7 54700 7 E 4 W 4 18862 Example Miles Hometown from Harrisonburg m 3444 Basic Properties of Mean 7 Mean is balance point of data 7 Mean pulled in direction of skew for skewed distributions 7 Mean highly in uenced by outliers Outlier page 50 A value that falls well above or below the bulk of the data 0 Median Let n be the number of values in the data set Put the data set in increasing order The me dian is the middle value if n is odd and the mean of the two middle values if n is even Example Number of Calories Consume n 29 Median 18000 Example Miles Hometown from Harrisonburg n 28 Median 1535 Median is resistant to outliers 0 Mode That value that occurs most often Most often used with categorical data For quantitative variables most useful with dis crete values taking on only a few values Example Number of Siblings Mode l sibling C Comparing the Mean and Median 0 Mean and Median about the same for roughly symmetric distributions 0 Mean larger than median for right skewed distri butions 0 Mean smaller than median for left skewed distri butions 24 How Can We Describe the Spread of Quan titative Data A Measuring the Spread Range of the Data B Measuring the Spread Standard Deviation of the Data 0 Standard Deviation gives typical distance of an observation from the mean 0 Based on deviations x T from the mean 0 Example Suppose that the total number of runs scored by both baseball teams in n 5 games are l8632 Mean T 205 4 Value Deviation Squared Deviation 1 1 4 3 9 84 4 16 6 64 4 2 4 3 3 4 4 1 1 2 2 4 7 2 4 The sum of squared deviations is 2x E291644134 The standard deviation is 2x wf 34 s HEMM This means that for these games the numbers of runs scored deviate from the mean of 4 runs typically by about 3 runs Properties of Standard Deviation The larger the spread of the data the larger is s i 8 2 0 With 3 0 if all data values are the same i s can be in uenced by outliers and thus is not a resistant measure 0 Estimation of Domangue s Height Hlstogram 10 8 E S I 3 U39 2 LL 4 z V66quot 6993 Std Dev2559 v I I I I I I I I N29 62 64 66 65 70 72 74 76 Insth C Empirical Rule Applying the Standard Deviation 0 Empirical Rule For bell shaped distributions then approximately 7 68 of the observations fall within 1 standard deviations of the mean that is between E and E s i 95 of the observations fall within 2 standard deviations of the mean that is between E 23 and E 23 i All or almost all of the observations fall within 3 standard deviations of the mean that is between E 33 and E 33 0 Example Dad s Ages Data 41424546464647474849495050515151 5252525252535357585858596062 Frequency 2 H u I I I I I I Mam 5123 4o 45 so 55 so 65 s quot D 539 dadsAge NGO o Caution pge 63 Empirical Rule may not work well if the distribution is highly skewed or highly discrete with the variable taking relatively few values 25 How Can Measures of Position Describe Spread A Measures of Position for Describing Spread Mea sures spread by giving the relative position of an ob servation in the data set w Two types of measures of position 0 Peroentiles 0 Z scores 0 Peroentiles as Measures of Position o The p h percentile of a data set is a value such that p percent of the observations fall below or at that value 0 Example Suppose that your height is at the 85th percentile in the distribution of heights of students at J MU 0 Median is the 50th percentile o Quartiles Three useful percentiles Usually used as measures of spread when median used as mea sure of center 7 1st quartile Q1 25th percentile 2nd quartile Q2 50th percentile median 3rd quartile Q3 75th percentile 0 Finding Quartiles page 65 Example Class Data Number of Miles Home town from Harrisonburg 11606063105108110120120120120125150150157180180 20020523023024028032040060020003000 n 28 Q2 median 1501572 1535 Q1 115 Q3 235 o lnterquartile Range lQR Q3 Q1 as measure of spread Miles Data lQR 235 115 120 miles 0 Using quartiles for identifying potential outliers 7 Low Side 06 lt Q1 15IQR 7 High Side 06 gt Q3 151QR Miles Data Potential Outliers Low Side 06 lt 65 High Side 06 gt 415 600 2000 3000 potential outliers 0 Five Number Summary Min Q1 Median Q3 Max Min Q1 Q3 Max describe spread Median describes center Hometown from Harrisonburg Data 11 115 1535 235 3000 o Boxplot Graphical depiction of Five Number Summary 3000 2500 2000 1500 1000 I homen es 60 55 50 45 40 I dadsAge 0 Side by Side Boxplots fastSped 140 17 120 O 10 032 O 100 60 l I fen39ale nme gender D Z Scores as Measures of Spread 0 Z score page 71 The Zscore for an observation is the number of standard deviations that it falls from the mean For sample data the Z score is calculated as observation mean standard deviation 0 Example Dad s Ages Data 41424546464647474849495050515151 5252525252535357585858596062 T 512 years 3 53 years Age 58 Z 7 Age 47 Z 7 0 Mean and Standard Deviation of Z Distribution Mean of Z scores equals 0 Standard Deviation of Z scores 1 0 Empirical Rule for Z scores If 06 distribution is bell shaped then 2 distribution is also bell shaped and 7 About 68 of Z scores are between 1 and 1 7 About 95 of Z scores are between 2 and 2 7 About 100 of Z scores are between 3 and 3 0 Example Source Peck Olsen Devore page 186 Suppose that your statistics professor returned your rst midterm exam with only a Z score writ ten on it She also told you that a histogram of the scores symmetric bell shaped How would you interpret each of the following Z scores a 22 b 04 c l8 d 10 e 0 SPSS for Chapter 2 0 Frequency Table for Categorical Variable or Quantitative Discrete Variable Bar Chart or Pie Chart for Categorical Data Analyze gt Descriptive Statistics gt Frequencies Move your variable into VARIABLES box Click on CHARTS and BARCHART OR PIE CHART fol lowed by CONTINUE if you want a bar chart or pie chart Click on OK 0 Stem and Leaf Histogram Boxplot or Nu merical Summaries of Quantitative Variable Analyze gt Descriptive Statistics gt Explore Move variable into DEPENDENT LIST box Click on STATISTICS PERCENTILES AND CONTINUE if you want percentiles Click on PLOTS and HIS TOGRAM if want a histogram Click on OK 0 Histogram for Quantitative Data another way Graph gt Legacy Dialogs gt Histogram Move your variable into VARIABLE box Click on OK Bar Chart for Categorical Data another way Graph gt Legacy Dialogs gt Bar Click on SIMPLE and DEFINE Move your variable into CATEGORICAl AXIX box Click on OK 0 Dot Plot for Quantitative Data Graph gt Legacy Dialogs gt ScatterDot Chapter 7 Statistical Inference Con dence Intervals 71 What are Point and Interval Estimates of Population Parameters A Point Estimate of a Population Parameter 0 De nition A single number that is our best guess for the unknown population parameter 0 Example When sampling from a categorical SF population the sample proportion of S s is a point estimate of the population proportion p 0 Example When sampling from a quantitative population the sample mean is a point estimate of a population mean 0 Properties of Good Point Estimators The estimator is unbiased the mean of the sampling distribution is equal to the parame ter being estimated 7 The estimator has a small standard error compared to other estimators 1 2 B Con dence Interval Estimate of a Population Pa 72 rameter 0 De nition A interval of numbers Within which the parameter is believed to fall with a certain amount of con dence 0 Margin of Error multiple of standard error of point estimate used to indicate how accurate point estimate is likely be in estimating parame ter 0 Left Endpoint Point estimate margin of error Right Endpoint Point estimate margin of error 0 Con dence Level of lnterval Estimate proba bility that method produces an interval that con tains parameter Usually chosen close to 1 such as 095 or 090 How Can We Construct a Con dence Inter val to Estimate a Population Proportion A Finding the 95 Con dence Interval for a Popula tion Proportion 0 Population proportion is symbolized by p 0 Sample proportion is symbolized by 13 13 is point estimate of p 3 o A large sample 95 con dence interval for the population proportion is 13 i 19636 where is the estimated standard deviation of the sam pling distribution of 13 19636 is the 95 margin of error for 13 or the con dence interval B Sample Size Needed for Validity of Con dence lnter val o 7113 is number of successes in sample 0 nl 13 is number of failures in sample 0 Need BOTH 7113 2 15 AND M1 13 2 15 for con dence interval to be valid C Con dence Levels Other than 95 o 99 con dence level use 13 l 25836 099 is middle area under standard normal curve between 258 and 258 25836 is the 99 margin of error for 13 or the con dence interval o 90 con dence level use 13 l 164536 090 is middle area under standard normal curve between 1645 and 1645 164536 is the 90 margin of error for 13 or the con dence interval D Effects of Con dence Level and Sample Size on Mar gin of Error 0 Margin of Error for a con dence interval increase as the con dence level increases 0 Margin of Error for a con dence interval decreases as the sample size increases E Interpretation of the Con dence Level 0 The con dence level such as 95 or 99 gives the success rate of the con dence interval for mula That is if the formula is used many times for many different samples then in the long run 95 or 99 of the different intervals would be successful in containing the population propor tion p 73 How Can We Construct a Con dence Inter val to Estimate a Population Mean 1 The t distribution 0 Suppose we are random sampling from a normal population with mean u and standard deviation 7 0 Let n be the sample size 0 Let E be the sample mean 0 Let s be the sample standard deviation 37 has a probability distribution called o The ratio 8 the t distribution 0 Properties of t distributions page 336 2 Finding percentiles from a t distribution Table B 3 Example A study of the ability of individuals to walk in a straigth line An article reported the fol lowing data on cadence strides per second for a sample of n 20 randomly selected men 095 085 092 095 093 086 100 092 085 081 078 093 093 105 093 106 106 096 081 096 Calculate a 99 con dence interval for the popula tion mean cadence From SPSS E 0926 s 00809 Stem and Leaf display of data shows approximate bell shape From Table B 0995 percentile from t distribution with df 20 1 19 is 2861 Endpoints of con dence interval T 1 2861 870 After calcuation endpoints are 0926 1 0052 99 con dence interval for population mean cadence is 08740978