Popular in Course
Popular in Mathematics (M)
This 21 page Class Notes was uploaded by Aileen Davis on Tuesday October 20, 2015. The Class Notes belongs to MATH0013 at Sierra College taught by Staff in Fall. Since its upload, it has received 14 views. For similar materials see /class/225381/math0013-sierra-college in Mathematics (M) at Sierra College.
Reviews for ElementaryStatistics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/20/15
Abe Mirza Topics Review Part I To understand God s thoughts we must study statistics for these are the measure of His purpose Florence Nightingale Topics Page General Introduction 2 Qualitative Data 4 Descriptive Statistics 6 Grouped Data Freq Table 10 Histogram 1 1 Correlation and Regression 13 Steps to do a Correlation and Regression problem 14 Different shapes of a Scattered Diagram 15 Basic Probability Tree Diagram 17 Multiplication Rule 18 Topics Review Part 1 1302009 Statistics General Introduction The Purpose of statistics Statistics has many uses but perhaps its most important purpose is to help us make decisions about issues that involve uncertainty De nition of Statistics 1 Numerical Facts 1 Average price for one bedroom apartment at the city of Rocklin is 895 2 80 of Sierra students graduate in 2 years 2 C 0 D A Collection Organization Description Analysis and interpretation of data Collection Data Sampling Organization Frequency Table Barchart Piechart Histogram Frequency Polygon Ogive Curve Description Mean Mode Median Range Variance Standard Deviation SD Quartiles Percentiles Box Plot Analysis Correlation and Regression Estimation Test of Hypothesis Analysis of Variance Type of Statistics Descriptive Collection Organization Description Inferential Analysis and interpretation of data What is the statistics all about It is about how we test if a new drug is effective in treating cancer It is about opinion polls preelection polls and eXit polls It is about sports where we rank players and teams primarily through their statistics It is about the market research and the effectiveness of advertising It is about how agricultural inspectors ensure the safety of the food supply MszNZN Population versus Sample Population Entire elements or subjects under study that share one or more common characteristic such as age gender major or race Keyword all All college students All Sierra College students All male Sierra College Students who are taking statistics and majoring in business Two Elements Time and Place Salnple A portion of population Census The collection of data from every element in a population Topics Review Part 1 1302009 Parameter VS statistic A numerical measurement describing some characteristic of a Population vs a Sample Greek Alphabet vs lower case English Lt avg 039sigma st dev x2 Chisquared X s r HW Answer questions Afrompuge 2 ofpractice problem part 1 Type of Data Qualitative Names Labels pass fail democratrepublicanindependent yesno grades ABCDF Quantitative 1 Discrete Countable number of accidents in Rocklin each day number of emergency call to 911 center each day number of students that will pass Abe stat class 2 Continuous Measurable Speed weight time capacity length volume area HW Answer questions Bfrompuge 2 ofpractice problem part 1 Types of Sampling RSSCC 1 Random Every member of population has equal chance to be selected How Every member will be assigned a different number and we select random numbers by a computer or a table and match those with the members numbers 2 Systematic We select some starting point and then select every kth such as every 20m member in the population How Every 101 customer or client will be selected to be asked questions 3 Stratified Subdivide the population into at least two different subgroups strata sharing the same characteristics such as gender or age bracket then we draw a sample from each stratum How adivide the police officers in Sacramento into male and female group bselect a random sample of each and collect data regarding the years in service 4 Cluster Divide the population into sections or clusters and then randomly select some of those clusters then choose all the members from those selected clusters How To see the customer feedback to a new menu a divide Sacramento in different zones b randomly select some ofthose zones c collect data from all fastfood branches in those selected zones 5Convinence Use the results are readily available How A math instructor ask some of his students if they use student solution manual to do their homework H W Answer questions C from page 2 of practice problem part 1 Topics Review Part 1 1302009 3 Qualitative Data Example 1 Grade f Students Rel freq Angles LX100 360 Rel freq n A 6 650gtlt10012 12gtlt360432 B 10 lO50gtlt10020 20gtlt36072 C 16 1650gtlt10032 32gtlt3601152 D 14 l450gtlt10028 28x3601008 F 4 450gtlt1008 8gtlt360288 nZf50 100 360 Bar Chart Pie Chart A 12 g 28 20 a C Practice 1 Grade f Students Rel freq Angles LX 100 360 Rel freq n A 22 B 26 C 20 D 8 F 4 nZf 100 360 Complete the table and draw the bar chart and the pie chart Topics Review Part 1 1302009 Descriptive Statistics A Measure of Central Tendency Mean Median Mode Meanu f xdata Z Sum Nor 1 1 Number of data points x Data 5 6 3 9 7 x w6 71 Median The middle data point in a ranked largest to smallest or smallest to largest data or The median cuts the ranked data in half one half below it and one half above it Examplel Suppose the median score for the rst test was 73 it simply means halfthe class got below 73 and the other half above it How to find it 25 1116 893 7 5 Ranked 2 3 5 5 7 8 9 ll 16 Median7 57 2 3 5 5 7 8 9 11 164 Ranked 2 3 4 5 5 7 8 9 ll 16 Medlan 76 Hint If there are extreme values in data set too large or too low with respect of the rest of data then median is a better than mean to identify the measure of central tendency Mode The data values with the highest occurrence bimodal multimodal 2 8 ll 7 8 l3 Mode8 3 12 5 l4 9 12 7 l6 7 Bimodal 7 12 11 15 7 2 6 16 15 3 2 11 19 5 4 Multilnodal 2 ll 15 H W Answer questions on columns AG from page of practice problem part 1 All the answers are on 111 8 B Measure of Variation Range Standard Deviation Variance Range It shows how far apart the data points are Range the highest value the smallest value Standard Deviation U s It measures the average dispersion of data around the mean Example Consider the 3 random delivery time in days taken by 2 different companies A and B B Mean 5 5 Median 5 5 Mode 5 none At rst it seems there are not that much of difference between the delivery times of these two companies but now let s look at their actual data Topics Review Part 1 1302009 A B A Dot Plot B Delivery time 5 5 X Delivery time 5 0 X Delivery time 5 10 7 7X X X X 0 5 10 0 5 10 Now it seems that there is no dispersion for company A but an average dispersion of 5 for company B The formula to compute the Standard Deviation or average dispersion Company A Company B i x x 0 xixf Zx fz o Zx Tcz 50 E2525 Find the mean and standard deviation for 5 6 3 9 10 3 and also draw the dot plot x f x xg2 zx 2 s J 297 Variances288 Variance 0392 s2 Variance is the square of standard deviation Range 4 Generally the larger the data set the closer the estimate will be to the eXact value H W Answer questions on columns AG from page of practice problem part 1 All the answers are on 111 8 Rule of thumb to estimate 5 S Topics Review Part 1 1302009 6 C Measure of Positions Quartiles Percentile BoxPlot Z score Quartiles Quartiles breaks the ranked data in 3 quartiles Q1 Q2 Q3 Data 25 Q1 25 Q2 25 Q3 25 How to nd quartiles 1 Rank the data 2 Find Q2 Median 3 Find the new medians Q1 Q3 on either side of Q2 Example 1 Data 2 5 11 16 8 9 3 7 5 4 13 Odd number ofdata Ranked Data 2 3 4 5 5 7 8 9 11 13 16 Q1 Q2 Q3 Example 2 Data 2 3 5 5 7 8 9 11 164 Even number ofdata 57 Ranked Data 2 3 4 5 5 7 8 9 11 16 Q2Median 76 Q1 Q26 Q3 Box Plot is used to show how the data are distributed by showing center spread and skewness Center is the Q2 Spread is how wide the box is Skewness explains the distribution of the data To construct a boxplot 1 Find the Min Q1 Q2 Q3 Max ofthe data 2 Plot these points on a scaled number line 3 Construct a box by using Q1 Q2 Q3 There are many possibilities of where the box in boxplot may be located If the box in boxplot is located to the far Left it suggests that distribution of data are skewed to the Right Skewed to the Right Min Q1 Q2 Q3 Max If the box in boxplot is located to the Center it suggests that distribution of data are Centered Centered Min Q1 Q2 Q3 Max If the box in boxplot is located to the far Right it suggests that distribution of data are skewed to the Left Skewed to the Left Min Q1 Q2 Q3 Max H W Answer questions on columns AG from page of practice problem part 1 All the answers are on 111 8 Topics Review Part 1 1302009 Z score is used to show the relative position of a data points with respect of the rest of data by measuring how many standard deviation the point is away from the mean To apply the Zscore the boxplot or histogram must be centered or Z The possible range of Zvalues 2 0 2 Unusual Values Z lt 72 Ordinary Values 72 Z 2 Unusual Values Z gt 2 Example 1 Find the Z score of nal exam for Tommy Yank in stat class at CSUS ifhis score was 87 when the class average was 72 and the standard deviation was 8 7 87 7 72 15 Z x T E 1875 Ordinary Or Unusual Value 039 So he does relatively an ordinary performance relative to the rest of his class Example 2 Find the Zscore of nal exam for Marcy Tank in stat class at UC Davis if his score was 82 when the class average was 71 and the standard deviation was 4 x417 82771711 Z Y 275 Ordinary Or Unusual Value 0 So she does relatively better than the rest of her class H W Answer questions D from page 3 of practice problem part 1 Empirical Rules If and only if the box plot is centered then we can apply the three following empirical rules 997 it 3 S 997 of data are within 3 S ofthe mean Tc 95 Tci 2 S 95 of data are within 2 S ofthe mean f 68 Tc i S 68 of data are within lS ofthe mean Tc Example Find all three empirical rules for Abe Stat class if the average was 72 and the standard deviation was 8 assuming that Boxplot was centered 997 72r 38 72r 24 48 lt 997 0o of class got scores lt 96 95 72r 28 72r 16 56 lt 95 0o of class got scores lt 88 68 72r 18 72r 8 64 lt 68 0o of class got scores lt 80 H W Answer questions C from page of practice problem part 1 All the answers are on p18 Topics Review Part 1 1302009 8 Grouped Data Freq Table X axis Frequency Histogram Mean St Polygon YaXis Dev Quiz Score Freqf In Rel freq oo fgtlt m fX m2 Students midpomt ixloo n 0 7 4 6 2 12 12 24 4 8 10 6 20 60 360 8 7 12 16 10 32 160 1600 12 7 16 14 14 28 196 2744 16 7 20 4 18 8 72 1296 Z Zfznzso fom500 ny6024 X m 500 5 Mean XL 10 n 50 6 Standard deviation s 506024 7 5002 7 anfxm27fom2 V M 457 2450 rtn71 505071 7 Variance S2 4572 209 Practice X axis Frequency Histogram Mean St Polygon YaXis Dev Quiz Score Freqf In Rel freq oo fgtlt m fX m2 0 7 10 8 20 10 20 12 180 20 7 30 14 25 30 7 40 6 7350 Zfn 1350 2 yaddt fom780 fom219000 X m 5 Mean X Zf 195 n 6 Standard deviation S V 7 Variance S2 982 9718 nn 1 V quotZfXMZfomf I VI 986 H W Answer questions A B C D from pages 45 of practice problem part 1 All the answers are on 111013 Topics Review Part 1 1 30 2009 Histouram 32 0 Relative 4U Freq 12 16 20 Quiz Scores 8 Histogram looks close to a Centered or bellshaped distribution Different possible shapes of Histogram UnllnrmlyDIsIthuled a imndzl mm ranks rrreuulzr sharia nn rlzrllculzr Milerquot rmy nr my hat have guns Skewedlnlhe mum 10 1302009 Topics Review Part 1 Regression Correlation and Regression is the study of the relationship between two variables x y with the following objectives 1 To nd the nature of the relationship Linear or nonlinear positive or negative relationship by doing a scattered diagram y versus x 2 To measure the strength of this relationship by computing the correlation coef cient r 3 Finding slope and y intercept for equation of the best tted line regression equation y ax 3 between x y variables 4 Using the regression equation to estimate or predict one variable from the other Nature of relationship Positive Both variables either increasing or decreasing x ll y or x N y Negative When one variable increases the other one decreases or vice versa x N y or x M y Determine the nature of relationship between X and y variables x y Nature test score for stat class rate parents sons or a semester test scores temperature summer amount spent on Topics Review Part 1 1302009 11 Why do we need to do scattered diagram a To see if data exhibit a linear pattern or not b To see if linear pattern is positive or negative 0 To see how closely strongly the data are clustered around the mean 1 To detect any outlier a point that is lying far away from the other data points Different Possible shapes of a Scattered Diagram r 1 Perfect Positive Linear Correlation Midterm vs Final 100 120 140 160 180 Positive Linear Correlation Tests cures vs n nursS tudy r 71 Perfect Negative Linear Correlation Absent vs Score Strong Negative Linear Correlation Absent vs Score Xaxis No Correlation Topics Review Part 1 1302009 Non linear relationship Steps to do 3 Correlation and Regression problem 1 Constructing a Scattered Diagram and comment on its nature linear or nonlinear positive or negative strong or weak relationship 2 Computing 1 Correlation Coef cient and comment on its strength 1 S r S l mm sundy 1 S r S 1 1 1 0 Perfect Negative Perfect Positive Linear Correlation No Correlation Linear Correlation 3 Computing Eisx Sy 4 Computing Slope a and yintercepts b for the regression equation y a x b 5 Using the regression equation y a x b to estimate or predict one variable from the other Estimated values are labeled as y39 y prime and x X prime Guideline for using the regression line 1 If there is no signi cant linear correlation do not use the regression equation 2 When using the regression equation for prediction stay within the range of the available sample data 3 A Regression equation based on old data is not necessarily valid now Marginal Change Slope in a variable is the amount that it changes in yvariable when the Xvariable increases by one unit Outlier is a point that is lying far away from the other data points 39 d t39 Coef cient of determination r2 x w is the amount of var1atlon 1n y that is explamed by total variation the regression line Topics Review Part 1 1302009 13 x Hours Studyweek y Test Score y Z x2 358 Z 2 27792 9W 3076 1 Use the data and plot the data as a scattered diagram and comment on the pattern of the points Strong Positive Linear Correlation 2 Compute the correlation coef cient and comment on that a very strong positive linear correlation r quot2372ny 7 43076736332 712304711952 7 352 lnzxi 4fo any1 7Zy1 4358736142779273321 JIME 358307 09824 3 Compute the slope and yintercept and write the equation of regression line n2xyi2xz y 43076736332712304711952 7 352 Slope a 7 2588 259 n2x272x2 43587362 143271296 7136 bZyzx2 zxzxy33235873630761188567110736 8120 yitc 2 2 5971 nZx2iZx 4358736 143271296 136 y axb 259x59 71 4 Explain the slope based on the regression equation and the in relation of X and y variables In general for every additional hour of study per week the score goes up by 259 points 5 Compute average and standard deviation for both X and y variables Y3649 hrs Sx337 f332483 Sy887 6 If one student studies 10 hours a week use Reg Equ to estimate her test score x 10 hrs y39 8561 x 10 hrs y39 8561 7 If one student has test score of 90 use Reg Equ to estimate number of hours he spends studying per week and if y 90 x39 1169hrs 8 Compute the coef cient of determination r2 x 100 and comment on that r2 x 100 98242 x 100 965 965 of variations in test score are explained by regression equation Topics Review Part 1 1302009 14 Best Fit Line Regression Line We start with an attempt to construct a linear demand function Suppose that your market research of real estate investments reveals the following sales gures for new homes of different prices over the pastyear39 I Price Thousands of I160 I180 I200 I220 I240 I260 I280 We would like to use data to construct a demand function for the real estate market Recall that a demand ISalesofNewHomesThisYear I 126 I 103 I 82 I 75 I 82 I 40 I 20 these 1m 2m 2m 4 M 2m hmsunnn function gives demand y measured here by annual sales as a function of unit price X Here is a plot ofy versus X The data de nitely suggest a straight line moreorless and hence a linear relationship between p and q Here are several possible quotstraight line ts Q Which line best ts the data A We would like the sales predicted by the best t line predicted values to be as close to the actual sales observed valuesas possible The differences between the predicted values and the observed values appear as the vertical distances shown in the gure below m 3929 1 quot quot foam i a M r H 5 m quotquot o a Ingmar aw r r r l m r r m 1m 24 23gt 2 2w 2 Flaw Q Since we want the vertical distances to be as small as possible why can39t we set them all to zero and solve for the slope and intercept of the straight line A If this were possible then there would be a straight line that passes through all the data points A look at the graph shows that this is not the case Q Then why not nd the line that minimizes all the vertical distances A This is not possible either The line that minimizes the rst two distances is the line that passes through the rst two data points since it makes the distances 0 But this line certainly does not the distance to the third point In other words there is a tradeoff making some distances smaller makes others larger Q So what do we do A Since we cannot minimize all of the distances we minimize some reasonable combination of them Now one reasonable combination of the distances would be their sum but that turns out the be difficult to work with because distances are measured in terms of absolute values Instead we use the sum of the squares of the distances no absolute values required The line that minimizes this sum is called the best t line regression line or least squares line associated with the given data Principles ofCausation Types of association An association may be found between two variables for several reasons show causal modeling gures Topics Review Part 1 1302009 15 there may be direct causation e g smoking causes lung cancer there may be a common cause e g ice cream sales and number of drownings both increase with temperature there may be a confounding factor e g highway fatalities decreased when the speed limits were reduced to 55 mph at the same time that the oil crisis caused supplies to be reduced and people drove fewer miles there may be a coincidence eg the population of Canada has increased at the same time as the moon has gotten closer by a few miles Establishing cause and effect How do we establish a cause and effect relationship It is generally agreed that most or all of the following must be considered before causation can be declared Strength of the association The stronger an observed association appears over a series of different studies the less likely this association is spurious because of bias Doseresponse effect The value of the response variable changes in a meaningful way with the dose or level of the suspected causal agent Lack of temporal ambiguity The hypothesized cause precedes the occurrence of the effect The ability to establish this time pattern will depend upon the study design used Consistency of the findings Most or all studies concerned with a given causal hypothesis produce similar ndings Of course studies dealing with a given question may all have serious bias problems that can diminish the importance of observed associations Biological or theoretical plausibility The hypothesized causal relationship is consistent with current biological or theoretical knowledge Note that the current state of knowledge may be insufficient to explain certain findings Coherence of the evidence The ndings do not seriously con ict with accepted facts about the outcome variable being studied Specificity of the association The observed effect is associated with only the suspected cause or few other causes that can be ruled out IMPORTANT NO CAUSATION WITHOUT MANIPULATION Examples Discuss the above in relation to smoking vs lung cancer amount of studying vs grades in a course sex education in school vs having premarital intercourse fossil fuel burning and the greenhouse effect free trade vs plant closings Topics Review Part 1 1302009 16 Basic Probability Probability of an event A PA 1 The Number Of Ways Event A Can Occur n The Total Number Of Possible Outcomes 0SPAS1 If an event has 0 S PA lt 5 then its occurrence is called unusual Definition 39An experiment is a situation involving chance or probability that leads to results called outcomes Example In the problem below the experiment is tossing a coin 39An outcome is the result of a single trial of an experiment The possible outcomes are getting a Tail or Head An event is one or more outcomes of an experiment One event of this experiment is getting a Tail 39Probability is the measure of how likely an event is The probability of getting a Tail is one half Tossing one fair coin Possibilities lt T T H H Tossing two fair coins Possibilities Probabilities 5 5 TT PTT 25 T T 5 5 TH PTH 25 T H 33 HT PHT25 H T 5 5 HH PHH25 H H Topics Review Part 1 1302009 Probabilities 50PT 50PH Multiplication Rule Keyword and both all PA andB and C and PAPBPC Example According to DMV 70 of applicants pass the written test class for the rst time then if 10 applicants are taking the test nd the probability that all 10 applicants will pass the test Pall 10 appliacnts will pass 07010 002825 2825 Calculator Hint 07 10 02825 Interpretation 282 means that its quite unlikely that all 10 applicants will pass the test A A system has 7 major parts and the working probability of each part is 88 The system works if all the parts work Also all parts are working independently from each other PS Working probability of the system PS PF 7 1 What is the working probability ofthe system PS PP n 0887 4087 4087 Calculator Hint 08 7 4087 Practice A system has 10 parts and the working probability of each part is 93 The system works if all the parts work Also all parts are working independently from each other 1 What is the working probability of the system Ans Rv 484 B There are 13 diamonds and l2faces and 4 aces in a deck of card If we draw 4 cards at random then a What is the probability that all 4 are diamond how likelihood is this 13 12M 11M 10M 3m 3M Hull an 13121110 2 2 2 26 veryunlikely 52 51 50 49 b What is the probability that all 4 are aces and how likelihood is this 4 I 3 h 2 N8 1 I II In 48 In a all 40 In 4 3 2 1 a 51 2 W 2 I 000369 very much unlikely 52 51 50 49 c What is the probability that all 4 are faces and how likelihood is this 12 Fun 11 Fat 10 has 0 Fa Min Mil In 41am 121110 9 2 51 2 W 2 I 001828 1828 much unlikely 52 51 50 49 d What is the probabilit that all 4 are non faces and how likelihood is this 12 u 12 L a 12 u I ah I Ill I II 1 Ilil I 2 5 I 40 39 38 37 337575 3376 It1s11kely 52 51 50 49 gt gt Topics Review Part 1 1302009 18 A There are 10 men and 8 women in a group If two people are selected at random without replacement then 1Write all the possibilities 2 Compute all the probabilities Possibilities Probabilities 10 M 9 M 8 W 8 W 3 10 9 W pM1018 pM917 3 PMME 294 10 M 9 M 8 W 8 W 3 10 8 MW pM1018 pW817 3 PMWE 2614 10 M 10 M 8 W 7 W 3 8 10 W pW818 pM1017 3 PWE 26l4 10 M 10 M 8 W 7 W 3 8 7 WW pW818 pW717 3 PWWE l83 then nd the following probabilities 100 5 Both are men 294 0o 6 Both are women 183 0o 7 At least one woman PMW PWM PWW 26 14 2614 183 706 0o 8 At most one woman PMW PWM PMM 2614 2614 294 817 0o 9 One man and one woman PMW PWM 26 14 2614 5228 oo Topics Review Part 1 1302009 19 B In a box there are 10 Red and 8 Blue balls If two balls are drawn at random with replacement then 1Write all the possibilities 2 Compute all the probabilities Possibilities Probabilities 10 R 10 R l 8 B l 8 B 3 10 10 RR pR1018 pR1018 3 PRREE3086 10 R 10 R 8 B 8 B 3 10 8 RB pR1018 pB818 3 PRBEE 2469 10 M 10 R 8 W 8 B 3 8 10 BR pB818 pR1018 3 PBREE 2469 10 R 10 R 8 B 8 B 3 8 8 BB pB818 pB818 3 PBBEEl975 100 then nd the following probabilities 5Both are Red PRR 3086 oo 6Both are Blue PBB 1975 0o 7 At least one Red PRB PBR PRR 2469 2469 3086 8025 0o 8 At most one Red PRB PBR PBB 2469 2469 1975 694 00 10 One Red and one Blue PRB PBR 2469 2469 4938 oo Topics Review Part 1 1302009 20 C There are two boxes rst one has 4 reds and 2 blue balls and the second box has 5 red and 2 blue balls If one ball from first box is drawn and put it into the second box and then we draw one ball from the second box at random then List all possibilities of compute all the probabilities for selecting a ball from second box Possibilities Probabilities 4 R 6 R 2 B 2 B R 3 R 3 RR pR46 pR68 3 PRR 50 4R 6R 25 2B R3 33 RB pR46 pB28 2 PRB 1667 4R 5R 2B 33 B3 R3 BR pB26 pR58 2 PBR 2083 4 R 5 R 2 B 3 B B 2 B 2 BB pB26 pR38 2 PBB 125 Must add up to 100 21 Find the probability that selected ball from second box is red PRR PBR 50 2083 7083 b Find the probability that selected ball from second box is blue PRB PBB 1667 125 2917 Topics Review Part 1 1302009 21