### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# INTRO STATISTICS STAT 2000

UGA

GPA 3.5

### View Full Document

## 33

## 0

## Popular in Course

## Popular in Statistics

This 31 page Class Notes was uploaded by Ethel Hermiston on Saturday September 12, 2015. The Class Notes belongs to STAT 2000 at University of Georgia taught by Oneal in Fall. Since its upload, it has received 33 views. For similar materials see /class/202530/stat-2000-university-of-georgia in Statistics at University of Georgia.

## Reviews for INTRO STATISTICS

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/12/15

Important Terms Population Total set of subjects in Populmm which we are interested Sample A subset of the population for which we have data Subject Entities we measure individuals Histogram Interpretation How many total students 5am p39e j Q s of 7a Graders 100 30 Which class has highest so s 60 so lowest frequency What are 40 41 those frequencies 30 B 9099 100109 110119 120429 How many students have an IQ between 110 and 129 StemAndLeaf Plot A bar chart on its side Stem is all digits except 19 9 the last one Last digit is the leaf Ascending order No commas If nothin in a row write the row ut leave it blank Example HW212 2 eBay selling prices 199210210223225 225 225 228 232 235 Sampling Methods Simple Random Sampling Each subject everywhere has an equally likely chance of being selected Often done with a random numbertable Choosing a company somewhere in the US Systematic Selecting every k th subject Surveying every 10m person we meet downtown Convenience Individuals are easily found eg internet surveys Often the laziest way so less reliable answers Sampling Methods Strati ed Sampling 39 Cluster Sampling Taking some Taking all subjects from subjects from all some POSSible QTOUPS possible groups A Symm Ek ic skewed Left Skewed Ri gin m ewi median mean lt medial m eau gt m 21111 Outliers The mean is sensitive to outliers The median is resistant to outliers When outliers are present best to use median as measure of central tendency Example average selling price of homes in the US Standard Deviation The average distance between any data point and the mean of the data Measures how muchlittle the data distribution is spread out hi gym standard dedarian Summary Stats Interpretation Mean Average ofthe data set Median also called Q2 About 50 of data lie below and above his value Range Difference between maximum and minimum Max amp Min Highest and lowest points in data set Q1 and Q3 25 and 75 percentiles lnterquartile Range IQR Difference between Q3 and Q1 BoxPlot HW 2526 Distribution of taxes in cents Minimum 26 Q3 105 Q1 31 Maximum 206 Median 55 Construct a boxplot forthis data What proportion of states have taxes Greaterthan 31 cents Greater than 1 05 105 cents Find the range and the interquartile range IQR BOXPlot Outlier Test HW 2526 Any point lying above Q3 15 x IQR is an outlier Any point lying below Q1 15 x IQR is also an outlier Are there any outliers on this boxplot 1105 320 000 Mean amp Median HW 2324 This chart shows the number of grams of protein in various brands of loafs of bread Compute the mean and median of the data set What can you say about the shape of the distribution Proteir g Count 0 15 16 21 4 Total 56 I u Empirical Rule Only used for bell shaped distributions Within one standard deviation from the mean we have 68 of all data points Within two standard deviations from the mean we have 95 of all data points Empirical Rule Within three standard deviations from the mean we have almost all data points Anything else is an outlier SUMMARY 1 s 68 2 s 95 3 8 Almost all L data point mean s st dev Z The number of standard deviations away from the mean is the Zscore Zscore negative data point below mean Zscore positive data point above mean Data point is an outlier if a Zscore gt 3 or b Zscore lt 3 Example HW 2324 The weight of an armadillo is bellshaped with mean 14 pounds and standard deviation 25 Find an interval within which about 95 of armadillo weights will fall What zscore represents an armadillo that is 28 standard deviations to the right of the mean What weight is that Relative Risk HW 31 conditional ro ortion for one rou lar er number relative nsk p p g P g conditional proportion for another group smaller number Relative risk tells us how many times more likely the outcome is for one group than the other group The following three facts therefore follow 1 Relative risk 1 2 When the numerator and denominator proportions are very similar relative risk will be very close to 1 3 However when the numerator is quite a bit larger then relative risk will be quite a bit greater than 1 72 1 4 86 Nquot 28 45 73 Total 100 59 159 Find the proportion of orienta ionstudents that adjusted well Find the proportion of noorientationstudents that adjusted well Find the relative risk of adjusting well to college for both groups of students Students that are times more likely to adjust well to college than students that 600d Bad Adjustment Adjmme39 Total Orientation 72 1 4 86 N 28 45 73 Orientation Total 1 00 59 1 59 Find the probability that someone selected did orientation and adjusted well Find the probability that someone selected did orientation or adjusted well If a subject selected adjusted poorty what s the probability heshe did not do orientation Scatter Plots 6 m 1 5 2 A a Strong correlation Weak correlation a Correlation r 1 5 r5 1 If r is positive then so is the slope If r is negative then so is the slope Closer r is to 1 or 1 strong correlation Closer r is to 0 weak correlation r is unitless r measures only LINEAR relationship r is not a proof of causality LeastSquares Regression 31abx x given data point v predicted response a intercept Predicted response when x 0 May not always have a practical interpretation b slope Slope is how much the predicted response increases or decreases for every unit increase in x Residual observed predicted or y Regression HW 3234 Analysis says that we can use the length of an alligator in feet to predict its weight in pounds The equation is given Find the expected weight of an alligator that s 10 feet long Suppose an alligator hat s 10 feet long actually weighs 402 pounds Calculate the residual Interpret the slope Interpret the intercept Probability Probability is the likelihood of a particular outcome occurring wa aneventcanoccur probablhtyys pos51ble outcomes Example probability of drawing a club from a deck of cards 13 clubs 1 p 725 52mtaIcaIds 4 A complement All possible events that are not in A Example A it s raining AB it s not raining Complement probability PA 1 PA Probability HW 51 53 We have an urn full of 12 blue 10 red and 8 black marbles We reach in and draw a marble at random What s the probability of drawing a marble that s one of UGA s colors If the marble drawn was a UGA color what s the probability it was not red Discrete Probability Distribution Two requirements 1 Each individual px is between 0 and 1 inclusive 2 All probabilities sum to 1 The mean ofa discrete distribution MEAN Zx px Mean of a Distribution HW6162 Here s a table for the probability of different category hurricanes Find the missing value and the meanexpected value of this data set Normal Distribution Continuous Derile n 6 Three rules Total probability ms area underthe normal curve is 1 n2 2 Normal curve is symmetric 3 Xvalue goes in left box probability goes in right box on x Statorunch Mm liu 5m Dev 171 Prunuircia x plussam a Prob 3 mm 1 mm mm 73 72 r n 1 Normal HW 6163 Study of entrance exam scores mean is 120 with sd 11 Anything above 145 is considered superior Find the zscore for a score of 145 Interpret it o What score is 1 standard deviation to the left ofthe Area Between Two Lines Find the probability within 128 standard deviations from the mean Make a sketch of this Try to think of a strategy for this Percentiles 3 Types of Distributions The Pth percentile is 1 Population the X that Q39Ves P Distribution of all points in the population below on the normal 2 Sample Data Data Always below Distribution of one particular sample Example here the x is the 30th percentile 3 Sampling Dist of Sample Means because 30 of the Distribution of the sample means of a given data falls below x size n X Means Problem Distribution Shapes Sample Data Data Distribution is the same shape as the population MEAN ST DEV If population is skewed so is the sample data POPULATION u 0 SAMPLE DATA DATA E 5 Sam lin Distribution of Sam le Means is m 5 normal if SAMPLDG DIST 3911 Population is normal or n gt 30 by the Central Limit Theorem OthenNise no conclusion about shape Two Important Properties Distributions HW 7172 As the sample size n increases The lifetime of a certain type of tumbledryer The mean of the sampling distribution of until failure has a distribution skewed right With sample means does not change mean S2 months and sd 11 A sample of 98 dryers Is selected and this sample has mean The standard error sd of the sampling distribution decreases 57392 and s39d39 1239139 39 What is the center and spread for the population Example 12 gt A larger denominator smaller overall fraction What shape Is the population Distributions HW 7172 What isthe centerand spreadrorthe sample data selected What shapeisthe sample data What isthe centerand spreadtorthe sampling distribution or sample means With size 987 What shape isthe sampling distribution or sample means StatCrunch HW 71 72 The average household temperature in Chattanooga is 676 degrees and the sd is 42 A sample of51 households is selected What39s the probability the average ofthis sample will be above 681 Fill in the inputs on the StatCrunch box Mean Sm Dev Prnulxiiviy i StatCrunch HW 7172 What39s the probability the average ofthe w sk t h Think of a strategy for wering this Proportions Problem SAVIPLTNG DIST Thnnnm39pling dim39butiunofth Implepmpm onilnmmalwhm W215 and rIl p215 Proportions HW 7173 57 of students at an academy are female In a random sample of55 students 26 ofthem are male Let 1 female and 0 male Identify the population distribution of gender X Proportions HW 7173 identity the data distribution or gender x Pgtlt i o What is the mean amp standard error orthe sampling distribution or the sample proportion isthe sampling distribution approximately normal Proportions HW 7173 Now given that the mean is 57 and the standard error is 06676 ample nd the probability that a sample of size 55 has a s proportion of51 or less TrueFalse Questions Probability is always between 1 and 1 while correlation is always between 0 and On the normal distr bution the probability above 2 2 is equal to the probability below 2 2 As sample size decreases standard error increases When calculating the probability that a sample average will fall abovebelow a given number we have to divide t e population mean by the square root of the sample size u population mean 039 population st dev Notation We use different letters for population parameters versus sample statistics x sample mean s sample st dev p population proportion I sample proportion Know the differences among these Notation HW 7173 What symbol is used to denote the population mean What symbol is used to estimate the population proportion What symbol is used to describe the spread in one sample Con dence Intervals Calculate the sample meanproportion in your sample Point Estimate sample meanproportion Calculate the width based on level of confidence and standard error You get a range of plausible values for the true population meanproportion CI for Proportions palm on dmce standard cmmatc ln39el rrror p point estimate 2 depends on con dence level Filip 75landarderror l n mam mdthurci pllrpl an z imargmofamrampwidthofcl n Properties of a CI The sample proportionmean is ALWAYS inside the con dence interval In fact it s always right in the center Eiz p1 p n The population proportionmean may or may not be inside the con dence interval Interpretation of a CI A95 Cl means that about 95 of all Cls constructed contain the true population proportionmean and about 5 do not longrun definition We are 95 certain the true proportion lies somewhere inside our Cl de nition of an individual interval A99 Cl means that about 99 of all Cls constructed contain the true population proportionmean and about 1 do not Example 1000 intervals At 95 about 950 maybe 940960 contain the true proportion At 99 about 990 maybe 985995 contain the true proportion Determining z o z level of confidence 95 Cl 2 196 Derile To get hese numbers as 95 5 is left over Half ofthat is 25 2 Pzgt 025m n1 StatCrunch Mm of sin Dev 7 prubrxmim lunz i j39SnaphutJ Close oompuiel Proportions Cl HW 8182 A random sample of 970 people were asked if they owned a pet hamster 19 said yes and 951 said no Find a point estimate for the proportion of people who said yes lfthe margin of error is 00872 nd the 95 con dence interval Proportions Cl HW 8182 Suppose in a new sample for owning a pet hamster we get a 95 confidence interval of 03 09 Can we find the sample proportion If so find it Can we find the population proportion If so find it Can we conclude that fewer than 12 of people own a pet hamster How about more ham 2 o What is he margin of error CI Properties Increasing level of con dence 2 widens the interval Decreasing level of con dence 2 shortens the interval Intuition narrowing your eld for the true proportion means you re not as certain it really does fall inside the interval Cl Properties Increasing the sample size shortens the Cl Decreasing the sample size widens the Cl This is because standard error decreases as n increases so the margin of error width decreases as well Intuition a larger sample size gives a more accurate estimate and allows you to zero in on the true proportion Summary of OJ Width Factors Confidence Level 2 Sample Size n As 2 increases Cl As n increases Cl widens shortens As 2 decreases Cl As n decreases Cl shortens widens Assumptions for proportion Cl 1 Sample is randomly selected 2 quot215 3 n1 215 Cl with Means con dence error Same general idea annt 1 level 8mm 65 But we have a different formula 7 s xitxi J With proportions use 2 With means uset Cl with Means Tvalues change as degrees of freedom change unlike normal calculator Degrees of freedom n 1 Assumptions for doing Cl for means Random sample One of these two should be true Sampling from a normal population n gt 30 Choosing Sample Size Idea We have a given confidence level and a desired margin of error What sample size is needed to achieve that Formula is different for proportions and means see formula sheet Sample Size Needed Formulas 2613 quot 7 71 sample sizeweneed z levelofcon denee 196etc m marginofmurwewantwhave n sample size needed Z ziscore forconfidence level 5 sample standard deviation quotWquot mule 1111715 Won m desired margin of error What do we choose for the sample proportion 1 Proportion ofa previous study 2 lfnothing is knovm p O Proportions Summary Assumptions for a Valid Con dence Interval Confidence Interval Random 5ampe Point Estimate 5 Standard Error IA Q D n We need np215 We need n17p215 Level ofCon dence use 2 p 1 p Margin of Error Z Finding Sample Size Lower Limit 31 I7139139 A A n p 1p n 7 A 13 m1 UpperLImIt t p p n Means Summary Assumptions for a Valid Confidencelntervals I ence Interval point Estimate x Random Sample One ofthese two Normal population or n gt 30 Level ofcon dence depends on t L Standard Error J Margin of Error 1 ij F39quotd39quot9 sjmzple SIZE Lower Limit gti 7 0 Z 6 n m2 Upper Limit tx Designed Experimental Study Manipulates the subjects somehow Can be used to prove causation Subjects randomly divided into groups Examples Does a coupon attached to a catalogue make recipients more likely to order Does a new medicine reduce the frequency of headaches Observational Study Measures qualities of subjects without manipulating them Cannot be used to prove causation only that the variables are related Cannot be randomly assigned to groups Examples Whether or not smoking has an effect on heart disease can t assign groups Are higher SAT scores positively correlated with higher college GPAs Designed Experiments Experimental Unit sub39ect The personobject that receives the treatment Treatment A conditiondrugetc applied to the subject Response Variable Variable we are interested in studying Explanatory Variable Variable we believe to influence the response Designed Experiment HW 4144 We are testing the effects ofa new energy drink on heart rate 50 339 quot 39 assigned quot drink while a different 50 have a similar tasting drink but that is not an energy ooster The subjects heart rates are recorded and the researchers know which drinks each subject gets Response ExaLarlatonc Treatments Experimental Units Is this completely randomized or matched pairs lfthe subjects don t know which drink they get is the study single or double blind Experimental Designs Completely Randomized Experimental units are randomly assigned to treatments and no overlap in groups That is everybody gets just one treatment nobody gets both Matched Pairs Subjects are somehow matched before the experiment happens for measuring differences between the two Twins or same person in two treatment groups Experimental Designs Crossover Design A matched pairs when a subject receives both treatments at some point in the experiment Cereal lab Block a set of matched experimental units subjects Randomized Block Design Using blocks but randomly assigning the order in which each block receives the treatment This reduces possible bias Cereal lab again order was random Hypotheses The alternative hypothesis HA tests if a parameter is greater than less than or not equal to a suspected number The null hypothesis H0 sets the parameter equal to the suspected number stated in the alternative Null is always equals We test parameters p or u We never test statistics p or x Example Hypotheses H0p31 Hozp56 Hozu11 HA2plt3l HAzpgt56 HAzu 11 lefttailed righttailed twotailed A Hypothesis Testing Notation p 2 population proportion p0 hypothesized proportion under H 0 A9 2 sample proportion Z stat 2 test statistic proportions u 2 population mean HO 2 hypothesized mean under H 0 2 sample mean t stat 2 test statistic means p Value probability that you observe this sample proportionmean or further away When H 0 is true PValue Interpretation The pvalue is the probability that you could observe a given sample mean or further away if in fact the null is true H0u40 E50 HA 2ugt40 pValue12 The probability of observing a sample mean of 50 or higher if the true mean were 40 is 12 Approximately 12 of sample proportions can be expected to be at least this much higher than 40 Hypothesis Testing Steps Proportions H Pu Hu pgtilt a menu I quotquot minion Conclusions irpeyaiue d aipna itpeyaiue gt d ReieetH e e Test is signirieant e Tnere is eneugn eyidenee to suggest a change increase or decrease 7 strong eyidenee reir e Nu strong eyidenee rer H HA e strong eyidenee e Nu streng eyidenee against His against He pussmig Me i Em e Pussibie Type ii Ermr Assumptions for Testing Proportions Random sarnpies Random sarnpies npuzis One ottnesetwo n i e gt is T WED WT epopuiation snonnai Errors Type i reieottne nuii Wnen in fact you shouidn t naye was aotuaiiytnetru oonoiusion Type ii raii to reieottne nuii Wnen in fact you snouid naye tne aiternatiye was tne correct one Hypothesis HW 9195 H p 5 HA pes Aresearchergetsz m n a am ueriisieisinenaiiwienypeiw Ham m on Peapie there were 22 ismeesees u nd he Paint estimate or thep puiatianpmpaman The margin aieiiai is i377 rind the eaniidenee iiiieiyeii and interpret What ievei at mn denue istnis interyen Wiii a 95 ci he Wder oi neiiaweir Hypothesis Testing 2 Dependent Samples Ex TWii iS used fortwo groups or sarne person in ootn groups gt dependent Use a rnatoned pairs iust means Matched pairs is tordependent sarnpies e e o i HA HRH or HrHgt0 HA row or HrHlt0 HA Maia or Metgzo Matched Pairs HW 101104 Blood pressure was recorded for 3 different people before and after a medicine was taken Before After Difference Subject 1 151 125 Subject 2 167 136 31 Subject 3 137 120 17 Average 1516667 127 246667 Notice the before average minus the after average equals the ifference average Amatched pairs test reduces two samples to one by taking differences So it reduces to a onesample test with sample size 3 Comparing 2 Proportions or 2 Means Independent Testing 2 independent proportions op1p2 or pisz HA3P1gtP2 or P1TP2gt0 HA3P1ltP2 or P17P2lt0 HA3P1 P2 or Pr PziO Testing 2 independent means oruluz or ul uz0 AIL11gtH2 or p1p2gt0 Ainulltnu2 or p1p2lt0 hertz or ill rm iii 2 Proportions HW101104 Does a new medicine help lower cholesterol People with high cholesterol were randomly assigned to receive eitherthe new medicine or a placebo After 5 weeks 106 of the 8499 on the new medicine had lower cholesterol and 86 of the 8091 in the placebo group had lower cholesterol Is this a significant difference Set up the hypotheses 2 Proportions HW101104 The pvalue is 2673 Interpret We do do not have strong evidence that there are different results We don t reject reject the null at 05 o The test is is not significant The 95 confidence interval for the difference in proportions will will not contain 0 o If we make the wrong conclusion it would be a Type Type II error 2 Means HW 101104 A summary for types of sales for an iPod Mean n Bid 231611 96 Buyit now 221667 128 Find the point estimate for the difference in population means The pvalue is 006 Can we conclude there s a popula ion difference V thout nding he Cl determinewhether it will contain 0 or not ChiSquare Goodness of Fit Used fortesting if category proportionscounts are equal to specified values Or used fortesting if category proportionscounts are all equal to each other Example hypotheses HEl Proportions are as stated example 20 30 50 H A otherwise HEl Proportions are all equal to one another H A otherwise Goodness of Fit Steps List hypotheses 0 Ho the proportions are as claimed 0 Ha otherwise Check assumptions A The sample is randomly selected B Each expected cell count is at least 5 Compute the test statistic 2 0bS exp2 Z 2 exp N 00 5 State degrees of freedom c 1 number of cells minus 1 Find the pvalue on the ChiSquare distribution on df c1 by finding the probability above X2 State the conclusion 01 0 Computing the pvalue Suppose we have 6 categories and X2 212164 Then df 6 1 5 Look up the probability above the test statistic x2 never below Derier 015 Ptobx ugt quot212164 quot05320707 Conclusions for Goodness of Fit If pvalue 5 or alpha If pvalue gt or Reject H0 Fail to reject H0 Test is insignificant There is insufficient evidence to suggest the proportions are Test is significant There is enough evidence to suggest the proportions are different that what s different than what s specified specified Possible Type Error Possible Type II Error Goodness Of Fit HW 111112 It is thought that a certain type of cookie box should contain the following percentages of three varieties Chocolate Chip 40 Oatmeal 30 Sugar 30 o A box is selected at random and opened Here are the observed counts of 50 cookies Compute the expected category counts VWI we have a valid chisquare test TrueFalse the degrees of freedom for this problem would be 50 1 49 Goodness of Fit HW 11 111 2 We want to see if a 20sided dice is fair balanced To investigate this we roll it 80 times and record the number of times each face comes up The X2 statistic is 21007 Fill in the boxes below to nd the p value Web gt 4 39 g ChiSquare Test for Independence Used in an r x 0 contingency table to test if there s an association between the categorical variables of contingency table association from Test 1 The null hypothesis is that the explanatory and response variables are independent The alternative hypothesis is that there is a strong association between them Important Terms Population Total set of subjects in Populmm which we are interested Sample A subset of the population for which we have data Subject Entities we measure individuals Histogram Interpretation How many total students 8am pled Q s of 7a Graders 60826041243 100 w 30 so Which class has highest 60 60 lowest frequency What are 41 those frequencies 40 Highest 100 109 with 82 3quot Lowest 120129 with 41 60 How many students have an IQ between 110 and 129 6041 101 StemAndLeaf Plot A bar chart on its side Stem is all digits except 19 9 the last one Last digit is the leaf Ascending order No commas If nothin in a row write the row ut leave it blank Example HW212 2 eBay selling prices 199210210223225 225 225 228 232 235 Sampling Methods Simple Random Sampling Each subject everywhere has an equally likely chance of being selecte Often done with a random numbertable Choosing a company somewhere in the US Systematic Selecting every k th subject Surveying every 10m person we meet downtown Convenience Individuals are easily found eg internet surveys Often the laziest way so less reliable answers Sampling Methods Strati ed Sampling 39 Cluster Sampling Taking some Taking all subjects from subjects from all some POSSible QTOUPS possible groups A Symm Ek ic skewed Left Skewed Ri gin m ewi m edran mean lt medial m eau gt m 21111 9099 100109 110119 120429 Outliers The mean is sensitive to outliers The median is resistant to outliers When outliers are present best to use median as measure of central tendency Example average selling price of homes in the US Standard Deviation The average distance between any data point and the mean of the data Measures how muchlittle the data distribution is spread out in an hi gym standard deviation Summary Stats Interpretation Mean Average ofthe data set Median also called Q2 About 50 of data lie below and above his value Range Difference between maximum and minimum Max amp Min Highest and lowest points in data set Q1 and Q3 25 and 75 percentiles lnterquartile Range IQR Difference between Q3 and Q1 BoxPlot HW 2526 Distribution of taxes in cents Minimum 26 Q3 105 Q1 31 Maximum 206 Median 55 Construct a boxplot forthis data What proportion of states have taxes Greaterthan 31 cents Greater than 1 05 105 cents Find the range and the interquartile range IQR BoxPlot HW 2526 25 l 25 25 25 16 51 SS 105 206 Greaterthan 31 cents 75 Greater than 105 25 Range max min 206 26 2034 lQRQ3 Q1 105 31 74 IQR range for the middle half of the data BOXPlot Outlier Test HW 2526 Any point lying above Q3 15 x lQR is an outlier Any point lying below Q1 15x lQR is also an outlier Are there any outliers on this boxplot J 256 SM 1105 320000 Q1 15 x IQR 256 15 x 1105 256 10175 Because there are no points beneath this cutoff we have no lower outliers Q3 15 x IQR 1105 15 x 1105 256 23785 Because the max is greaterthan this cutoff 320000 gt 23785 we have an upper outlier Mean amp Median HW 23 24 This chart shows the number of grams of protein in various brands of loafs of bread Compute the mean and median ofthe data set What can you say about the shape of the distribution Proteing Count 0 15 1 16 2 21 3 4 Total 56 mean Mean amp Median HW 23 24 Forthe median nd halfthe Momma Count total count about 28 so we need to nd where 0 1 bread 28 is 1 15 It s not in Row 0 since we 1 31 have the rst 15 only 3 4 After Row1we have15 16 31 loafs Total 36 o Median 1 since bread 28 falls in Row 1 Mean gt median gt somewhat skewed right OX151X162gtlt213gtlt4 56 125 Empirical Rule Only used for bell shaped distributions Within one standard deviation from the mean we have 68 of all data points Within two standard deviations from the mean we have 95 of all data points Empirical Rule Within three standard deviations from the mean we have almost all data points Anything else is an outlier SUMMARY 1 s 68 2 s 95 3 s Almost all data point mean s st dev Z The number of standard deviations away from the mean is the Zsoore Zsoore negative data point below mean Zsoore positive data point above mean Data point is an outlier if a Zscore gt 3 or b Zscore lt 3 Example HW 2324 The weight of an armadillo is bellshaped with mean 14 ounces and standard deviation 25 Find an interval within which about 95 of armadillo weights will fall By the Empirical Rule we go out 2 deviations from the mean 14 2gtlt25142gtlt25919 What zscore represents an armadillo that is 28 standard deviations to the right of the mean What weight is that z 28 14 x xgt28x gt x 282514 21 Relative Risk HW 31 conditional proportion for one group larger number conditional proportion for another group smaller number Relative risktells us how many times more likely the outcome is for one group than the other group The following three facts therefore follow 1 Relative risk 1 2 When the numerator and denominator proportions are very similar relative risk will be very close to 1 3 However when the numerator is quite a bit larger then relative risk will be quite a bit greater than 1 relative risk Adj gtiriie39nt 1331 at Total Orientation 72 1 4 86 28 45 73 7 No Orientation Total 1 00 59 159 Find the proportion of orienta ionstudents that adjusted well 7286 083721 Find the proportion of no orientationstudents that adjusted well 2873 038356 Find the relative risk of adjusting well to college for both groups of 5 students Look at the Good Adjustment proportion LargerSmaller 083721 038356 218272 Students that did orientation are 218272 times more likely to adjust well to college than students that did not do orientation Adjigtiir ijent AdiEsiiiiani Total Orientation 72 1 4 86 N Orientoa pn 28 45 73 Total 1 00 59 1 59 Find the probability that someone selected did orientation and adjusted well 72 159 Find the probability that someone selected did orientation or 4 I159 adjusted well 72 28 39 If a subject selected adjusted pooriy what s the probability heshe did not do orientation 4559 Scatter Plots Van vaiz 25 t m 2 5 2 I 6 Strong correlation Weak correlation 8 Correlation r 1 5 r5 1 If r is positive then so is the slope If r is negative then so is the slope Closer r is to 1 or 1 strong correlation Closer r is to 0 weak correlation ris unitless r measures only LINEAR relationship r is not a proof of causality LeastSquares Regression 31abx x given data point 31 predicted response a intercept Predicted response when x 0 May not always have a practical interpretation b slope Slope is how much the predicted response increases or decreases for every unit increase in x Residual observed predicted or y Regression HW 3234 Analysis says that we can use the length of an alligator in feet to predict its weight in pounds The equation is given by y 10 40x Find the expected weight ofan alligator that s 10 feet long 104010 410 pounds Suppose an alligator hat s 10 feet long actually weighs 402 pounds Calculate the residual Observed Predicted 402 410 8 so we overes imated Interpret the slope For every additional foot in length an alligator s weight is expected to increase b unds Interpret the intercept Literally an alligatorwith leng h 0 will weigh 10 poundsmakes no sense So the intercept has no interpretation here Probability Probability is the likelihood of a particular outcome occurring a 3116 entcanoccur mobablhwwysv poss1ble outcomes Example probability of drawing a club from a deck of cards 13 clubs 1 p 7 15 52 total cards 4 A complement All possible events that are not in A Example A it s raining AB it s not raining Complement probability F AC 1 F A Probability HW 51 53 We have an urn full of 12 blue 10 red and 8 black marbles We reach in and draw a marble at random What s the probability of drawing a marble that s one of UGA s colors 30 total marbles and 10 8 18 of them are red or black 6 30 If the marble drawn was a UGA color what s the probability it was not red Out of 18 marbles of UGA colors red or black 8 of them are black so not red 8 18 44444 Discrete Probability Distribution Two requirements 1 Each individual px is between 0 and 1 inclusive 2 All probabilities sum to 1 The mean ofa discrete distribution MEAN Zx px Mean of a Distribution HW6162 Here s a table for the probability of different category hurricanes Add up and get 26 The meanexpected value is the category strength of a hurricane we will expect to see on average lfwe average a large number of hurricanes the longrun average will be about 26 Normal Distribution Continuous Derrle n v Three rules Total probability ms area underthe normal curve is 1 n2 2 Normal curve is symmetric m 3 Xvalue goes in left box probability n 3 2 D 1 2 3 goes in right box on 39 39 39 x 7 5m Dev 171 StatCrunch 7 413 M j gum mm V Normal HW 6163 Study of entrance exam scores mean is 120 with sd 11 Anything above 145 is considered superior Find the zscore for a score of 145 Interpret it zx p 145 120 039 11 227273 Area Between Two Lines Find the probability within 128 standard deviations from the mean Make a sketch39 ofthis Ascore of 145 is 227 deviations above the mean Try to think Of3 0 What score is 1 standard deviation to the left of the Strategy for th39s mean 2 10 because it s belowthe mean 4 a 23 1 x171 x 111 120 109 Area Between Two Lines Percentiles n muznsas u IDUZTZj 7 23 128 393 391 391 g 1 The Pth percentile is the x that gives P below on the normal Always below Example here the x is the 30th percentile because 30 ofthe data falls below x Meanlu Sm Dev iii x 120100272566 79945 4 3 Types of Distributions Means Problem 1 Population Distribution of all points in the population 2 Sample Data Data Distribution of one particular sample 3 Sampling Dist of Sample Means Distribution of the sample means of a given size n MEAN ST DEV POPULATION u a SANEPLE DATA DATA x S 7 J SAMCPL LNG DIS T u Distribution Shapes Sample Data Data Distribution is the same shape as the population If population is skewed so is the sample data Sampling Distribution of Sample Means is normal if Population is normal or n gt 30 by the Central Limit Theorem Otherwise no conclusion about shape Two Important Properties As the sample size n increases 0 The mean of the sampling distribution of sample means does not change The standard error 5d of the sampling distribution decreases Example 12 gt A larger denominator smaller overall fraction Distributions HW 7172 The lifetime of a certain type of tumbledryer until failure has a distribution skewed right with mean 62 months and 5d 11 Asample of 98 dryers is selected and this sample has mean 572 and 5d 121 0 What is the center and spread for the population Center 62 Spread 11 0 What shape is the population Skewed Right Distributions HW 7172 What is the center and spread for the sample data selected Center 572 Spread 121 o What shape is the sample data Skewed Right same as population What is the center and spread for the sampling distribution of sample means with size 98 Center 62 Spread 11 sqrt98 111117 What shape is the sampling distribution of sample means Normal by Central Limit Theorem n gt 30 StatCrunch HW 7172 The average household temperature in Chattanooga is 676 degrees and the 5d is 42 Asample of 51 households is selected 0 What s the probability the average of this sample will be above 681 Fill in the inputs on the StatCrunch box Mean Std Dev Prom X i Y StatCrunch HW 7172 0 Sampling distribution with n 51 0 Mean 676 population s 0 SD standard error i 53312 J 51 X Mean E75 Sm De 58812 Pmmxiiiisaii 1W 3 mm mm ea StatCrunch HW 7172 What s the probability the average of the sample with be within 15 degrees of the population mean Hint draw a sketch Think of a strategy for answering this StatCrunch HW 7172 an 1 an a 1 66 67 X 68 69 1720 00538 98924 Mean 1m Sm Dev 153m 1 200538 98924 Frublxlgt vilsat n unsavma naunm ciasa cumpme Proportions Problem MEAN ST ERROR POPULATION p SAVIPLE DATA DA TA 2 SAMPLING DIST p M 77 The sampling distribution of the sample proportion E is normal when up 215 and n1 p215 Proportions HW 7173 57 of students at an academy are female In a random sample of 55 students 26 ofthem are female Let 1 female and O male Identify the population distribution of gender X PX 1 57 O 43 1 57 Proportions HW 71 73 Identify the data distribution of gender X 1 0 F X 26 55 47273 1 47273 52727 What is the mean amp standard error of the sampling distribution of the sample proportion Mean 57 a M 06676 n 55 Is the sampling distribution approximately normal np 5557 3135 and n1 p 5543 2365 Yes both are greater than 15 Proportions HW 7173 Now given that the mean is 57 and the standard error is 06676 find the probability that a sample of size 55 has a sample proportion of 51 or less a F 9 Normal calculator DensiW 6 5 a 3 J 2 l x o D 4 D S D 6 D 7 x Mean 57 5rd Dev 05676 thrx iii 51 7 013439512 TrueFalse Questions Probability is always between 1 and 1 while correlation is always between 0 and 1 False other way aroun On the normal distr bution the probability above 2 2 is equal to the probability below 2 2 rue because the normal is symmetric As sample size decreases standard error increases True because the opposite statement is as sample sIze increases standard error decreases When calculating the probability that a sample average will fall abovebelow a given number we have to dividet e population mean by the square root ofthe sample size False we dividethe population standard deviation by sqrtn Notation We use different letters for population parameters versus sample statistics 1 population mean E sample mean 039 population st dev s sample st dev p population proportion I sample proponion Know the differences among these Notation HW 7173 What symbol is used to denote the population mean 1 population mean What symbol is used to estimate the population proportion 13 the sample proportion estimates p What symbol is used to describe the spread in one sample S sample standard deviation Confidence Intervals 0 Calculate the sample meanproportion in your sample 0 Point Estimate sample meanproportion 0 Calculate the width based on level of confidence and standard error 0 You get a range of plausible values for the true population meanproportion CI for Proportions point inn dence standard estimate 39 incl error p point esurnate 2 depends on con dence level at 17 at 7 standard error ll n mm dtluf i pllrpl am 2 imargmofa rorampwidthofcl n Properties ofa C The sample proportionmean is ALWAYS inside the confidence interval In fact it s always right in the center u 71 o The population proportionmean may or may not be inside the confidence interval Interpretation of a CI A95 Cl means that about 95 of all Cls constructed contain the true population proportionmean and about 5 do not longrun definition We are 95 certain the true proportion lies somewhere inside our Cl de nition of an individual interval A99 Cl means that about 99 of all Cls constructed contain the true population proportionmean and about 1 do not Example 1000 intervals At 95 about 950 maybe 940960 contain the true proportion At 99 about 990 maybe 985995 contain the true proportion Determining z z level of confidence 95 Cl 2 196 To get hese numbers as 95 5 is left over Half ofthat is 25 Pzgt 025in m StatCrunch X Mean Inf Std Dev 1 pmwmim junz i jargon Mimi Proportions Cl HW 8182 A random sample of 970 people were asked if they owned a pet hamster 19 said yes and 951 said no Find a point estimate for the proportion of people whosaid es A 19 y p 01959 lfthe margin of error is 00872 nd the 95 con dence interval 01959 i 00872 0108702831 Proportions Cl HW 8182 Suppose in a new sample for owning a pet hamster we get a 95 confidence interval of 03 09 Can we find he sample proportion If so find it 03 09 06 because sample proportlon 1s always in the center Can we find he population proportion If so find it We cannot because the population proportion is unknown It may or may not be inside the interval Can we conclude that fewer than 12 of people own a pet hamster Yes because 12 lies above this interval How about more than 2 Yes because 02 lies below this interval What is the margin of error Difference between endpoint and center 09 06 03 CI Properties Increasing level of con dence 2 widens the interval Decreasing level of con dence 2 shortens the interval Intuition narrowing your eld for the true proportion means you re not as certain it really does fall inside the interval CI Properties Increasing the sample size shortens the Cl Decreasing the sample size widens the Cl This is because standard error decreases as n increases so the margin of error width decreases as well Intuition a larger sample size gives a more accurate estimate and allows you to zero in on the true proportion Summary of OJ Width Factors Confidence Level 2 Sample Size n As 2 increases Cl As n increases Cl widens shortens As 2 decreases Cl As n decreases Cl shortens widens Assumptions for proportion C 1 Sample is randomly selected 2 quot215 3 n1 215 Cl with Means Same general idea I 1 level 8mm es con dence error But we have a different formula 2 i t x i J With proportions use 2 With means uset Cl with Means Tvalues change as degrees of freedom change unlike normal calculator Degrees of freedom n 1 Assumptions for doing CI for means Random sample One of these two should be true Sampling from a normal population n gt 30 Choosing Sample Size Idea We have a given confidence level and a desired margin of error What sample size is needed to achieve that Formula is different for proportions and means see formula sheet Sample Size Needed Formulas A A 2 2 nquot mfzz n 0m 71 sample sizeweneed z level of con dence 196 etc In marginofermrwewanttohave gimfonthe sample proportion n sample size needed Z ziscore forconfidence level 5 sample standard deviation m desired margin of error What do we choose for the sample proportion 1 Proportion ofa previous study 2 Ifnothingis knoiimp50 Proportions Summary Assumptions for a Valid Con dence Interval Confidence Interval Random sample Point Estimate Standard Error l n We need np215 We need n17p215 Level ofCon dence use 2 Margin of Error Z W n liq2 Finding Sample Size Lower Um p 7 139 in l quot7 A 17 m1 UpperLimit t p p n Means Summary Assumptions for a Valid Confidencelntervals ence lmerVa39 Point Estimate x Random Sample One ofthese two Standard Error f Normal population or n gt 30 Level ofcon dence depends on t Margin of Error 1 ij Finding Sample Size Lower Limit quot 139 02Z2 5 n 7 J 2 U er LImIt tx m pp x J Designed Experimental Study Manipulates the subjects somehow Can be used to prove causation Subjects randomly divided into groups Examples Does a coupon attached to a catalogue make recipients more likely to order Does a new medicine reduce the frequency of headaches Observational Study Measures qualities of subjects without manipulating them Cannot be used to prove causation only that the variables are related Cannot be randomly assigned to groups Examples Whether or not smoking has an effect on heart disease can t assign groups Are higher SAT scores positively correlated with higher college GPAs Designed Experiments Experimental Unit sub39ect The personobject that receives the treatment Treatment A conditiondrugetc applied to the subject Response Variable Variable we are interested in studying Explanatory Variable Variable we believe to influence the response Designed Experiment HW 4144 We are testing the effects ofa new energy drink on heart rate 50 339 t assigne quot drink while a different 50 have a similar tasting drink but that is not an energy ooster The subjec s heart rates are recorded and the researchers know which drinks each subject gets Response heart rate type of drink received Treatments energy and generic dn39nks Experimental Units 100 subjects Is this completely randomized or matched pairs Completely randomized no overlap in groups lfthe subjects don t know which drink they get is the study single or double blind Single since the researchers know Experimental Designs Experimental units are randomly assigned to treatments and no overlap in grou s hat is everybody gets just one treatment nobody gets both Subjects are somehow matched before the experiment happens for measuring differences between the two Twins or same person in two treatment groups Experimental Designs Crossover Design A matched pairs when a subject receives both treatments at some point in the experiment Cereal lab Block a set of matched experimental units subjects Randomized Block Design Using blocks but randomly assigning the order in which each block receives the treatment This reduces possible bias Cereal lab again order was random Hypotheses The alternative hypothesis H A tests if a parameter is greater than less than or not equal to a suspected number The null hypothesis H0 sets the parameter equal to the suspected number stated in the alternative Null is always equals We test parameters p or u We never test statistics p or x Example Hypotheses Hozp31 Hozp56 H0u11 HA2plt3l HAzpgt56 HAzu 11 lefttailed righttailed twotailed Hypothesis Testing Notation p 2 population proportion p0 hypothesized proportion under H 0 A9 2 sample proportion z stat 2 test statistic proportions u 2 population mean H0 2 hypothesized mean under H 0 c 2 sample mean t stat 2 test statistic means p Value probability that you observe this sample proportionmean or further away When H 0 is true PValue Interpretation The pvalue is the probability that you could observe a given sample mean or further away if in fact the null is true H0u40 E50 HA 2ugt40 pValue12 The probability of observing a sample mean of 50 or higher if the true mean were 40 is 12 Approximately 12 of sample proportions can be expected to be at least this much higher than 40 Hypothesis Testing Steps Proportions Means HoPPo Horuu0 HA3Pgtlt P0 HAzugtlt u0 1 zwherese M tx o wheresei se J pValue pValue conclus1ons concluswns Conclusions ifpevaiue 5 C aipha ifpevaiue gt C Reteetn 7 Faiitu reteetnutput u 7 Test is signineant 7 There is engugn evidence to suggest There is insurneient a change increase evidence to suggest a mere etc or decrease 7 strong evideneergr HA 7 strong evidence against Ht 7 Pussibie TypeiErrur 7 ng strong evidence rgr A 7 ng strong evidence against Hg 7 Pussibie Type ii Ermr Proportions Means Random sampies Random sampies gtt5 One ortnese two tetanus Assumptions for Testing e n gt an 7 Pupuiatiun s ngrmai Errors Tvpe i reteottne nuii wnen in tact vou shouidn t nave7it was aotuaiiv tne true oonoiusion Tvpe ii ran to reteottne nuii wnen in tact vou snouid nave tne aitemative was tne correct one at Hypothesis HW 9195 no p HA 7 5 Aresearchergets e ss2angapvatueat s2rt Whatisherdecisian m7 tsresmerrgri t eis 7 raingretectriuiiitest aiterriative 7 PussibieWpEiierrur itaut at on peapie there were 22 sAcuessES i mg the paint estiriate tar e2 55 w wquot M is irisigrirticarit irisurncierit widence tor thepapuiaianpmpaman 2 l7 Yhemarginmermris tsrr rinutneeantiueneemterwt and interpret pamtesamate marsmafenvx 7 51377 7pm 5277 it is Piausibie the true Prapaman eauiu he s the nun since t 5 made sa We can ruie t am mi is Equivaiemta raring ta reieg the nuii DEiWiEhiSE Sai UVE92 mnndence vviii a 95 or he wger av narravve Higher canndencei s wder Hypothesis Testing 2 Dependent Samples 2 popuiations and dependent sampies Ex Twins used rortwo groups or same person in ootn groups gt dependent Use a matoned pairs tust means Matched pairs is rordependent sampies is a is 0 H H H A row or HrMgt0 A row or Hisrslt0 t Mitts or were Matched Pairs HW 101104 aiggd pressure was recorded tor a dinerent pegpie d was taken before an after a medicine Ber re ner Dinerenee supteett tat t25 26 supteet 2 t67 t36 at supteet a 137 t2n t7 Average tat 6667 t27 24 6667 mm trie petgre average miriustrre atter average eguatstne girtererice average A matcneg pairstest reduces mg sampiestg one or taking grttererices sg it reduces tu a griesampie test wrtrr sampie size 3 Comparing 2 Proportions or 2 Means Independent Testing 2 independent proportions 0p1p2 or pisz HA3P1gtP2 or P1 P2gt0 HA3P1ltP2 or P17P2lt0 HA3P1 P2 or P17P2 0 Testing 2 independent means 11031111112 or ul uz0 AIL11gtH2 or p1p2gt0 Ainulltnu2 or p1p2lt0 irritate or ill H2 iii 2 Proportions HW101104 Does a new medicine help lower cholesterol People with high cholesterol were randomly assigned to receive eitherthe new medicine or a placebo After 5 weeks 106 of the 8499 on the new medicine had lower cholesterol and 86 of the 8091 in the placebo group had lower cholesterol Is this a significant difference Set up the hypotheses Harp1pz or Ptpz0 HA P1 P2 or P1Pz 0 2 Proportions HW101104 The pvalue is 2673 Interpret We do not have strong evidence that there are different results We don t reject the null at 05 o The test is not significant The 95 confidence interval for the difference in proportions will contain 0 o If we make the wrong conclusion it would be a Type II error 2 Means HW 101104 A summary for types of sales for an iPod Mean n Bid 231611 96 Buyit now 221667 128 Find the point estimate for the difference in population means 231611 221667 9944 The pvalue is 006 Can we conclude there s a popula ion dilference 3 r I 05 t u V thout nding he Cl determine whether it will contain 0 or not Wejust concluded that the means are most likely dilferent so 0 is not a In quot Interval ChiSquare Goodness of Fit Used fortesting if category proportionscounts are equal to specified values Or used fortesting if category proportionscounts are all equal to each other Example hypotheses HEl Proportions are as stated example 20 30 50 H A otherwise HEl Proportions are all equal to one another H A otherwise Goodness of Fit Steps 1 List hypotheses 0 Ha the proportions are as claimed 0 Ha otherw39se 2 Check assumptions A The sample is randomly selected B Each expected cell count is at least 5 3 Compute thetest statistic 2 7 abs explz x e 2 exp 4 State degrees of freedom c 1 number of cells minus 1 5 Find the p value on the ChiSquare distribution on df c1 by nding the probability above 2 6 State the conclusion Computing the pvalue Suppose we have 6 categories and X2 212164 Then df 6 1 5 Look up the probability above the test statistic x2 never below gem P1on a rat quot212164 quot05320707 Conclusions for Goodness of Fit If pvalue 5 or alpha If pvalue gt or Reject H0 Fail to reject H0 Test is insignificant There is insufficient evidence to suggest the proportions are Test is significant There is enough evidence to suggest the proportions are different that what s different than what s specified specified Possible Type I Error Possible Type II Error Goodness Of Fit HW 111112 It is thought that a certain type of cookie box should contain the following percentages of three varieties Chocolate Chip 40 Oatmeal 30 Sugar 30 o A box is selected at random and opened Here are the observed counts of 50 cookies Compute the expected category counts VWI we have a valid chisquare test Yes because the sample is random and each expected count is at least 5 TrueFalse the degrees of freedom for this problem would be 50 1 49 False it is 3 1 2 since we do categories 1 for goodnessof fit Goodness of Fit HW 11 111 2 We want to see if a 20sided dice is fair balanced To investigate this we roll it 80 times and record the number of times each face comes up The X2 statistic is 21007 Fill in the boxes below to nd the pvalue df 20 1 19 categories 1 OF 19 Probx gt ii 3921007 033641425 ChiSquare Test for Independence Used in an r x 0 contingency table to test if there s an association between the categorical variables cf contingency table association from Test 1 The null hypothesis is that the explanatory and response variables are independent The alternative hypothesis is that there is a strong association between them Independence Test Steps Ho variables are independent Ha association I Assumptions random sample and each expected count is at least 5 t t 1 1 t t 1 Compute each expected count expW A overall total obs exp 2 Compute the test statistic 962 Compute df r 1c 1 Find the pvalue the probability above X2 State your conclusion exp 9301

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.