Statistical Methods MATH 2300
Popular in Course
Popular in Mathematics (M)
This 5 page Class Notes was uploaded by Ms. Ally Koelpin on Thursday October 22, 2015. The Class Notes belongs to MATH 2300 at Texas Tech University taught by Staff in Fall. Since its upload, it has received 25 views. For similar materials see /class/226472/math-2300-texas-tech-university in Mathematics (M) at Texas Tech University.
Reviews for Statistical Methods
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/22/15
Chapter 13 review Data is the information we gather with experiments and with surveys Q Say we want to know how well a group of students did in a statistics course The data could be what every student made on their final exam grades 3 Statistical Methods I Design planning how to obtain data Q conduct an experimentsurvey I Description summarizing the raw data and presenting it in a useful format Q charts graphs average mean median etc I Inference making decisions or predictions based on data Subjects are the entities that we measure in a study Q people schools rats etc Population vs Sample I Population all subjects of interest I Sample subset of the population for whom we have data Q Let s say we want to know how many Texas Tech students like coffee To figure this out we surveyed 50 random students in the SUB Population every Texas Tech Student Sample the 50 random students we surveyed Parameter vs Statistic I Parameter numerical summary of the population usually unknown I Statistic numerical summary of a sample taken from the population Q The average number of cigarettes smoked by all teenagers last year parameter The average number of cigarettes smoked by a proportion of teenagers last year statistic The proportion of all teenagers who smoked in the last month parameter The proportion of teenagers who smoked last month out of SO teenagers statistic After looking at the cars in the North Commuter parking lot we conclude that 67 of the people who park in North Commuter drive trucks 67 is a parameter A survey of 50 car lots in America found that 35 of the cars in the car lots in America are BMWs 35 is a statistic Randomness each subject in the population has the same chance of being included in the sample Random sampling enables the sample to be a good reflection of the population Q Let s say I want to know if I want to know if everyone in the class understands a question The top 10 scorers on the first exam not random Everyone sitting in the last row not random Picking 10 names off the attendance random Variability Note that measurements may vary from subject to subject and from sample to sample Q f want to know how the class did on an exam f took the average of everyone whose name starts with an 1s the average will be different than if take the average of everyone whose name starts with a 1m In saying this we can get a more accurate idea of the population if we take larger samples Computer and Statistics Data file large sets of data are typically organized in a spreadsheet format Database an existing archive collection of data files Applet short application program for performing a specific task Q random number generator Variable any characteristic that is recorded for the subjects in a study I CategoricalVariabIe described by wordsg gender marital status I Quantitative Variable described by numbers number of pets in a household height 0 Discrete Variable there is a finite number of possible values Q number of pets in a household 0 Continuous Variable the values are represented in an interval Q height All forms of measurements are continuous variables time height volume Proportion Frequency number of times an observation has occurred ProportionRelative frequency frequency Of a centain dass total number of observatIons The proportion will always be between 0 and 1 Percentage proportion multiplied by 100 Q 4 students received an A out of 40 students The frequency of getting an A is 4 The proportion of students who got an A is 440Ol The percentage of students who got an A is OlxlOOlO I Possible values ofvariable I I I Frequencyrelative frequencyproportion I I I I I Ex Frequency Table The president of student council wanted to know how many hours Tech students party Here were his results Number of Party 01 23 34 4 or more Hours ICount I4 102244 Variable of interest Number of hours Tech students party Type of variable Quantitative Discrete or Continuous Continuous Add proportions to the frequency table I Number ofParty Hours I 01 I 23 I 34 I 4or more I Relative Frequency I 005 I 0125 I 0275 I 055 Distribution 0 A distribution tells us the possible values a variable takes as well as the occurrence of those values frequency or relative frequency 0 A graph or frequency table describes a distribution Graphs for Categorical Variables I Pie Charts I Bar Graphs Graphs for Quantitative Data I Dot Plot small data set discrete variable I Stemandleaf plots small data set discrete variable I Histograms large data set discrete or continuous variable nme woes Usedmr msp aymga ume senea a dale selcoHecled overume CenterofDala 1 EX n Mamaquot mmd eobsewauon odd numbere39obsewauons average onwo rmdd e observauons even number Mobservauons Mode Va uelhaloccurs most men mgneae pawl m a msmbuuon or mgneae bar m ram Mean 2 me mamg Shave ofa mamhuuon Svmmmrl Skeweumlhe rum ongermghl 13 mean gt meman skawaamzha left Hunger e la mean lt meman symmemc pretermeanmgnwakewea pretermeman spraaaornaza Range maxrrmn Devlauon x 7x sum awemauana s aways u 21er 0 a a wavs nonnegauve Standard Devlauon 5 n Emplrlcal Rule quot3 msmbuuon omaea 5 beHrshaped hen appmmmaeew 52 anne dale vans mmn 1 standard waler anne mean 95 anne dale vans mmn 2 standard aemauona anne mean 99 7 anne dale oaus wnn 3 standard aemauona anne mean I lnterquartile Range IQR Qg 01 x x I ZScore Z I Outliers 0 An outlier falls far from the rest of the data 0 Outliers are represented in the tails of a distribution Detecting Potential Outliers 15 x IQR rule xlt 01 15 x IQR or x gt Qg 15 x IQR zscore rule 2 lt 3 or z gt 3 I Fivenumber summary min not including potential outliers 01 Oz Qg max not including potential outliers I Box Plot using fivenumber summary horizontal line ltgt tail Resistant Measures A numerical summary measure is resistant if extreme observations outliers have little ifany influence on its value Ex resistant to outliers median not resistant to outliers mean range standard deviation linear correlation Two variables I Response variable I Explanatory variable Association An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable Association between Two Categorical Variables Contingency Table Frequency of a class in a specified category conditional pmponlon Total number of observations in a specified category Association between Two Quantitative Variables Scatter Plot Horizontal axis explanatory variable x Vertical axis response variable y Trend linear curved clusters no pattern Direction Positively associated y increases as x increases Negatively associated y decreases as x increases Strength linear correlation r n 1 Sr sy r only measures strength of linear relationship r is always between 1 and 1 r gt 0 gt positive association r lt 0 gt negative association r is close to 1 or 1 gt strong relationship r is close to 0 gt weak relationship r is unitless does not depend on the variables units Two variables have the same correlation no matter which is treated as the response variable Squared correlation r2 r2 x 100 of the variation in y can be explained by x r in posItIve assocnatlon gt negative assocnatlon gt Regression Line A a bx A is the predicted value ofy when x is given Residual y j measures the size of the prediction errors the vertical distance between the point and the regression line Sy b r slope b gt 0 gt positive association b lt 0 gt negative association S x a 7 MY yintercept predicted value for y when xO Regression Outlier an outlier that lies far away from the trend that the rest of the data follows An observation is influential if 0 Its x value is relatively low or high compared to the remainder of the data 0 The observation is a regression outlier Lurking Variable usually unobserved influences the association between the variables of primary interest Simpson s Paradox When the direction of an association between two variables changes after we include a third variable and analyze the data at separate levels of that variable
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'