S301 Week 3 Lecture & Textbook Notes
S301 Week 3 Lecture & Textbook Notes STAT-S301
Popular in Business Statistics
verified elite notetaker
Popular in Statistics
This 4 page Class Notes was uploaded by Lauren Detweiler on Thursday January 29, 2015. The Class Notes belongs to STAT-S301 at Indiana University taught by Hannah Bolte in Spring2015. Since its upload, it has received 226 views. For similar materials see Business Statistics in Statistics at Indiana University.
Reviews for S301 Week 3 Lecture & Textbook Notes
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 01/29/15
S301 Textbook Week 3 Ch 51 amp 6 pgs 7785 9495 105124 Required concepts not covered in lecture 1 Best Practices Contingency Tables a Use contingency tables to nd and summarize association between categorical variables b Be on the lookout for lurking variables c Use plots to show association d Exploit the absence of association II Pitfalls Contingency Tables a Don39t interpret association as causation b Don39t display too many numbers in a table 111 Scatterplot a A graph that displays pairs of values as points on a 2D grid IV Response Variable a Placed on the yaxis in scatterplots b The variable that has the variation we want to understand explain or predict V Explanatory Variable a Placed on the xaxis in scatterplots b The variable we use to explain variation in the response VI Association in Scatterplots a Association the value of the xcoordinate tells us about the value of the ycoordinate ie they are related b Visual test for associations i A method for identifying a pattern in a plot of numerical variables Compare the original scatterplot to arti cial plots in which the variables are unrelated c Describing the association i Once you decide the scatterplot shows association you need to describe the association 1 Direction does the pattern trend up down or both a Positive points concentrate in the lower left and upper right As explanatory variable increases so does the response b Negative pattern running the other way As x increases y tends decrease 2 Curvature does the pattern appear to be linear or does it curve a Linear patterns have consistent direction b Curved direction changes VII r6999 VIII 3 Variation Are the points tightly clustered along the pattern a Strong association means little variation around the trend 4 Outliers and surprises did you nd something unexpected a An outlying point is almost always interesting and deserves special attention Best Practices Correlation Matrix a To understand the relationship between two numerical variables start with a scatterplot Look at the plot look at the plot look at the plot Use clear labels for the scatterplot Describe a relationship completely Consider the possibility of lurking variables Use a correlation to quantify the association between two numerical variables that are linearly related Pitfalls Correlation Matrix a b c Don39t use the correlation if the data are categorical Don39t treat association and correlation as causation Don39t assume that a correlation of zero means that the variables are not associated Don t assume that a correlation near 1 or 1 means near perfect association Required concepts covered in lecture I II III IV V Standardized Data Values a It s hard to compare distributions on different scales or w diff units b Standard normal distribution mean 0 standard deviation 1 c If a histogram looks like the normal histogram 1 ii iii About 68 of data is within one SD of the mean About 95 of data is within two SDs of the mean About 997 of data ie almost all the data is within three SDs of the mean iv And The convexity of the curve changes at 1 SD V You should be able to look at a histogram and estimate the SD from the data Contingency Tables a Cells are mutually exclusive b ie Total Televisions Sold by Region and Store Frequency Distribution Table a Marginal distributions and conditional distributions can both be found on a frequency distribution table i Marginal distributions sum totals for the probabilities 1 Found in the margins 2 Also marginal distributions can be displayed in different types of visuals see textbook andor slides If the relationship seems linear a how do we measure the strength of the relationship i ii iii iv v Correlation Covariance quanti es the strength of association between numeric variables Recall Variance find deviations square them divide by n 1 391 1 i 39 52 Z 11 JC2x2i x c n l Vx1 ix1 ixz J39rx2 39c l xI Jquotrx Jquotr n l Now replace one of the terms with y s XI iy1xz iyz xn fyn i C0vxy n 1 See Week 3 Correlation Examplexlsx 9 Correlation InClass A In Excel use covariancesDataColumn1 DataColumn2 a Covariance like variance has strange units xunitsyunits b Correlation adjusts for this C0vx y r C0rx y s s c Correlation Covariance divided by the product of the standard deviations d This is normalized so it is unitless no matter What 1 S r S 1 e See Week 3 Correlation Examplexlsx 9 Correlation InClass A f In Excel use correlDataColumnl DataColumn2 VI Scatterplots a Look at different scatterplots to practice recognizing correlation b Terms used to describe scatterplots i Direction positive negative both neither ii Trend linear curved nonlinear iii Variation spread or not iv Outliers c Correlation is a measure of linear association i Not very meaningful if the trend is nonlinear ii View lecture slides or textbook for graphical examples of other correlation issuesquot
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'