# ProbStats MATH 1530

ETSU

GPA 3.5

This 9 page Class Notes was uploaded by Ms. Ismael Spinka on Sunday October 11, 2015. The Class Notes belongs to MATH 1530 at East Tennessee State University taught by Deborah Hosler in Fall.

Date Created: 10/11/15

MATH1530581213 Probability and Statistics Week 1Spring 2005 Learning statistics is quite like learning a foreign language Here is a list ofjust some ofthe terms and symbols that are introduced during the course ofthe semester It may help you to print out this list and jot down the meanings of terms as you come across them in lectures and in reading the text Then keep the list in a handy place so that you can reference it as needed GLOSSARY OF TERMS Data Individual Variable Observation Categorical variable Quantitative variable Distribution Count noun Proportion Deviation Symmetric Skewed right Skewed left Center Spread Outlier Bar chart Pie chart Histogram StemandIeaf plot Time plot Mean Median Fivenumber summary Boxplot Standard deviation Variance Resistant measure Sample Population Statistic Parameter Descriptive Statistics Inferential Statistics NOTATION GLOSSARY Give basic definitions of these statistical and mathematical symbolsabbreviations and when applicable indicate whether the symbol refers to a population parameter or a sample statistic AAA Here is an Example 7c x bar Definition sample Mean Circle one Parameter or AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Now you do the rest ti Greek lower case mu Definition Circle one Parameter or Statistic 5 English lower case s Definition Circle one Parameter or Statistic 039 Greek lower case sigma Definition Circle one Parameter or Statistic n English lower case n Definition Circle one Parameter or Statistic N English upper case N Definition Circle one Parameter or Statistic S2 s squared Definition Circle one Parameter or Statistic 0392 sigma squared Definition Circle one Parameter or Statistic p English lower case p Definition Circle one Parameter or Statistic 13 p hat Definition Circle one Parameter or Statistic Below are more statistical and mathematical symbols to be used during the course but we won t apply the terms statSticor parameterto these 2 Greek upper case sigma Q1 Q3 MECI Min 5 at most W at least MATH1530O4O71517 Relationships between variables Recognize the difference between quantitative versus categorical variables The first step of bivariate analysis is to make an appropriate graphical display 03gt Probability amp Statistics Fall 2004 Week 7 LEARNING OBJECTIVES Chapters 4 5 amp 6 1 When you are looking for associations between two categorical variables either a stacked bar chart or a clustered bar chart will often provide the best display 2 On the other hand a scatterplot is the tool of choice when looking for an association between two quantitative variables 3 Identify the explanatory X and response y variables in situations where one variable may be used to explain or to predict another ll Looking for associations between two quantitative variables A Scatterplots 1 Make a scatterplot for two quantitative variables always placing the explanatory variable if any on the horizontal axis 2 Add a categorical variable to the scatterplot by using different plotting symbols or colors 3 Recognize positive vs negative association a linear vs a curved form a weak vs a strong relationship 4 From scatterplots identify linear associations for which further analysis should be carried out ie correlation and linear regression B Correlation 1 Compute the linear correlation coefficient r given a set of bivariate data 2 Recognize that r gives a numerical indication about the strength and direction of the linear association between two quantitative variables 3 Compare values of r along with the patterns shown in scatterplots to pick the best explanatory variable out ofa several alternatives that can be used to predict values of a given response variable 4 Know the basic facts about correlation listed and discussed in class and on pages 9091 of the text lll Leastsquares linear regression A Build the model C 1 Understand what is meant by slope b and yintercept a in the equation of the Least Squares Regression Line 2 Calculate the leastsquares regression line ofy on x from a set ofbivariate data that is construct the linear model that best describes the data ie the bestfit line yhal a bx 3 Give the equation of the regression line ofy on x from the means xbar and ybar the standard deviations sx and Sy and the correlation coef cient r Use the formulas given on page 107 with these 5 statistics to find first the slope b and then the yintercept a 4 Know that the regression line ofy on x always passes through the point xbar ybar 5 Be aware ofthe other facts about leastsquares regression listed on pages 110 amp 111 oftext Use the model 1 Use the leastsquares regression equation to draw a graph of the bestfit line through the points in a scatterplot 2 Predict y for a given value ofx either from the equation or from a graph of the fitted line in a scatterplot D Assess the model 1 Recognize potential outliers and influential points in a scatterplot with the regression line drawn on it 2 Compute a residual when given the observedvalue and the predictedvalue ofy for some particular value of x 3 Calculate all the residuals for a bivariate data set and plot them against the observed values ofX and recognize unusual nonrandom or nonlinear patterns when displayed in residual plots 4 Use R2 to describe the extent to which the variation in the response variable can be accounted for by its straightline relationship with the explanatory variable E Limitations ofcorrelation and regression 1 Understand that r and the leastsquares regression line are not resistant to the effects ofoutliers and extreme observations Both can be strongly influenced by just a few extreme observations 2 Understand that even a strong correlation does not necessarily mean that there is a causeeffect relationship between two variables 3 Understand the dangers presented by extrapolation and lurking variables u you haven t done so already check out the Links to Applets in the W to explore regression and correlation located under the material for Chapter 4 amp 5 in the course web page not in your instructor s page for Math 1530 IV Looking for relationships between two categorical variables A TwoWay Tables aka contingency tables 1 Recognize that twoway tables are useful because they help organize large amounts of data by grouping responses or outcomes into categories 2 Notice that the categories pertaining to the row variable label the rows that run across the table 3 Likewise the categories ofthe column variable provide labels for the columns that run down the table B Marginal Distributions 1 Know how to compute the marginal distributions in either frequencies or relative frequencies for a two way table 2 Understand that the row totals give the distribution of the row variable and the column totals give the distribution ofthe column variable C Conditional Distributions 1 Realize that associations between two categorical variables shown in a twoway table are explored by examining the conditional distributions within the rows and columns ofthe table 2 Know how to find the conditional distribution of the row variable for one specific category of the column variable by looking only at that one column in the table Calculate the relative frequency for each cell in the column by dividing the count in the cell by the column total 3 Know how to find the conditional distribution of the column variable for one specific category of the row variable by looking only at that one row in the table Calculate the relative frequency for each cell in the row by dividing the count in the cell by the row total 4 Be able to compare the conditional distributions in order to describe the association between variables Notice that there is a conditional distribution of the row variable for each column in the table and there is a conditional distribution ofthe column variable for each row in the table D Limitations of twoway tables 1 Beware of lurking variables 2 Understand that the term Simpson s paradox refers to a change or reversal in the association between two variables that may occur when the in uence of a third variable is taken into account Warmup Problems 1 An old study in Iowa produced the following data on corn yield bushels per acre in 19101919 and value per acre in 1920 for farmland in 10 counties This is actual data times have changed since 1920 County 1 2 3 4 5 6 7 8 9 10 Yield 40 36 34 41 39 42 40 31 36 30 Value 87 133 174 285 263 274 235 104 141 115 These data were collected in order to determine if corn yield is a good predictor of land value Sketch a scatterplot of these data Is the overall pattern roughly linear Are there any outliers or other unusual features Using the data given in problem 1 answer the following questions 2 Compute the least squares line ifcorn yield is used to predict land value Plot the line on your scatterplot State the correct interpretation ofthe slope ofthis equation 3 lfyou were to compute the least squares regression line and the residuals for each point you need not actually perform these calculations which observation would you expect to have the largest residual Explain how you found your answer 4 Farmer McDonald had a yield of 39 bushels per acre What is the amount of the residual for McDonald s farmland NOTE McDonald s farm is the observation in the data set from county 5 5 Compute the correlation between the corn yield and the land value 6 Omit the suspected outlier from your data and recalculate the correlation between corn yield and land value Explain why the two correlations differ 7 About what percent of the variation in the value of the farmland is explained by the linear relationship between corn yield and land value 8 For a scatterplot showing a linear relationship between two quantitative variables and having a correlation of r 098 which ofthe following can be said about the data There is a strong linear association There is weak linear association You would expect to see a great deal of scatter among points in the scatterplot This is clearly a case of Simpson s Paradox 1 000 9 If r 098 the linear association is positive negative symmetric insuf cient information to tell 905 10 Given these sample statistics from a bivariate dataset xbar 500 ybar 1099 sX 194365 sy 3811226 and r 0997462 compute the slope and the yintercept in the equation of the leastsquares regression line ofy on x that models these data The Roper Organization 1992 conducted a study as part ofa larger survey to ascertain the number ofAmerican adults who had experienced phenomena such as seeing a ghost feeling as ifyou have left your body and seeing a UFO A representative sample of adults age 18 and over in the continental USA were interviewed in their homes during July August and September 1991 The results when respondents were asked about seeing a ghost are shown in the following table M A 4 U I 039 Age Group or over How many American adults responded in some way to the question Have you ever seen a ghost Calculate the marginal distribution for the variable age group in both counts and relative frequencies Within each age group find the percent who reported seeing a ghost Of the respondents that reported never seeing a ghost what percentage were aged 30 or over Ofthe respondents that are aged 30 or over what percentage reported never seeing a ghost Explain how 14 and 15 are not asking the same question

