Class Note for STAT 528 at OSU 62
Class Note for STAT 528 at OSU 62
Popular in Course
Popular in Department
This 21 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Ohio State University taught by a professor in Fall. Since its upload, it has received 21 views.
Reviews for Class Note for STAT 528 at OSU 62
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Statistics 528 Data Analysis Lecture 2 June 22 2006 Overview of Today s Lecture 0 Example of use of median in a production process 0 IPS Sections 13 and 21 The Normal Distribution Scatterplots Production Process Median Example 0 Two groups of workers each with five people 0 Group A is trained using one method and Group B is trained using another method 0 Over the next 5 days we monitor how many completed products each worker makes on each day 0 We want to know if the method used to train Group A results in more output than the method used to train Group B Production Process Median Example Training Method A Median 1922 Mean 18 19 j J a O l l l l l 0 5 10 15 20 25 30 35 Frequency Output Training Method B 00 Mean 18 43 Median l8 8 ltr c 0 5 10 15 20 25 30 35 Frequency Output Production Process Median Example Group A Group B asai a at Review Strategy for Exploring Data 0 Plot the data Categorical data bar chart or pie chart Quantitative data stemandleaf histogram time plot etc 0 Look for the overall pattern and deviations from that pattern 0 Calculate numerical summaries to describe the center and spread Next step Apply a Mathematical Model 0 Why would you do this Most obvious easier to summarize the information this way than reporting all the values More useful If the data are representative of a larger group the mathematical model is 30 15390 230 2310 useful for describing the WW larger group Histogram of Toddler FullTime Weekly Rates 0015 Q 0010 I Proportion 0005 I 0000 I Density Curves 0 Density curves are the mathematical models used to represent the distribution of data 0 The link between the curve and the histogram is the proportion of data that falls between two values I 23 Area 303 8 1390 1392 8 1390 1392 Grade equivalent vocabulary score Grade equivalent vocabulary score Density Curves 0 Density Curve facts Always above the horizontal axis no such thing as a negative frequency Has exactly area 1 underneath it the total range covers 100 of the data Area under the curve over a range of values is the relative frequency of observations in that range Outliers are not represented by a density curve Types of density curves 0 Any curve satisfying the properties on the previous page can be a density curve 0 Some common densitv curves 0 Uniform gt O i i i i l l o Chi squared o 2 4 6 8 w V a l u e V a l ue o Normal ei Value Numerical summaries of distributions also work for density curves Some numerical summaries are easy to understand for a density curve 0 Mode a peak point in the curve 0 Median point with 50 area on each side 0 Quartiles Percentiles l i l l I 2 4 6 8 l 0 25th Percentile 50th Percentile 75th Percentile 0 2 4 8 8 l0 Value Mean of a density curve 0 Mean of a density curve is the balance point at which the curve would balance if made of solid material JV k Standard Deviation of a Density Curve o The concept approximately the average distance from the mean 0 Difficult to approximate by eye but can be calculated mathematically Notation Observation summaries vs Density properties o For observations of a variable Mean 7 Standard Deviation S o For a density curve Mean 1 Standard Deviation 039 Normal Distribution Density 0 The normal density is a WWW R synunet c beHshaped curvethatis usefulfor fc l40l descnbn1g many types of data 1 2 6 2E a 1 2702 fx0 Why is it important 1 Good description of real data 2 Good approximation to the results of chance outcomes 3 Statistical inference procedures rely heavily on the normal distribution 6895997 rule lt 68 Of data gt 95 of data 997 of data Example Heights of women age 18 to 24 o The distribution is approximately normal X Nu0392 N645625 Measurements are in inches 0 How tall would a woman 1824 need to be to be in the top 5 of heights Standard Normal o If a variable follows a normal distribution then 2 X uo follows a standard normal distribution 2 N0 1 o This fact is very useful for finding areas under a normal curve other than the ones exactly at the 1 2 and 3 SD marks 0 When an observation is transformed by subtracting the mean and dividing by the standard deviation the resulting value is called the zscore Example IQ scores o IQ scores are normally distributed with a mean of 100 and a standard deviation of 10 XN100100 o What fraction of people have an IQ score under 85 Draw a picture Shade the region of interest Look up the areas you need in Table A 10 Example IQ scores o What if you need the area to the right of a point or on an interval Use symmetry or the fact that the total area has to sum to 1 o What fraction of people have IQ scores between 98 and 115 Normal Quantile Plots o Histograms and stemplots can find obvious violations of normality but we need a better tool for subtle problems 0 Instead use a normal quantile plot Xaxis Data values Yaxis zscores of the percentiles of the data values Note In Minitab the yaxis is converted to percentages rather than the raw zscores but the spacing is based on the zscores 11 Probability Plot of ToddlerFTW Normal 95 CI Mean 159 9 SDev 29 19 N 137 A D 1 350 Prvalue lt0 005 Percent 8 i i i i i 100 150 200 250 300 Todd lerFl39W Normal Quantile Plot o If the points lie close to a straight line the distribution is approximately normal 0 Do not be worried if the observations deviate a little bit 12 Normal Quantile Plot Examples Knownmu m Relationships Between Variables 1 Variable Chapter 1 2 Variables Chapter 2 Graphical Histograms Scatterplots Summaries dotplot etc Numerical Center Correlation Summaries spread Models Density Curve Regression 13 Exploring the Relationship o Generically we call two variables X and Y o Are the variables associated When the value of one increases does the other increase When the value of one increases does the other decrease Scatterplot o ODJFS child care data X Fulltime weekly rate for infants Y Fulltime weekly rate for toddlers Scatte plot of ToddleriFTW vs InfantiFTW 240 u 220 l 130 Toddler HW 150 140 120 100 14 Association or Explanation o In some cases we are only interested in understanding whether the variables are associated ODJFS is a good example 0 In some cases one variable is thought to explain another Example Pressure treatment on plastic Response Variable dependent variable Migration of chemical after 24 hours Explanatory Variable independent variable Pressure level for treament 0 Note Do not equate explanation with causation Examples 0 Time spent studying vs grade on exam 0 Height of husband vs height of wife 0 Percent of districts voting majority Republican in 2000 vs percent of districts voting majority Republican in 2004 15 What to look for in a scatterplot 0 Overall pattern deviations from the pattern 0 Form of relationship linear curved etc o Direction and strength of relationship Positively associated increase in X is seen with increase in Y Negatively associated increase in X is seen with decrease in Y Do the points closely follow this pattern or loosely o Outliers Pattern Linear A02 00 02 O4 06 081012 16 Pattern Curved o m o o a o 8 C 0 o o 0 N o o w 0 h 00 o C o gt o C o g o o S x o e 8 M c 0quot O 7 o O o co 0 o o w w o 00 d 49 Pattern Clustered of 06 m 0 0 Q 0C 0 gt c 8 8 t o m o cg o r r r r r r 00 02 04 06 08 10 17 Association PositiveNegative Positive Negative l 0 o e 0 o Q vi oo 0 o 05 o 7 o 0 Y 902 0 0080 0 0 so 0 o O o O o 00 003 o0 o o o 0 0 0 0 o to 000 09 o 98gt lt50 0 o o 0 lt0 00 I6 0 o gt o o 0 o lt30 00 o o ltr 00 v o W o O s 0 o Cgt 00 o 8 0 0 9 0 Qgt oo o m 88 ltgt g 00 cog 0 00 g c o o o 0 lt2 0 00 000 o o Q 0 0 o b c 50 0 0 0 S N o 00 C I I I I I I I I I I I I 00 02 04 06 08 10 00 02 04 06 08 10 X X o N o If 0 Defoe f 0 6 00 o M O wequot or 0 2 o o f 00 00 of lt90 0quot oo 007 0 30 0 f o 0 0 0 O a so 0 Md gt g o 00000 gt mic o 33 00000 0000 o 0 0 o o o o o g 0 gt0 08 0 o o 000 gr 0 co low W o o 0 o 0 Go 0 Dec o O 0 80 o O N M o o o 0 on o 00 a 29 0 0X 000 O o c 0 07 9 gt30 0 o m I I I I r I I I I I I Cquot I I I I I I 02 04 06 08 Io 00 02 04 06 08 Io 00 02 04 06 08 Io Outliers C 00005 a a Z on o 8amp9 gt m y o w o O 5 ago 0 gtos 0 x 33gt 0 6330 0 00 l l l l l l 00 02 04 06 08 lo Adding categorical variables 0 Use different colors or symbols to add a categorical variable to a scatterplot don t forget to label scatterplot of ToddleriFTW vs InfamiFTW 12a 140 1m 1m 2m 22m 24m 25 1143quotan pxoexm rva cmmnm mum 19 A note of caution lurking variables 0 Factors other than the main ones of interest may have an effect 180 m 160 TABLE 21 Corn y1elds bushels per acre 1n an a 0 agricultural experiment Q o o 0 U Plants per acre 1956 1958 1959 1960 Mean 6 1 39 C 0 12000 1501 1130 1184 1426 1310 3 0 16000 1669 1207 1352 1498 1432 Q 20000 1653 1301 1396 1499 1462 C 1 24000 1347 1384 1561 1431 V 28000 1190 1505 1348 Mean 1608 1246 1301 1498 39gt 100 l 12 16 20 24 28 Rate in thousands of plants per acre Categorical Explanatory Variables o Sidebyside boxplots For just a few measurements we could plot the actual values previous example 0 Backtoback stem plots 0 For nominal variables it makes no sense to talllt about pOSItIve or negative assoc1ations o For ordinal variables we can make a statement about pOSItIve or negative assoc1ations 20 Example Boxplots for an ordinal variable vs a continuous variable Personal income Syear 200000 150000 100000 50000 ee I I I I I I No HS Some HS HS grad Some college BS degree Higher degree 21
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'