### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Class Note for STAT 528 at OSU 33

### View Full Document

## 23

## 0

## Popular in Course

## Popular in Department

This 11 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Ohio State University taught by a professor in Fall. Since its upload, it has received 23 views.

## Popular in Subject

## Reviews for Class Note for STAT 528 at OSU 33

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Statistics 528 Data Analysis I Production Process Median Example Lecture 2 June 22 2006 0 Two groups of workers each with five people 0 Group A is trained using one method and Group B is trained using another method 0 Over the next 5 days we monitor how many completed products each worker makes on each day 0 We want to know if the method used to train Group A results in more output than the method used to train Group B Overview of Today s Lecture Production Process Median Example 0 Example of use of median in a production process 0 IPS Sections 13 and 21 The Normal Distribution Scatterplots Training Method A 00 Mean 18 19 Median 1922 st 0 l l l l l l l l 0 5 10 15 20 25 30 35 Frequency Output Training Method B Mean 1843 Medlanl88 q 0 5 10 15 20 25 30 35 Frequency Output Production Process Median Example Next step Apply a Mathematical Model Group A Group B 3 EJOZTT ETT E QEQ DE 25 I 10 ll 123458T89 10 0 Why would you do this Most obvious easier to summarize the information this way than reporting all the values More useful If the data l are representative of a larger group the 0015 I Q 0010 0 005 Histogram of Toddler FullTime Weekly Rates 0000 mathematical model is go 1539 230 useful for describing the WeeklyRa larger group 250 Review Strategy for Exploring Data Density Curves o Plot the data Categorical data bar chart or pie chart Quantitative data stemandleaf histogram time plot etc 0 Look for the overall pattern and deviations from that pattern 0 Calculate numerical summaries to describe the center and spread 0 Density curves are the mathematical models used to represent the distribution of data 0 The link between the curve and the histogram is the proportion of data that falls between two values x I x 4 2 10 1392 8 1390 1392 Grade equivalent vocabulary score Grade equivalent vocabulary score Density Curves 0 Density Curve facts Always above the horizontal axis no such thing as a negative frequency Has exactly area 1 underneath it the total range covers 100 of the data Area under the curve over a range of values is the relative frequency of observations in that range Outliers are not represented by a density curve Numerical summaries of distributions also work for density curves Some numerical summaries are easy to understand for a density curve 0 Mode a peak point in the curve 0 Median point with 50 area on each side 0 Quartiles Percentiles h in N m o e o c m m o o m In a 0 cf m m c 8 e 8 a O O m m a o a o a o a as I I I I O I I I I I I C I I I I I I 8 10 Cl 2 4 6 8 10 0 2 4 6 8 10 Value 25th Percentile 50th Percentile 75th Percentile Types of density curves 0 Any curve satisfying the properties on the previous page can be a density curve 0 Some common densitv curves 0 Uniform gt c I I I I I I o Chisquared o 2 I 6 8 I V a l u e V a l ue 0 Normal Value Mean of a density curve 0 Mean of a density curve is the balance point at which the curve would balance if made of solid material JV A k A Standard Deviation of a Density Curve o The concept approximately the average distance from the mean 0 Difficult to approximate by eye but can be calculated mathematically Normal Distribution Density 0 The normal density is a xm a symmetric bellshaped curve that is useful for fx39a0 describing many types of data 1 eilx z 2702 fx0 Notation Observation summaries vs Density properties o For observations of a variable Mean 75 Standard Deviation S o For a density curve Mean 1 Standard Deviation 039 Why is it important 1 Good description of real data 2 Good approximation to the results of chance outcomes 3 Statistical inference procedures rely heavily on the normal distribution 6895997 rule Standard Normal lt 68 of data gt 95 of data 997 ofdata o If a variable follows a normal distribution then 2 X uo follows a standard normal distribution 2 NO 1 o This fact is very useful for finding areas under a normal curve other than the ones exactly at the 1 2 and 3 SD marks 0 When an observation is transformed by subtracting the mean and dividing by the standard deviation the resulting value is called the zscore Example Heights of women age 18 to 24 Example IQ scores o The distribution is approximately normal X Nu0392 N645625 Measurements are in inches 0 How tall would a woman 1824 need to be to be in the top 5 of heights o IQ scores are normally distributed with a mean of 100 and a standard deviation of 10 XN100100 o What fraction of people have an IQ score under 85 Draw a picture Shade the region of interest Look up the areas you need in Table A Example IQ scores o What if you need the area to the right of a point or on an interval Use symmetry or the fact that the total area has to sum to 1 o What fraction of people have IQ scores between 98 and 115 Normal Quantile Plots o Histograms and stem plots can find obvious violations of normality but we need a better tool for subtle problems 0 Instead use a normal quantile plot Xaxis Data values Yaxis zscores of the percentiles of the data values Note In Minitab the y axis is converted to percentages rather than the raw z scores but the spacing is based on the z scores Probability Plot of ToddlerFTW M251 1599 sum 29 19 N 137 no man wens ltum5 Normal 7 95 c1 Percent l l l l 100 150 200 250 300 Toddlerflw Normal Quantile Plot o If the points lie close to a straight line the distribution is approximately normal 0 Do not be worried if the observations deviate a little bit Normal Quantile Plot Examples Exploring the Relationship o Generically we call two variables X g g 4 and Y M l o Are the variables associated When quot quot 1 39 1 quot 1 the value of one increases does the Wm mm other increase When the value of quotquot one increases does the other i2 decrease Relationships Between Variables Scatterplot 1 Variable 2 Variables o ODJFS child care data X Fulltime weekly rate for infants Chapter 1 Chapter 2 Y Fulltime weekly rate for toddlers Stamrplmc In leri w vs InfamirTW Graphical Histograms Scatterplots 2 39 Summaries dotplot etc 2m 7 Numerical Center Correlation Summaries spread m ff 5quot Models Density Curve Regression m Jquot quot Association or Explanation O In some cases we are only interested in understanding whether the variables are associated ODJFS is a good example In some cases one variable is thought to explain another Example Pressure treatment on plastic Response Variable dependent variable Migration of chemical after 24 hours Explanatory Variable independent variable Pressure level for treament Note Do not equate explanation with causation What to look for in a scatterplot 0 Overall pattern deviations from the pattern 0 Form of relationship linear curved etc o Direction and strength of relationship Positively associated increase in X is seen with increase in Y Negatively associated increase in X is seen with decrease in Y Do the points closely follow this pattern or loosely o Outliers Examples O O 0 Time spent studying vs grade on exam Height of husband vs height of wife Percent of districts voting majority Republican in 2000 vs percent of districts voting majority Republican in 2004 Pattern Linear V 402 00 02 O4 06 081012 l Pattern Curved 000 006 010 015 020 025 030 0 0 O 4 8 o oo o 1 0 0 000 o 00 0 06 0 0 O o O o O o 0 o 3 o 3 0 O O o o 8 o O o 00 o 0 o 0 0 o 00 000 o 00 oo 0 0 O O on o 0000 o p o O 6 i l l l 0390 02 04 06 08 1 0 X Association PositiveNegative Positive Negative Pattern Clustered D r LO gt C o o o 0000 08 00 Q g 000 C o oo o 00 O 00 33 0 0 gt0 0 L0 Cl 0 o O o o 000 amp o 00 Q 00 o O O o b o oo 0 o a D 2 639 oo 0 00 D 00000 0 f 053297 oo 80 0 3 0 i o 000 D o 00 O 0 o 0 com ooa o 90 00 o 0 0 Q o 0 c Q 00 08 g 0 9 60 o e o o o 7 00 39 o b C o o gt 0 Qgt 00 O o v 000 v7 0 o 8 800 o c 0000 o o 00 e 00 o 00 o N 8 o o N 0g 0 06 o 00 c 0 0 000 o 0 o o o 0 0 o o o o o o C oo o o 6 a go 0 0 0 0 cog N C3 i i I i I i i I i i i i 00 02 O4 06 08 10 00 02 04 US 08 10 X x Stronger 39 Weaker 0 N7 0 in o 0006 t O we or c C 00 0 0 of lt95 0 o9 00 o 0000 000 e O 00 0 o g o oo w a a we 5 M 00 O o 000 0 o 00 0 0 0 22 gt o o gt vat o 0 0 0 o o o 0 gt8 0 00 08 O o 000 gr 0 7 00 w 0 0 0 o 0 0 84gtamp o D 0 80 00 N7 Moo 0 g owcccoc gm 2 o o w a o 00 o g o 2 gt8 4mg 0 g M o o in 0 0 of 0 i i i i i i i i i i i i i i i i i i 00 02 04 06 08 to 00 02 04 06 08 10 00 02 04 06 D8 10 x x x Outliers A note of caution lurking variables 0 Factors other than the main ones of Interest may have an effect 0 o x 0 E O o ea 0 C 0 0 O 0 A CD go 839 o G o o 9 0 39 U C c 00 393 m 39 d o 0 TABLE 21 Corn yields bushels per acre in an 0 3 LO 1 C5 o o 3 O O o 0 agricultural experiment 0 a o o o o o m 0 08 o 0 Plants per acre 1956 1958 1959 1960 Mean 6 39 05 C 8 o 00 0 12000 1501 1130 1184 1426 1310 g 39 o 89 0 16000 1669 1207 1352 1498 1432 g o 0 I9 lt3 20000 1653 1301 1396 1499 1462 E g 3950 G G 9 24000 1347 1384 1561 1431 v 120 3999 28000 1190 1505 1348 g 0 0 Mean 1608 1246 1301 1498 gt l I I l I 100 00 02 04 00 08 10 12 16 20 24 28 Rate in thousands of plants per acre 1 Adding categorical variables Categorical Explanatory Variables o Sidebyside boxplots For just a few measurements we could plot the actual values preVIous example 0 Backtoback stem plots 0 Use different colors or symbols to add a categorical variable to a scatterplot don t forget to label Scatter plot of ToddlerFTW vs InfantFTW 24 o For nominal variables it makes no sense I 39 39 to talk about positive or negative 39 r associations 3 o For ordinal variables We can make a g I 39 statement about pOSItIve or negative 40 n assoaations 120 I I I 100I I I I I I I I I 120 140 160 32am Egg 220 240 260 Example Boxplots for an ordinal variable vs a continuous variable 200000 150000 100000 Personal income year l I l l l I No HS Some HS HS grad Some college BS degree Higher degree

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.