### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 258 Class Note for STAT 30100 with Professor Zhao at Purdue

### View Full Document

## 25

## 0

## Popular in Course

## Popular in Department

This 15 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 25 views.

## Similar to Course at Purdue

## Reviews for 258 Class Note for STAT 30100 with Professor Zhao at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Chapter 1 Looking at DataDistributions Section 11 Introduction Displaying Distributions with Graphs Section 12 Describing Distributions with Numbers Big picture what do we learn in this chapter Individuals vs Variables Categorical vs Quantitative Variables Grap hs Bar graphs and pie charts categorical variables Histograms and stemplots quantitative variablesigood for checking for symmetry and skewness Boxplots quantitative variablesigraphical display of the 5 summary modi ed boxplots show outliers Describing distributions Shape symmetric skewed unimodalbimodalmultimodal Center mean or median Spread usually standard deviationvariance or IQR from the 5 summary Outliers If you have a symmetric distribution with no outliers use the mean and standard deviation If you have a skewed distribution andor you have outliers use the 5 summary instead 2 components in describing data or information 0 Individuals objects being described by a set of data people households cars animals corn etc 0 Variables characteristics of individuals height yield length age eye color etc o Categorical places an individual into one of several groups gender eye color college major hometown etc 0 Quantitative Attaches a numerical value to a variable so that adding or averaging the values makes sense height weight age income yield etc Distribution of a variable describes what values a variables takes and how often it takes those values If you have more than one variable in your problem you should look at each variable by itself before you look at relationships between the variables Example Identify whether the following questions would give you categorical or quantitative data If it is categorical state the possible answers a b c d e t g h 139 What letter grade did you get in your Calculus class last semester What was your score on the last exam What is your GPA Did you vote for John Kerry Who did you vote for in the last election How many votes did George Bush get How many red MampMs are in this bag Is this a red MampM What color is the MampM you just ate Which type of MampMs has more red ones peanut or plain It s always a good idea to start by displaying variables graphically before you do any other statistical analysis What kind of graph should you use That depends on whether you have a categorical or quantitative variable Categorical Variables 0 Bar graphs or pie chalts o Messy room example In a poll of 200 parents of children ages 6 to 12 respondents were asked to name the most disgusting things ever found in their children s rooms The results are below JampC 2005 Most dngdstillg thing ofp artents of parents Foodrelated 106 53 An1mal and 1nsectrelated 22 11 nu1sances Cloth1ng dirty socks and 22 11 underwear espec1ally Other 50 25 Bar graph can use either of parents like below or of parents izn inn m Ci i n aiiiimi ciuiiiiiig mun type of disgusting mess cases Weighted by at ur parents Pie chart needs of parents type ai disgusting mess I animai l eiaming El and I uthei animal Cases Weighted by at of parents Quantitative Variables 0 Stem plots histograms and boxplots discussed a little later 0 Example You investigate the amount of time students spend on the internet in minutes You study 28 students and their times in minutes are listed below Show the distribution of times with a stem plot and a histogram 7 20 24 25 25 28 28 3O 32 3 5 42 43 44 45 46 47 48 48 5O 51 72 75 77 78 79 83 87 88 To create a stem plot by hand 1 Put the data in order from smallest to largest 2 The stem will be all digits for a data point except for the last one Write the stems in a vertical line 3 The leaf will be the last digit from each data point Write the leaves after the appropriate stem in increasing order 0 7 1 2 045588 3 025 4 23456788 5 01 6 7 25789 8 378 A split stem plot just has more stems There are several ways to split the stems 0 l 7 l l l 0 4 l 5 5 8 8 l 0 2 l 5 l 2 3 4 l 5 6 7 8 8 oooo cNONmmbbmmNmt It n O D I Why do we need split stem plots Sometimes it is easier to see the shape of the data with more stems Sometimes a regular stem plot is better If you re not sure try it both ways and see ifa pattern appears To create a histogram by hand 1 Order the data from smallest to largest 2 Your range is the maX data point 7 min data point 88 7 7 81 here 3 Decide how many intervals you want 9 4 The width of your intervals is just r39fmi g 9 of intervals 9 5 List your intervals and of individuals in each interval in tabular form 6 Draw the histogram Interval range Of mes in that range 7 lt X S 16 l 16 lt X S 25 4 25 lt X S 34 4 34 lt X S 43 3 43 lt X S 52 8 52 lt X S 61 0 61 lt X S 70 0 70 lt X S 79 5 79 lt X S 88 3 Total 28 How is a histogram different from a stem plot They basically show the same information eXcept stem plots use numbers and histogram use shaded rectangles to show where the quantitative data falls How is a histogram different from a bar graph Histograms have the bars for each interval touching each other bar graphs do not have the bars touching Histograms will have a continuous XaXis with the Xvalues in order Bar graphs can have the categories on the XaXis listed in any order Histograms are for quantitative variables and bar graphs are for categorical variables You ve drawn your graphs histogram or stem plot now what Look for overall pattern and any outliers The pattern is described by shape center and spread 1 Shape unimodal bimodal multimodal symmetric right skewed left skewed 2 Center If the distribution is symmetric the mean will equal the median but otherwise these numbers are not the same l a Mean ar1thmet1c average x 2 9g 11 Where 7 the total of observations And x an individual observation b Median midpoint of the distribution such that 12 the observations are smaller and 12 the observations are larger M not as affected by outliers To nd the median i Order the data form smallest to largest ii Count the of observations 71 iii Calculate quotTH to nd the center of the data set iv If n is odd M is the data point at the center of the data set V If n is even n 1 falls between 2 data points called the middle pair M the average of the middle pair Examples of center Find the mean and median of the following 7 numbers in Dataset A 23 I25 I 325 33 67 1 20 Find the mean and median of the following 8 numbers in Dataset B 1 2 4 6 8 9 12 13 3 Spread a Range maX 7 min simplest not always the most helpful b Variance 32 average of the square of deviations of observations from the mean 1 VI sz x fz 112 c Standard Deviation 3 square root of the variance common way for measuring how far observations are from the mean Example of nding the standard deviation by hand 0 2 4 1 Calculate the mean 2 Calculate the variance 3 Take the square root of the variance d Pth percentile value such that p of the observations fall at or below it Median M 50Lh percentile First Quartile Q1 25Lh percentile Third Quartile Q3 75Lh percentile How do you nd quartiles Think of them as minimedians Leave the median out and then nd the median of what is left over on the left side Q1 and what is left over on the right side Q3 Find the 1st and 3rd quartiles of the following 7 numbers in Dataset A 20 1 23 25 325 33 67 Find the 1st and 3rd quartiles of the following 8 numbers in Dataset B e S Number Summary Min Q1 M f Interquartile Range IQR Q3 Q1 g Call an observation a suspected outlier if it is gtQ3 151QR OR ltQ17151QR Boxplots Use the 5number summary A central box spans the quartiles Q1 and Q3 A line in the box marks the median M Lines extend from the box out to the smallest and largest observations Modi ed boxplots have lines extend from the box out to the smallest and largest observations which are NOT outliers Dots mark any outliers For the intemet time example with 2 additional data points added in list the 5number summary nd any outliers present and show a boxplot and modi ed boxplot 7 20 24 25 25 28 28 3O 32 35 42 43 44 45 46 47 48 48 5O 51 72 75 77 78 79 83 87 88 135 151 How do you know which method is best for determining center and spread S Number Summary better for skewed distributions or distribution with outliers Mean and Standard Deviation good for reasonably symmetric distributions free of outliers Always start with a graph In the intemet time example here are how the meanstandard deViation and 5number summary are affected by the outlier With outlier 151 With outlier removed from dataset Mean 5477 5145 Standard DeViation 32647 27600 5number summary 7 30 465 77 151 7 29 46 76 135 The Median vs the Mean in the Age of Average by Mike Pesca on NPR s DaytoDay 71906 httpwwwnprorgtemplatesstorvstorvphpstorvld5567890 Do you always have to do all of this by hand NO Statistical software packages like SPSS can make life much easier for you but it s a good idea to know how to do these by hand so you can make sense of your output Also on the exam you won t have access to a computer Read over your SPSS manual part of the HW and get comfortable with using it You will have a chance to practice on the HW for this week and you will work on it in lab on Friday Enter your data then Analyzegt Descriptive Statisticsgt Explore Follow the instructions on p 48 of the SPSS manual The output from SPSS for the intemet time problem looks like Descriptives Statistic Std Error Time spent on the web Mean 5477 5961 95 Confidence Lower Bound 4258 Interval for Mean Upper Bound 6696 5 Trimmed Mean 5213 Median 4650 Variance 1065840 Std Deviation 32647 Minimum 7 Maximum 151 Range 144 Interquartile Range 48 Skewness 1314 427 Ku rtosis 1977 833 Histogram Frequency Mean 5m dDev 325w C an I I u 5n mu 15D Time spem on the web T1ne spent en the weh areneenuebee plut Frequency Stem s Leaf LED u u 9uu u 222222333 1uuu u 4444444455 5uu u 77777 3uu u aaa uu 1 1uu 1 3 1uu Extremes lgt151 Stem w1uth 1mm Each leaf 1 casELS Nouee on the boxplot ltls easy to ldentlfy the potenual outller Thls would be your lndleatlon thatthe Srnumber summary would be the best way to deserlbe your data You eould also try ealeulatlng the mean and standard dewatlon wlthout the outller for eornpanson SPSS ean also glve you the Quarules llsted under Percenulesquot butthese are not neeessanly the same answers as whatyou would getby hand The welghted averagequot and Tukey39s Hlngesquot are not the same method we use Fur this class whenever we ask ynll to calculate the Quartiles wewant ynll to u them by hand n What if you want to compare the results from two or more different groups Use sidebyside boxplots or backto back stemplots for your graphs 40 35 ig e Miles per gallon 10 Twocity Twohwy Minicity Mini hwy mm mkmum mm WW Female Male 2 8 1 3 4 5 0 3 3 0 6 8 8 8 1 1 0 7 0 8 6 5 2 8 3 4 5 9 9 9 9 9 2 2 4 5 6 12 Features of bell shaped distributions from Section 13 A z score tells us how many standard deviation away from the mean an observation is This is also called getting a standardized value Why is standardization useful For comparing apples to oranges Example p 88 Problem 199 Jacob scores 16 on the ACT Emily scores 670 on the SAT Assuming that both tests measure scholastic aptitude who has the higher score The SAT scores for 14 million students in a recent graduating class were roughly normal with a mean of 1026 and standard deviation of 209 The ACT scores for more than 1 million students in the same class were roughly normal with mean of 208 and standard deviation of 48 13 How else can we use standardization If the distribution of observations has a bellshape then these standardized values have some special properties One of these is the 6895 997 rule 0 Approximately 68 of the observations fall within 16 of the u 0 Approximately 95 of the observations fall within 26 of the u 0 Approximately 997 of the observations fall within 36 of the u o 68 of data 95 of data I I 997 of data 1 Standard deviations I I away from the mean 391 0 1 2 3 zscore mean The most famous bellshaped distribution is the Normal distribution We will spend a whole week talking about it for Section 13 and it will be important to everything we do for the rest of the semester 14 Example Checking account balances are approximately Normally distributed with a mean of 1325 and a standard deviation of 25 a Between what numbers do 68 of the balances fall b Above what number do 25 ofthe balances lie c Approximately what of balances are between 1250 and 1400 15

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.