### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# 24 Class Note for STAT 30100 with Professor Gundlach at Purdue

### View Full Document

## 20

## 0

## Popular in Course

## Popular in Department

This 16 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Purdue University taught by a professor in Fall. Since its upload, it has received 20 views.

## Similar to Course at Purdue

## Reviews for 24 Class Note for STAT 30100 with Professor Gundlach at Purdue

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 02/06/15

Chapter 1 Looking at DataDistributions Section 11 Introduction Displaying Distributions with Graphs Section 12 Describing Distributions with Numbers Learning goals for this chapter Identify categorical and quantitative variables Interpret create by hand and with SPSS and know when to use bar graphs pie charts stemplots standard backtoback split histograms and boxplots regular modi ed sidebyside Describe the shape center and spread of data distributions De ne calculate by hand and with SPSS and know when to use measures of center mean vs median and spread range 5number summary IQR variance standard deviation Understand what a resistant measure of center and spread is and when this is important Use the 151QR rule to look for outliers Draw a Normal curve in correct proportions and identify the meanmedian standard deviation middle 68 middle 95 and middle 997 Perform calculations with the empirical rule both backwards and forwards Understand the need for standardization Big picture what do we learn in this chapter Individuals vs Variables Categorical vs Quantitative Variables Grap hs Bar graphs and pie charts categorical variables Histograms and stemplots quantitative variablesigood for checking for symmetry and skewness Boxplots quantitative variablesigraphical display of the 5 summary modi ed boxplots show outliers Describing distributions Shape symmetricskewed unimodalbimodalmultimodal Center mean or median Spread usually standard deviationvariance or IQR from the 5 summary Outliers If you have a symmetric distribution with no outliers use the mean and standard deviation If you have a skewed distribution andor you have outliers use the 5 summary instead 2 components in describing data or information Individuals objects being described by a set of data people households cars animals corn etc Variables characteristics of individuals height yield length age eye color etc I Categorical places an individual into one of several groups gender eye color college major hometown etc Quantitative Attaches a numerical value to a variable so that adding or averaging the values makes sense height weight age income yield etc Distribution of a variable describes what values a variables takes and how often it takes those values If you have more than one variable in your problem you should look at each variable by itself before you look at relationships between the variables Example Identify whether the following questions would give you categorical or quantitative data a b c d e D What letter grade did you get in your Calculus class last semester What was your score on the last exam Who will you vote for in the next election How many votes did George W Bush get How many red MampMs are in this bag Which type of MampMs has more red ones peanut or plain It s always a good idea to start by displaying variables graphically before you do any other statistical analysis What kind of graph should you use That depends on whether you have a categorical or quantitative variable Categorical Variables Bar graphs or pie chalts Messy room example In a poll of 200 parents of children ages 6 to 12 respondents were asked to name the most disgusting things ever found in their children s rooms The results are below JampC 2005 Most disgusting thing of parents of parents Foodrelated 106 53 Ammal aan Insectrelated 22 11 nulsances Clothlng dlrty soeks and 22 11 underwear espemally Other 5 0 25 Bar graph can use either of parents like below or of parents 120 100 80 Count 40 20 D l 0 ahrmar dothtng food other type of disgusting mess Cases weighted by of parents Pie chalt needs of parents type or drsgustrhg mess I arhrhat l dommg El food 11 n I other mm 25 n mher Cases weighted by of parents g guantitative Variables 7 42 72 Stemplots histograms and boxplots discussed a little later Example You investigate the amount of time students spend online in minutes You study 28 students and their times are listed below Show the distribution of times with a stemplot 2O 24 25 25 28 28 3O 32 35 43 44 45 46 47 48 48 5O 51 75 77 78 79 83 87 88 To create a stemplot by hand 1 2 Put the data in order from smallest to largest The stem will be all digits for a data point except for the last one Write the stems in a vertical line Think of 7 as being 07 so that all the numbers have a digit in the tens place The leaf will be the next digit in this case the ones place from each data point Write the leaves after the appropriate stem in increasing order It is possible to trim any digits that you feel may be unnecessary For example if our second data point had been 203 we would probably choose to ignore the 3 for the purposes of the stemplot so that we could create a more reasonable stemplot If we did not ignore this 3 then our stems would have been 07 08 09 10 ll 12 13 88 with decimal numbers as our leaves This would show a very uniform stemplot with only one leaf for each stem all leaves would be 0 except for the 3 This would not be helpful to us at all It makes much more sense to use the tens place for the stem and the ones place as the leaves in this example Stemplot A split stemplot just has more 0 l 7 stems There are several ways to 1 split the stems Here they are 2 l 0 4 5 5 8 8 3 0 2 5 sp11t by f1ves 0 7 4 l 2 3 4 5 6 7 8 8 1 5 01 6 1 7 25789 2 04 8 3 7 8 2 l 5 5 8 8 3 02 3 5 4 l 2 3 4 4 l 5 6 7 8 8 5 01 5 l 6 l 6 7 2 7 5789 8 3 8 78 Why do we need split stemplots Sometimes it is easier to see the shape of the data with more stems Sometimes a regular stemplot is better If you re not sure try it both ways and see if a pattern appears Try a stemplot and a split stemplot with this data use the hundreds place for stems 3 4 17 18 39 93 102 110 143 178 250 278 299 3001 Histograms Sorting the quantitative data into bins How many bins 0 Not too many bins with either 0 or 1 counts 0 Not overly summarized so that you lose all the information 0 Not so detailed that it is no longer a summary Too few bins OK Too many bins 50 25 14 40 20 12 a 51 30 E15 5 s a 3 2 3310 E 6 1o 5 4 r r 2 l 0 I I I I I 0 ll L 0 ll ITl39 ii WI 15 20 25 30 35 40 I I I I I I I I I I BeakLength 15 20 25 30 35 40 15 20 25 30 35 4 0 Beak Length Beak Length Histograms Bar graphs The bars for each interval touch each other The bars for each category do not touch each Iother There are spaces between the bars Histograms have a continuous quantitative XaXis with the Xvalues in order ar graphs can have the categories on the XaXis isted in any order alphabetical biggestt0 mallest etc Quantitative variables ICategorical variables 25 20 7 gt1 0 5 15 3 5 I 10 u 5 0 I 1 1 1 j1 39 15 20 25 30 35 40 Beak Length 39 gpeordhguIngmeu mm 01 Histograms Stemplots Quantitative variables IQuantitative variables Good for big data sets especially if technology is available God for small data sets convenient for back0f heenvelope calculations Rarely found in scienti c or laymen publications Uses a box to represent each data point Uses a digit to represent each data point 25 20 gt u 5 15 3 U E 10 5 0 I 1 1 1 i 15 20 25 30 35 40 BeakLength 1 1 n 955 444443322222110000 999955777565556555555 44444444333333322222222222211111111130 a 000001111111111112222223333333344444444 5555555555555555555556555656665577777775855685595599999 00000000011111111122233334444 55566667839 011334 You ve drawn your graph histogram or stemplot Now what Look for overall pattern and any outliers The pattern is described by shape center and spread 1 Shape o of peaks unimodal 1 bimodal 2 multimodal gt 2 0 Where the long tail is Symmetric Right skewed Left skewed long tail on the long tail on the left right 1 Mum mm Mquot TY Median Mean Median lt Mean Median gt Mean To describe the shape use a histogram with a smoothed curve highlighting the overall pattern of the distribution don t get overly detailed g 2 Center If the distribution is symmetric the mean will equal the median but otherwise these numbers are not the same T 1 a Mean arithmetic average x Zg 11 Where n the total of observations And x an individual observation b Mode the most common number biggest peak c Median M midpoint of the distribution such that 12 the observations are smaller and 12 the observations are larger The median is not as affected by outliers as the mean is the median is resistant to outliers To find the median i Order the data form smallest to largest ii Count the of observations n iii Calculate quot71 to find the center of the data set iv If n is odd M is the data point at the center of the data set v If n is even quot71 falls between 2 data points called the middle pair M the average of the middle pair Examples of center Find the mean and median of the following 7 numbers in Dataset A 23 25 325 33 67 1 20 Find the mean and median of the following 8 numbers in Dataset B 1 2 4 I6 I8 9 12 13 3 S read a Range maX 7 min simplest not always the most helpful b Variance s2 average of the square of deviations of observations from the mean 1 n 32 Zxl 7W n 7 1 11 c Standard Deviation s square root of the variance common way for measuring how far observations are from the mean Example of nding the standard deviation by hand 0 2 4 1 Calculate the mean 2 Calculate the variance 3 Take the square root of the variance d Pth percentile value such that p of the observations fall at or below it Median M 501h percentile First Quartile Q1 25Lh percentile Third Quartile Q3 751h percentile How do you nd quartiles Think of them as minimedians Leave the median out and then nd the median of what is left over on the left side Q1 and what is left over on the right side Q3 Find the 1st and 3rd quartiles of the following 7 numbers in Dataset A Min M Max Min Max e S Number Summary Min Q1 M Q3 Max f Interquartile Range IQR Q3 7 Q1 Call an observation a suspected outlier if it is gtQ3 151QR OR ltQ17151QR g Boxplots Graph of the 5number summary Modi ed boxplots have lines extend from the box out to the smallest and largest observations which are NOT outliers Dots mark any outliers We will always ask for the modi ed boxplot but if there are no outliers the modi ed and regular boxplots look exactly the same 7 W Max 67 Boxplot for DatasetA with 5 60 7 numbasummary 50 t 20 1 25 33 67 4 W 03 33 Since there was no outlias in 30 W M 7 25 this dataset aregular boxplot and 20 W a modi ed boxplot look exactly the same for this data 10 7 0 W 01 1 710 if 3920 W quot mm 720 30 For the online time example with 2 additional data pains added in list the 5numba summary nd any ou ias presmt and show a boxplot and modi ed boxplot 7 20 24 25 25 28 28 30 32 35 42 43 44 45 46 47 48 48 50 51 72 75 77 78 79 83 87 88 135 151 10 How do you know which method is best for determining center and spread S Number Summary better for skewed distributions or distribution with outliers Mean and Standard Deviation good for reasonably symmetric distributions free of outliers Always start with a graph In the internet time example here are how the meanstandard deviation and 5number summary are affected by the outlier With outlier 151 With outlier removed from dataset Mean 5477 5145 Standard Deviation 32647 27600 5number summary 7 30 465 77 151 7 29 46 76 135 The Median vs the Mean in the Age of Average by Mike Pesca on NPR s DaytoDay 7 1906 httpwwwnprorgtemplatesstorvstorvphpstorvId5567890 Do you always have to do all of this by hand NO Statistical software packages like SPSS can make life much easier for you but it s a good idea to know how to do these by hand so you can make sense of your output Also on the exam you won t have access to a computer Read over your SPSS manual and get comfortable with using SPSS You will have a chance to practice on the HW for this week and you will work on it in lab on Friday Enter your data then Analyzegt Descriptive Statisticsgt Explore Follow the instructions on p 48 of the SPSS manual The output from SPSS for the internet time problem looks like De scri pti ve s Statistic Std Error TIme spent on the web Mean 5477 5961 95 Con dence Lower Bound 4258 Interval for Mean Upper Bound 6696 5 Trimmed Mean 5213 Median 4650 Variance 1065840 Std Deviation 32647 Minimum 7 Maximum 151 Range 144 Interquartile Range 48 skewness 1314 427 Kurtosis 1977 833 11 cemranereaf Plot amalgam Frequency Stem amp Leaf 1uu u u I 9uu u 222222333 1nuu n 4444444455 A 5uu u 77777 39 3nn n BEE 3 nn 1 1uu 1 3 1uu Extremes gt151 y vi n wh Stem wldth 1mm Each leaf 1 cases time spent m0 wee Notice on the boxplot it is easy to identify the potential outlier This would be your indication that the 5number summary would be the best way to describe your data You could also try calculating the mean and standard deviation without the outlier for comparison SPSS can also give you the Quaniles listed under Percentiles but these are not necessarily the same answers as what you would get by hand The weighted average and Tukey s Hinges are not the same method we use For this class whenever we ask you to calculate the Quartiles we want you to do them by hand 12 What if you want to compare the results from two or more different groups Use side by side boxplots or back to back stemplots for your graphs 4o 35 10 Miles per gallon Twocity Two hwy Minicity Mini hwy Figure H 7 Inlmduaion m throzm39ttafsmrixtks mm mum u was w H Pittman and Cnmpany Female Male 2 813 4 5 0 330 6 88 8110 7 08 652 8 3459 999 9 22456 13 Preview of Section 13 from Section 13 A z score tells us how many standard deviations away from the mean an observation is This is also called getting a standardized value Why is standardization useful For comparing apples to oranges Example p 88 Problem 199 Jacob scores 16 on the ACT Emily scores 670 on the SAT Assuming that both tests measure scholastic aptitude who has the higher score The SAT scores for 14 million students in a recent graduating class were roughly normal with a mean of 1026 and stande deviation of 209 The ACT scores for more than 1 million students in the same class were roughly normal with mean of 208 and standard deviation of 48 14 How else can we use standardization If the distribution of observations has a bell shape then these standardized values have some special properties One of these is the 68 95 997 Empirical Rule 0 Approximately 68 of the observations fall within 16 of the mean between u 1039 and LH la 0 Approximately 95 of the observations fall within 26 of the mean between u 2039 and y20 0 Approximately 997 of the observations fall within 36 of the mean between u 3039 and u3039 Pp16ltXltl116 068 Pp26ltXltl126 095 Pu36ltXltu3cs 0997 68 of data 950 of data l i 997 of data Standard deviations away I lg gt from the mean z score mum 393 2 1 O 1 2 3 so a zscore of 2 could quot 39 also be written as 720 for example The mean and the median of a bellshaped curve are in the middle This is shown with a 0 because the mean is 0 standard deviations away from itself The most famous bellshaped distribution is the Normal distribution We will spend several lectures talking about it for Section 13 and it will be important to everything we do for the rest of the semester 15 Example Checking account balances are approximately Normally distributed with a mean of 1325 and a standard deviation of 25 a Between what numbers do 68 of the balances fall b Above what number do 25 ofthe balances lie c Approximately what percent of balances are between 1250 and 1400 16

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I bought an awesome study guide, which helped me get an A in my Math 34B class this quarter!"

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.