### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Week 2 notes STAT 121

BYU

GPA 3.65

### View Full Document

## About this Document

## 68

## 0

## Popular in Principles of Statistics

## Popular in Statistics

This 7 page Class Notes was uploaded by Amanda Berg on Saturday September 19, 2015. The Class Notes belongs to STAT 121 at Brigham Young University taught by Dr. Christopher Reese in Fall 2015. Since its upload, it has received 68 views. For similar materials see Principles of Statistics in Statistics at Brigham Young University.

## Reviews for Week 2 notes

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/19/15

Notes Week 2 Lesson 4 Numerical Measures to Summarize the Distribution of Quantitative Variables 1 Measures of Center 2 3 a b C 0 Mode value corresponding to a quotpeakquot of the distribution i Most common value value with highest frequency Median middle value i If n data values is odd the median is the middle value ii If n is even the median is the mean of the two middle values iii Denoted by the symbol quotMquot iv 12 histogram area data values are to the right and 12 to the left v Median is always a number even if the data shows vi You can only nd the median when the data are ordered Mean center of gravity average of histogram i On a skewed graph the mean follows the tail ii Calculated by summing values then dividing sum by number of values iii Denoted by gt39lt Mean Mode and Median can be the same on a symmetric moundshaped graph Should we use the mean or median a b Use median when the graph is skewed because it is quotresistantquot to long tails and outliers i Home prices and salaries Use mean if it is roughly symmetric Measures of spread a IQR rather than range i Why not range 1 Highly affected by outliers 2 Only measures overall spread ii What is the IQR 1 Range occupied by middle 50 of data in numbers a If you have 100 individuals studied the IQR would contain the middle 50 individuals b 3rCI quartile1St quartile c If small relative to range highly clustered data set d If large relative to range less clustered data set e Resistant to outliers Bottom 25 of the data Top 25 lullldrzlle 513 of the data D1 me Elma Ida To i lmma f Quartiles 25 Elf values E Q ll first EiUEFii39E sss if saiuss 3 mi sass of values 5 Gig sssund usrtils l q J sass nf 1isslusss 3 Eia third sq usrtiils E 33 E E 1 01 is the median of the smallest half of the observations 2 Q3 is the median of the largest half of the observations 3 02 is the median of the data 1 When n is odd the median is not included in either the bottom or top half of the data 2 When n is even the data are naturally divided into 2 halves g Outliers i Values that are not consistent with the rest of the distribution 1 Sometimes dif cult to judge 1 That39s why we de ne outliers as 15IQR and 15IQR lmy sisssrsalish isilmg in suns st tissss regions will he ssnw sitisrrsrzi ss 51 suspsstsd suitier iii iEil Fij r33 1siiriiii 2 Reasons to keepremove outliers i Keep if the distribution is longtailed and value is legitimate ii Remove if the values were produced under different conditions than the rest of the data iii Remove or correct if possible if the value is a mistake ortypo b Standard deviation quotaverage distancequot from the mean Vocabulary Clustering large amounts of data in one area Minimum smallest number in the data Outlier a piece of data that is more than 15 IQR or less than 15 IQR xbar symbol for mean Mean numerical average for data found by adding all of the data and dividing by the number of individuals Maximum largest number in the data Range maximumminimum Median middle value of the data Summation all of the values added together Resistant unaffected For example the median is resistant to outliers whereas the mean is not Mode the value with the most data The peak of the graph 01 Q3 IQR interquartile range see notes Lesson 5 Numerical Measures to Summarize the Distribution of Quantitative Variables Part 2 1 5Number Summary a Median range and IQR determined by 5 numbers i Minimum ii 01 1St quartile iii Median 2nCI quartile center nd rst iv Q3 3rCI quartile v Maximum b Complete numerical description of 5number summary i Centermedian ii Spread 1 Overall maxmin 2 Clustering QlQ3 iHShape 1 MedianQl versus Q3median 2 Medianminimum versus maximummedian 3 If lower numbers contain more data the graph is skewed right 4 If higher numbers contain more data the graph is skewed left 2 Boxplots a Represents density of data by thickness of boxes and length of lines b How is it made i Central box contains interquartile range ii Line in box marks median iii Right whisker extends from box to largest non agged value no outliers iv Left whisker extends from box to smallest non agged value no outliers v Flagged values outliers marked by asterisks vi Boxplot can be horizontal or vertical I I I I l I I I I ll EU 2 5 353 35 a39fllil e5 amree January irritatewee 5 LE For this data 01 25 Median 29 Q3 33 Max 45 Flagged value minimum 11 Line at 13 15IQR c Examples of questions to be asked about boxplots i What is the median January temperature in SLC ii What is the rst quartile of average January temperatures in SLC iii About what percent of years have average January temperatures above freezing d A major advantage easy comparison of several distributions using sideby side ieiI greup heel 39ih Iergeel e1 eed m I39nerried liemelee 3 3m einglI ierrIelIee rnerried rrIelle eingle I39I39Ielee me we menII boxplots The answer is D because it spans the longest out of all the data The least spread is married males 3 Standard Deviation as a measure of spread a What is standard deviation i Single measure that responds to both aspects of spread 1 Overall spread 2 Clustering ii Some facts about standard deviation that will help you interpret it 1 Does not only measure clustering 2 Can be 0 3 Has the same units as the data 4 Is not resistant to outliers 5 Should be paired with the mean 6 Should be used when the data is symmetric and moundshaped a Should not be used when the data is skewed or there are outliers i The median and IQR should be used in that case not the mean and SD b 68 95 997 rule i For symmetric moundshaped distributions 1 Approximately 68 of data falls within 1 SD of the mean 2 Approximately 95 falls within 2 3 Approximately 997 falls within 3 Elia5 l l ii i i 1i ri i 1 I TIEFII39I mE39EII39I Tl39t I39I1FEII39I I39I39iil l39i I39l1EIII39i lu li l Vocabulary IQR Interquartile Range 0301 Middle 50 of the data exists in this range Boxplot graphical representation of the 5number summary Used to determine shape and spread of the graph 2number summary mean and standard deviation 5number summary min 01 median Q3 max S standard deviation Sidebyside boxpots used to compare data sets that use the 5number summary Standard deviation average distance of values from the mean When the value is small the data is closer to the mean When it is large the data is farther from the mean Standard deviation rue 68 95997 68 of the data falls within 1 Standard deviation of the mean 95 within 2 and 997 within 3 This only applies when the data is symmetric and moundshaped Variance differentiation in the data values Deviation from mean movement away from the mean When the data has more deviation from the mean it is farther away Flagged value values that are more than 15IQR or less than 15IQR Can be considered outliers Lesson 6 Examining Relationships for Pairs of Variables 1 When examining 2 variables we look for a relationship a 2 variables measured on the same individual is called bivariainIity 2 Relationships a 2 variables for each individual b Want to investigate the relationship between the variables using visual displays and numerical summaries 3 Goals for relationships a Characterize relationship b Predict one from another c Investigate causeeffect relationship i We normally want this but it is often not achievable 4 Explanatoryresponse variables a Used if prediction or causeeffect analysis is the goal b Explanatory happens rst in time used to predict or explain changes in response i In the scienti c method called the independent variable c Response happens second in time outcome of study i In scienti c method called the dependent variable d Explanatory x response y Eatagariaal iEluaritiitatii Eatagariaal E v E E uarrtitatiaa If 1 1 5 Roletype classi cation quotthat a In order to gure out how to analyze a relationship we must determine the roIetype classi cation 6 C gt Q Vocabulary a Categorical explanatory variable b c d Quantitative response variable Visual display tool sidebyside boxplots Numerical summary tool 5 or 2number summery for each category 2 variable data studying 2 variables on one individual Explanatory variable X the rst variable to occur in time In causeeffect relationships the explanatory variable attempts to explain the data ndings for the response variable Response variable Y the second variable to occur in time In causeeffect relationships the response variable is what happens because the explanatory variable changes Roletype classi cation explanatorygtresponse shows what kind of value is which variable categorical or quantitative Sidebyside boxplots used to compare data sets that use the 5number summary

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.