### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# STATISTICAL METHODS STAT 302

Texas A&M

GPA 3.54

### View Full Document

## 17

## 0

## Popular in Course

## Popular in Statistics

This 20 page Class Notes was uploaded by Darien Kutch on Wednesday October 21, 2015. The Class Notes belongs to STAT 302 at Texas A&M University taught by Ellen Toby in Fall. Since its upload, it has received 17 views. For similar materials see /class/225751/stat-302-texas-a-m-university in Statistics at Texas A&M University.

## Reviews for STATISTICAL METHODS

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/21/15

Chapter 2 Exploring Data with Graphs and Numerical Summaries Recall that statistics is how we use data to answer questions about a population or process Data sets contain information about some group of individuals or experimental units This information is organized into variables a A variable is a characteristic of an individual There are two main types of variables Categorical Variables o A categorical variable places in individual into one of several groups 0 Examples Numerical Variables o A variable is called numerical the book uses the term quantitative if each variable value is a number 0 Ex Two Subtypes of Numerical variables 0 Discrete 0 Numerical variables that can only take certain fixed values with no intermediate values possible 0 Ex 0 Continuous 0 Continuous variables can take on any real numerical value over an interval 0 Ex Example 21 An oceanographer studying sharks wanted to know Which state has the highest percentage of shark attacks in the US For each individual shark attack in the sample the state where the attack occurred was identified There were 289 attacks in Florida 44 in Hawaii and 34 in California 0 What is a subject in this sample 0 What is the variable 0 What type of variable is it o What variable values are taken by this data set Describin aSin le Cate orical Variable Freguency tables 0 Frequency table Pie Chart Bar Chart The statistics of interest W W mm M o 289 shark attacks occurred in Florida Percentages o 9 of all shark attacks in the sample occurred in California Progortions o The proportions of shark attacks in the sample that occurred in Hawaii is 012 Describinq a Sinqle Numerical Variable Example 22 A nutritionist collected data 20 popular cereals Below is her data The categorical variable Code takes values A adult cereal and C cereal intended for children CEREAL SODIUMmg SUGARg CODE Frosted Mini Wheats 0 7 A Apple Bran 260 5 A Apple Jacks 125 14 C Capt Crunch 220 12 C Cheerios 290 1 C Cinnamon Toast 210 13 C Corn Flakes 290 2 A Raisin Bran 210 12 A Crackling Oat Bran 140 10 A Crispix 220 3 A Frosted Flakes 200 11 C Fruit Loops 125 13 C Grape Nuts 170 3 A Honey Nut Cheerios 250 10 C Honeycomb 180 11 C Life 150 6 A Oatmeal Raisin Crisp 170 10 A Sugar Smacks 70 15 C Special K 230 3 A Wheaties 200 3 A The Distribution of numerical data The distribution of a variable tells us what values the variable takes and how often it takes these values The most common graph used to describe the distribution of a numerical variable is the histogram Reading Histograms 0 Variable values are plotted on the horizontal axis The height of each bar is its frequency orthe proportion of subjects taking values in that range a The difference between bar charts used for categorical variables and histograms used for numerical variables is that there are spaces between each bar in bar charts Also in bar charts the variable value for that bar is written below the bar Histogram of SUGAR grams 45 4 35 5 3 8 25 g 2 9 LL 15 1 05 o 1 3 5 7 9 11 13 15 Describing the Shape of the distribution of numerical data Modality number of modes Unimodal Bimodal nltu u Uniform Skewness of unimodal distributions Symmetric Skewed Right Skewed Left Example 23 How would you describe the shape of the distribution of these data sets 0 Identify both the modality and skewness for data that are unimodal 1 a t Describing a Single Quantitative Variable We generally use four different things to describe numerical variables 0 Shape a Center 0 Spread 0 Unusual observations Three statistics which measure the centgr of a data set 0 Mode least important statistic Mean Median Eifai39f li39r s M IVIenThe average value ofa 232 J M variable 0 Median o The typical value of a variable Find the median 0246810 024681012 Example 24 Find the mean and median for the 2 sets of data given below 12 3 4 5 oMean oMedian 12 3 4100 oMean 0 Median o tlier Outliers Outliers are extreme observations taking m values far away from the bulk of the data values a o Which value in example 24 is an outlier Comparing the Mean amp Median In general if the shape of the distribution is o Symmetric o Skewed Right o Skewed Left A statistic is called resistant if it is not strongly influenced by extreme values like outliers The is resistant to outliers The median of the data sets 12345 and 1234100 are the same The is not resistant to outliers The mean of the data sets 12345 and 1234100 are definitely not the same a Each observation has the same weight 1 n when calculating the mean 0 Consequently the larger the sample size the less influenced the mean will be by a single outlier A data value is called influential if it noticeably shifts the value of the sample mean towards it EX 25 Example of a large data set and how size impacts whether or not an outlier is influential Mean 1581 Median 1575 Next I added one artificial outlier to this dataset in in l in Mean 1590 m Median 1578 Frequeno g m A o Is this outlier influential Zuluu 253900 PlamWeighl 0 Approximate size a data set needs to be for an outlier NOT to be influential Describinq the Variabiliy Spread of Numerical Data Range maximum minimum 0 The range is not resistant to outliers lnterguartile Range abbr IQR o The IQR is the range of the middle 50 of the data values IQR Q3 Q1 0 The IQR is resistant to outliers First guartile Q1 is the 25th percentile 0 Q1 is median of the observations whose position in the ordered list of variable values is to the left of the location of the overall median Third guartile Q3 is the 75th percentile 0 Q3 is the median of the value greaterthan the overall median 0 Both Q1 and Q3 are measures of location lam c all i quot EX 26 Find median Q1 and Q3 and the IQR of the 20 sodium variable values in the cereal data IO 70 125 125 140 150 170 170 180 200 200 210 210 220 220 230 250 260 290 29039 o The IQR is resistant to outliers which is good but gives no information about the spread of data values less than Q1 or greater t an 3 o The median and IQR are usually paired together because both are resistant to outliers Measures of variability associated with the mean 0 Variance The variance is the average ofthe square ofthe deviation 39om the sample mean 0 X a is the deviation ofthe Im data value 39om the sample mean 0 The variance is not resistant to outliers 0 Standard Devi on The standard deviation is the sguare root of the variance o The standard deviation measures how tightly the data values are clustered around the mean 0 The standard deviation is very close in value to the average distance ofthe data fromX o The standard deviation is NOT resistant to outliers Standard Variance Deviation Population 2 52 X 7 2 Parameter O O n 11 39 2 Statistic S S 5 The mean and standard deviation are usually paired together Both use all the data values in their calculation this is good but both have the disadvantage that they are not resistant to outliers 0 Below each data set consists of 80 numbers andX 0 for both data sets a The horizontal xaxis scale is the same in both plots ln data set 1 the histogram is narrow and tall with a standard deviation s 22 In data set 2 the histogram is broad and flat with a standard deviation s 90 Histogram of Data setl 35 Histogram of Data set 2 496 140 84 28 28 84 140 195 7132 7109 736 36 109 182 Example 27 Set2123456789 X5 s27 M5 Q125 0375 QR5 Set 3 9 8 7 6 5 4 3 2 1 X 5 s27 A91 5 Q1 75 Q3 25 QR5 Below is given the calculation for the IQR of data set 3 o IQR Q3 Q1 25 75 5 o The IQR is positive but all data values are negative Set451015 20 25 30 35 40 45 X25 s137 M25 Q1125 Q3 375 QR25 Set551015 20 25 30 35 40150 X377 S 440 M 25 Q1125 Q3 375 QR 25 Five Number Summary amp Outliers Five Number Summary 0 Minimum 0 Q1 0 Median Q3 0 Maximum Outliers The formal definition of an outlier is any observation that meets one of the following criteria a Is less than Q1 15IQR o Is greaterthan Q3 15IQR 39 Versions of the Five Number Summarv Boxplots can be used to identify in the data The lower whisker ends at the smallest nonoutlier data value and the upper whisker ends at the largest nonoutlier data value 131 Quartile Mean 3rd Quartile Extreme Outlier Median I39utild g lnterquartjle Range IQR std lower Whisker Upper whisker You can also use box plots to identify the of a distribution Symmetric The whiskers will NH Skfgfd be approximately equal in length Skewed One whisker will be Hm Staewhid much longer than the other 9 Sym m etric Comparing Histograms and the corresponding Box plots Choosing the best Measure of Center and Spread 0 Because the mean and standard deviation are affected by extreme observations they can be misleading when a distribution is strongly skewed or has outliers o The five number summary best describes the data when the distribution is strongly skewed or has extreme outliers because the median and quartiles are resistant to outliers When the distribution is approximately symmetric and there are no outliers use and s to summarize the center and spread of the data When the data is somewhat skewed or there are outliers but not extreme outliers and the data set is relatively large it isn t always obvious which set of statistics is most appropriate First compare the mean and median lfthe difference between them is significant then use the 5 number summaries 22 Summam Exploring Single Categorical Variables o Graphical summary 0 Use frequency tables pie charts and bar charts 0 Description 0 Discuss counts percentages and proportions Exploring Single Quantitative Variables o Graphical summary 0 Use histograms and box plots 0 Description 0 Discuss the shape center spread and any unusual observations such as outliers 0 Usually all of the following statistics will be included in any description X s M minimum maximum Q1 Q3 IQR 23

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.