### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# STAT 1051, Balaji, Week 3 STAT 1051

GWU

### View Full Document

## About this Document

## 11

## 0

## Popular in Introduction to Business and Economic Statistics

## Popular in Department

This 8 page Class Notes was uploaded by skenan on Thursday September 22, 2016. The Class Notes belongs to STAT 1051 at George Washington University taught by Dr. Srinivasan Balaji in Fall 2016. Since its upload, it has received 11 views.

## Reviews for STAT 1051, Balaji, Week 3

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/22/16

STAT 1051 - WEEK 3 CHAPTER 2 (CONTINUED) Key terms/concepts Important notation Examples Descriptive Statistics: Summation Notation: Observations in a dataset are denoted by {x ,x ,x ,x ,....x }; n=sample 1 2 3 4 n size • x1is the first observatio2, x is the second and so on. • We use ∑ x to denote x +x +x +x +....+x i 1 2 3 4 n • In particular, n x = x + x +...... + x ∑ i 1 2 n i=1 Example: construct the following s.n. for the dataset below. 7 11 3 4 5 6 13 n=7 • ∑????= 7+11+3+4+5+6+13= 49 • ∑(x-2)= (7-2)+(11-2)+(3-2)+(4-2)+(5-2)+(6-2)+(13-2)= 35 2 2 2 2 2 2 2 2 • ∑x = 7 +11 +3 +4 +5 +6 +13 = 415 • (∑x) = (49) = 2401 ∑▯ • ▯ = 49/7 =7 Population: Collection of all the units that we are interested in studying. Sample: a subset of the units of the population. Numerical Summary: summarizing the data using numerical descriptive measures. Both population and sample data can be summarized. Two quantities to measure: 1. Center: measure the central tendency a. Mean: the average of a group of numbers. i. Population mean: (µ), computes the mean of a population data. X + + +...+ µ = ∑ = X 1 X 2 X 3 X N ii. N N iii. Sample mean: computes the mean of a sample data. ̄x) X = ∑ X = x1+ x 2+ ... x n n n iv. v. Mean and Distribution: Mean is the point where the histogram is balanced. For positively skewed distribution extreme observations will pull it up. b. Median: median is the middle observation. i. Median partitions the histogram into two equal halves. c. Mode: the most frequent observation. For continuous variables, mode is the point where the histogram has the peak. 2. Variability: spread of the data Example: 1 3 5 6 8 8 9 11 12 n ∑ xi 1+3+5+6+8+8+9+11+12 63 x =i=1 = = = 7 Mean: n 9 9 Median: 8 Mode: 8 Comparing 3m’s: • For negatively (left) skewed distributions: mean < median < mode • For positively skewed distribution: mean > median > mode • For symmetric (not skewed) distributions: Mean = Median = Mode Measures of Spread: different ways of computing the spread/ variability of a dataset. a) Range: Maximum-minimum. a. Two very different datasets could have the same range. b) Variance: the squared distance between a typical observation and the mean of the data. a. Population variance (σ ) : measures the spread in population. 2 2 ∑ ( X−µ ) σ = N 2 b. Sample Variance (s ): measures the spread in the s̄mple. x: sample mean n 2 2 ∑ i=1(xi−x ) s = , n −1 c) Standard deviation (SD): square root of variance. Gives the “average” distance between a typical observation and the mean of the dataset. a. Popular Standard Deviation: 2 σ = ∑ (X −µ ) N b. Sample Standard Deviation: n 2 ∑ i=1xi−x ) s = n−1 d) Inter quartile range (IQR) Example: Find the mean, median, and the standard deviation for the following datasets. a) 3 7 11 2 5 4 3 b) 4 -2 5 8 12 5 7 4 9 8 (a) Ordered d▯▯▯▯▯▯▯▯▯▯▯▯▯▯ ▯▯ 4 5 7 11 Mean: x̄ = = 5 ▯ ▯ Median: 4 Mode: 3 Range: 11-2=9 2 ▯▯▯ ▯ ▯▯▯▯▯ ▯▯▯▯▯ ▯▯▯▯▯ ▯▯▯▯▯ ▯▯▯ ▯ ▯▯▯ ▯ ▯▯▯▯▯ Standard Deviation: s = ▯▯▯ = ▯▯▯ = ▯▯▯▯▯▯▯▯▯▯▯▯▯▯ = ▯▯= 9.67 ▯ ▯ 2 Sample Standard Deviation: s=√s =√9.67=3.1 (b) Ordered d▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯ 5▯▯ 5 7 8 8 9 12 Mean: x̄ = = 6 ▯▯ ▯▯ Median: 6 Mode: 4, 5, 8 Range: 12+2=14 2 ▯▯▯ ▯ Standard Deviation: s = ▯▯▯ = ▯▯▯▯▯▯ ▯▯▯▯▯ ▯▯▯▯▯ ▯▯▯▯▯ ▯▯▯▯▯ ▯▯▯▯▯ ▯▯▯▯▯ ▯▯▯▯▯ ▯▯▯▯▯ ▯▯▯▯▯ ▯▯▯▯ = ▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯= ▯▯▯= 14.22 ▯ ▯ 2 Sample Standard Deviation: s=√s =√14.22=3.77 In-class example: find the mean, median, range, and standard deviation for the following datasets. a) 2 -1 4 7 4 3 11 2 b) 52 49 54 57 54 53 61 52 c) 4 -2 8 14 8 6 22 4 a) Range: 11-(-1)=12 ▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯ ▯▯ Mean: x=̄ ▯ = ▯ = 4 Median: (3+4)/ 2=3.5 Mode: 2, 4 2 ▯▯▯ ▯ Standard Deviation: s = ∑ ▯▯▯ = ▯▯▯▯ ▯▯ ▯▯▯ ▯▯ ▯▯▯ ▯▯ ▯▯▯ ▯▯ ▯▯▯ ▯▯ ▯▯▯ ▯▯ ▯▯▯ ▯▯ ▯▯▯▯ ▯ ▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯▯ ▯▯ ▯▯▯ = ▯ = ▯ = 13.14 Sample Standard Deviation: s=√s =√13.14= 3.63 b) This data set is each observation of dataset a +50. Basically, dataset a has shifted 50 units on a number line. If a dataset is shifted by a constant, the mean and the median of the new dataset are also shifted by the same constant but the range and standard deviation remains the same. Range=12 (same as range of dataset a) Mean: x a50 = 4+50=54 Median: 3.5+50=53.5 (median +50a Mode: 52, 54 (mode +5a) SD: 3.63 (s as b c) This data set is each observation of dataset a *2. If a dataset is multiplied by a constant, the mean, median, and the range of the new dataset are also multiplied by the same constant. The standard deviation is multiplied by the absolute value of the same constant. Range=24 (range of dataset a *2) Mean: x a2 = 4*2= 8 Median: 3.5*2=7 (median *2a Mode: 4, 8 (mode *a) SD: 3.63*2= 7.26 (s =a *c) The Empirical Rule: when the distribution is symmetric and bell shaped, mean and SD together can describe the distribution fairly well. Most of the observations lie near the center or mean of the data. It summarizes the distribution. (x̄s, x̄s) • For data with a bell-shaped distribution, approximately 68% of the observations will be within ONE STANDARD DEVIATION of the mean. • For data with a bell-shaped distribution, approximately 95% of the observations will be within TWO STANDARD DEVIATION of the mean. • For data with a bell-shaped distribution, approximately 99.7% of the observations will be within THREE STANDARD DEVIATION of the mean. Example: In a class of 500 students, mean score is 84, and standard deviation is 5. Then by empirical rule; • Approximately 68% of the students got between 79 and 89 • Approximately 95% of the students will have score between 74 and 94 • Approximately 99.7% of the students will get between 69 and 99 Question: what proportion of students get; a) Below 84 b) Between 84 and 89 c) Less than 74 d) Between 79 and 94 Solution: a) 50% b) ½ * 68 = 34% c) 12*5 =2.5% d) ½*68 + ½*95 = 34+47.5= 81.5% Chebyshev’s Rule: For any dataset, the following are true; • at least 75% of data values fall between (x̄2s) and (x+̄s) • at least 89% of data values fall between (x̄3s) and (x+̄s) (Continuation of the previous example): If the dataset is not symmetric and bell shaped in the dataset above; Here, x̄84 and s=5 • at least 75% of data values fall between74 and 94 • at least 89% of data values fall between69 and 99 Solution: a) cannot say decisively about the proportion of students who score below 84. b) Cannot give an answer for the range (84,89) c) At most 25% of student score below 74. Measures of Relative Standing: ▯▯▯ • Z-score: for any data value, z-score = ▯ o Example: Find the z-scores of 7&11. ▯ Dataset: 3 7 11 4 5 ▯ x̄ ▯▯▯▯▯▯▯▯▯▯ = ▯▯= 6 ▯ ▯▯ ▯ ▯ ▯ ▯ ▯ s = ∑ ▯▯▯ ▯ = ▯▯▯ ▯ ▯▯▯ ▯ ▯▯▯▯ ▯ ▯▯▯ ▯ ▯▯▯ = ▯▯ = 10 =√10 ▯▯▯ ▯▯▯ ▯ =3.16 ▯▯▯ ▯ z-score of 7 =▯.▯▯= 0.316 o For a symmetric bell shaped data, nearly 68% of data values have z-scores between -1&1. Nearly 95% of data have z-scores between -2&2. Almost all of data have z-scores between -3&3. Hence, for such data, any data value with a z-score above 3 or below -3 is considered and OUTLIER. • Percentiles: percentile rankings make use of the pth percentile. Ex; median. Median is the 50 percentile – 50 % of observations lie above it, and 50% lie below it o For any p, the pth percentile has p% of the measures lying below it, and (100- p)% above it • Quartiles: there are three quartiles, which partition the dataset into four equal parts. o Q orLQ or,1lower quartile: 25% of the observations are below the first quartile. nd o M, or, Q o2 median: 2 quartile is the same as median. rd o Q , Ur, Q o3 upper quartile: 75% of the observations lie below the 3 quartile. o IQR gives range of the middle 50% of the data. o How to find quartiles? o Find the median (2 quartile) o Find the median of observations below the 2 quartile: 1 quartile nd rd o Find the median of observations above the 2 quartile: 3 quartile o Find IQR = Q - U L BOX PLOT: A box (rectangle) is drawn with its two ends (hinges) at the lower and the upper quartiles. The median is shown in the box (by a line). o The points at a distance 1.5 *IQR from each hinge mark the “inner fence”. o Lines are drawn from each hinge to the most extreme observation (called adjacent values) inside the inner fence. nd o The points at a distance of 3 IQR from each hinge mark the 2 pair of fences called “outer fences”. o Observations outside the “outer fence” are called extreme outliers and are marked using ‘0’. o Observations outside the inner fences but inside the outer fences are called mild outliers and are marked using “ *. o Distribution: Positively skewed dist. Negatively skewed dist. Symmetric dist. Example: Compute the five number summary after identifying any outliers of the following data set below. 15 19 25 30 33 45 52 56 59 62 Minimum: 15 IQR= Q -UQ = L6-25 = 31 Maximum: 62 Lower IQR= 25- (31*1.5)= -21.5 Median (2 quartile): 39 Upper IQR= 56+ (31*1.5) = 102.5 1 quartile: 19 nd 2 quartile: 25 3 quartile: 56 No outliers. The data is not symmetric, it is positively skewed.

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.