Join StudySoup for FREE

Get Full Access to
Rutgers - STAT 960 - Class Notes - Week 2

Description

Reviews

Stats 401 Week 2

Relative frequency = How many times appeared in category ÷ Total items in category (like finding percent without multiplying by 100 at the end)

For the median, if there is an even amount of number given then add the two middle terms then divide by 2.

EX: 1 2 3 4 5 6. The two middle terms are 3 and 4 so 3 + 4 = 7. Then 7/2 = 3.5 (finding the average of those numbers).

If there isn’t a mode (1 2 3 4 5 6) then write “NO MODE” not “The mode is zero” because that implies that there is the number 0 and that is what appears the most.

Peak of curve is in the middle

When the mean = median = mode the distribution is bell shaped or symmetric.

Median > mean then it is left skewed

Median < mean then it is right skewed

APPROPRIATE MEASURE OF CENTER

1. You have to check to see if it is Categorical or Numerical

a. Categorical

i. Use MODE

b. Numerical

i. Check to see if there are extremes (Numbers that are very far from the data, can

and can’t be outliers.)

∙ If Yes then use Median

∙ If No then use the mean.

You don’t use mode to find the center of numerical values because it doesn’t take into account all of the values provided. Extreme observations don’t use mean because the average will float towards that Extreme (which can be a really high or low number).

Percentile tells you where you are in respect to other. If a person is in the 80th percentile then 80% of people are below them and they are in the top 10%. Percentile is measured out of 100 because it is percent.

Don't forget about the age old question of Which escape sequence is used to print?

QUARTILES Each Quartile is 25 %

Q1 Q2 Q3

Lower Half Upper Half

Q2 is the median

From the Median, split that in two, Upper and Lower half. Q1 is the mean of the lower half and Q3 is the mean of the upper half

Ex: 15, 18, 56, 78, 26, 43, 29

First Rearrange data from lowest to highest: 15, 18, 26, 29, 43, 56, 78. Then find median: 29. Lower Half (15, 18, 26) Mean of this is 19.667, Upper Half (43, 56, 78) Mean of this is 59. So Q1= 19.667, Q2= 29, and Q3=59 If you want to learn more check out How does silica content affect magma viscosity?

Five Number Summary

(Min, Q1, Q2, Q3, Max) In that order to make a box plot/ box whisker

1st scale an even number line. 2nd Plot Q1, Q2, Q3 (from above), max, and min. Then draw vertical lines through Q1, Q2, and Q3. Connect those lines making a box. Finally draw horizontal lines to max and min.

If you want to learn more check out What is the parallax of a star?

0 10 20 30 40 50 60 70 80

Measures of Variation (spread of data)

Range is Max – Min

Standard Deviation

σ = population

Outliers- How to Find Outlier

Inter Quartile Range (IQR) is Q3-Q1: 59 – 19.667 = 39.333

Lower Limit= Q1 – ((1.5)(IQR)): 19.667 – ((1.5)(39.333))= -39.333

Upper Limit= Q3 + ((1.5)(IQR): 59 + ((1.5)(39.333))= 118.000

If the max or min goes past these numbers then they are outliers. If you want to learn more check out What does totemism mean?

Standard Deviation

Variance

There are two equations used to find Variance

s² =∑(��−��̅)²

��−1 (sample variance)² = The sum of (x minus mean)² divided by (# of units minus 1)

x

(x-x)

(x-x)²

1

-1

1

2

0

0

3

1

1

Total

6

0

2

1. Add all the numbers in the x column.

2. Find the mean which is 6/3= 2.

3. Fill in the next column by subtracting the mean (2) from x. 4. Square your answer from the previous column We also discuss several other topics like What is the meaning of the isolation effect?

5. Plug in (2) back in the numerator of the equation and then divide by 2 because the denominator says to divide by # of unit minus 1.

6. So s² = 1 and then s = 1 because the square root of one is just 1. s² = ∑(��2)−(∑ ��)²

��

��−1 (sample variance)² = the sum of (x squared) minus ((the sum of x)² divided by # of units) then all divided by number of units minus 1.

x

x²

2

4

4

16

6

36

8

64

10

100

Total

30

220

1. Add all of X

2. Add all of x²

3. Plug in We also discuss several other topics like What is the meaning of family in family life?

s² = 220 −(30)²

5

5−1-> s² = 220 – 180 / 4 -> s² = 10 -> s (sample variance) = 3.162

Interpretation of Data using mean and Standard Deviation

Empirical Rule- For a bell shape distribution

1. About 68% of data lie within one standard deviation to either side of mean. 2. About 95% of data lie within two Standard deviations to either side of mean.

3. About 99.7% of the data lie within three standard deviations to either side of mean.

(https://saylordotorg.github.io/text_introductory-statistics/s06-05-the-empirical-rule-and chebysh.html)

Chebyshevs Data

For only numerical data

At least 75% of the data lie within two standard deviations to either side of the mean. At least 89% of data lie within three standard deviations to either side of mean Z score is a measure of how many deviations above or below the population mean a raw score.