Stats 401 Week 2
Relative frequency = How many times appeared in category ÷ Total items in category (like finding percent without multiplying by 100 at the end)
For the median, if there is an even amount of number given then add the two middle terms then divide by 2.
EX: 1 2 3 4 5 6. The two middle terms are 3 and 4 so 3 + 4 = 7. Then 7/2 = 3.5 (finding the average of those numbers).
If there isn’t a mode (1 2 3 4 5 6) then write “NO MODE” not “The mode is zero” because that implies that there is the number 0 and that is what appears the most.
Peak of curve is in the middle
When the mean = median = mode the distribution is bell shaped or symmetric.
Median > mean then it is left skewed
Median < mean then it is right skewed
APPROPRIATE MEASURE OF CENTER
1. You have to check to see if it is Categorical or Numerical
i. Use MODE
i. Check to see if there are extremes (Numbers that are very far from the data, can
and can’t be outliers.)
∙ If Yes then use Median
∙ If No then use the mean.
You don’t use mode to find the center of numerical values because it doesn’t take into account all of the values provided. Extreme observations don’t use mean because the average will float towards that Extreme (which can be a really high or low number).
Percentile tells you where you are in respect to other. If a person is in the 80th percentile then 80% of people are below them and they are in the top 10%. Percentile is measured out of 100 because it is percent.
Don't forget about the age old question of Which escape sequence is used to print?
QUARTILES Each Quartile is 25 %
Q1 Q2 Q3
Lower Half Upper Half
Q2 is the median
From the Median, split that in two, Upper and Lower half. Q1 is the mean of the lower half and Q3 is the mean of the upper half
Ex: 15, 18, 56, 78, 26, 43, 29
First Rearrange data from lowest to highest: 15, 18, 26, 29, 43, 56, 78. Then find median: 29. Lower Half (15, 18, 26) Mean of this is 19.667, Upper Half (43, 56, 78) Mean of this is 59. So Q1= 19.667, Q2= 29, and Q3=59 If you want to learn more check out How does silica content affect magma viscosity?
Five Number Summary
(Min, Q1, Q2, Q3, Max) In that order to make a box plot/ box whisker
1st scale an even number line. 2nd Plot Q1, Q2, Q3 (from above), max, and min. Then draw vertical lines through Q1, Q2, and Q3. Connect those lines making a box. Finally draw horizontal lines to max and min.
If you want to learn more check out What is the parallax of a star?
0 10 20 30 40 50 60 70 80
Measures of Variation (spread of data)
Range is Max – Min
σ = population
Outliers- How to Find Outlier
Inter Quartile Range (IQR) is Q3-Q1: 59 – 19.667 = 39.333
Lower Limit= Q1 – ((1.5)(IQR)): 19.667 – ((1.5)(39.333))= -39.333
Upper Limit= Q3 + ((1.5)(IQR): 59 + ((1.5)(39.333))= 118.000
If the max or min goes past these numbers then they are outliers. If you want to learn more check out What does totemism mean?
There are two equations used to find Variance
��−1 (sample variance)² = The sum of (x minus mean)² divided by (# of units minus 1)
1. Add all the numbers in the x column.
2. Find the mean which is 6/3= 2.
3. Fill in the next column by subtracting the mean (2) from x. 4. Square your answer from the previous column We also discuss several other topics like What is the meaning of the isolation effect?
5. Plug in (2) back in the numerator of the equation and then divide by 2 because the denominator says to divide by # of unit minus 1.
6. So s² = 1 and then s = 1 because the square root of one is just 1. s² = ∑(��2)−(∑ ��)²
��−1 (sample variance)² = the sum of (x squared) minus ((the sum of x)² divided by # of units) then all divided by number of units minus 1.
1. Add all of X
2. Add all of x²
3. Plug in We also discuss several other topics like What is the meaning of family in family life?
s² = 220 −(30)²
5−1-> s² = 220 – 180 / 4 -> s² = 10 -> s (sample variance) = 3.162
Interpretation of Data using mean and Standard Deviation
Empirical Rule- For a bell shape distribution
1. About 68% of data lie within one standard deviation to either side of mean. 2. About 95% of data lie within two Standard deviations to either side of mean.
3. About 99.7% of the data lie within three standard deviations to either side of mean.
For only numerical data
At least 75% of the data lie within two standard deviations to either side of the mean. At least 89% of data lie within three standard deviations to either side of mean Z score is a measure of how many deviations above or below the population mean a raw score.