# Stats 401, Week 2 Notes 01:960:401

Rutgers

GPA 3.28

This 4 page Class Notes was uploaded by Chelsea Notetaker on Friday September 16, 2016. The Class Notes belongs to 01:960:401 at Rutgers University taught by Michael Miniere in Fall 2016. Since its upload, it has received 72 views. For similar materials see Basic Statistics for Research in Statistics at Rutgers University.

Date Created: 09/16/16

Stats 401 Week 2 Relative frequency = How many times appeared in category ÷ Total items in category (like finding percent without multiplying by 100 at the end) For the median, if there is an even amount of number given then add the two middle terms then divide by 2. EX: 1 2 3 4 5 6. The two middle terms are 3 and 4 so 3 + 4 = 7. Then 7/2 = 3.5 (finding the average of those numbers). If there isn’t a mode (1 2 3 4 5 6) then write “NO MODE” not “The mode is zero” because that implies that there is the number 0 and that is what appears the most. Peak of curve is in the middle When the mean = median = mode the distribution is bell shaped or symmetric. Median > mean then it is left skewed Median < mean then it is right skewed APPROPRIATE MEASURE OF CENTER 1. You have to check to see if it is Categorical or Numerical a. Categorical i. Use MODE b. Numerical i. Check to see if there are extremes (Numbers that are very far from the data, can and can’t be outliers.) If Yes then use Median If No then use the mean. You don’t use mode to find the center of numerical values because it doesn’t take into account all of the values provided. Extreme observations don’t use mean because the average will float towards that Extreme (which can be a really high or low number). Percentile tells you where you are in respect to other. If a person is in the 80 percentile then 80% of people are below them and they are in the top 10%. Percentile is measured out of 100 because it is percent. QUARTILES Each Quartile is 25 % Q1 Q2 Q3 Lower Half Upper Half Q2 is the median From the Median, split that in two, Upper and Lower half. Q1 is the mean of the lower half and Q3 is the mean of the upper half Ex: 15, 18, 56, 78, 26, 43, 29 First Rearrange data from lowest to highest: 15, 18, 26, 29, 43, 56, 78. Then find median: 29. Lower Half (15, 18, 26) Mean of this is 19.667, Upper Half (43, 56, 78) Mean of this is 59. So Q1= 19.667, Q2= 29, and Q3=59 Five Number Summary (Min, Q1, Q2, Q3, Max) In that order to make a box plot/ box whisker 1 scale an even number line. 2 Plot Q1, Q2, Q3 (from above), max, and min. Then draw vertical lines through Q1, Q2, and Q3. Connect those lines making a box. Finally draw horizontal lines to max and min. 0 10 20 30 40 50 60 70 80 Measures of Variation (spread of data) Range is Max – Min Standard Deviation σ = population Outliers- How to Find Outlier Inter Quartile Range (IQR) is Q3-Q1: 59 – 19.667 = 39.333 Lower Limit= Q1 – ((1.5)(IQR)): 19.667 – ((1.5)(39.333))= -39.333 Upper Limit= Q3 + ((1.5)(IQR): 59 + ((1.5)(39.333))= 118.000 If the max or min goes past these numbers then they are outliers. Standard Deviation Variance There are two equations used to find Variance ∑ (????−????)² s² = ????−1 (sample variance)² = The sum of (x minus mean)² divided by (# of units minus 1) x (x-x) (x-x)² 1. Add all the numbers in the x column. 2. Find the mean which is 6/3= 2. 1 -1 1 2 0 0 3. Fill in the next column by subtracting the mean (2) from x. 3 1 1 4. Square your answer from the previous column Total 5. Plug in (2) back in the numerator of the equation and then divide by 2 because the denominator says to divide by # of unit minus 1. 6 0 2 6. So s² = 1 and then s = 1 because the square root of one is just 1. 2 ( ????)² ∑(???? )− ???? s² = ????−1 (sample variance)² = the sum of (x squared) minus ((the sum of x)² divided by # of units) then all divided by number of units minus 1. x x² 1. Add all of X 2 4 2. Add all of x² 4 16 3. Plug in 6 36 220 − (30)² s² = 5 -> s² = 220 – 180 / 4 -> s² = 10 -> s (sample variance) = 3.162 8 64 5−1 10 100 Total 30 220 Interpretation of Data using mean and Standard Deviation Empirical Rule- For a bell shape distribution 1. About 68% of data lie within one standard deviation to either side of mean. 2. About 95% of data lie within two Standard deviations to either side of mean. 3. About 99.7% of the data lie within three standard deviations to either side of mean. (https://saylordotorg.github.io/text_introductory-statistics/s06-05-the-empirical-rule-and- chebysh.html) Chebyshevs Data For only numerical data At least 75% of the data lie within two standard deviations to either side of the mean. At least 89% of data lie within three standard deviations to either side of mean Z score is a measure of how many deviations above or below the population mean a raw score.

