Chapter 2 Notes
• Deviation—Given a value y and a data point x then x-y represents how far x deviates from y.
o Ex. Given x=10 and y=5. 10 deviates by 5 from 5.
• Bar Charts—Display where the length of a bar corresponds to the frequency or number of observations in a category
• Pie Charts—Slice is proportional to the amount in each category • Relative Frequency—The proportion and is calculated by relative frequency= # in the class/# total.
• Cumulative Frequency—Sum of frequencies of a particular class and all preceding classes.
• Cumulative Relative Frequency—Make the cumulative frequency relative.
• Histogram—A bar graph of frequency or relative frequency • Algorithm
1. Determine the number of classes
2. Find the smallest and largest value
3. Class Width=largest – smallest divided by total number of classes
4. First class is usually the first class with the smallest number and starting at a multiple of the class width
5. Find class boundaries are the average/midpoint between two classes
a. Lower class boundary < xi < upper class boundary 6. Calculate the frequency or relative frequency
7. Create a bar graph!
***Class boundaries are sometimes called cut points. Classes are sometimes called bins.
• Ordered Array—List of all data points in order
Don't forget about the age old question of How does variety of lean protein affect eating pattern?
o Rank order: increasing order
o Reverse rank order: decreasing order
• Dot Plot—Graph where each data point is a point above a horizontal axis (usually a number line) if multiple entries have the same value they are stacked
• Probability Distribution—Assigns a probability to a set of possible outcomes
• Symmetric Distribution—If one were to draw a line down the middle of the distribution the two sides would mirror each other • Skewed (asymmetrical) Distribution—Not symmetric or a group of observation that are not equal on both sides
1. Left Skewed—Left side longer
2. Right Skewed—Right side longer
• Unimodal—Distribution has each one “peak”
• Bimodal—Has exactly two “peaks”
• Multimodal—Has more than one “peak”
• Mode—The value that occurs most frequently, not necessarily unique
• Mean—The average
o Sample Mean—Xi is the data point in a sample
o Population Mean—N is the number of elements in the population. For a finite population.
o Weighted Mean—Suppose the ith observation is given a weight wi We also discuss several other topics like What happened in the chesapeake incident?
o Trimmed Mean—Ignores equal percentages of the highest and lowest data points.
• Median—Data value in the center of an odered list o Ex. 1,3,4,6,3,4,5—1,3,3,4,4,5,6. 4 is the median.
• Outlier—Data points that are extremely small or large relative to the data set.
• Resistant—Statistics not affected by outliers are called resistant • Range—The difference between the largest and smallest value • Empirical Rule—Derived from a bell-curve (normal distribution) We also discuss several other topics like What makes potassium flouride ionic?
o One-sigma rule—68% of data lives within one standard derivation of the mean
o Two-sigma Rule—95% of the data lives within 2 standard deviations of mean If you want to learn more check out How do you solve a lagrangian equation?
o Three-sigma—99.7% lives within 3 standard deviations of the mean
• Chebyshev’s Theorem—Proportion of any data set lying within K standard deviations of the mean is at least 1-1/k2 for k>1 • Percentiles—Given a set of data xi,…xN, the Pth percentile is a value, say x, such that approximately P% of the data is less than or equal to x and (100-P)% is greater We also discuss several other topics like What was the dominant form of political organization?
• Percentile of a Value—Percentile of x=#data pts. =x/total # of data pts.
• Quartiles—The 25th 50th 75th percentiles are the first, second, and third quartiles Q1, Q2, Q3
• Interquartile Range—Difference between third and first quartile • Outliers—A data point is considered an outlier if it is 1.5 times the IQR above Q3 or 1.5 times the IQR be between Q1
• Z-score—The number of standard deviations x away from the mean.
• Mean of Grouped Data—Data points might be binned in classes • Random Experiment—An activity or event where the outcome is uncertain
• Sample Space—The set of all distinct outcomes of an experiment • Relative Frequency—Rel Freq. of A=# times A occus divided by the # times of the experiment We also discuss several other topics like How does social class influence families?
• Set Theory—A set is a list without repeats
o Compound event: A combination of two or more events o Union of Events: A & B is the set of outcomes that are included A or B or both. Denoted AυB
• Intersection—Intersection of events A & B is the set of all outcomes that are in both A & B.
• Complement of Event A—the set of all outcomes not in A • Mutually Exclusive—The two sets A and B are mutually exclusive if they have no points in common