Math 221 Study Guide test 1

Syracuse

GPA 3.78

This Study Guide belongs to MAT 221 - M200 at Syracuse University taught by X. Au in Spring 2016.

Date Created: 02/12/16

Statistics – MAT 221 Statistics – the science of learning from data 2 Main activities of Statistics: 1. Estimating a characteristic of the population 2. Testing a hypothesis or claim about a population Chapter 1 – Looking at Data 1.1 – Distribution ex 1) amount spent on textbooks Person 1 2 3 4 5 $$ 539 628 489 716 641 Spent - this table is DATA - each person is a CASE - numbers 1-10 are LABELS - $$ spent is a VARIABLE - #’s (539, 628, 489) are a DISTRIBUTION *example 1 is a QUANTITATIVE VARIABLE *example 2 is a CATEGORICAL VARIABLE 1.2 – Displaying Distribution w/ Graph Categorical Data: - Bar Graph - Pie Chart *Be aware of misleading graphs (scaling) - To deemphasize, zoom out - To emphasize, zoom in - Don’t make 3D graphs Quantitative Data: - Stem plots - Histograms Outliers – observations (numbers) that lie outside the overall pattern of the distribution Symmetric – you could draw a line down the middle and both sides look the same Skewed Right – the right side of the histogram extends much farther than the left - The outliers are on the right, the majority is on the left Skewed Left – the left side of the histogram extends further than the right - The outliers are on left, the majority on the right 1.3 Describing Distributions with Numbers Measures of Center: 1. Mean or Average – to calculate, add all numbers and divide by the amount of numbers (cases) 2. Median – the midpoint of the distribution (put in order first. if no middle, take the average of the two) Ex. 0, 1, 2, 3, 100 Median – 2 Mean – 53 *Median is resistant (not effected) to outliers * Mean is “center of gravity” Symmetric – Mean = median Skewed Right – Mean > Median Skewed Left – Mean < Median Measure of Spread: Quartiles 1. First Quartile – the first 25% of the data, the median of the lower half of data 2. Third Quartile – the last 25% of the date, the median of the higher half of data Exclude the median Interquartile Range (IQR) – the distance between quartiles IQR = Q 3– Q 1 Min Q1 Median Q3 Max Five-number Summary – min, Q1, median, Q3, max Boxplot – above number line Rule of Thumb for Identifying Outliers: Any number lower/higher than 1.5 X IQR Standard Deviation – measures how far the observations are from their mean *affected by skewness or outliers 1. calculate mean (average) 2. s2= (x1-x)2 + (x2 – x)2 + … n-1 - SD is 0 when all the numbers have the same value, otherwise its positive - SD has the same unit of measurement as the original observations 1.4 Density Curves and Normal Distributions Density Curve – a smooth approximation of a histogram - Estimation, don’t have to draw histogram, just red curve - The total area under the curve is equal to 1 or 100% Curve Histogram Mean = mean = x SD = SD = s - The median of a density curve is the point that divides the area under the curve in half - the mean is the point at which the curve would balance is made of solid material Normal Distributions – a special symmetrical bell shaped distribution whose density curve is completely determined by its mean () or SD () The 68-95-99.7% Rule for Normal Distributions: 68% are within 1 SD of the mean 68 95 95% are within 2 SD of mean 99.7% are within 3 SD of mean 99.7 Z-score: the number of standard deviations that x is from the mean Z= x-/ *when x> the mean, z is positive *when x< the mean, z is negative MAT 221 – Chapter 2 Looking at Data - Relationships 2.1 & 2.2: Relationships & Scatterplots Scatterplot- one axis is used to represent each of the variables and the data are plotted as points on the graph Three Aspects of a Relationship: 1. Direction- positive or negative a. Positive: greater values of one variable tend to occur w/ greater values of other values (ex. House size and price) b. Negative: greater values of one variable tend to occur w/ smaller values of other variable (ex. Weight of cars and fuel efficiency) 2. Form – linear, curved, clusters, no pattern 3. Strength – how closely the points fit the form No relationship- the variables are independent Explanatory (independent) variable – the one that controls the other variable [x-axis] Response (dependent) variable – the one that moves based on the other variable [y-axis] 2.3 Correlation Correlation (coefficient) r – a numerical measure of the direction and strength of the relationship between 2 quantitative variables Properties: - Value r ranges from -1 to 1 - Gives the direction of the relationship - Closer to 1 or -1 is a strong relationship - Closer to 0 is a weak relationship - Very sensitive to outliers How to calculate: - For each case in the sample we have a pair of values (x,y) - Suppose there are n cases (x1,y1), (x2,y2), … (x n,yn) Image from Professor Xu’s online notes: https://blackboa rd.syr.edu/bbcs webdav/pid- 3995343-dt- content-rid- 12064908_1/cou rses/35384.116 2/Ch2Part2.pdf - R has no unit of measure - Correlation only describes linear relationships - Not resistant to outliers – will be very affected 2.4 Least-Squares Regression Regression Line – a straight line the describes the relationship between x and y variables - Distinction between explanatory and response is important Which line “best fits”? -need line to be as close to all points as possible Residual – the vertical distance from the point to the line Least-squares Regression Line – unique line that the sum of the squared vertical distances between the data points and the line is as small as possible - A straight line is simply a picture of a relationship between two variables Straight Line: Y= (slope) X + (y-intercept) - The y-intercept is where the line crosses the y-axis - The slope tells us which way and by how the line is tilted Finding the equation of the regression line: 1. Find the slope(b 1): B 1= r (S y/Sx) r = correlation coefficient S x= SD of the x-values S y= SD of the y-values 2. Find the y-intercept(b 0): B 0= (average of y-values) Y – b 1(average of x-values) X 3. The equation is: y = b 1X + b 0 Chapter 3 Producing Data – MAT 221 3.1 & 3.2 - Sources of Data & Design of Experiments Anecdotal Data – unusual cases that we draw conclusions from past experiences - May not me representative of any larger group of cases Available Data – past data we produced that may help us Population – the entire group of individuals we are studying Parameter – the part of the population we are studying and have data for Statistic – a number describing a characteristic of a sample Experimental units – the individuals in an experiment - Called Subjects if they are human - Treatment or factor: the “something” we do to a subject that’s response gets measured Observational Study: Simply observing and recording data of individuals without influencing responses - Cannot establish cause and effect relationships Experimental Study – Deliberately giving individuals a sort of treatment and recording their responses Control – a situation where no treatment is given; serves as reference mark/basis Placebo – a fake treatment to test that the results are from the actual treatment and not the subjects belief that they are being treated Ronald Fisher (1890 – 1962) – randomized comparative experiments; fertilizer Principles of Experimental Design: 1. Control the effects of lurking variables 2. Randomize 3. Replicate treatment on enough subjects to reduce chance of variation in results Biased – systematically favoring certain outcomes - Random assignment is the best way to avoid Blind experiment – one in which the subjects do not know which treatment they are getting until the experiment it completed Double-Blind Experiment – neither the subjects or the experimenter know who has the treatment until the experiment is over 3.3 – Sampling Design - We don’t always get a response from everyone in our sample - Response Bias: people don’t always respond truthfully - Wording effects: the way a question is worded may influence a certain response 3.4 – Toward Statistical Inference Statistical Inference – the process of drawing conclusions about a population from data obtained from a sample Sampling Variability – every time we take a random sample from a population we are likely to get a different set of individuals/ statistics Sampling Distribution – the distribution achieved by repeating the study many times with the same sample size - The larger the sample size, the lower the sample variability - The better the data-collecting technique, the lower the bias Exam covers chapter 1, 2, 3

