Math 221 Study Guide test 1
Math 221 Study Guide test 1 MAT 221 - M200
Popular in Elements of Mathematical Statistics and Probability Theory
MAT 221 - M200
verified elite notetaker
Popular in Math
MAT 221 M200
verified elite notetaker
This 7 page Study Guide was uploaded by Niki Neidhart on Friday February 12, 2016. The Study Guide belongs to MAT 221 - M200 at Syracuse University taught by X. Au in Spring 2016. Since its upload, it has received 61 views. For similar materials see Elements of Mathematical Statistics and Probability Theory in Math at Syracuse University.
Reviews for Math 221 Study Guide test 1
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/12/16
Statistics – MAT 221 Statistics – the science of learning from data 2 Main activities of Statistics: 1. Estimating a characteristic of the population 2. Testing a hypothesis or claim about a population Chapter 1 – Looking at Data 1.1 – Distribution ex 1) amount spent on textbooks Person 1 2 3 4 5 $$ 539 628 489 716 641 Spent - this table is DATA - each person is a CASE - numbers 1-10 are LABELS - $$ spent is a VARIABLE - #’s (539, 628, 489) are a DISTRIBUTION *example 1 is a QUANTITATIVE VARIABLE *example 2 is a CATEGORICAL VARIABLE 1.2 – Displaying Distribution w/ Graph Categorical Data: - Bar Graph - Pie Chart *Be aware of misleading graphs (scaling) - To deemphasize, zoom out - To emphasize, zoom in - Don’t make 3D graphs Quantitative Data: - Stem plots - Histograms Outliers – observations (numbers) that lie outside the overall pattern of the distribution Symmetric – you could draw a line down the middle and both sides look the same Skewed Right – the right side of the histogram extends much farther than the left - The outliers are on the right, the majority is on the left Skewed Left – the left side of the histogram extends further than the right - The outliers are on left, the majority on the right 1.3 Describing Distributions with Numbers Measures of Center: 1. Mean or Average – to calculate, add all numbers and divide by the amount of numbers (cases) 2. Median – the midpoint of the distribution (put in order first. if no middle, take the average of the two) Ex. 0, 1, 2, 3, 100 Median – 2 Mean – 53 *Median is resistant (not effected) to outliers * Mean is “center of gravity” Symmetric – Mean = median Skewed Right – Mean > Median Skewed Left – Mean < Median Measure of Spread: Quartiles 1. First Quartile – the first 25% of the data, the median of the lower half of data 2. Third Quartile – the last 25% of the date, the median of the higher half of data Exclude the median Interquartile Range (IQR) – the distance between quartiles IQR = Q 3– Q 1 Min Q1 Median Q3 Max Five-number Summary – min, Q1, median, Q3, max Boxplot – above number line Rule of Thumb for Identifying Outliers: Any number lower/higher than 1.5 X IQR Standard Deviation – measures how far the observations are from their mean *affected by skewness or outliers 1. calculate mean (average) 2. s2= (x1-x)2 + (x2 – x)2 + … n-1 - SD is 0 when all the numbers have the same value, otherwise its positive - SD has the same unit of measurement as the original observations 1.4 Density Curves and Normal Distributions Density Curve – a smooth approximation of a histogram - Estimation, don’t have to draw histogram, just red curve - The total area under the curve is equal to 1 or 100% Curve Histogram Mean = mean = x SD = SD = s - The median of a density curve is the point that divides the area under the curve in half - the mean is the point at which the curve would balance is made of solid material Normal Distributions – a special symmetrical bell shaped distribution whose density curve is completely determined by its mean () or SD () The 68-95-99.7% Rule for Normal Distributions: 68% are within 1 SD of the mean 68 95 95% are within 2 SD of mean 99.7% are within 3 SD of mean 99.7 Z-score: the number of standard deviations that x is from the mean Z= x-/ *when x> the mean, z is positive *when x< the mean, z is negative MAT 221 – Chapter 2 Looking at Data - Relationships 2.1 & 2.2: Relationships & Scatterplots Scatterplot- one axis is used to represent each of the variables and the data are plotted as points on the graph Three Aspects of a Relationship: 1. Direction- positive or negative a. Positive: greater values of one variable tend to occur w/ greater values of other values (ex. House size and price) b. Negative: greater values of one variable tend to occur w/ smaller values of other variable (ex. Weight of cars and fuel efficiency) 2. Form – linear, curved, clusters, no pattern 3. Strength – how closely the points fit the form No relationship- the variables are independent Explanatory (independent) variable – the one that controls the other variable [x-axis] Response (dependent) variable – the one that moves based on the other variable [y-axis] 2.3 Correlation Correlation (coefficient) r – a numerical measure of the direction and strength of the relationship between 2 quantitative variables Properties: - Value r ranges from -1 to 1 - Gives the direction of the relationship - Closer to 1 or -1 is a strong relationship - Closer to 0 is a weak relationship - Very sensitive to outliers How to calculate: - For each case in the sample we have a pair of values (x,y) - Suppose there are n cases (x1,y1), (x2,y2), … (x n,yn) Image from Professor Xu’s online notes: https://blackboa rd.syr.edu/bbcs webdav/pid- 3995343-dt- content-rid- 12064908_1/cou rses/35384.116 2/Ch2Part2.pdf - R has no unit of measure - Correlation only describes linear relationships - Not resistant to outliers – will be very affected 2.4 Least-Squares Regression Regression Line – a straight line the describes the relationship between x and y variables - Distinction between explanatory and response is important Which line “best fits”? -need line to be as close to all points as possible Residual – the vertical distance from the point to the line Least-squares Regression Line – unique line that the sum of the squared vertical distances between the data points and the line is as small as possible - A straight line is simply a picture of a relationship between two variables Straight Line: Y= (slope) X + (y-intercept) - The y-intercept is where the line crosses the y-axis - The slope tells us which way and by how the line is tilted Finding the equation of the regression line: 1. Find the slope(b 1): B 1= r (S y/Sx) r = correlation coefficient S x= SD of the x-values S y= SD of the y-values 2. Find the y-intercept(b 0): B 0= (average of y-values) Y – b 1(average of x-values) X 3. The equation is: y = b 1X + b 0 Chapter 3 Producing Data – MAT 221 3.1 & 3.2 - Sources of Data & Design of Experiments Anecdotal Data – unusual cases that we draw conclusions from past experiences - May not me representative of any larger group of cases Available Data – past data we produced that may help us Population – the entire group of individuals we are studying Parameter – the part of the population we are studying and have data for Statistic – a number describing a characteristic of a sample Experimental units – the individuals in an experiment - Called Subjects if they are human - Treatment or factor: the “something” we do to a subject that’s response gets measured Observational Study: Simply observing and recording data of individuals without influencing responses - Cannot establish cause and effect relationships Experimental Study – Deliberately giving individuals a sort of treatment and recording their responses Control – a situation where no treatment is given; serves as reference mark/basis Placebo – a fake treatment to test that the results are from the actual treatment and not the subjects belief that they are being treated Ronald Fisher (1890 – 1962) – randomized comparative experiments; fertilizer Principles of Experimental Design: 1. Control the effects of lurking variables 2. Randomize 3. Replicate treatment on enough subjects to reduce chance of variation in results Biased – systematically favoring certain outcomes - Random assignment is the best way to avoid Blind experiment – one in which the subjects do not know which treatment they are getting until the experiment it completed Double-Blind Experiment – neither the subjects or the experimenter know who has the treatment until the experiment is over 3.3 – Sampling Design - We don’t always get a response from everyone in our sample - Response Bias: people don’t always respond truthfully - Wording effects: the way a question is worded may influence a certain response 3.4 – Toward Statistical Inference Statistical Inference – the process of drawing conclusions about a population from data obtained from a sample Sampling Variability – every time we take a random sample from a population we are likely to get a different set of individuals/ statistics Sampling Distribution – the distribution achieved by repeating the study many times with the same sample size - The larger the sample size, the lower the sample variability - The better the data-collecting technique, the lower the bias Exam covers chapter 1, 2, 3
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'