### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Statistics 401, Midterm I 01:960:401

Rutgers

GPA 4.0

### View Full Document

## 412

## 7

## Popular in Basic Statistics for Research

## Popular in Statistics

This 5 page Study Guide was uploaded by Wendy Liu on Friday September 30, 2016. The Study Guide belongs to 01:960:401 at Rutgers University taught by Hei-ki Dong in Fall 2016. Since its upload, it has received 412 views. For similar materials see Basic Statistics for Research in Statistics at Rutgers University.

## Reviews for Statistics 401, Midterm I

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/30/16

Midterm I: Study Guide 4 October 2016 Basic Statistics for Research Professor HK Dong Wendy Liu Common Notations ∑ Summation – addition of set of values x Variable representing individual data values x bar – mean of all x values Standard Notation for sample vs. population: Variable sample Population Mean Standard deviation Variance Data set size Correlation coeffiecient Unit/subject – single entity from which you collect data Ex: person Population of units/subjects – complete collection of units from which you collect data Ex: American citizens Population – large set of all potential measurements corresponding to the population of units Ex: age of all American citizens Sample – subset of measurements that are actually collected during investigation Ex: age of 40 American citizens Statement of purpose – the reason to collect data; must be specific and unambiguous Statistics – collecting, summarizing, interpreting data and then drawing conclusions Objectives of Statistics 1. Make inferences about a population by analyzing info from a sample a. Includes assessing the extent of uncertainty involved in these inferences 2. Design the process and extent of sampling so that the sample is representative of the population, and thus inferences are valid Inferential statistics – evaluate info present in data, assess the new learning gained from this info Descriptive statistics – summarize and describe prominent features of data 3 S’s of descriptive statistics: shape, center, spread Shape of the distribution Normal – bell shaped, symmetric about the center o Area under entire curve = 1 = 100% o Mean = median = mode (or v. close) Skewed left/negatively skewed – long tail on left o Few data points to the left (more negative) of the majority Skewed right/positively skewed – long tail on right Uniform distribution o Few data points to the right (more positive) of the majority Uniform – nearly equal frequency of all values o Flat-topped Bimodal – continuous probability distribution of two diff. variables w/ two peaks o Looks like two normal distributions merged together Spread (variation) – how far apart the data is from each other o Range – difference btwn largest and smallest observations Range = max. – min. o Deviation from the mean: Measure of variation for one data point (not entire data set) Total deviation for any data set Positive and negative deviations of diff. data points eventually cancel out Average deviation from the mean: o Interquartile range (IQR) – middle 50% of data IQR = Q3-Q1 o Standard deviation – avg. distance of scores in a distribution from their mean Sample standard deviation Population standard deviation o Variance – standard deviation squared Sample variance Population variance o Bessel’s correction for standard deviation and variance: use n-1 for samples for the n-1 degrees of freedom Samples generally won’t have as many outliers as population (if any at all) Dividing by a smaller value (n-1) results in a larger st.dev., which will be more similar to the true population st.dev. Central tendency – where the data is clustered o Mean – average of set o Median – Q 2 middle value of ordered measurements Positioning point: = value corresponding to median Points ending in (x.5) occurs for even data sets (sets with even n) Take average of data values corresponding to x.5±0.5 o Mode – most frequent data measurement Types of Data 1. Qualitative – classified in categories, not numerically measured 2. Quantitative/numerical/measurement – variables measured w/ numbers Discrete – gaps btwn neighboring distinct values Continuous – no gaps btwn neighboring value Organizing quantitative data Ordered array: list all data smallest to largest/largest to smallest Visual representations: o Frequency distributions + cumulative distributions Histogram – like a bar graph, but for quantitative data (number line on x-axis) Polygon Ogive – like a polygon, but cumulative o Stem + leaf plot For smaller sets of data o Dot plot – dots on top of a number line representing each data point Frequency distributions for continuous variables Class intervals – cover ranges of equal length w/o overlapping Class boundaries – endpoints of intervals Class frequency – number of observations belonging to each class interval Relative frequency – percentage of observations in each class out of total observations 5 number summary – forms box & whisker plot Min – smallest value in data set th Q 1 25 percentile; 25% of data is below it o aka the median of min-Q 2 Q 2 aka the median – splits data in half equallytht 50% mark Q 3 75 percentile; 75% of data is below it o aka the median of Q 2max Max – largest value in data set Outlier data – marked as an asterisk (*) on box&whisker plot Q 3 1.5IQR = upper limit: data points above the upper limit are outliers Q 1 1.5IQR = lower limit: data points below the lower limit are outliers z-score – a data point’s distance away from the mean, measured in units of standard deviation allows for comparison across data sets z-score of mean: z=0 sample: population: Empirical Rule – for normal distributions o 68% of the data will be within 1 standard deviation away from the mean o 95% of the data will be within 2 standard deviations away from the mean o 99.7% of the data will be within 3 standard deviations away from the mean Data points more than 2 stdevs away from the mean are considered outliers o Chebyshev’s Rule – any type of distribution No useful info for z=1 At least 75% of data will be within z=±2 At least 89% of data will be within z=±3 Bivariate/multivariate data – observations on two or more variables Marginal totals – total frequency of any row or column of a data table, given in the right-hand margin or bottom margin Simpson’s parado – reversal of conclusions from a data table after combining several data tables together due to appearance of unreported variables Experimentation Predictor/input/independent variable – denoted by x Response/output/dependent variable – denoted by y Random assignment – subjects placed randomly into control/experimental groups Placebo effect – subject’s e pectations of a treatment to work cause positive results, even though the treatment itself has no therapeutic value o Placebo – treatment that has no physiological effect; usually a sugar pill, for drug testing Double-blind procedure – e perimenters don’t know which subjects are in which group, and subjects themselves don’t know what group they are in o eliminates placebo effect and experimenter bias Scatter diagram/plot – pairs of observations plotted as dots on a graph, with one observation as one variable (x,y) Positive correlation – x and y increase/decrease together Negative correlation – x and y increase/decrease in opposite directions Correlation coefficient r – measures strength and direction of linear relationship between x and y Ranges from -1 ≤ r ≤ 1 o Magnitude of r indicates strength |r| = 1, perfect linear relationship o Sign of r indicates direction r > 0, positive correlation r < 0, negative correlation o r = 1, perfect positive correlation o r = -1, perfect negative correlation o r = 0, no correlation Calculating r: Definitional formulas: sum of squared deviations of x: sum of squared deviations of y: sum of cross products of x & y deviations: Alternative formulas: sum of squared deviations of x: sum of squared deviations of y: sum of cross products of x & y deviations: Spurious correlation – observed correlation btwn two variables that is false, due to influence by a third variable (the lurking variable) Method of Least Squares – for the line of best fit – minimizes the average amount of residual residual (SSE) – vertical error btwn data point and line of best fit o sum of squared error: regression equation for line of best fit: o slope: o intercept: Coefficient of determination r – amount (%) of variation in Y due to variation in X

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.