Week 1 Notes - Lecture 2
Week 1 Notes - Lecture 2 STATS 250
Popular in Introduction to Statistics
Popular in Statistics
verified elite notetaker
This 6 page Class Notes was uploaded by Debra Tee on Wednesday September 21, 2016. The Class Notes belongs to STATS 250 at University of Michigan taught by Brenda Gunderson in Fall 2016. Since its upload, it has received 17 views. For similar materials see Introduction to Statistics in Statistics at University of Michigan.
Reviews for Week 1 Notes - Lecture 2
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/21/16
Lecture 2: Turning Data into information - “Simple summaries of data can tell an interesting story and are easier to digest than long lists.” 2.1 Raw Data - Raw data correspond to numbers and category labels that have been collected or measured but have not yet been processed in any way. Definition: - A variable is a characteristic that differs from one individual to the next. - Sample data are collected from a subset of a larger population. - Population data are collected when all individuals in a population are measured. - A statistic is a summary measure of sample data. - A parameter is a summary measure of population data. 2.2 Types of Variables -‐ We have 2 variables in our data set. We want to distinguish between the different types of variables -‐ different types of variables provide different kinds of information and the type will guide what kinds of summaries (graphs/numerical) are appropriate. - Think about it: ▯ Could you compute the “AVERAGE AMOUNT OF SLEEP” for these 86 students? YES ▯ Could you compute the “AVERAGE SLEEP DEPRIVED STATUS” for these 86 students? NO (could code, but it would be arbitrary: 0 and 1, or could use any two values like 1 and 203) - SLEEP DEPRIVED STATUS is said to be a CATEGORICAL variable, - AMOUNT OF SLEEP is a QUANTITATIVE variable. Definitions: - A categorical variable places an individual or item into one of several groups or categories. - When a categorical variable has ordered categories, it is called an ordinal variable. - A quantitative variable takes numerical values for which arithmetic operations such as adding and averaging make sense. Other names for quantitative variable are: measurement variable and numerical variable Exercise: ▯ -‐ Age (years): QUANTITATIVE ▯ -‐ Typical Classroom Seat Location (Front, Middle, Back): CATEGORICAL ▯ -‐ Number of songs on an iPod: QUANTITATIVE ▯ -‐ Time spent studying material for this class in the last 24-‐hour period (in hours) : QUANTITATIVE ▯ -‐ Soft Drink Size (small, medium, large, super-‐sized): CATEGORICAL (ordinal) 2.3 Summarizing One or Two Categorical Variables - Numerical Summaries: percentages, proportions, frequency distributions - Visual summaries: graph, pie charts, histogram - Maximum, Minimum - Categorical Data: Use bar graphs Interpreting Histograms: 1. Location (center, average) - Approximately the middle value or where it would balance 2. Spread (variability) - Range (overall and then where most of the observations are) 3. Deviations from overall Pattern - Outliers Examples: Numerical Summaries: - Mean: numerical average value - Median: the middle value when data is arranged from smallest to largest - - Range = Max – Minimum - Percentiles: The pth percentile is the value such that p% of the observations fall at or below that value. th - Median: 50 percentile - First percentile: 25 percentile th - Third quartile: 75 percentile - Interquartile Range: Measures the spread over the middle 50% of the data. IQR = Q3-‐ Q1 - Outliers: Uses 1.5* IQR rule BOXPLOT: - Side by side boxplots are good for comparing 2 or more sets of observations - Can’t confirm shape from a boxplot alone. (Histograms better for showing shape). Quantitative Variables: - Histogram - QQ Plots - Time Plots - Boxplots - Scatter plots Categorical Variables: - Bar Charts - Pie Charts - Freq. Tables