Intr Stat&Data Anlys
Intr Stat&Data Anlys STATS 250
Popular in Course
Popular in Statistics
verified elite notetaker
This 5 page Class Notes was uploaded by Easton Mayert on Thursday October 29, 2015. The Class Notes belongs to STATS 250 at University of Michigan taught by Thomas Venable Jr in Fall. Since its upload, it has received 19 views. For similar materials see /class/231658/stats-250-university-of-michigan in Statistics at University of Michigan.
Reviews for Intr Stat&Data Anlys
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/29/15
Stats 250 Exam 1 Study Guide Chapter 1 0 Statistics are numbers measured for some purpose 0 Statistics is a collection of procedures and principals for gathering data and analyzing information in order to help people make decisions when faced with uncertainty Chapter 2 0 simple summaries of data can tell an interesting story and are easier to digest than long lists 0 Raw data corresponds to numbers and categorylabels that have been collected or measured but not yet been processed in any wa 0 Data is always taken from a sam le Variable is a characteristic that differs from one individual to the next Sample data are collected from a subset of a larger population Population data are collected when all individuals in a population are measured Statistic is a summary measure of sample data Parameter is a summary measure of population data A categorical variable places an individual or item into one of several groups or categories 0 Ordinal variable is when the groups have an order or ranking I E 39 s all medium large extralarge A quantitative variable takes numerical values for which arithmetic operations such as adding and averaging make sense 0 Also known as the measurement variable or numerical variable 0 It is a discrete variable is you can only have it as a whole number I Ex number ofCD s you own 0 It is a continuous variable if you can have decimals I Ex time age etc 0 Visual summaries give us a visual representation of the data we are examining 0 Bar graph gives us a bar with height corresponding to number of items in that ou P I Used for categorical data Pie chart helps us see what part of the whole each groups form Histograms give us a graph for quantitative ata I Show us a distribution of the quantitative variable 0 To interpret histograms we examine shape location and sprea 0 Shape can be symmetric skewed bell shaped or uniform I Symmetric bell shape 00 I Le skew I Right skew I Uniform 0 Location refers to the center or the average I T e mean is usually used as the average but is pulled by outliers With a skewed graph the median will be more accurate 0 Spread variability refers to how far scores vary from the mean e s andard deviationto calculate this There are two basic measures of location or center Mean which is the numerical average value I Xbar x1xz ern n 0 Median is the middle value when data is arranged from smallest to largest This doesn t allow high or low observations to skew it 0 Note the mean is sensitive to extreme observations while the edian is resistant to extreme observations Range measures the spread over 1005 of the data 0 Ran e high value low value 7 maximum 7 minimum Percentiles the ppercentile is the value such that p of the observations fall at or below that oint 0 Common percentiles I Median 50 I First quartile Q1 25 I Third quartile Q3 75 Five number summary lists the median rst and third quartiles and the highest and lowest value to give a quick overview of the data values and information about the center an sprea Interquartile range measures the spread over the middle 50 of the data 0 IQ Q3 1 Boxplots are graphical representations of the ve number summary 0 Steps I Label an axis with values to cover the minimum and maximum ofthe data I Make a boxvvith ends at quartiles Q1 and Q3 Draw a line in the box at the median M Check for possible outliers using the 15IQR rule and if any plot them individuall Extend lines from end of box to smallest and largest observations that are not possible outliers 0 Note possible outliers are observations that are below Q115IQR or observations that are above Q3l5IQR 0 Side by side boxplots are good for comparing data 0 Watch out because points plotted individually are still part of the data I They are just outliers o Boxplots cannot confirm shape 0 Possible Reasons for outliers and possible actions 0 The outlier is a legitimate data value and represents natural variability for the group and the variables measures I Values may not be discarded for they provide important information about location and spread o A mistake was made while taking a measurement or entering it into the computer I If this can be veri ed the value should be corrected or discarded o The individual in question belongs to a different group than the bulk of individuals measured I Values may be discarded if a summary is desired and reported for the majority group only 0 Standard deviation is the measure of spread of the observations from the mean 0 roughly the average distance the observations fall from the mean 0 The squared standard deviation is variance 0 Interpretation for standard deviation is on average the X variable vary by about standard deviation from the mean X of mean I EX on average the weights of small orders of French fries vary b about 6 g from their mean weight of 73 g 0 Notes I S0 means there is no variation and all the scores are the same I Like the mean s is sensitive to outliers We used the mean and standard deviation for data that is bellshaped and we use a five number summary for skewed data 0 The empirical rule states that for bell shaped curves approximately 0 68 of the values fall within 1 standard deviation of the mean 0 95 of the values fall within 2 standard deviations of the mean 0 997 of the values fall within 3 standard deviations of the mean Chapter 5 0 Descriptive statistics describe data using numerical summaries and graphical summaries o Inferential statistics use sample information to make conclusions about a large group of items or individuals than just those in the sample 0 Population is the entire group of individuals that we want information about about which inferences are to be made 0 Sample the smaller group the part of the population we actually examine in order to gather information Variable the characteristic of the items or individuals that we want to learn about 0 Fundamental rule for using data for inference is that available data can be used to make inferences about a much larger group if the data can be considered to be representative with regard to the question of interest 0 Bias 0 Selection bias occurs if the method for selecting the participants produces a sample that does not represent the population of interest 0 Nonparticipation nonresponse bias occurs when a representative sample is chosen for survey but a subset cannot be contacted or does not respond 0 Biased response occurs when participants respond different from how they truly feel I The way the questions are worded the way the interviewer behaves etc can lead to individuals providing false information Margin of error refers to how close that proportion comes to the truth for the entire population 0 Conservative margin of error is lxH Most inference methods require the data to be considered a random sample Independence means that the response you will obtain from one individual doesn t in uence the response you will get from another individual Identically distributed means all of the responses come from the same distribution Chapter 6 o Observational studies 0 The researchers simply observe or measure the participants and do not assign any treatments or conditions 0 Participants are not asked to do anything differently 0 Experiments 0 The researchers manipulate something and measure the effect of the manipulations on some outcome of interest I Participants are randomly assigned to different treatments 0 Explanatory variable is the variable we are interested in learning the effect of o It has its effect on the outcome or response variable 0 Confounding variable is a variable that affects the response variable and is related to the explanatory variable 0 Effect of a confounding variable cannot be separated from the effect of the explanatory variable 0 Might be measured and accounted for in analysis or could be lurking variables 0 Randomized experiments control the in uence of confounding variables Chapter 7 0 Probability is the proportion of times an event occurs 0 It applies to the population not the sample Two events are mutually exclusive if they do not contain any of the same outcomes so their intersection is empty Two events are independent if knowing that one will occur or has occurred does not change the probability that the other occurs 0 Mutually exclusive does not indicate independent A sample is drawn without replacement if individuals are returned to the eligible pool for each election 0 If they are not eligible for subsequent selection it is a sample drawn without replacement Chapter 8 0 Random variable assigns a number to each outcome of a random circumstance or a random variable assigns a number to each unit in the population 0 Discrete random variable can take one of a countable list of distinct values I Two conditions are that the sum of all individual probabilities must equal one and the individual probabilities must be between 0 and l 0 Continuous random variable can take any value in an interval or collection of intervals 0 Expected value of a random variable is the mean value of the variable in the sample space 0 Can be interpreted as the mean value that would be obtained from an in nite number of observations on the random variable 0 Binomial random variables count the number of times a certain event occurs out of a particular number of observations or trials of a random experiment 0 The conditions are I There are n trials where n is determined in advance and is not a random value I There are two possible outcomes on each trial success and failure I The outcomes are independent from one trial to the next I The probability of a success remains the same from one trial to the next I The probability of a failure remains lpsuccess for every trial 0 A binomial random variable is de ned as xmumber of successes in the n trials of a binomial experiment 0 Probability distribution of a continuous random variable is described by a density curve 0 The curve must lie on or above the horizontal axis 0 The area under the curve is equal to 1 Chapter 9 o The distribution of all possible values for a statistic for repeated samples of the same size from a population is called the sapling distribution of the statistic Chapter 10 o The sample estimate provides our best guess as to what is the value of the population parameter but is not 100 accurate The value of the sample estimate will vary from one sample to the next 0 Values often vary around the population parameter 0 Standard deviation gives an idea about how far the sample estimates tend to be from the true population proportion on average 0 Standard error of the sample estimate provides an idea of how far away it would tend to vary from the parameter value 0 The confidence level is the probability that the procedure that is used to determine the interval will provide an interval that includes the population parameter 0 Applies to the procedure not an individual interval Chapter 12 o The null hypothesis is a statement that there is nothing happening 0 Generally the researcher hopes to reject this
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'