INTR STATISTCL REASONING
INTR STATISTCL REASONING STAT 110
Popular in Course
Popular in Statistics
This 17 page Class Notes was uploaded by Shane Marks on Monday October 26, 2015. The Class Notes belongs to STAT 110 at University of South Carolina - Columbia taught by B. Habing in Fall. Since its upload, it has received 10 views. For similar materials see /class/229662/stat-110-university-of-south-carolina-columbia in Statistics at University of South Carolina - Columbia.
Reviews for INTR STATISTCL REASONING
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 10/26/15
STAT 1 10 Chapter 6 De nitions single blind 7 an experiment is single blind if the subjects are unaware of the exact treatment being imposed on them 9 controls for subject bias double blind 7 an experiment is double blind if the subjects and the experimenter are unaware of the exact treatment being imposed 9 controls for subject and experimenter bias nonadherers 7 subjects who participate but do not follow the experimental treatment refusals 7 some subjects that we want in our study may refuse to participate dropouts 7 subjects may start in the study and later dropout 9 especially true for experiments that last over an extended period of time completely randomized 7 all the experimental subjects are allocated at random among all the treatments matched pairs design Subjects are matched to form pairs or each subject receives both treatments Randomization occurs within each pair block 7 a group of experimental subjects that are known before the experiment to be similar in some way that is expected to affect the response to the treatments block design 7 the random assignment of subjects to treatments is carried out separately within each block STAT 110 Chapter 15 Definitions regression line 7 a straight line that describes how a response variable y changes as an explanatory variable X changes The equation ofa line is y mX b m is the slope of the line slope the amount by which y changes when X increases one unit b is the intercept of the line intercept the value of y when X0 Three Things to Understand about Prediction Prediction is based on tting some model to a set of data Prediction works best when the model ts the data closely Prediction outside the range of the available data is risky r2 the fraction of the variation in the values of y that is explained by the leastsquares regression ofy on X What are the criteria for giving evidence about causation when we can t do an eXperiment 1 Strong association Consistent association Higher doses associated with stronger responses Alleged cause precedes the effect in time Alleged cause is plausible waxww STAT 110 Section 13 Chapter 8 amp 10 De nitions measure 7 assign a number to represent a property of a person or thing instrument 7 device used to make a measurement units 7 used to record the measurements feet pounds inches gallons etc variable 7 the result of a measurement that takes different values for people or things that differ in whatever we re measuring valid 7 a variable is a valid measure of a property if it is relevant or appropriate as a representation of that property bias 7 systematic deviation from the true value of the property random error 7 repeated measurements on the same individual give different results reliable 7 when random error is small measured value true value bias random error categorical variable 7 places an individual into one of several groups or categories quantitative variable 7 takes numerical values for which arithmetic operations make sense operations adding averaging etc frequency 7 the number of times a value occurs in the data relative frequency 7 for a value the proportion fraction or percent of all observations that have that value distribution 7 tells us what values a variable takes and how often in takes those values roundoff error 7 error introduced as we do arithmetic STAT 110 Chapter 4 Definitions part 1 Population size does not affect con dence interval width signi cantly as long as the population is at least 100 times larger than the sample sampling errors 7 errors caused by the act of taking a sample 9 They cause sample results to be different from the results of a census random sampling error 7 results from chance selection in the simple random sample sampling frame 7 a list of individuals from which we will draw our sample 9 should list every individual in the population undercoverage 7 occurs when some groups in the population are left out of the process of choosing the sample nonsampling errors 7 errors not related to the act of selecting a sample from the population 9 can even be present in a census STAT 110 Chapter 4 Definitions Population size does not affect con dence interval width signi cantly as long as the population is at least 100 times larger than the sample sampling errors 7 errors caused by the act of taking a sample 9 They cause sample results to be different from the results of a census random sampling error 7 results from chance selection in the simple random sample sampling frame 7 a list of individuals from which we will draw our sample 9 should list every individual in the population undercoverage 7 occurs when some groups in the population are left out of the process of choosing the sample nonsampling errors 7 errors not related to the act of selecting a sample from the population 9 can even be present in a census Multistage sample 7 used to select a sample in stages from a very large population where certain groups and subgroups are available Cluster Sample Divide population into clusters Select one or more clusters and include everyone in those clusters in the sample Systematic Sample Take every nth item from the sampling frame Strati ed Random Sample 7 Step 1 7 Divide the sampling frame into groups of individuals called strata The strata are chosen using some characteristic of the individuals already known and of special interest Examples 7 race gender location Step 2 7 Take a separate SRS in each stratum and combine these to make up the strati ed random sample Probability Sample 7 a sample chosen by chance A sample chosen in such a way that we know what samples are possible and what chance or probability each possible sample has to be chosen not all need be equally probable STAT 110 Chapter 12 Definitions median M 7 the midpoint of a distribution the number such that half the observations are smaller and the other half are larger To find the median of a distribution 1 Arrange all observations in order from smallest to largest 2 Is the number of observations odd or even Q1 7 the point that is onequarter up the ordered list of observations Q3 7 the point that is threequarters up the ordered list of observations Five number summary minimum Ql median Q3 and maximum Boxplot 7 a graph of the venumber summary Innerquartile Range the distance between the rst and third quartiles The 15 X IQR Criterion for Outliers 7 Call an observation and outler if it falls more than 15 X IQR above the third quartile of below the rst quartile sum of the observations mean 7 the average of a set of observatlons x sample s12e standard deviation 7 s measures the average distance of the observations from their mean STAT 110 Chapter 3 Definitions parameter 7 a number that describes the population 9 a parameter is a xed number but in practice we don t know it s value statistic 7 a number that describes a sample 9 the value of a statistic is known when we have taken a sample but it can change from sample to sample 9 we often use a statistic to estimate an unknown population parameter bias 7 consistent repeated deviation of the sample statistic from the population parameter in the same direction when we take many samples 9 closeness to the truth on average variability 7 describes how spread out the values of the sample statistic are when we take many samples 9 scatter around the truth plus bias Margin of error plus or minus three percentage points is shorthand for this statement If we took many samples using the same method we used to get this one sample 95 of the samples would give a result within plus or minus 3 percentage points of the truth about the population We say we are 95 con dent that the true value of the parameter lies within the margin of error con dence statement 7 made up of a margin or error and a level of con dence The margin of error measures how close the sample statistic lies to the population parameter The level of con dence says what percent of all possible samples satisfy the margin of error STAT 110 Chapter 22 De nitions The claim or assumption being tested is called the null hypothesis such as H0 p 050 The null hypothesis is the status quo It has the sign The statement we are looking for evidence of is called the alternative hypothesis Three possible alternate hypotheses for the above null hypothesis are 1 Ha p lt 050 or 2 Ha pgt050 or 3 Ha p is not equal to 050 The alternative hypothesis is the experimental hypothesis The p Value is the probability that we would see a statistic at least as extreme as the one observed if the null hypothesis was true a is the signi cance level It is how rare something needs to be before we say it is not likely to happen just by chance It is the probability we are willing to risk that we say H0 is false when it is really true If the pvalue is less than a then we have statistically signi cant evidence against the null hypothesis General Procedure 1 Set up hypothesis statement 2 Set Level of Signi cance a 3 Gather the Data 4 Calculate the pvalue 5 Draw your conclusion Possible Conclusions We have signi cant evidence against the null hypothesis in favor of the alternative hypothesis pvalue S a we reject H0 0r We do not have signi cant evidence against the null hypothesis pvalue gt a we fail to reject H0 STAT l 10 Chapter 14 Definitions xaxis is typically the one doing the explaining 7 the explanatory predictor or independent variable yaxis is typically the one being explained 7 called the response or dependent variable positively associated 7 when aboveaverage values and belowaverage values tend to occur together 9 scatterplot slopes upward as you move from left to right negatively associated 7 when aboveaverage values of one variable tend to accompany belowaverage values of the other 9 scatterplot slopes downward as you move from left to right correlation 7 r describes the direction and strength of a straightline relationship between two quantitative variables sign indicates negative or positive association r 0 9 no linear association r 1 or r 1 9 perfect straight line V has no units and won t change if we change the units of measurement r ignores the distinction between explanatory and response variables r is strongly affected by outliers STAT 110 Chapter 1 Definitions individuals 7 the objects described by a set of data variable 7 any characteristic of an individual observational study 7 observes individuals and measures variables of interest but does not attempt to in uence the responses 9 the purpose of this type of study is to describe some group or situation sample survey 7 a type of observational survey in which a sample is selected and asked to respond to questions population 7 the entire group of individuals about which we want information sample 7 a part of the population from which we actually collect information used to draw conclusions about the whole census 7 a sample survey that attempts to include the entire population in the sample experiments 7 deliberately imposes some treatment on individuals in order to observe their responses 9 the purpose of an experiment is to study whether the treatment causes a change in the response Chapter 2 Definitions biased 7 systematically favors certain outcomes convenience sampling 7 selection of whichever individuals are easiest to reach voluntary response sample 7 chooses itself by responding to a general appeal simple random sample of size n 7 consists of 71 individuals from the population chosen in such a way that every set of 71 individuals has an equal chance to be the sample actually selected