Midterm 1 study guide
Midterm 1 study guide STAT 1350 Intro to Stats
Popular in Intro to Stats
Popular in Statistics
This 10 page Study Guide was uploaded by Katie Catipon on Thursday February 19, 2015. The Study Guide belongs to STAT 1350 Intro to Stats at Ohio State University taught by Alice Miller in Spring2015. Since its upload, it has received 1567 views. For similar materials see Intro to Stats in Statistics at Ohio State University.
Reviews for Midterm 1 study guide
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/19/15
CHAPTER 1 2 3 0 Individuals people animals things described by the set of data in a statistical study 0 Variables Any characteristic of an individual Can take on different values for different individuals can be numerical or other Response a variable that measures an outcomeresult of a study 0 Observational study observes individuals and measures variables of interest but does not intervene in order to in uence the responses The purpose is to describe some group or situation 0 Sample survey an important kind of observational study You study a group of individuals by studying only some of its members that represent the larger group 0 Population the entire group of individuals about which we want information in a statistical study 0 Sample part of the population from which we actually collect information and is used to draw conclusions about the whole Parameter a xed number that describes the population 0 Statistic is a number that describes a sample The value of a statistic chnages sample to sample We use a statistic to estimate an unknown parameter 0 Census a sample survey that attempts to include the entire population in the sample Expensive takes a long time difficult 0 Experiment deliberately imposes some treatment on individuals in order to observe their responses The purpose is to study whether the treatment causes a change in the response Bias the design is biased if it systematically favors certain outcomes Convenience sampling selection of whichever individuals are easiest to reach Often biased 0 Voluntary response sample the sample chooses itself by way of voluntary participation Writein or callin opinion polls are examples Usually people with strong opinions volunteer Often biased however is often ethical 0 Random sampling eliminates bias all individuals have equal chance to be chosen 0 Simple random sampling SRS starts with list of all individuals in the population Use random method to select n individuals to be in the sample Choosing an SRS 1 Assign numerical label to each individual in the population All labels must have the same number of digits if using a table of random digits string of digits 09 used to randomly select sample participants 2 Use random digits to select labels at random o Strati ed random sampling divide the sample frame into groups strata Take an individual SRS for each stratum and combine the results to make one sample Advantages allows us to draw separate conclusions about each stratum usually has a smaller margin of error the individuals in each strata are more alike than the population as a whole gt eliminates some variability in the sample Disadvantages not all individuals in a population may be given the same chance to be chosen Some strata may be deliberately overrepresented in the sample Bias amp Variability o Bias systematic deviation of the sample statistic from the population parameter To reduce bias use random sampling SRS produces unbiased estimates the values of a statistics computed from an SRS neither consistently overestimate nor consistently underestimate the value of the population parameter 0 Variability describes how spread out the values of the O 0 sample statistics are when we take many samples Large variability l the result of sampling is not repeatable The reduce variability use a larger sample You can make the variability as small as you want by taking a large enough sample Variability is unaffected by the size of the population as long as the population is at least 100 times larger than the sample Measured using IQR interquartile range see chapter 12 A good sampling method has both small bias and small variability Large random samples almost always give an estimate that is close to the truth 0 Margin of Error says how close the sample statistic is to the population parameter 0 quotMargin of error plus or minus two percentage pointsquot quotIf we took many samples using the same method we used to get this one sample 95 of the samples would give a result within plus or minus 2 percentage points of the truth about the population o Margin of error for 95 con dence is roughly equal to 1nwhere n sample size Ex sample size 1013 1square root of 1013 0031 31 Because n is always in the denominator larger samples have smaller margins of error To cut the margin of error in half we must use a sample four times as large 0 Con dence statements says what percentage of all possible samples satisfy the margin or error 0 O O O 95 con dent 2 percentage points of the sample statistic Conclusions of the con dence statement applies to the population not the sample Uses the sample result to say something about the population A sample survey can choose to use a con dence level other than 95 Higher con dence larger margin of error Lower con dence smaller margin of error To have a smaller margin of error with the same con dence level use a larger sample has less variability If con dence level is not speci ed assume it s 95 CHAPTER 4 TYPES OF ERRORS Sampling errors errors caused by the act of taking a sample They cause sample results to be different from the results of a census 0 Random sampling error the deviation between the sample statistic and the population parameter caused by chance in selecting a random sample The margin of error in a con dence statement includes only random sampling error Can be controlled by manipulating the size of the random sample 0 Use of bad sampling methods ex Voluntary response 0 Undercoverage occurs when some groups in the population are left out of the process of choosing a sample The sampling frame the list of individuals from which we will draw our sample at the start of a study tries to include every individual in a population however this is rarely available If the sampling frame leaves out certain classes of people even random sampling from that frame will be biased Ex using telephone directories as the frame for a telephone survey More than half the households in many large cities have unlisted numbers 0 Nonsampling errors errors that aren t involved with how the sample is chosen They can be present in a census 0 Processing errors mechanical mistakes when doing arithmetic or entering data into a computer 0 Response error when a subject gives an incorrect response Caused when subjects lie misremember misunderstand the question or make a guess so they don t look ignorant o Nonresponse most serious nonsampling error It is the failure to obtain data from an individual selected for a sample Mostly happens when subjects can t be contacted or refuse to cooperate Different groups have different rates of nonresponse refusals are higher in large cities and among the elderly l bias Tricks to reducing nonresponse Carefully trained interviewers to keep callers on the line 0 Calling back over longer time periods 0 Letters sent in advance help a Note these methods slow down the survey polls that want fast answers often for the media do not use them 0 Substitute other households for nonresponses if it comes to it Replacing them with households in the same neighborhoods or small cities may reduce bias 0 Question wording question wording may in uences answers if slanted to favor one response Wording can also cause confusion from misinterpretation 0 When nonsampling errors cause bias weight the responses to correct the bias Ex if many urban households did not respond more weight is given to the households that did respond If too many women are in the study more weight is given to men Note weighting reduces bias but increased variability This must be taken into account When doing margin of error CHAPTER 5 amp 6 TERMS Response variable DV a variable that measures the results of a study Explanatory variable IV a variable that explainscauses the change in the response variable Treatment any experimental condition applied to the subjects of the study Lurking variables a variable that an explanatory variable but has an effect on the experiment Confounding variables variables whose individual effects on the response variable can t be distinguished from each other They may be explanatory or lurking variables Placebo dummy treatment with no active ingredients l has no actual effect Placebo effect response to the placebo Control groups allow us to control lurking variables Can be a placebo group or other Nonadherers subjects who participate but don t follow the experimental treatment Can cause bias Dropouts subjects who begin a treatment but do not complete it Principals of experimental design 0 Control the effects of lurking variables on response variables by making sure that all individuals by the lurking variables in the same way and then comparing treatments 0 Randomize sample selection via impersonal chance so that treatment groups are similar and can be compared 0 Use enough subjects in each treatment group l reduces chance variation in results Statistical signi cance when there is an effect on the response variables of a size that wouldn t happen by chance DESIGNSEXPERIMENTS Doubleblind experiment when neither the subjects nor those conducting the study know which treatment actual or placebo was received by the subject Randomized comparative experiments experiment comparing 2 treatments to examine causeeffect relationship Randomly assign individuals to two different groups of subjects in each group should be similar Each group receives a different treatment Compare results of two treatment groups Completely randomized design all experimental subjects are assigned at random to all treatments Block design block refers to the group of subjects known to be similar in some respect that might affect the response to the treatments In this design randomly assigning subjects to treatments happens within each block 0 Matched pairs design compares just 2 treatments Choose pair of subjects that are as closely matched as possible in terms of a certain criteria Randomly assign a different treatment to each of the two subjects Sometimes one quotpair one subject who takes one treatment after the other in randomized order Each subject is their own control Observational studies are impressive if they compare matched groups measure as many lurking variables as possible l aow statistical adjustment l answer causeeffect questions CHAPTER 8 o Validity whether a measurement is relevantappropriate as a representation of a property 0 Rate fraction proportion percentage at which something occurs More valid of a measurement than a count of occurrences o Predictive validity a measurement has predictive validity when it can predict success on tasks that are related to the measured property Ex IQ test score for intelligence SAT score for college grades 0 Errors in measurement measured value true value bias random error 0 Measurement is biased when it systematically overstates or understates the true value of the measurement 0 Measurement has random error if repeating the measurement on the same individual results in different results If the random error is small the measurement is reliable I How do we know if the random error is small Variance Find the average of n measurements 0 Find the difference between each observation and the mean Square each of these differences 0 Average the squared differences by dividing their sum by n 1 This number is the variance Reliable measurement l small variance 0 Bias vs Reliability vs Validity o Bias tendency to overstateunderstate true value To lessen get a better measuring instrument 0 Reliability says that the result is repeatable Concerns random error and variance To improve take average of several measurements for the same individual rather than a single measure 0 Validity appropriateness of the measurement in terms of the property being measured CHAPTER 10 11 12 0 Distribution of a variable tells us what values the variable takes and how often 0 Categorical variables sort individuals into groupscategories To display use 0 Pie chart 0 Bar graph All bars should have the same width 0 Quantitative variables have numerical values so we can perform arithmetic operations on them To display use 0 Line graph Plots observations in terms of time when they were measured Time goes on the horizontal axis Data points connect by lines Trend longterm upward or downward movement over time Deviations sharp increasesdecreases from overall pattern Seasonal variation regular pattern of change that repeats each year Seasonal adjustment expected seasonal variation is removed from data before publishing 0 Histograms most common graph Classes should have equal widths like bar graph Difference from bar graph xaxis has continuous numberical scale To describe distribution 0 Center midpoint of the distribution Median or mean measurement Can also be mode which is the highest peakmost common numbers Spread aka variability lnterquartile range or standard deviation 0 Shape o Symmetrical both sides are mirror images Skewed right the tail extends to the right The bulk of the values is on the left Skewed left opposite o Unimodal bimodal multimodal Outliers an individual observation that is far outside the overall pattern of the graph To nd outliers o Create range Ql15lQR Q315lQR Note IQR interquartie range measures variability Large IQR large variability On a box plot it is the Width of the box It s calculated by 0301 0 Sort the data given numerically Find median Ql Q3 IQR 0 Plug 5 into range All numbers within the range are not outliers Median midpoint in the values of the observations Note when n is even median lies between two values take the average of the two values to nd the median Ql and Q3 midpoints between the min and the median and max and the median Note when calculating quartile values do not include the overall median Five number summary consists of o Smallest observation min 0 Ql 0 Median 0 Q3 0 Largest observation max Boxplot graphs the ve number summary Central box spans the quartiles Line in center of box marks median Lines extending from the box show min and max If the median is closer to Ql l skewed right Closer to Q3 l skewed left 0 Boxplots vs Histograms Boxplots show symmetry vs skewness center median and spread IQR and are useful for making comparisons among groups They do not show if distribution is unimodal bimodal etc Nor the size of the data or the frequency of where values fall Mean xbar average of the observations sumn Standard deviation average distance from the mean As distance increased l 5 becomes larger 0 To nd Find the variance Take the square root of the variance 0 S 0 only when there is no spread AKA all observations have the exact same value Medianquartiles vs MeanSD o MeanSD l strongly affected by outliers or longtailed skewed distribution Better for reasonably symmetric distributions without outliers o Medianquartiles l
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'