bio exam 1 study plan
Popular in Course
verified elite notetaker
verified elite notetaker
verified elite notetaker
verified elite notetaker
PSY 101 Jonathon Weaver- Introduction to Psychology
verified elite notetaker
verified elite notetaker
Popular in Department
This 24 page Study Guide was uploaded by Katie Catipon on Tuesday February 10, 2015. The Study Guide belongs to a course at Ohio State University taught by a professor in Fall. Since its upload, it has received 16 views.
Reviews for bio exam 1 study plan
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/10/15
CHAPTER 1 2 3 0 Individuals people animals things described by the set of data in a statistical study 0 Variables Any characteristic of an individual Can take on different values for different individuals can be numerical or other Response a variable that measures an outcomeresult of a study 0 Observational study observes individuals and measures variables of interest but does not intervene in order to in uence the responses The purpose is to describe some group or situation 0 Sample survey an important kind of observational study You study a group of individuals by studying only some of its members that represent the larger group 0 Population the entire group of individuals about which we want information in a statistical study 0 Sample part of the population from which we actually collect information and is used to draw conclusions about the whole Parameter a xed number that describes the population 0 Statistic is a number that describes a sample The value of a statistic chnages sample to sample We use a statistic to estimate an unknown parameter 0 Census a sample survey that attempts to include the entire population in the sample Expensive takes a long time difficult 0 Experiment deliberately imposes some treatment on individuals in order to observe their responses The purpose is to study whether the treatment causes a change in the response Bias the design is biased if it systematically favors certain outcomes Convenience sampling selection of whichever individuals are easiest to reach Often biased 0 Voluntary response sample the sample chooses itself by way of voluntary participation Writein or callin opinion polls are examples Usually people with strong opinions volunteer Often biased however is often ethical 0 Random sampling eliminates bias all individuals have equal chance to be chosen 0 Simple random sampling SRS starts with list of all individuals in the population Use random method to select n individuals to be in the sample Choosing an SRS 1 Assign numerical label to each individual in the population All labels must have the same number of digits if using a table of random digits string of digits 09 used to randomly select sample participants 2 Use random digits to select labels at random o Strati ed random sampling divide the sample frame into groups strata Take an individual SRS for each stratum and combine the results to make one sample Advantages allows us to draw separate conclusions about each stratum usually has a smaller margin of error the individuals in each strata are more alike than the population as a whole gt eliminates some variability in the sample Disadvantages not all individuals in a population may be given the same chance to be chosen Some strata may be deliberately overrepresented in the sample Bias amp Variability o Bias systematic deviation of the sample statistic from the population parameter To reduce bias use random sampling SRS produces unbiased estimates the values of a statistics computed from an SRS neither consistently overestimate nor consistently underestimate the value of the population parameter 0 Variability describes how spread out the values of the O 0 sample statistics are when we take many samples Large variability l the result of sampling is not repeatable The reduce variability use a larger sample You can make the variability as small as you want by taking a large enough sample Variability is unaffected by the size of the population as long as the population is at least 100 times larger than the sample Measured using IQR interquartile range see chapter 12 A good sampling method has both small bias and small variability Large random samples almost always give an estimate that is close to the truth 0 Margin of Error says how close the sample statistic is to the population parameter 0 quotMargin of error plus or minus two percentage pointsquot quotIf we took many samples using the same method we used to get this one sample 95 of the samples would give a result within plus or minus 2 percentage points of the truth about the population o Margin of error for 95 con dence is roughly equal to 1nwhere n sample size Ex sample size 1013 1square root of 1013 0031 31 Because n is always in the denominator larger samples have smaller margins of error To cut the margin of error in half we must use a sample four times as large 0 Con dence statements says what percentage of all possible samples satisfy the margin or error 0 O O O 95 con dent 2 percentage points of the sample statistic Conclusions of the con dence statement applies to the population not the sample Uses the sample result to say something about the population A sample survey can choose to use a con dence level other than 95 Higher con dence larger margin of error Lower con dence smaller margin of error To have a smaller margin of error with the same con dence level use a larger sample has less variability If con dence level is not speci ed assume it s 95 CHAPTER 4 TYPES OF ERRORS Sampling errors errors caused by the act of taking a sample They cause sample results to be different from the results of a census 0 Random sampling error the deviation between the sample statistic and the population parameter caused by chance in selecting a random sample The margin of error in a con dence statement includes only random sampling error Can be controlled by manipulating the size of the random sample 0 Use of bad sampling methods ex Voluntary response 0 Undercoverage occurs when some groups in the population are left out of the process of choosing a sample The sampling frame the list of individuals from which we will draw our sample at the start of a study tries to include every individual in a population however this is rarely available If the sampling frame leaves out certain classes of people even random sampling from that frame will be biased Ex using telephone directories as the frame for a telephone survey More than half the households in many large cities have unlisted numbers 0 Nonsampling errors errors that aren t involved with how the sample is chosen They can be present in a census 0 Processing errors mechanical mistakes when doing arithmetic or entering data into a computer 0 Response error when a subject gives an incorrect response Caused when subjects lie misremember misunderstand the question or make a guess so they don t look ignorant o Nonresponse most serious nonsampling error It is the failure to obtain data from an individual selected for a sample Mostly happens when subjects can t be contacted or refuse to cooperate Different groups have different rates of nonresponse refusals are higher in large cities and among the elderly l bias Tricks to reducing nonresponse Carefully trained interviewers to keep callers on the line 0 Calling back over longer time periods 0 Letters sent in advance help a Note these methods slow down the survey polls that want fast answers often for the media do not use them 0 Substitute other households for nonresponses if it comes to it Replacing them with households in the same neighborhoods or small cities may reduce bias 0 Question wording question wording may in uences answers if slanted to favor one response Wording can also cause confusion from misinterpretation 0 When nonsampling errors cause bias weight the responses to correct the bias Ex if many urban households did not respond more weight is given to the households that did respond If too many women are in the study more weight is given to men Note weighting reduces bias but increased variability This must be taken into account When doing margin of error CHAPTER 5 amp 6 TERMS Response variable DV a variable that measures the results of a study Explanatory variable IV a variable that explainscauses the change in the response variable Treatment any experimental condition applied to the subjects of the study Lurking variables a variable that an explanatory variable but has an effect on the experiment Confounding variables variables whose individual effects on the response variable can t be distinguished from each other They may be explanatory or lurking variables Placebo dummy treatment with no active ingredients l has no actual effect Placebo effect response to the placebo Control groups allow us to control lurking variables Can be a placebo group or other Nonadherers subjects who participate but don t follow the experimental treatment Can cause bias Dropouts subjects who begin a treatment but do not complete it Principals of experimental design 0 Control the effects of lurking variables on response variables by making sure that all individuals by the lurking variables in the same way and then comparing treatments 0 Randomize sample selection via impersonal chance so that treatment groups are similar and can be compared 0 Use enough subjects in each treatment group l reduces chance variation in results Statistical signi cance when there is an effect on the response variables of a size that wouldn t happen by chance DESIGNSEXPERIMENTS Doubleblind experiment when neither the subjects nor those conducting the study know which treatment actual or placebo was received by the subject Randomized comparative experiments experiment comparing 2 treatments to examine causeeffect relationship Randomly assign individuals to two different groups of subjects in each group should be similar Each group receives a different treatment Compare results of two treatment groups Completely randomized design all experimental subjects are assigned at random to all treatments Block design block refers to the group of subjects known to be similar in some respect that might affect the response to the treatments In this design randomly assigning subjects to treatments happens within each block 0 Matched pairs design compares just 2 treatments Choose pair of subjects that are as closely matched as possible in terms of a certain criteria Randomly assign a different treatment to each of the two subjects Sometimes one quotpair one subject who takes one treatment after the other in randomized order Each subject is their own control Observational studies are impressive if they compare matched groups measure as many lurking variables as possible l aow statistical adjustment l answer causeeffect questions CHAPTER 8 o Validity whether a measurement is relevantappropriate as a representation of a property 0 Rate fraction proportion percentage at which something occurs More valid of a measurement than a count of occurrences o Predictive validity a measurement has predictive validity when it can predict success on tasks that are related to the measured property Ex IQ test score for intelligence SAT score for college grades 0 Errors in measurement measured value true value bias random error 0 Measurement is biased when it systematically overstates or understates the true value of the measurement 0 Measurement has random error if repeating the measurement on the same individual results in different results If the random error is small the measurement is reliable I How do we know if the random error is small Variance Find the average of n measurements 0 Find the difference between each observation and the mean Square each of these differences 0 Average the squared differences by dividing their sum by n 1 This number is the variance Reliable measurement l small variance 0 Bias vs Reliability vs Validity o Bias tendency to overstateunderstate true value To lessen get a better measuring instrument 0 Reliability says that the result is repeatable Concerns random error and variance To improve take average of several measurements for the same individual rather than a single measure 0 Validity appropriateness of the measurement in terms of the property being measured CHAPTER 10 11 12 0 Distribution of a variable tells us what values the variable takes and how often 0 Categorical variables sort individuals into groupscategories To display use 0 Pie chart 0 Bar graph All bars should have the same width 0 Quantitative variables have numerical values so we can perform arithmetic operations on them To display use 0 Line graph Plots observations in terms of time when they were measured Time goes on the horizontal axis Data points connect by lines Trend longterm upward or downward movement over time Deviations sharp increasesdecreases from overall pattern Seasonal variation regular pattern of change that repeats each year Seasonal adjustment expected seasonal variation is removed from data before publishing 0 Histograms most common graph Classes should have equal widths like bar graph Difference from bar graph xaxis has continuous numberical scale To describe distribution 0 Center midpoint of the distribution Median or mean measurement Can also be mode which is the highest peakmost common numbers Spread aka variability lnterquartile range or standard deviation 0 Shape o Symmetrical both sides are mirror images Skewed right the tail extends to the right The bulk of the values is on the left Skewed left opposite o Unimodal bimodal multimodal Outliers an individual observation that is far outside the overall pattern of the graph To nd outliers o Create range Ql15lQR Q315lQR Note IQR interquartie range measures variability Large IQR large variability On a box plot it is the Width of the box It s calculated by 0301 0 Sort the data given numerically Find median Ql Q3 IQR 0 Plug 5 into range All numbers within the range are not outliers Median midpoint in the values of the observations Note when n is even median lies between two values take the average of the two values to nd the median Ql and Q3 midpoints between the min and the median and max and the median Note when calculating quartile values do not include the overall median Five number summary consists of o Smallest observation min 0 Ql 0 Median 0 Q3 0 Largest observation max Boxplot graphs the ve number summary Central box spans the quartiles Line in center of box marks median Lines extending from the box show min and max If the median is closer to Ql l skewed right Closer to Q3 l skewed left 0 Boxplots vs Histograms Boxplots show symmetry vs skewness center median and spread IQR and are useful for making comparisons among groups They do not show if distribution is unimodal bimodal etc Nor the size of the data or the frequency of where values fall Mean xbar average of the observations sumn Standard deviation average distance from the mean As distance increased l 5 becomes larger 0 To nd Find the variance Take the square root of the variance 0 S 0 only when there is no spread AKA all observations have the exact same value Medianquartiles vs MeanSD o MeanSD l strongly affected by outliers or longtailed skewed distribution Better for reasonably symmetric distributions without outliers o Medianquartiles l CHAPTER 13 Density curves Formed by smoothing out edges of a histogram Show the proportion of observations in any region by areas under the curve vs the counts of observations via heights amp areas of bars in a histogram Most useful for describing large numbers of observations 0 Choose a scale so that the total area under the curve exactly 1 0 Median of density curve divides the area under the curve into halves Quartiles dived the curve into quarters 0 Mean of density curve the point at which the curve would balance if made of solid material Symmetric curve this point is the center Mean of skewed distribution is pulled towards the long tail away from the median Note in a symmetric curve median and mean are the same value center Normal curves Symmetric mean amp median peak singlepeaked bellshaped Tails fall of quickly so do not expect outliers Described using mean u and standard deviation 0 0 Standard deviation eye of the curve Points at which curvature occurs located one standard deviation on either side of the meanmedian Mean xes center standard deviation xes shape l changing mean does not change shape only location on axis However changing standard deviation does change shape 6895997 Rule the quotEmpirical Rulequot in any normal distribution approx 0 68 of observations fall within 1 standard deviation of the mean 0 95 of observations fall within 2 standard deviations of the mean 0 997 of observations fall within three standard deviations of the mean 0 100 total are under the curve Standard scores 2 observations expressed in standard deviations above or below the mean of a distribution Zscore observation mean standard deviation Zscore translates directly into percentiles If you re looking for the percent of observations belowthe Zscore the answer the percentile If you re looking for the above the Zscore the answer 100the percentile Negative Zscore l percentile lt 50 Zscore zero l 50 0 Positive Zscore l percentile gt50 Solving Problems 0 Looking for a percent l forward problem To solve know observed value Calculate zscore Use table to convert into 0 Looking for observed values l backward problem To solve know Use table to convert to zscore Plug into equation to solve for observed value CHAPTER 14 amp 15 Scatterplots most common way to display the relationship between two quantitative variables measured on the same individuals Each individual in the study appears as a point in the plot The point is determined by both variables for that one individual How to examine a scatterplot look for the overall pattern described by form direction and strength and striking deviations from the pattern outliers 0 Form linear Nonlinear No obvious pattern 0 Direction positive or negative No association 0 Strength strongmoderateweak pattern Linear straightline relations simple and common It is strong if the points all lie close to the line and weak if they are widely scattered about the line Association direction relationship between x and y causation Association As x explanatoryindependent variable increases so does y responsedependent variable The slope moves upwards from left to right 0 Association As x increases y decreases Plot slopes downward left to right Correlation r numerical description of the direction and strength of a linear relationship between x and yquotv v vv v unus lylv rznilzli EX The symbol 2 called sigma means add them all up X and y are both quantitative r positive association r negative association ralways falls between 1 and 1 Near 0 indicate weak straightline relationship r 1 or 1 only occur when the points lie exactly on the straight line 0 Correlation is not affected by a change in units of measurement 0 Correlation ignores distinction between variables changing which is labeled xand ydoes not affect it 0 Correlation is strongly affected by outliers o Is only for straightline relationships 0 Has no units of its own just a number Category of strength based on absolute value of correlation 0002 Very weak to negligible correlation 0204 Weak low correlation 0407 Moderate correlation 0709 Strong high correlation 0910 Very strong correlation Regression line a straight line that describes how a response variable y changes with the explanatory variable X Often used to predict the value of y for given value of X Leastsquares regression line the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible Regression equation y a bx yintercept slopex To nd 1 Create a scatterplot and describe the form direction and strength of the relationship Look for outliers Remember you must have a fairly linear form to use correlation and regression 2 Compute the correlation coefficient r 3 Obtain the equation of the regression line y a bx yintercept slopex Note about slope the bigger the absolute value of it the steeper the line And slope and correlation always have the same sign Prediction Plug new values of x into the regression line to predict new y values Prediction outside the range of data extrapolation and leads to false predictions it s bad Regression in terms of correlation Correlation measures direction amp strength of a straightline relationship Regression draws line to describe this relationship 0 Both are strongly affected by outliers Usefulness of regression line for prediction depends on strength of association aka correlation between the variables 0 The square of correlation Rquot2 percent of variation in the y variable that is explained by the regression line Find the correlation square it then multiply it by 100 Is always between 0 and 100 closer to 100 stronger the linear relationship between x and y Causation Strong relationships between variables does not necessitate a cause effect relationship 0 Relationships between variables are often in uenced by lurking variables 0 Best evidence of causation comes from randomized comparative experiments 0 Even when direct causation is present it is rarely the complete explanation for the variables relationship Three types of causation note two or more of these may happen simultaneously 0 O 0 Direct causation Common response Confounding Observed relationships can be used to make predictions without worry of causation as long as the patterns continue to hold true in data Establishing causation without an experiment Criteria 0 O O 0 Association between variables is strong Association is consistent throughout many studies reduces effects of lurking variables Higher doses l stronger responses Alleged cause comes before effect chronologically Alleged cause is plausible CHAPTER 17 amp 18 Chance random behavior in the short run unpredictable In the long run regular and predictable pattern 0 Random if individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions Probability any outcome of a random phenomenon is a number between 0 and 1 that describes the proportion of times the outcome would occur in a very long series of repetitions 0 Probability 05 means occurs half the time in a very large number of trails Outcome with probability 0 never occurs 0 Outcome with probability 1 happens on every repetition quotLaw of averagesquot aka Law of large numbers Averages or proportions are likely to be more stable when there are more trials while sums or counts are likely to be more variable This does not happen by compensation for a bad run of luck since independent trials have no memory Note stable the average or proportion is likely to be closer and closer to the true population value Unstable the sumcount tends to vary father and father from the true value Personal Probability a number between 0 and 1 that expresses and individual s judgment of how likely the outcome is Advantage aren t limited to repeatable settings Useful because we base decisions on them They are opinions thus cannot be said to be right or wrong Note Probability in terms of quotpersonal judgment of how likelyquot and probability in terms of quotwhat happens in many repetitionsquot are two completely different ideas They are not two explanations of the same thing Probability model in terms of a random phenomenon describes all the possible outcomes and says how to assign probabilities to any collection of outcomes We sometimes call the collection of outcomes an event Rules that all probability models must obey 1 Any probability is a number between 0 and 1 because any proportion is a number between 0 and 1 2 All possible outcomes together must equal probability 1 3 The probability that an event does not occur 1 the probability that the event does occur 4 If two events have no outcomes in common the probability that one or the other occurs is the sum of their individual probabilities A probability is incoherent when it does not follow rules 1 amp 2 They don t go together in a way that makes sense Sampling distribution shows us how sample statistics can vary It is a probability model in that it assigns probabilities to the values the sample statistic can take In other words it is a collection of the values the sample statistic can take and how often it takes those values if we repeatedly sample from the same population Because there are usually many possible values sampling distributions are often described by a density curve such as a Normal curve In other words we can apply the rules of Normal distributions to answer questions about sample statistics Describing sampling distributions by shape center and spread variability o In certain conditions the sampling distribution will have normal shape 00gt an apply rules of normal distributions see above 0 Sampling distribution is centered where the population is centered Sampling distribution is less spread out or less variable than the population CHAPTER 21 Sample proportion p the statistic that estimates the parameter n sample size Distribution of a statistic tells what values the statistic takes on and how often it would take that value if we took a lot of samples Sampling distribution the distribution of a statistic Sampling distributions of proportions and means follow the normal curve pattern Sampling distribution of the sample proportion When sample size is large enough 0 Shape sampling distribution is approximately normal 0 Center mean of sampling distribution is equal to population parameter p 0 Spread standard deviation of sampling of sampling distribution Jpn p 0 1 15 I l if p is not known n p sample proportion p population proportion n sample size Sampling distribution of the sample mean When sample size is large enough 0 Shape sampling distribution is approximately normal 0 Center mean of sampling distribution is equal to population parameter 1 Spread standard deviation of sampling of sampling distribution i or i l2 l2 7 sample mean u population mean 5 sample standard deviation 0 population standard deviation n sample size What makes a sample size quotlarge enoughquot Central limit theory Sample Mean SD Mean n 2 25 ifthe u population 0 population is mean E unknown n 2 15 if population is normalsample is normal Proportion np 25 P000Uati0n p1 p n1p 25 proportion n Or np 2 15 n1p 2 15 Statistical inference draws conclusions about a population parameter based on the data from a sample statistic Con dence interval takes the sample statistic and calculates in random variability via margin of error Con dence interval sample statistic i MoE Thus sample statistic determines center of con dence interval MoE determines width identi ed by the i sign MoE z z is determined by the con dence you wish to n make C 90 95 99 2 1645 1960 2576 Thus the equation for a con dence interval is p i z p1p n S Or for means x i z T n 95 con dence interval an interval calculated from sample statistics by a process that is guaranteed to capture the true pooulation parameter in 95 NOT 100 of samples It is an estimate and takes into account random sampling variability Note we cannot say that we know the true population parameter unless we take a census Con dence statements 0 Say how accurate our conclusions about the population are 0 Applies to the population not the sample 0 Larger con dence level wider con dence intervalSmaer con dence lever narrower con dence interval 0 Larger sample size n narrower con dence interval assuming the same con dence level Proportion vs Mean Proportions Categorical data from each individual in a sample only 2 answer choices Determine percentageproportion of individuals who choose each choicebelong to each category p sample proportion p population proportion n sample size total sam ple i sample proportion r3 countof successesEthesample a Means Quantitative data from each individual is obtained Find the average of all the data 7 sample mean u population mean 5 sample standard deviation ozz population standard deviation n sample size Remember central limit theory if you know that the population is normal than sample mean 7 is normal If you do not know if the population is normal an SRS wn 2 25 means sample mean 7 is normally distributed SEE ABOVE CHART Cautions about con dence intervals 0 Need an SRS Data must be collected correctly no bias Outliers are BAD CHAPTER 22 Con dence intervals allow us to estimate the population parameter Hypothesis tests help us answer speci c questions about the population parameter or to test a claim about a population parameter STEPS in hypothesis testing 1 Determine null hypothesis Ho and alternative hypothesis Ha 2 Collect data summarize the data with a statistic and consider what the sampling distribution looks like nd mean and standard deviation map a normal curve l look at where p falls on the curve 3 Determine how unlikely your result would be if the null hypothesis were true This involves calculating a probability we call the pvalue 4 Reach a conclusion about the null hypothesis based on vaue Null Hypothesis Ho the claim being tested about the population parameter Usually a statement of quotno effectquot quotno differencequot quotno relationshipquot or quotthe status quoquot Written with equals sign in terms of population parameter p or u Alternative hypothesis Ha the statement we hope or suspect is true instead of H0 Written with inequality sign in terms of population parameter I0 0r 11 How to write hypotheses Ho Ho mean Ha proportion Ha mean proportion Onesided Ho p Ho u Ha pgtz Ha ugtz Ha pltZ Ha ultZ Twosided quot quot Ha pi Ha ui How to judge if you can reject a null hypothesis convert sample proportion psample mean u to zscore then called quottest statisticquot Look at Table B to determine the probability of getting this proportion or a larger one pvalue observation mean Standard score Zscore standard deviation Proportions Means Observation p sample Observation 7 sample proportion mean Mean p0 from Ho Mean u o from Ho Standard deviation o Standard deviation or p01 p0 l n i Z PO i MO zscore p01 p0 zscore o n n Pvalue probability assuming H0 is true that the sample outcome would be as extreme or more extreme than the actually observed outcome Smaller p value l stronger evidence against Ho To calculate a twosided pvalue Compute test statistic zscore o If zscore is nd percentile above zscore If it is nd percentile below it 0 Double that percentile Signi cance level x usually use 005 unless the problem says otherwise Pvalue 0 l do not reject Ho quotWe do not have enough evidence to say thatHaquot Pvalue to l reject Ho quotWe have evidence thatHaquot DO NOT SAY prove or accept say quotfail to reject thatquot CHAPTER 23 Hypothesis testing caveats Large sample caution signi cant results based on large samples may not be of practical signi cance A large sample size makes the signi cance test more sensitive so it s easier to obtain a small pvalue and to reject the null hypothesis 0 Small sample caution results that are not signi cant in small samples may still be of practical signi cance A small sample size can make it harder to obtain a small pvalue even when there is an important difference that exists in the population Con dence interval caveats High con dence is not free Larger level of con dence means wider interval larger MoE which is less precise than a narrow interval Note Larger sample sizes give narrower intervals so it raise the level of con dence and keep a narrow interval we should increase sample size
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'