Midterm Study Guide
Midterm Study Guide AMS 5
Popular in Statistics
verified elite notetaker
Popular in Applied Mathematics
This 6 page Study Guide was uploaded by Sandy Nguyen on Tuesday October 27, 2015. The Study Guide belongs to AMS 5 at University of California - Santa Cruz taught by Prof. Bruno Mendes in Fall 2015. Since its upload, it has received 186 views. For similar materials see Statistics in Applied Mathematics at University of California - Santa Cruz.
Reviews for Midterm Study Guide
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/27/15
Midterm Study Guide Monday October 19 2015 448 PM Contents Chapters 1 through 12 amp Chapter 19 C H A P T E R O N E Controlled Experiments Randomized Controlled Experiments Puts subjects into treatment or control group at random to ensure that the two groups are similar 1 Treatment group treated test subjects 2 Control group untreated test subjects gt When possible control group should receive a placebo neutral but resembles the treatment Experiments should be run doubleblind neither subjects nor evaluators know who are in the treatment group and who are in the control group C H A P T E R T W O Observational Studies Observational Studies Different from controlled experiments Subjects assign themselves to the different groups amp investigators just observe what happens Treatment group some of the subjects having the condition whose effects are being studied gt Other subjects are the control Confounding Factors Hidden confounding factors are a major problem in observational studies Differences between the treatment group amp the control group aside from the treatment Affects the responses Can sometimes be controlled for by comparing smaller groups which are relatively homogenous with respect to the factor Example Controls nonsmokers Treatmentzsmokers Association One thing linked to another established by observational studies Association DOES NOT prove causation Crosssectional study different subjects are compared to each other at one point in time Longitudinal study subjects are observed over time and compared with themselves at different times gt Better than crosssectional CHAPTER NINETEEN Sample Surveys Parameters Sample Numerical fact about a population which can only Part of a population used to make inferences to the population be eStlmated Can be used to compute a statistic 39 Cannet be determined exaetly What an investigator wants to know Can be estimated from a sample major issue is Types of Sampling accuracy Simple random sample SRS sample of size n from a population of size N is obtained through SRS if every possible sample size n has an equally likely chance of occurring Stratified sampling obtained by separating the population into nonoverlapping groups called strata layer and the obtained SRS samples fro each stratum gt Individuals in each stratum should be homogenous or similar in some way Systematic sampling obtained by selecting every kt individual from a population or list gt The lst is chosen randomly Cluster sampling obtained by selecting all individuals within a randomly selected collection or group of individuals Quota sampling not a probability method gt Interviewers have a lot of discretion in choosing subjects Typical Errors Bias systematic tendency to over or underestimate the true value of a variable Nonresponse bias systematically not 39hearing subgroups that are initially selected for the sample Selection bias systematically leaving out subgroups of a population Response bias questionnaire design order amp wording of questions Probability Methods Interviewers have no discretion whatsoever as to whom they interview There39s definite procedure for selecting the sample gt Involves the planned use of chance To minimize bias an impartial amp objective probability method should be used CHAPTER THREE Histograms Histograms represents percentages by area No vertical scale Horizontal axis consists of class intervals Consists of a set of blocks gt Area of each block represents the percentage of cases in corresponding class interval total area of histogram is 100 gt Height of each block represents the percentages of cases per horizontal unit crowding Density scale height of each block equals the percentage of cases in the corresponding class interval divided by the length of that interval Drawing 3 Histogram 4 Draw vertical axis 1 Begin with a table giving the percentages of cases in each class interval 5 Draw blOCkS 2 Draw horizontal axis 3 Determine height of blocks Variables characteristic which changes from subject to subject in a study Qualitative variable word or phrase gt Gender marital status employment etc Quantitative variable a number can be discrete or continuous gt Discrete values can differ by fixed amounts only family size of rooms in a house etc gt Continuous values can differ by any amount age time etc C H A P T E R F O U R The Average and Standard Deviation Standard Deviation measures the spread around the average in a list of entries To calculate take entries from the list one at a time each deviates from the average by 39 quot gt Deviation from average entry average Root Mean Square RMS used to calculate standard de 1 Square all entries getting rid of the signs 2 Take mean average of the squares 3 Take the square root of the mean SD rms size of deviations from average Roughly 68 of a list are within 1 SD of the average Roughly 95 of a list are within 2 SD of the average The Average Median amp Histogram Average mean for a list of numbers the sum of entries divided by how many entries there are gt Average sum of entries number of entries Median a number such that 50 of the data is smaller and 50 of the data is larger than this number A histogram balances at the average Median of the histogram value such that half of the area under the blocks is to the left and the other half is to the I llr l I list average of nitric3 sum of the squared deviations from the average nr of entries Average mean 0 Right skewed Left skewed 39 la illed towards the tail symmetric K Average median Average gt median Average lt median C H A P T E R F I V E Normal Approximation for Data Normal curve can be used as an quotidealquot histogram to which histograms for data can be compared Normal Approximation for Data Replacing original histogram by using the normal curve before finding the area Using the normal curve to estimate the percentage of entries in an interval 1 Convert interval to standard units 2 Find corresponding area under normal curve Total area under the graph is equal to 100 gt Area under a normal curve between 1 amp 1 is 68 gt Area under normal curve between 2 amp 2 is 95 A value is converted to standard units by finding how many standard deviations SDs it is above or below the average gt Values above average are positive gt Values below average are negative Percentiles The kth percentile is a number such that k of the entries in a list are smaller than the number gt 100k are larger gt A number Percentile rank the percentage of entries smaller than that value gt A percentage Estimating percentiles If a histogram follows the normal curve the normal curve can be used to estimate percentiles 1 Sketch a normal curve and find the correct quot2 valuequot by using the normal table at back of textbook 2 quotzquot is given in standard units convert it back to the units in the problem Quartiles lst quartile number such that 14 of the data are smaller and 34 are larger 25th percentile 2nd quartile number such that 24 of the data are smaller and 24 are larger 50th percentile median 3rd quartile number such that 34 of the data are smaller and 14 are larger 75th percentile Interquartile Range IQR another measure of the spread of the data gt IQR 3rd quartile lst quartile Changes of Scale Adding a constant to all entries of a list gt Average increases by this constant gt Standard deviation and standard units don t change Multiplying all entries in a list by a positive number gt Average is also multiplied by this number gt Standard deviation is multiplied by this number gt Standard units don t change Multiply all entries on a list by a negative constant gt In standard units the signs are reversed Changes of scale don39t change standard units only units change C H A P T E R S I X Measurement Error Chance Errors In reality results are thrown off by chance error gt The error changes from measurement to measurement No matter how careful a measurement could have come out slightly differently If the measurement is repeated it will come out slightly differently gt Replicating the measurement shows the difference The standard deviation of a series of repeated measurements estimates the likely size of the chance error in a single measurement gt Individual measurement exact value chance error Outliers In careful measurement work a small percentage of outliers is expected gt lst choice is more usual triumph of theory over experience The average and standard deviation can be heavily influenced by outliers Bias systematic error Affects all measurements in the same way gt Pushing measurements in the same direction The basic equation must be modified when each measurement is altered by bias and chance error gt Individual measurement exact value bias chance error If no bias in a measurement procedure the longrun average of repeated measurements should give the exact value of the thing being measured chance errors should be cancelled out 39 If bias present the longrun average will itself be either too high or too low Usually bias can39t be detected byjust looking at the measurements themselves CHAPTER EIGHT Correlation The Scatter Diagram Illustrates the relationship between two variables Association If there39s strong association between two variables knowing one helps predicting the other If there39s weak association between two variables knowing one variable doesn39t help much in guessing the other Association is NOT causation Dependent and Independent Variables Independent variable influences the dependent variable Independent variable is put on the horizontal xaxis Dependent variable is put on the vertical yaxis Summarizing the Data Point of averages shows the average of the xvalues and the average of the yvalues o It locates the center of the point cloud Measuring the spread of the cloud from side to side use the SD of the xvalues the horizontal SD and the SD of the y values the vertical SD Lightly clustered scatter points strong linear association Loose clustered scatter points weak linear association Correlation Coefficient Quantity measuring the strength of the linear association between the two variables Correlation coefficient r measure of linear association or clustering around a line To calculate 1 Convert each variable into standard units 2 Taking the average of the products gt r average of x in standard units x y in standard units Interpreting Correlation Coefficient Correlations always between 1 amp 1 and can take any value within that range Positive correlation cloud slopes up as one variable increases so does the other direct relationship Negative correlation cloud slopes down as one variable increases the other decreases inverse relationship If quotrquot is closer to 1 or 1 the linear association between the variables is stronger and the points are more tightly clustered together on the scatter diagram If quotrquot is closer to O the linear association between the variables is weaker and the points are more loosely clustered on the scatter diagram C H A P T E R N I N E More Correlation Correlation Coefficient r pure number without units amp is not affected by 1 Interchanging the 2 variables 2 Adding or subtracting a constant to or from all the values of one variable 3 Multiplying or dividing all the values of one variable by a positive constant r is the average of the products of x amp y after being converted to standard units By converting x amp y to standard units before calculating r the value of r is independent of the units used Correlation coefficient measures linear association only The correlation coefficient measures clustering in terms relative to the SD Ecological Correlation based on rates amp averages often used in political sciences and sociology Often tend to overstate the strength of an association There is considerable amount of variation between individuals Taking the rate or averages of groups removes some of the variation and makes it seem like there is more clustering C HAPT E R T E N Regression Regression Two correlated variables With knowledge of the value of one variable we can make better predictions about the value of the other variable The Regression Line For y on x estimates the average value of y corresponding to each value of x gt Increase in 1 SD in x results in an increase of only r SDs in y f graph of averages follow a straight line the line is the regression line gt The regression line is a smoothed version of the graph of averages 39lir ip that one thrnlmh the nnint of the averages R has Inna il3939 ll il391 lt V 39 39 JIM y l I 1 if 39 hear a 1quot 3es39 RGd line X on y m W I w39 gt Begins on the horizontal x axis Black line y on x P x m 3 gt Begins on the vertical y axis 5 I Regression Method way of using the correlation coefficient r to estimate the average value of y for each value of x Regression estimate resulting value of y Regression for Percentiles amp Percentile Ranks Change regression method to use without x use percentile instead Not interested in finding y look for percentile rank instead 1 Find zX by using normal table at back of text book 2 Calculate r x zX 2V 3 Convert zyto percentile rank by using the normal tabe Do not use average or SD Only use normal table and correlation coefficient since everything is calculated in standard units Use the normal table only if the scatter diagram is quotfootball shapedquot Regression Fallacy Regression effect in virtually all testretest situations the bottom group on the first test will on average show some improvement on the second test gt The top group will on average fall back Regression faacy is due to thinking that the regression effect must be a result of something important instead of iust the spread around the line C H A P T E R E L E V E N RMS Error for Regression RMS Error for Regression Line used to determine how precise estimates are when using regression method to predict value of y from a given value ofx Actual values are different from predictions due to errors Error actual value predicted value The error is simply the distance of a point above or below H the regression line Rms error is smaller than SD ofy gt Regression line gets closer to the points than the horizontal line I 7 SDquot rms error for the regression line of y on x gt Units for rms error are same as the units for the variable being predicted Rms error measures distance of points to the regression line gt If rms error is small points are closer to the line gt Measures spread in original units of y C H A P T E R T W E L V E The Regression Line Slope of Regression Line of Regression Line39 ntercept of regression line quation of regression in Iculating Regression Estimate
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'