### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# STAT 217 FINAL EXAM STUDY GUIDE STAT 217-05

Cal Poly

### View Full Document

## 60

## 1

## Popular in Statistical Concepts and Reasoning

## Popular in Statistics

This 158 page Study Guide was uploaded by Sierra Taylor on Wednesday June 1, 2016. The Study Guide belongs to STAT 217-05 at California Polytechnic State University San Luis Obispo taught by Dr. Karen McGaughey in Spring 2016. Since its upload, it has received 60 views. For similar materials see Statistical Concepts and Reasoning in Statistics at California Polytechnic State University San Luis Obispo.

## Popular in Statistics

## Reviews for STAT 217 FINAL EXAM STUDY GUIDE

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 06/01/16

STAT217: STATISTICAL METHODS Information and Topicsfor the Final Exam TOPICS I. Collecting Data IV. Null Distributions • Populations/Parameters • Describe in general what a null distribution is – how do • Samples/Statistics we simulate them (cards/coins) for different types of • Categorical vs. Numerical Data tests? • Data Collection Methods: Observational study vs. • What is the purpose of the simulations we did this Designed Expt. quarter? How did the simulations work? What pr ocess • Observational units was simulated? • Could you explain how to simulate a null distribution for • Explanatory variables • Response variables a new statistic? (Recall the last question on Exam 2) • Confounding variables • Be able to describe/compute (shape, mean, std dev) the null distribution op • Why do we need random assignment? What is the purpose? • Why do we want a random sample? What kind of V. Inferential Methods conclusion does random sampling allow and why? • Sampling bias • What is a CI? What is its purpose? • Blinding in statistical studies – why should subjects be • Effects of sample size, confidence level, standard blinded to the treatment they are receiving? deviation on the MoE and width of interval • Purpose of Control groups • Difference between CI and hypothesis test (when to use • Cause-and-effect – when can we draw cause-and-effect each) conclusions? What types of studies? Association vs. • Interpretation of p-value in context Causation • Test and CI for the proportion, • Generalization – what population can we generalize study • Test and CI for the difference in 2 means,1µ 2µ ( conclusions to and why independent t-test) • Test and CI for the mean difference,diffpaired t-test) II. Graphical Methods • Test and CI for the difference in 2 proportioππ− 1 2 • Categorical Data: bar charts • χ 2-Test for 3 or more proportions,ππ, π , ,... 1 2 3 • Numerical Data: Histograms, dot plots • F-Test for 3 or more means, µµ, µ , ,... • Common distributional shapes 1 2 3 • Scatterplots • All steps and technical conditions of the above hypothesis tests and CI’s • Know when to carry out each of the above tests and CI’s III. Numerical Methods for Describing Data • Impacts of sample size and standard deviation on the t, z, • Center: Mean, median 2, F, p-value. χ • Variability: IQR, Std. Dev • What do we mean by “statistically significant”? • Proportions vs. percentages vs. counts • What is the purpose of doing a hypothesis test? • Two-way tables & conditional proportions • What does a p-value measure? • Empirical Rule & the normal distribution • Type I and Type II errors • Z-scores & t-scores – what do these quantities measure? 2 • χ -statistic – what does this quantity measure? • F-statistic – what does this quantitymeasure? Be able to decide which of two sets of data will have a larger/smaller F-statistic. • Correlation • Slope and intercept of the regression line • Using the regression line to predictthe value of a new observation • Residuals and the method of least squares 1 COVERAGE There will be multiple choice and possibly T/F questions. The final exam will cover the entire course. You will be graded both on the work you show and on your final answers. A correct final answer with little or no work to Approximately, 1/3 of the exam will cover the Exam #1 support that answer will receive little credit. material, 1/3 the Exam #2 material, and 1/3 the material since Exam #2. This exam will test your conceptual understanding of Pay particular attention to the following topics: statistics. Definitions, terms, steps of hypothesis tests, etc. will help, but these will not be the sole focus. Instead, you need to • Confidence intervals for 2 means and 2 proportions understand WHY a hypothesis test works; what the premise • Simple Linear Regression and correlation and purpose of hypothesis testing are (similarly for confidence • Matched Pairs vs. Independent t-tests for 2 groups intervals); how variability and sample size influence the results we get and conclusions we draw in statistical studies; (JMP output) • Analysis of Variance why we use simulation to determine the null distribution; what • Type I vs. Type II errors the null distribution is and what information it provides to us; why we have a null and alternative hypothesis, etc. Also, you • Interpreting the p-value and the relationship to should be prepared to apply what you have learned to a novel sample size and variability of samples • Study design and confounding variables problem. Keep in mind, we have done the same thing all quarter long. You will be asked to decide what method (type of test or CI) WHAT TO BRING should be used for a given research question. This will be set up as a multiple choice question and it will be worth a - a calculator significant number of points. - a pencil or pen - 8.5x11 note sheet – front and back HOW TO STUDY WHAT NOT TO BRING Your best study guides are (1) the course notes, (2) labs and other electronics (see next page) exercises, (3) quizzes and exams, and (4) the practice problems. Re-work examples from the notes, re-work practice and quiz problems; work ALL of the review problems. Focus on conceptual understanding and the big picture. Be able to BRINGING NOTES TO THE EXAM answer the question “Why do we do this???” “Why do we The exam is closed-book, but you may bring one 8.5x11 page analyze sample data using the techniques we have discussed this quarter?” of notes (front and back). These notes may include formulas, definitions, and any miscellaneous notes you think are important. NO WORKED OUT EXAMPLES. NOTHING IN Write out your note sheet early rather than late. Practice old CONTEXT. problems with your note sheet next to you. You should get accustomed to looking up formulas from the note sheet quickly. Your notes must be hand-written. You may not photocopy notes from a classmate, the textbook, or your own notes. You will be tested on your understanding of null distributions, Nothing may be typed. confidence intervals and hypothesis testing. Computation is Any violation of these guidelines, will result in an automatic part of that understanding, but certainly not all. I may provide deduction of at least 10 points from your exam grade. You computer output for you, and you may be asked to interpret that output. must turn in your note sheet with the final exam. ELECTRONICS FORMAT OF THE EXAM Cell phones, iPods, etc. are not allowed at any exam. If you The Final Exam will consist of 8-12 problems, some with have any such electronics out at any time during the exam, multiple parts, and will count for 25% of your final course your exam will be confiscated, and you will automatically receive a score of 0 points. grade. The exam is written to take about 1.5-2 hours. Partial credit will be available. Do not expect “plug-and-chug” questions. All questions will have a context and it is expected If you ordinarily use your cell phone as a clock, you should wear a watch to the exam. that your answers will be in context. 2 STAT 217: STATISTICAL METHODS FINAL EXAM Review Questions Note: These are just a few review questions covering selected material only. Not all of the material that will be on the final exam is covered here, and these questions are mostly mechanical and not conceptual. You should go back and work all of the practice problems previously provided. 1. A study of potential age discrimination considers promotions among middle managers in a large company. Researchers would like to determine if age discrimination is happening in the company. The JMP analysis is shown to the right. Carry out the appropriate test to answer this question. Show all steps. 2. A random sample of 66 Cal Poly STAT 217 students was taken in Spring 2011. Students were asked two questions: What is your gender?, and How many units are you currently enrolled in? Two analyses are shown below from JMP. One is the correct analysis and one is incorrect. The goal is to determine if there is a difference, on average, in the number of units enrolled between males and females at Cal Poly. a. Which is the appropriate analysis of this sampled data? Why? b. Carry out a hypothesis test to determine if there is a significant difference in the number of units enrolled, on average, between males and females at Cal Poly in Spring 2011. (Show all steps: Ho and Ha, define the parameter, check technical conditions, find p-value, make a decision and draw a conclusion in context.) c. Interpret the 95% confidence interval in theappropriate output above. 1 3. The journalAcademy of Management (1989) reported on a study which investigated the relationship between smoking status and short-term absenteeism rates (measured in hours per month absent) in office workers. The results of the JMP analysis are shown below. a. Identify the observational unit, explanatory variable and response variable. b. What are the null and alternative hypotheses for this study? c. Do the data support the alternative hypothesis? Explain why or why not, supporting your answer. d. Are the technical conditions met for the use of the F-test? e. Using the p-value, provide a conclusion for this study, in context. Be sure to discuss whether a cause-and-effect conclusion is valid and be clear about the population to which you are generalizing. 2 f. Looking just at the continuous smoker group, write an interpretation for the 95% confidence interval that is provided in the Means and Std Deviations output. g. Suppose the means had been 1.6, 1.4, 1.3 and 1.7 for the continuous, long-term ex-smoker, never smoked and recent ex-smoker groups. Suppose further that the standard deviations stayed the same (0.64, 0.71, 0.77, and 0.70) and the sample sizes stayed the same (96, 86, 206, 34). What affect would this change in means have on the F- statistic and p-value? Explain. 4. For each of the following cases, decide what type of analysis method should be used. Possible methods are listed below. Not every method will be used. Inference Methods: • Matched Pairst confidence interval ( µDifference • Matched Pairst hypothesis test (µ Difference • 1-sample z confidence interval ( )π • 1-sample z hypothesis test ( π ) • 2-sample t confidence interval ( µµ1 2 ) • 2-sample t hypothesis test ( µ1 2 ) • 2-sample z confidence interval ( ππ1 2 ) • 2-sample z hypothesis test ( π1 2 ) • Chi-square hypothesis test ( π1 2 ,....,k ) • F-test hypothesis test ( ) µ1 2 3, ,... a. A Kinesiology student at CP has carried out a study on two different bike chain rings to see if their time in a 1- km time trial would be faster on one of the rings than the other. The study was carried out in such a way that 12 cyclists, did the 1-km time trial twice (once on the cylindrical chain ring and again on an oval chain ring). 3 b. A student at Cal Poly carried out a study to determine if there were any differences in the true mean strength gains for two different exercise plans. He took a group of 50 volunteers, and randomly assigned them to one of the two types of exercise. After 12-weeks, he measured the improvement in strength. c. A random sample of 1000 U.S. adults taken in February by Gallup found that 45% of those surveyed approve of the job Obama is doing as president. What is Obama’s approval proportion among all U.S. adults? d. A Biology student at CP has determined the carapace (hard-shell of a tortoise) length (cm) and for 20 land- dwelling tortoises of 3 difference species. The student wishes to determine if the average carapace length differs among the 3 species. e. A survey of 100 randomly selected full-time CP students, and 100 randomly selected part-time CP students was carried out. Each student was asked their level of agreement with a fee increase in Fall 2016. Possible responses were agree, no opinion, disagree. Researchers wished to determine if there were any differences in the fee increase opinions between the two student groups (full-time and part-time). f. A senior project student at CP takes a random sample of 100 students in each college at CP in Winter 2016. Each student is asked whether or not s/he spent more than $500 on books and supplies for the quarter. The senior project student would like to know if the cost of textbook/supplies is associated with college. 5. In a random sample of 1147 U.S. adults ages 18 and older living in all 50-states and the District of Columbia, a USA Today/Gallup poll taken Jan 28-30, 2011 found that 60% of the sampled individuals use Google (at least once) in a typical week. a. Describe the population and parameter of interest. b. Estimate with 95% confidence, the true proportion of Google users in the U.S. Show that the technical conditions are met for this study. Write a sentence interpreting the interval. c. Are the technical conditions for this interval met? Explain, fully supporting your answer. 6. The article “The Association Between Television Viewing and Irregular Sleep Schedules Among Children Less than 3 Years of Age” (Pediatrics, 2005), reported the following 95% confidence intervals for the difference in the mean TV viewing times (in hours per day) between males and females for three different age groups: 95% CI Age Group µ µ − male female 18 - 25 (-0.2, 0.8) 26 - 35 (0.5, 1.2) 36 - 50 (-1.25, -0.65) a. Which of the three groups has the largest margin of error? b. Suppose the sample sizes for each of the three groups were equal. Which age group would have had the greatest variability (i.e., the largest standard deviation)? c. Suppose the standard deviations were all the same. Which group would have had the largest sample size? d. For which group is the average viewing time higher for females than for males? Why? 4 7. The following data (Consumer Reports 1999 New Car Buying Guide) reports the EPA’s city miles per gallon rating and the weight (in pounds) of a random sample of 15 sports cars. Model city MPG weight BMW 318Ti 23 2790 BMW Z3 19 2960 Chev Camaro 17 3545 Chev Corvette 17 3295 Ford Mustang 17 3270 Honda Prelude 22 3040 Hyundai Tiburon 22 2705 Mazda MX-5 Miata 25 2365 Mercedes Benz SLK 22 3020 Mercury Cougar 20 3140 Mitsubishi Eclipse 23 3235 Pontiac Firebird 18 3545 Porsche Boxster 19 2905 Saturn SC 27 2420 Toyota Celica 22 2720 a. What is the explanatory variable? What is the response variable? Observational unit? b. Using the scatterplot above, does there appear to be a correlation between weight and city MPG? If so, is the correlation negative or positive? Explain. c. Interpret the sample slope and intercept of the least squares line. d. Predict the EPA’s city gas mileage for a sports car the BMW Z3. Find the residual for this car. Interpret the residual. 5 STAT 217: STATISTICAL METHODS FINAL EXAM Review Questions Note: These are just a few review questions covering selected material only. Not all of the material that will be on the final exam is covered here. You should go back and work all of the practice problems previously provided. 1. A study of potential age discrimination considers promotions among middle managers in a large company. Researchers would like to determine if age discrimination is happening in the company. The JMP analysis is shown to the right. Carry out the appropriate test to answer this question. Show all steps. H :0 π under30 30−39− 40 49= over50 H :At least one π different π = the true proportion of promotions (or no promotions) for all middle managers in a large company within each age group Tech conditions are met since we have at least 10 promotions and 10 not promoted within each age group. Our small p-value of 0.0072 means we have evidence to believe the true promotion proportions are not all the same among the 4 age groups. Therefore, yes, we do have evidence to believe there could be some age discrimination in the company. (It appears people under 30 and over 50 have less chance of being promoted.) However, since there was no random assignment in this study (it’s observational), we cannot conclude cause-and-effect. Thus, there could be some other factor responsible for the differences we see in the promotion proportions among age groups. 2. A random sample of 66 Cal Poly STAT 217 students was taken in Spring 2011. Students were asked two questions: What is your gender?, and How many units are you currently enrolled in? Two analyses are shown below from JMP. One is the correct analysis and one is incorrect. The goal is to determine if there is a difference, on average, in the number of units enrolled between males and females at Cal Poly. a. Which is the appropriate analysis of this sampled data? Why? The OneWay analysis is the correct analysis because the 2 samples (male, female) are independent and not paired. b. Carry out a hypothesis test to determine if there is a significant difference in the number of units enrolled, on average, between males and females at Cal Poly in Spring 2011. (Show all steps: Ho and Ha, define the parameter, check technical conditions, find p-value, make a decision and draw a conclusion in context.) H :o µµMale Female H : µµ ≠ o Male Female µ = true average number of units all males (or females) are enrolled in, in the Spring 2011. 1 Tech conditions: 1. The 2 samples are independent. 2. The sample sizes appear to be quite small (see boxplots) and the boxplots are somewhat right skewed. This condition is most likely not met and the analysis via the t-distribution is suspect. A more appropriate analysis would simulate the null distribution. The JMP t-test analysis will be used for example purposes only. p-value from JMP = 0.8565 FTR Ho. With such a large p-value = 0.8565 there is not sufficient evidence to conclude there is a difference, on average, in the number of units enrolled, for all males and females at CP in Spring 2011, similar to those in this study. Cause and effect is not valid here as this study was observational and the p-value was large. We do not have evidence to suggest an association between gender and the number of units enrolled. c. Interpret the 95% confidence interval in theappropriate output above. We are 95% confidence the true average difference in units enrolled between all males and all females (similar to those studied) at CP in Spring 2011 is between -1.1 and 1.3 units. 3. The journalAcademy of Management (1989) reported on a study which investigated the relationship between smoking status and short-term absenteeism rates (measured in hours per month absent) in office workers. The results of the JMP analysis are shown below. a. Identify the observational unit, explanatory variable and response variable. b. What are the null and alternative hypotheses for this study? c. Do the data support the alternative hypothesis? Explain why or why not, supporting your answer. d. Are the technical conditions met for the use of the F-test? e. Using the p-value, provide a conclusion for this study, in context. Be sure to discuss whether a cause-and-effect conclusion is valid and be clear about the population to which you are generalizing. f. Looking just at the continuous smoker group, write an interpretation for the 95% confidence interval that is provided in the Means and Std Deviations output. g. Suppose the means had been 1.6, 1.4, 1.3 and 1.7 for the continuous, long-term ex-smoker, never smoked and recent ex-smoker groups. Suppose further that the standard deviations stayed the same (0.64, 0.71, 0.77, and 0.70) and the sample sizes stayed the same (96, 86, 206, 34). What affect would this change in means have on the F- statistic and p-value? Explain. 2 4. For each of the following cases, decide what type of analysis method should be used. Possible methods are listed below. Not every method will be used. Inference Methods: • Matched Pairst confidence interval ( µDifference • Matched Pairst hypothesis test (µ Difference • 1-sample z confidence interval ( )π • 1-sample z hypothesis test ( π ) • 2-sample t confidence interval ( µµ1 2 ) • 2-sample t hypothesis test ( µ1 2 ) • 2-sample z confidence interval ( ππ1 2 ) • 2-sample z hypothesis test ( π1 2 ) • Chi-square hypothesis test ( π1 2 ,....,k ) • F-test hypothesis test ( µµ, µ , ,...) 1 2 3 a. A Kinesiology student at CP has carried out a study on two different bike chain rings to see if their time in a 1-km time trial would be faster on one of the rings than the other. The study was carried out in such a way that 12 cyclists, did the 1-km time trial twice (once on the cylindrical chain ring and again on an oval chain ring). Matched pairs hypothesis test b. A student at Cal Poly carried out a study to determine if there were any differences in the true mean strength gains for two different exercise plans. He took a group of 50 volunteers, and randomly assigned them to one of the two types of exercise. After 12-weeks, he measured the improvement in strength. 2-sample t hypothesis test for µµ1 2 c. A random sample of 1000 U.S. adults taken in February by Gallup found that 45% of those surveyed approve of the job Obama is doing as president. What is Obama’s approval proportion among all U.S. adults? 1 sample z confidence interval for π d. A Biology student at CP has determined the carapace (hard-shell of a tortoise) length (cm) and for 20 land-dwelling tortoises of 3 difference species. The student wishes to determine if the average carapace length differs among the 3 species. F-test. ANOVA. e. A survey of 100 randomly selected full-time CP students, and 100 randomly selected part-time CP students was carried out. Each student was asked their level of agreement with a fee increase in Fall 2012. 3 Possible responses were agree, no opinion, disagree. Researchers wished to determine if there were any differences in the fee increase opinions between the two student groups (full-time and part-time). 2-sample z-test foππ1 2 f. A senior project student at CP takes a random sample of 100 students in each college at CP in Winter 2016. Each student is asked whether or not s/he spent more than $500 on books and supplies for the quarter. The senior project student would like to know if the cost of textbook/supplies is associated with college. 2 χ −test ****Note: Some good practice would be to take each of these scenarios, identify the RV, EV (if there is one), and if it is a hypothesis test, provide Ho and Ha. 5. In a random sample of 1147 U.S. adults ages 18 and older living in all 50-states and the District of Columbia, a USA Today/Gallup poll taken Jan 28-30, 2011 found that 60% of the sampled individuals use Google (at least once) in a typical week. a. Describe the population and parameter of interest. Population: All U.S. adults 18 and over. Parameter: π = proportion of all U.S. adults ages 18 and older who use Google in a typical week. b. Estimate with 95% confidence, the true proportion of Google users in the U.S. Show that the technical conditions are met for this study. Write a sentence interpreting the interval. 0.60(1−0.60) 0.60 (2) =±0.60 0.029 (0.571,0.629) 1147 Technical Conditions: 1147(0.60) =688 and 1147(0.40) =459 are both greater than 10. We are 95% confident that between 57.1% and 62.9% of all U.S. adults ages 18 and older use Google in a typical week. c. Are the technical conditions for this interval met? Explain, fully supporting your answer. Shown above. 4 6. The article “The Association Between Television Viewing and Irregular Sleep Schedules Among Children Less than 3 Years of Age” (Pediatrics, 2005), reported the following 95% confidence intervals for the difference in the mean TV viewing times (in hours per day) between males and females for three different age groups: 95% CI Age Group µ µ − male female 18 - 25 (-0.2, 0.8) 26 - 35 (0.5, 1.2) 36 - 50 (-1.25, -0.65) a. Which of the three groups has the largest margin of error? b. Suppose the sample sizes for each of the three groups were equal. Which age group would have had the greatest variability (i.e., the largest standard deviation)? c. Suppose the standard deviations were all the same. Which group would have had the largest sample size? d. For which group is the average viewing time higher for females than for males? Why? a. Which of the three groups has the largest margin of error? The margin of error is half the width of the interval. The widest interval is largest in the 18-25 age group. Thus, this group has the largest margin of error. b. Suppose the sample sizes for each of the three groups were equal. Which age group would have had the greatest variability (i.e., the largest standard deviation)? Age group 18-25 since it has the largest width. c. Suppose the standard deviations were all the same. Which group would have had the largest sample size? If the standard deviations are all the same, the interval with the smallest MoE would have the largest sample size. Hence, the 36-50 age group would have the largest sample size. d. For which group is the average viewing time higher for females than for males? Why? The average viewing timefor females is higher than males in the 36-50 age group since this interval is entirely negative. 7. The following data (Consumer Reports 1999 New Car Buying Guide) reports the EPA’s city miles per gallon rating and the weight (in pounds) of a random sample of 15 sports cars. a. What is the explanatory variable? What is the response variable? Observational unit? Explanatory variable: weight Response variable: city MPG Obs unit: car b. Using the scatterplot above, does there appear to be a correlation between weight and city MPG? If so, is the correlation negative or positive? Explain. 5 Yes, there appears to be a negative correlation between weight of the car and city MPG. In other words, as the weight of the car increases the city MPG decreases. c. Interpret the sample slope and intercept of the least squares line. b = sample slope = 0.00695: For cars in the sample, as the weight of the car increases by 1 pound, the city MPG is predicted to decrease by 0.00695 mpg. Or, as the weight of the car increases by 1000 pounds, the city MPG is predicted to decrease by 6.95 mpg. a = sample intercept = 41.7: For cars in the sample, a car of 0 pounds has a predicted city MPG of 41.7 mpg. d. Predict the EPA’s city gas mileage for a sports car the BMW Z3. Find the residual for this car. Interpret the residual. − ity = 41.7 0.00695(296021.13mpg residual = observed – predicted= 19 – 21.13 = -2.13 mpg The observed city milage of 19mpg is 2.13 mpg below the predicted city mileage for this car. Model city MPG weight BMW 318Ti 23 2790 BMW Z3 19 2960 Chev Camaro 17 3545 Chev Corvette 17 3295 Ford Mustang 17 3270 Honda Prelude 22 3040 Hyundai Tiburon 22 2705 Mazda MX-5 Miata 25 2365 Mercedes Benz SLK 22 3020 Mercury Cougar 20 3140 Mitsubishi Eclipse 23 3235 Pontiac Firebird 18 3545 Porsche Boxster 19 2905 Saturn SC 27 2420 Toyota Celica 22 2720 6 STAT 217: STATISTICAL METHODS A Few Exam1 Review Questions 1. In the following scenario describe the population, sample, observational unit, variable(s), parameter of interest, and the statistic which could be used to estimate the parameter. Is (are) the variable(s) quantitative or categorical? A study is to be done on the social interaction skills of California second graders. A random sample of 50 second graders is taken. A psychologist watches each of the selected children for a period of 3 days, and records whether or not each child had a negative interaction with any of the other children in his/her class at school. Of interest is to estimate the true proportion of negative interactions all second graders have with their classmates during a period of 3 days. Pop = All California second graders. Sample = 50 California second graders. OU = one second grader Variable = whether or not the child had a negative interaction during the 3-day period Parameter = π = true proportion of negative interactions all CA second graders would have with their classmates during a period of 3 days. Statistic = p= proportion of negative interactions the sampled CA second graders had with their classmates during a period of 3 days. 2. Intuitively, using what you know about WHAT the standard deviation measures, what do you believe is the standard deviation of the following data set: 6, 6, 6, 6, 6, 6, 6. Explain. Since the standard deviation measures the typical distance to the mean, and all of the data values are the same and equal to the mean (of 6), the standard deviation should be 0. 3. Describe the differences/similarities between statistics and parameters. Parameters are summary numbers of the population. Statistics are summary numbers of the sample. 4. The following graph summarizes data collected on a sample of 432 domestic wines. 1 domestic wine (a bottle of that wine, actually) 1 b. What variable(s) is/are graphed? Be sure to note the variable type. Only one variable is graphed, the lead concentration (measured in ppb). This variable is quantitative. c. How would you describe the shape of this graph? Right skewed d. Between what two numbers would you expect the median lead content to be located? i. 0-50 ii. 50-100 iii. 100-150 iv. 150-200 e. Would you expect the average lead content to be located above or below the median? Explain. The average lead content should be located above the median, since there are wines that have extremely large lead contents which will ‘pull’ the average to a higher value. The results below are taken from the n=66 Cal Poly Stat 217 students sampled in Spring 2010. The table and graph show the relationship status by gender. Rows: Relationship Columns: Gender Comparisons of Relationship Status by Gender Cal Poly STAT 217 Spring 2010 100 Relationship nos Female Male All 80 No 31 12 43 Yes 15 8 23 t 60 r All 46 20 66 P 40 20 5. Let’s consider the females ONLY. Gender F M Percent within levels of Gender. a. What proportion of sampled females were in a relationship Spring 2010? Is this number a parameter or statistic? What symbol do we use for it? 15 pYes = = 0.326 It is a statistic because it is computed from the sample data. 46 b. Use the information provided, to estimate the proportion of all Cal Poly Stat 217 female students who were in a relationship in Spring 2010. p(1− ) − (0.326)(1 0.326) p±2 = ± 0.326 2 = ±.=26 0.138 (0.188,0.464) n 46 c. Write a sentence interpreting your estimate from part (b). We are 95% confident between 18.8 and 46.4% of all female Stat 217 students in Spring 2010 would have been in a relationship, for students similar to those sampled. d. If the sample size had been larger, would the confidence interval from (b) be narrower, wider or the same width? Explain. 2 If the sample size had been larger, the interval in (b) would be narrower. Because n is in the denominator of the standard deviation, the standard deviation will be smaller with larger n. 6. Now, let’s consider the males ONLY. Were significantly more than 50% of all male Stat 217 students in Spring 2010 unattached? a. Provide the appropriate null and alternative hypotheses. H 0 π No = 0.50 H A πNo > 0.50 b. What proportion of sampled males were NOT in a relationship in Spring 2010? ˆ 12 p No= 20 = 0.60 c. Is it appropriate to use the Normal model approximation for the null distribution? Explain why or why not, supporting your answer with appropriate computations. Perhaps not, since the technical conditions are not quite met. Note that for males, there are 12 “no” and 8 “yes” responses. This is right on the boarder of what is acceptable and what is not for use of the Normal model, so it may be more appropriate to use the coin tossing applet to simulate the null distribution, rather than approximate it with a Normal model. d. Use the coin tossing applet to find the p-value. Your results should be similar to that shown below, but not necessarily exactly the same since the 1000 repetitions are random. e. Write a sentence interpreting the appropriate p-value from part (d). In a simulation of 20 male STAT 217 students where we assume 50% of all male STAT 217 students are unattached, we could expect a sample proportion of 0.60 unattached or something more extreme, about 25.1% of the time. f. Use the p-value to draw a conclusion regarding the research conjecture. Support your answer with your p-value. Be sure to consider the population to which you are willing to generalize the conclusion. (Note: Since 0.251 > 0.05 we should FTR Ho.) 3 Our large p-value, 0.251, means we do NOT havestatistically significant evidence to believe that more than 50% of all male Stat 217 students in Spring 2010 would have been unattached, for students similar to those sampled. 7. A recent Gallup poll of 1500 randomly sampled US adults, shows 52% approve of the job Obama is currently doing as president. a. Suppose we wish to determine if his approval rate is significantly more than 50%. Write out the null and alternative hypotheses (sentences and symbols) we should use to make this determination. H :π = 0.50 0 Approve H Aπ Approve0.50 b. Determine the z-score of the observed sample proportion and write a sentence interpreting it in the context of the study. z = 0.52−0.50 = 0.02 = 1.55 (0.50)(1−0.5) 0.0129 1500 The observed sample proportion of US adults who approve of the job Obama is doing as president, p 0.52, is 1.55 standard deviations above the mean of 0.5. c. Use the Normal model (Theory-Based Inference Calculator) to find the p-value for this study. d. Use the p-value to draw a conclusion regarding the research conjecture. Support your answer with your p-value. Be sure to state the population to which you are willing to generalize the conclusion. The p-value of 0.0607 is larger than 0.05. Thismeans we do NOT have statistically significant evidence that Obama’s approval rating (by all US adults) would besignificantly morethan 50%. e. Suppose Gallup had used only a sample of 500 adults. Assuming 52% of this sample approved of the job Obama is currently doing as president, explain the effect of this smaller sample on the p-value. Be sure to explain why this change would happen. The smaller sample size means the standard deviation of the null distribution will be larger. Since the z- score would then be smaller, the p-value will be larger. 4 8. Explain why it is definitly okay to use the Normal model to find the p-value in #5, but it might not be okay to use the Normal model to find the p-value in #4. This is due to the technical conditions. Since we require 10 or more successes and failures, with a sample size of 1500 and 52% who approve, we have 780 ‘successes’ and 720 ‘failures’, so we definitely can use the Normal model in #5. However, as shown in part (c) of #4, we only have 12 successes and 8 failures. It would perhaps be more appropriate to use the coin tossing applet to generate the null distribution, rather than the Normal model, in #4. 9. Write a few sentences explaining what we use the One Proportion Simulation Applet for. That is, what is it’s purpose? In addition, discuss why we set it up the way we do and what information we use it to get? I will let you work on this one. 10. Suppose a polling organization in California polls residents of southern California(at random) about their opinions on water conservation. Would it be okay to generalize the results of this poll to all Californians? Why or why not? No, even though the sample is taken randomly, it is only taken randomly from southern California. Opinions on water conservation may be different in southern California than in northern California or the Central Valley, thus, this sample may not be representative of the opinions of all Californians on this issue. 11. Suppose the administration at Cal Poly wants to know how students feel about food quality on campus. To gauge opinion, they set up a booth in the UU on Thursdays from 11-12 for the month of April, collecting the opinions of 100 students. Would it be okay to generalize the results of this poll to all Cal Poly students? Why or why not? No. This is a convenience sample. Setting up a booth in the UU means only students who happen to be in the UU on Thursdays from 11-12 have the opportunity to respond to the poll. In addition, it is up to the student to come over to the booth, which means only students who are interested or have strong opinions on the matter may choose to participate. Thus, the sample data may not be representative of the opinions of all CP students on this issue. 12. Referring back to #11, would increasing the number of students to 1,000 improve the ability to generalize the results of the poll to all Cal Poly students? Why or why not? No, we still have a biased sampling method. The method of collecting opinions hasn’t changed. It still uses a convenience sample from a booth in the UU on Thursdays from 11-12, relying on volunteers to answer the poll. With a larger sample, we just have a larger BIASED sample. 5 STAT217: STATISTICALMETHODS Practice: Day 02 The following problems are intended to provide practice with the material we discuss in class and labs. These problems are practice only – not graded. Solutions are provided. 1. The histogram below shows the amount of money spent by 601 visitors to Disney World last year. a. In this graph, what is the observational unit and variable? Is the variable categorical or quantitative? OU: 1 visitor to Disney World Var: Amount spent ($) - quantitative b. What is the shape of this histogram? Bell-shaped and symmetric. c. At what values (approximately) would we expect the mean and median to be located? Why? Both the mean and median should be located at about $90-$100. Since the distribution is symmetric, the left half looks like the right half, thus the median (the salary which divides the data in half) should be in the middle. The mean is the balance point. Imagine the bars are actually bricks stacked on top of one another. Since the distribution is symmetric, the mean (balance point) is around $90-100. 2. Below are two bar charts displaying the same exact data (the home state of visitors to Disney World). a. Explain why the home state variable is graphed using a bar chart, rather than a histogram. Type of visitor is categorical, not quantitative. b. Explain how the two graphs above are different. The order of the categories along the x-axis has changed. 1 c. Does it make sense to talk about the shape (e.g., left skewed or right skewed) of this data? Explain why or why not. No. With a categorical variable such as this one, the order along the x-axis is arbitrary. In the first graph the categories are in alphabetical order. Since the order is arbitrary, the ‘shape’ would change depending on this ordering. Hence, we don’t discuss the ‘shape’ of a bar chart. We only discuss ‘shape’ of a histogram, since quantitative variables have a natural (smallest to largest) ordering. d. Write a brief summary of the information portrayed in the graphs above. Use numbers to support your statements. Visitors from southern states had the highest representation at 41% in this group of 601 visitors to Disney World. The fewest number of visitors came from the west; the pacific northwest and the pacific southwest each only accounted for about 2% of the visitors in this group. 3. The histogramsbelow display the yearly income ($) for a group of female and male executives. a. Describe the shape of each salary distribution. Both salary distributions are basically symmetric and bell-shaped. b. At what salary are the means and medians located (approximately) for both male and female executives. For female executives, both the mean and median are between $450,000-$500,000. For the male executives, the mean and median are higher, both of which are between $550,000-$600,000. c. Which salary distribution is more variable? That is, for which salary distribution would you expect the standard deviation to be larger? The male executive salary distribution is more variable. This is evident by the wider spread in the x-direction. The male salary distribution would have a larger standard deviation than the female salary distribution. 2 4. Open the Backpack.jmp data file posted on PolyLearn. The data come from 100 Cal Poly students. a. Create a graph of BackpackWt (measured in pounds). Write a brief description of the backpack weight distribution. Be sure to include a discussion on shape, typical value and variability. Use numbers to cite numbers. The backpack weight distribution for these 100 Cal Poly students is bell-shaped and symmetric (mostly) with an outlier at 35 lbs. The average backpack weight is 11.7 lbs and the standard deviation of the backpack weights is 5.8 lbs. b. Create a graph of the BackProblems variable. Write a brief description of the BackProblems distribution. Be sure to cite numbers in your summary. Out of the 100 Cal Poly students in this data set, 68% reported no back problems, while 32% reported having back problems. 3 STAT217: STATISTICALMETHODS Practice: Day 01 1. In a survey of 100 people who had recently purchased motorcycles, data on the following variables was recorded: Gender of purchaser, Brand of motorcycle purchased Number of previous motorcycles owned by the purchaser Telephone area code of purchaser Weight of motorcycle as equipped at purchase Opinion (on a scale of 1 to 5) of the buying experience Name of the purchaser a. Which of the variables above is categorical? Gender, Brand, Telephone area code, opinion b. Which of the variables above is quantitative? Number of previous motorcycles owned, weight of the motorcycle purchased 2. Researchers collected data on each of 1,000 hospitals in California. a. Describe the observational unit in this study. One hospital in the state of California b. Identify whether each of the following is a variable for the observational unit in (b). i. The number of emergency room visits in the last 6 months at each hospital. variable ii. The average amount of money spent on bandages for all hospitals in California. no iii. The percentage of sampled hospitals with neo-natal intensive care units. no iv. For each hospital, the total cost of care for uninsured patients in the last 6 months. variable v. The average cost of care for uninsured patients in the last 6 months at all hospitals in California. no vi. The type of each hospital in the sample (e.g., general, teaching, specialized, district). variable vii. The location of each hospital in the sample (e.g., northern California, southern California, etc.). Variable 3. Researchers in Kinesiology at Cal Poly carried out a study on college health. In the study, they collected information on each of 450 Cal Poly students. a. Describe the observational unit in this study. one Cal Poly student 1 b. Identify whether each of the following is a possible variable for this research study. i. The number of students who reported being vegetarian. no, describes many students ii. Whether or not a student smokes. yes, since the characteristic describes a student iii. The percentage of sampled students who exercise at least 4 times per week. no, describes multiple students iv. The average number of calories the student eats each day. yes, since the characteristic describes a student v. The blood pressure of the student.yes, since the characteristic describes a student 2 STAT 217: STATISTICAL METHODS Practice: Day 03 The following problems are intended to provide practice with the material we discuss in class and labs. These problems are practice only – not graded. Solutions are provided. 1. Open the file Movies.jmp located on PolyLearn. Information was collected on 277 movies. a. Graph the amount of money made domestically (in millions of dollars) for the movies in this data set, then write a summary of this distribution. The distribution of domestic receipts for these 277 movies is right skewed. The median amount made domestically for a movie is $135,300,000. The interquartile range for the domestic receipt per movie is $64,150,000. Note: Because this distribution is right skewed, report the median instead of the mean, the IQR instead of the standard deviation. b. Graph the types of movies in this data set, then write a summary of this distribution. Dramas made up the largest portion in this set of movies and 28%, followed closely by comedies, which were 25% of the movies. Only 11% of these movies were Mystery-suspense. 1 2. Open the file USDemographics.jmp located on PolyLearn. This file contains demographic information on each state in the U.S. a. Describe the observational unit in this data set. a state b. Are the variables quantitative or categorical? The data set contains both quantitative (e.g., household income, vegetable consumption, etc.) and categorical variables (e.g., Region). State is not a variable since this identifies the observational unit. c. Graph the Vegetable Consumption variable. This variable shows the percentage of the population in each state that reports eating at least 5 servings of vegetables each day. Write a summary of this distribution. The distribution of vegetable consumption is bell-shaped and symmetric. The average % of the population consuming at least 5 servings of vegetables per day is 23.4%. The standard deviation of the % of the population consuming at least 5 servings of vegetables per day is 3.3%. Note: Because this distribution is symmetric use the mean and standard deviation. 3. Based on a study of 2,121 children between the ages of one and four, researchers at the Medical College of Wisconsin concluded that there was an association between iron deficiency and the length of time that a child is bottle-fed (Milwaukee Journal Sentinel, November 26, 2005). Describe the population of interest and the sample for this study. Population of interest: All children between the ages of one and four. Sample: 2,121 children between the ages of one and four. 4. A professor in Psychology at Cal Poly carried out a study investigating the use of online social networking applications like Facebook and loneliness in college students. Of particular interest was whether college students who are more lonely tend to have fewer Facebook friends, than students who are not as lonely. She collected data on the 100 Cal Poly students enrolled in her Psych 201 class in the Spring 2011. In addition to several demographic variables, such as year in school st nd rd th (1 , 2 , 3 , 4 , etc.), major, where the student lives (on-campus, off-campus), number of units enrolled, and gender, she had students rate their loneliness on a scale from 1-4 where 1 was not lonely, 2 was somewhat lonely, 3 was very lonely, and 4 was extremely lonely, and then give information on their usage of Facebook, including the number of Facebook friends. Describe the population of interest and the sample for this study. Population of interest: All college students. Sample: 100 college students at Cal Poly enrolled in Psych 201 in Spring 2011. 2 STAT217: STATISTICAL METHODS Practice Set #5 Solutions 1. Pepsi Challenge. The Pepsi Challenge is a marketing campaign run by Pepsi that was started in 1975. In taste-tests, consumers are given a small sip of each soft drink, in random order, and blinded. Then they are asked which sample they prefer. In sample after sample, consumers overwhelming chose Pepsi. As a result, in 1985 Coke changed its recipe, pulled Coke off the shelves, and introduced “New Coke”. After only 4 months, and thousands of complaints, Coke re- introduced their original Coke, “Coke Classic”. Let’s see if Coke is preferred by students at Cal Poly. In the Spring of 2013 58 students were asked which soft drink they prefer (Coke or Pepsi). The data are located in the JMP file, initialsurveyS13.jmp on PolyLearn. Open the data file on a computer which has JMP: a. Describe: i. Observational unit: a student ii. Variable: Do you prefer Coke or Pepsi? iii. Sample: 58 Cal Poly students in Spring 2013 iv. Population of interest: All CP students in Spring 2013 b. In JMP create an appropriate graphical display and summary statistics for this data. (Hint: What statistics do we compute to summarize sample data when the variable is categorical?) Copy and paste the graph and summary statistics from JMP. c. What do the graph and summary statistics imply about the taste preferences of the sampled students? Is Coke or Pepsi the preferred soft drink for STAT 217? Explain, using the values of the statistics in your explanation. The graph and statistics indicate that in the sample Coke is preferred by about 81% of the students, while Pepsi is only preferred by19% of the students sampled. There are only two (hopefully) possible reasons for the outcome above: H o Reason 1: Students have no preference, they are choosing equally between Coke and Pepsi (50/50), so the observed outcome happened by random chance alone. H A Reason 2: Students actually do prefer Coke. 1 d. Determine the p-value for this study. e. Write a sentence interpreting the p-value (found in part (d)) in the context of this study. That is, write a sentence that describes what the p-value measures. Be sure to address what random process is repeated, what the probability measures, and any assumptions that were made to generate the p-value. Do not use the word probability or any synonyms for the word probability. In a simulation of 58 students choosing equally (50/50) between Coke and Pepsi, we could expect 81% or more students to choose Coke in about 0 out of 1,000 repetitions. f. Citing your p-value as evidence, draw a conclusion (beyond the sampled students) about the preference for Coke or Pepsi. What larger group of students are you willing to generalize to? Why? Since our p-value (0/1,000) is so small, it means the study outcome of 81% or more students choosing Coke could possibly have happened by random chance under 50/50 choice, but it’s not probable to have happened by random chance under 50/50 choice. Thus, we have statistically significant evidence to conclude that Coke is preferred by all students at Cal Poly in Spring 2013, similar to those who were actually sampled. (Note: There is no mention that the sample was taken randomly. Therefore, we will qualify our conclusion. Instead of making conclusions about ALL CP students in Spring 2013, we will draw conclusions about all CP students in Spring 2013 who are SIMILAR to those in the study.) 2. Provide the null and alternative hypotheses (in symbols and numbers) for each scenario below. a. Researchers in a political polling organization wish to know if a majority of Californians favor more stringent gun control. H oπ Favor 0.50 H A:π Favor> 0.50 b. Researchers at Cal Poly wish to know if significantly fewer than 33% of all Cal Poly students binge drink. H oπ Binge 0.33 H :A Binge 0.33 2 c. The rate of home foreclosures over the last 5-years is about 8%. Researchers at a mortgage lending company wish to determine if this rate has changed significantly. H oπ Foreclosures08 H Aπ Foreclosures.08 3 STAT217: STATISTICALMETHODS Practice Set #4 The following problems are intended to provide practice with the material we discuss in class and labs. These problems are practice only – not graded. Solutions are provided. 1. Gallup reported the results of a recent survey of 1,003 US adults on immigration. In answer to the question, “should immigration be kept at its present level, increased, or decreased?” 41% of respondents in June 2014 said decreased, 22% said increased, 33% said it should be kept at the same level, and 4% had no opinion. a. Describe the observational unit in this study. 1 US adult b. Describe the sample in this study. 1,003 US adults c. Describe the population in this study. All US adults d. Describe the variable of interest. Is the variable quantitative or categorical? Immigration opinion - categorical 2. Coin Tossing. a. Suppose we toss a fair coin (50/50) 10 times. What do you

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I signed up to be an Elite Notetaker with 2 of my sorority sisters this semester. We just posted our notes weekly and were each making over $600 per month. I LOVE StudySoup!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.