STAT 217 EXAM 2
STAT 217 EXAM 2 STAT 217-05
Popular in Statistical Concepts and Reasoning
verified elite notetaker
Popular in Statistics
This 35 page Study Guide was uploaded by Sierra Taylor on Sunday May 15, 2016. The Study Guide belongs to STAT 217-05 at California Polytechnic State University San Luis Obispo taught by Dr. Karen McGaughey in Spring 2016. Since its upload, it has received 107 views. For similar materials see Statistical Concepts and Reasoning in Statistics at California Polytechnic State University San Luis Obispo.
Reviews for STAT 217 EXAM 2
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 05/15/16
STAT 217: STATISTICAL METHODS MIDTERM #2 Review Questions Note: Do NOT limit your studying to the problems on this review. This is not an exhaustive list of concepts. 1. Explain briefly, so that a non-statistician could understand, what a confidence interval is. 2. Consider a 90% confidence interval vs. a 95% confidence interval for the difference in 2 population means. Everything else being equal, which of these intervals will be wider? Why? 3. Consider a 95% confidence interval. How can we make this interval more precise, i.e., more narrow, without changing the confidence. 4. Explain what we mean by 99% confident. 5. A study was carried out to determine if there is a significant difference in the average time spent studying for students in Science and Math vs. students in Engineering at Cal Poly. Data was collected from 50 students in Science and Math and 50 students in Engineering. Each student was asked how many hours total s/he had spent studying in the previous 7-days. Studying was defined as working alone to understand course material (e.g., reading, reviewing notes, working problems, etc.) The researchers determined a 95% confidence interval for the difference in means (Science & Math minus Engineering): (2.5, 4.3). a. Are the technical conditions met for the validity of the interval shown above? Explain. b. Provide a one-sentence interpretation of the interval. c. Since this was an observational study, there could be additional reasons for why the average study time for Science & Math students exceeds that of Engineering students. Provide a plausible confounding variable in this study. 6. The results below are taken from the n=66 Cal Poly Stat 217 students sampled in Spring 2010. The table and graph show the relationship status by gender. Rows: Relationship Columns: Gender CompariCal Poly STAT 217 Spring 2010y Gender Relationship F M All 100 yes no no 31 12 43 80 yes 15 8 23 n 60 All 46 20 66 r P 40 Cell Contents: Count 20 Gender F M Percent within levels of Gender. Let’s consider whether there’s a difference in the proportion of male and female STAT 217 students who were in a relationship in Spring 2010. a. Identify the observational unit, explanatory variable (if there is one), and the response variable. b. Provide the null and alternative hypotheses to address the question above. 1 c. The simulated null distribution is shown below. d. Would you expect the p-value to be larger or smaller than 0.05? Explain. e. Explain why it’s not appropriate to approximate the null distribution with a Normal model? 7. A study of potential age discrimination considers promotions among middle managers in a large company. Researchers would like to determine if age discrimination is happening in the company. To assess this question, data was collected on the age of each employee and whether or not s/he had been promoted in the last 3-years. The JMP analysis is shown to the right. Carry out the appropriate test to answer this question. Show all steps. 2 STAT217: STATISTICALMETHODS TOPIC LIST for EXAM #2 TOPICS - Populations, samples, parameters, statistics - Confidence Intervals: - Variables(RV, EV), observational units - For ( π 1−µ2 1µ 2 ) - Should be comfortable looking at - What is a confidence interval? histograms, dotplots and - Interpreting the interval interpreting/comparing. - Find the CI in JMP output - Be able to discuss the purpose of random - Understand how to compute a selection and random assignment - Be able to discuss the meaning of confidence interval - Be able to decide when to use each of ‘controlled-randomized trial’; blinding’ the above intervals. - Be able to describe what confounding - Effects of sample size, confidence level, variables are and be able to identify standard deviation on the width of the plausible confounding variables in an interval observational study. - Know when to draw cause-and-effect - Necessary technical conditions for each interval. conclusions - Know to what population conclusions are - Hypothesis Testing applicable - Difference b/w CI and a HT - Observational vs. Designed experiments - Carry out a test for 2 independent 2 means, µ1 2 , showing all steps using - Tests: π 1 2χ− µ1 2µ , JMP or simulation output - Be able to find p-values, interpret p- - Carry out a test for 2 independent values proportions, ππ1 2 , showing all steps - Draw conclusions about the study from using JMP or simulation output the p-value - Carry out a test for 3 or more - Set up Ho and Ha 2 - Describe the parameter(s) of interest proportions, χ , showing all steps - Know and find (via JMP) thestatistic using JMP or simulation output we use to assess statistical significance - Effects of sample size on the p-value, - Find p-value from JMP output or from null standard deviation (std error), and t- applet output score or z-score - Explain the purpose/process of each of the simulations - Apply the methods of this course to novel - Why do we carry out the simulations? tests/situations What is the goal? - Be able to use the simulated null distribution to find the p-value. - Be able to compute the z-score or t- COVERAGE score and interpret in context - Necessary technical conditions for using Exam #2 will cover the material in Weeks 5 – 8. the t-distribution, Chi-square distribution or Normal distribution for each test, rather than simulating the null. 1 WHAT TO BRING your note sheet with the exam. Failure to comply with the note sheet guidelines will result in at - a calculator least a 10% deduction from your exam score. - a pencil or pen - one 8.5x11 page of notes (see below) HOW TO STUDY Your best study guides are (1) the course notes, FORMAT OF THEEXAM (2) quizzes,(3) labs, and (3) the practice problems. If you can solve these problems, you Exam #2 will consist of 4-5 problems, some should do well on the exam. To that end you with multiple parts (at most15-18 parts total). should re-work examples from the notes, labs, Partial credit will be available on most parts of practice problems and quiz problems. most problems. Do not expect “plug-and-chug” questions. All questions will have a context. The exam will be conceptual and will require you to communicate clearly. You will be asked There may be some multiple choice and/or T/F to interpret and draw conclusions. questions. You will be graded both on the work you show and on your final answers. A correct final ELECTRONICS answer with little or no work to support that answer will not receive much credit. Cell phones, iPods, etc. are not allowed at any exam. If you have any such electronics out at Much of the exam will require interpretation and any time during the exam, your exam will be explanation. There will be some computation. confiscated, and you will automatically receive You will be expected to use JMP and applet a score of 0 points. output to help you answer the exam questions, but you will NOT be expected to use JMP or the applets on the exam. AND FINALLY… Be prepared to carry out hypothesis tests and If you have any questions about this exam, confidence intervals from start to finish. please do not hesitate to ask me in class, during office hours, or via e-mail. NOTE SHEET: EXAM #2: You are allowed one 8.5x11 page of notes; front Thursday, May 19 and back. Worked out examples with context are not allowed. You will be expected to turn in 2 STAT217: STATISTICALMETHODS Practice Set #10 The following problems are intended to provide practice with the material we discuss in class and labs. These problems are practice only – not graded. Solutions are provided. 1. Student researchers were interested in whether how a question was phrased affected people’s responses. In particular, they asked a group of 30 students a question about how their year was going. Two different phrasings of the question were used, one positively phrased (“Are you having a good year?”) and the other negatively phrased (“Are you having a bad year?”). They randomly assigned 15 of the students to receive the positively phrased question and 15 to receive the negatively phrased question. The student researchers then recorded whether the response for each participant was positive or negative. The students wished to determine if the positively worded questions are more effective at eliciting a positive response (= ‘success’). The data are located in the file wording.jmp. Use this data to help you answer the questions below. a) Describe the observational unit, explanatory variable and response variable in this study. Obs unit: 1 student EV: Whether the question was phrased positively or negatively RV: Whether the response to the question was positive or negative b) Write out the null and alternative hypotheses which are necessary to answer the research question above. Be sure to also write a sentence which describes your parameters of interest. Be sure it is clear howyou are defining a ‘success’ in this study. H : π π = H : π π = (1) o PosWorded NegWorded or (2) o NegWorded PosWorded H A π PosWorded NegWorded H A π NegWorded PosWorded (1)π = proportion of all students who would respond positively to a positively (or negatively worded question). c) Using a numerical and/or graphical summary of the data, determine one numberwhich can be used to support or refute the research question. Write a sentence interpreting this number and state whether it supports the research question or not and why. (Note: You will need to subtract in the same order as the null and alternative hypotheses are written.) p p − = − = 0.667 0.400 0.267 Positively Negatively This supports the research conjecture because 26.7% more students responded positively when the question was worded positively than when the question was worded negatively. d) Find the p-value for this study using the Analyzing Two-way Tables Applet from Lab 4. My simulated p-value is 126/1000 = 0.126. (Yours should be close to this.) e) Using your p-value and a 0.05 level of significance, write a conclusion which answers the researchers’ question of interest. Be sure to consider significance, causationand generalizability. With a large p-value of 0.126 we do not have statistically significant evidence to say that Postively Worded questions elicit a more positive response than Negatively Worded questions. This conclusion applies to all students similar to those in the study. Cause and effect is not valid here. While this was a designed experiment, using random assignment, we had a large p-value so there is ‘no significant effect’. (The large p-value means it is more probable that the observed difference occurred due to the random assignment, rather than an effect due to the question wording.) STAT217: STATISTICALMETHODS Practice Set #11 The following problems are intended to provide practice with the material we discuss in class and labs. These problems are practice only – not graded. Solutions are provided. 1. A pilot study was carried out in the Fall of 2009 on a group of 911 mostly freshmen and sophomores at Cal Poly. The study participants filled out an online health and fitness assessment and then underwent a physical assessment. One of the measurements of interest waswhether or not the participants had high blood pressure (Prehypertension or Hypertension). The data can be found in the file hypertension2.jmp. Suppose that we wish to determine if males are more prone to hypertension/pre-hypertension than females. Identify each of the following: a) Observational units? A Cal Poly freshman or sophomore b) Variables? RV: whether or not the person has hypertension/prehypertension; EV: gender c) Population of interest? All Cal Poly freshmen and sophomores in Fall of 2009. Suppose that we wish to determine if males are more prone to hypertension or pre-hypertension than females. d) Null hypothesis and alternative hypothesis in symbols? H o π Males Females H A : π Males Females e) Write a sentence interpreting your parameters from your null and alternative hypotheses in the context of this study. π = proportion of ALL female Cal Poly freshmen and sophomores in Fall 2009 who Females have pre-hypertension or hypertension. πMales proportion of ALL male Cal Poly freshmen and sophomores in Fall 2009 who have pre-hypertension or hypertension f) Determine the difference in the proportion of sampled female and male students with hypertension or pre-hypertension. Does this difference support the conjecture that males are more prone to hypertension or pre-hypertension? Explain. (Note 1: You’ll need to use the data in the JMP file. Note 2: When you find this difference, you must subtract in the same order as your null/alternative hypothesis.) 220 355 pFemales 388 = 0.567 pMales= 523 = 0.679 pMales= Females0.112 Yes, this difference supports the conjecture since it is positive, indicating the proportion of males in the sample who have pre- or hypertension is larger than the proportion of females. Consider the null distribution: g) At what numerical valuewill the null distribution be centered? Why? 0 since we assume no difference between females and males in the null hypothesis. h) Will the normal distribution be an adequate probability model of the simulated null distribution in this study? Explain. (Hint: Are the tech conditions met? Explain how.) Yes, since we have 168 students with Normal bp and 220 with Pre/Hypertension in the female group, and 168 students with Normal bp and 355 with Pre/Hypertension in the male group, all of which are larger than 10. In JMP: From the data table, select Analyze > Fit Y by X. Put the BloodPressure in for the Y- variable and Gender in for the x-variable. Select Run. In the output window, under the hot spot at the very top, select Two Sample Test for Proportions. i) What is the p-value for this test? The p-value is 0.0003. Note that in JMP, the male group is first followed by the female group. Thus, we need the probability greater than or equal to our observed difference of pMales= Females 0.112, assuming no effect of gender on blood pressure, which is 0.0003. j) Using your p-value and a significance level of 0.05, answer the research question of interest in the context of the study. (Are males at Cal Poly more prone to hypertension or pre-hypertension than females?) Be sure to consider significance, causation and generalizability. With a small p-value of 0.0003, we have statistically significant evidence to say that male freshmen and sophomores at Cal Poly have a higher incidence of pre/hypertension than female freshman/sophomores at Cal Polyin 2009. This conclusion applies to all students similar to those studied since we do not have a random sample. A cause and effect conclusion is NOT valid here because this was not a designed study, and no random assignment to gender was used. This was an observational study. k) Now consider this study compared to the study in problem #1 of Practice Set 10. Why is it okay to use the Normal model to find our p-value here, but not in problem #1 of Set 10? In the first study, we had 10 and 6 Yes responses and 5 and 9 No responses. We need the number of successes and failures to all be larger than 10. Thus, the tech conditions are not met in problem #1 of Set 10, so the Normal model would NOT be a good approximation to the simulated null distribution in that study. However, because the S/F condition is met here, we can use the Normal model (via JMP) to determine our p-value. STAT 217: STATISTICAL METHODS Practice Set #12 Solutions The following problems are intended to provide practice with the material we discuss in class and labs to help prepare you for quizzes and exams. Solutions will be posted so that you may check your answers. 1. The paper “No Evidence of Impaired Neurocognitive Performance in Collegiate Soccer Players” (American Journal of Sports Medicine (2002)), compared random samples of collegiate soccer players, collegiate athletes in sports other than soccer, and a group of students who were not involved in collegiate sports with respect to history of head injuries. Each student in the sample was classified as to whether they had previously suffered 0 or 1 concussion versus 2 or more concussions. The data are shown in the table below. Type of Student # of previous concussions Collegiate Soccer Other Collegiate Athlete College Non-Athlete 0 or 1 concussion 70 84 50 2 or more concussions 21 13 3 The researchers wish to determine if there is an association between the type of student and concussions. a. Describe the observational unit, response variable and the explanatory variable. Are the variables quantitative or categorical? Obs unit: one student EV: Type of student (3 levels: collegiate soccer, other collegiate athlete, college non-athlete) - categorical RV: # of previous concussions (0-1, 2+) - categorical b. Provide the null and alternative hypotheses. H 0 :π soccerπ otherathletenon athlete H :At least one π is different c. Compute the conditional proportions of 2 or more concussions for each of the student groups. Do these proportions support the research conjecture? Explain. 21 13 3 p2| soccer = 0.231 p ˆ2| other = 0.134 , pˆ| −on athlete = 0.057 91 97 53 Yes, since it appears there are some differences in these conditional proportions. d. The simulated null distribution for the chi-square statistic is shown below. The observed chi-square statistic is 8.2. Approximately where would the p-value cut-off be located in this distribution? Would you expect the p-value to be smaller than 0.05, or larger than 0.05? 1 p-value is determined as the # of outcomes beyond 8.293 The p-value is expected to be smaller than 0.05 since there appears to be only a few outcomes beyond 8.293. Note that the approximate p-value is 0.012. e. Use this p-value to draw a conclusion in the context of this study. Be sure to indicate the population you are willing to generalize the conclusion to and state whether a cause-and-effect conclusion is valid in this study. Since the p-value = 0.012 < 0.05 we should reject Ho. With our small p-value of 0.012, we have evidence to believe that concussions are associated with the type of student. No cause-and-effect conclusion is valid here, because the students were not randomized to these groups. (This is an observational study.) This conclusion applies to all students similar to those in this study, because the sampled students were not randomly selected. 2. The polling organization Gallup surveyed US adults about their support for free public college education. The results broken out by age group are shown below. Do these data suggest that opinion on a free public college education is associated with income? Income Level Opinion Less than $36K $36K-$89,999 $90K or more Total Agree 198 186 127 511 Disagree 127 201 175 503 Total 325 387 302 1014 a. Describe the observational unit, response variable and the explanatory variable. Are the variables quantitative or categorical? Obs unit: one adult US adult EV: income level (categorical) RV: Does the person favor free public college education? (categorical) 2 b. Provide the null and alternative hypotheses. Describe the parameter of interest. H 0π πLess$36K $36K−$89,999 $90Kormore H Aat least one π is different The proportion of all US adults who would favor free public college education within each income level. c. For each income level, what proportion of surveyed adults are in favor of a free public college education? Do these proportions support the research conjecture? Explain. pˆ = 198 = 0.61, pˆ = 186 = 0.48, p = 127 = 0.42 Less$36K 325 $36−$89,999 387 $90KorMore 302 Yes, since it appears there are some differences in these conditional proportions. d. Are the technical conditions met so that we could use the chi-square distribution to find the p-value, instead of simulating the null distribution? Explain. Yes, all of the cell counts in the observed data table are larger than 10 (201, 186, 175, 127, 127, 198). e. Use the JMP analysis below to find the p-value. Use this p-value to draw a conclusion for this study. Be sure to address significance, generalization and causation. Our small p-value < 0.0001 provides strong evidence of an association between opinion on a free public college education and income level. This conclusion applies to all US adults since the sample was randoml selected. There is not a cause-and-effect conclusion here because this is an observational study. Participants were not randomly assigned (and could not be randomly assigned) to income levels. 3. Consider the study above on opinion on a free college public education and income level. The plot in the JMP output is called a mosaic plot. What would this plot need to look like if there is no association between opinion and income level? The mosaic plot should show equal proportions who agree across the 3 income groups. The proportions who agree do not need to be 0.50. They can be anything, but they must be equal for the 3 groups. 3 STAT 217: STATISTICAL METHODS Practice Set 13 Solutions The following problems are intended to provide practice with the material we discuss in class and labs to help prepare you for quizzes and exams. 1. The New England Journal of Medicine (Feb 9, 2012) reported the results of study on the effectiveness of Tai Chi on postural stability of patients with Parkinson’s disease. In the study 130 Parkinson’s patients were randomly assigned to either do Tai Chi 2 times per week, or a resistance training routine 2 times per week. After 6 months, the change from baseline was measured in each person’s functional reach (measured in cm). Positive changes indicate improvement. (Functional reach is how far a person can lean over to reach for something without losing their balance.) The data are provided in the file taichi.jmp. The researchers believe the practice of Tai Chi 2 times per week will result in significantly greater improvement to functional reach than resistance training 2 times per week for the same period of time. a. Describe: i. Observational unit: a Parkinson’s patient ii. Variable: RV = change in functional reach (cm), EV = exercise group (Tai Chi or resistance training) iii. Sample: 130 Parkinson’s patients iv. Population of interest: All Parkinson’s patients b. Write out the null and alternative hypotheses using symbolic notation (i.e., H o, HA,π ,µ , etc.) which should be used to test the research conjecture above. H 0 :µTaiChi= Resitane H : µ µ > A TaiChi Resitane c. Write a sentence describing the parameter(s) of interest in the context of this study. µ = true mean functional reach of all Parkinson’s patients after 6 months practicing Tai Chi or resistance training twice per week. d. Use JMP to find numerical/graphical summaries of the data for the Tai Chi group and the resistance training group. Shown below. Could also create boxplots. 1 e. Use JMP to find the p-value. Report the value of the p-value here. p-value = 0.0726 f. Check the technical conditions which are necessary for the p-value in part (e) to be valid. Both sample sizes are 65, so both are greater than 30. This means it is reasonable to believe the null distribution is well-approximated by a normal distribution. We can employ the t-methods to test the statistical significance of the study result. g. Use the p-value to answer the research conjecture above. Be sure to address statistical significance, generalizability, and causation (if appropriate). With a p-value of 0.0726, we do not have sufficient evidence to say that the true mean change in functional reach will be greater after practicing Tai Chi for 6 months, than after resistance training. This conclusion is applicable to all Parkinson’s patients like those in the study, since most likely these were volunteers and not randomly selected. (A cause-and-effect conclusion is not valid here, because the p-value suggests the results could have happened as a result of the random assignment.) 2
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'