### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# BASIC STATISTICS STAT 220

UW

GPA 3.66

### View Full Document

## 22

## 0

## Popular in Course

## Popular in Statistics

This 12 page Class Notes was uploaded by Providenci Mosciski Sr. on Wednesday September 9, 2015. The Class Notes belongs to STAT 220 at University of Washington taught by Staff in Fall. Since its upload, it has received 22 views. For similar materials see /class/192512/stat-220-university-of-washington in Statistics at University of Washington.

## Reviews for BASIC STATISTICS

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/09/15

Chapter 19 Sampling Context Terminology Some basics Population and parameter Sample and statistic The 1936 election Summary The Digest39s sampling method Problems But Gallup The 1948 election 1948 Presidential elections Quota sampling The method Example Pros and cons Back to 1948 election Republican bias Probability methods Probability methods Simple random sampling Problem Solution multistage cluster sampling Election predictions Iraq Miscellaneous Some other problems with election polls Some other problems with election polls How to find out if a sample is good 28 29 30 31 Chance error 32 Chance error 33 Questions 34 Context I Data collection 0 Experimental design I Controlled experiments ch 1 I Observational studies ch 2 0 Next Sampling ch 19 l Main lessons 0 The method of choosing the sample matters a lot 0 The best methods involve the planned use of chance and leave no room for personal choice I Examples we39ll look at 0 Predicting US Presidential elections 0 Estimating the number of casualties in Iraq due to the war 2 34 Terminology I New terminology 0 Population parameter sample statistic inference 0 Sources of bias I selection bias non response bias response bias 0 Sampling methods I non probability methods quota sampling convenience sampling I probability methods simple random sampling multistage cluster sampling stratified sampling won39t go into detail on this one but it is good to know it exists 0 Chance error 3 34 Some basics 4 34 Population and parameter I We want to know a parameter numerical fact about a population I Example What is the percentage of Republican voters in the next presidential election 0 population voters in the next presidential election 0 parameter percentage of Republican voters l Example What is the number of deaths in Iraq due to the war 0 population people of Iraq 0 parameter the number of deaths due to the war 534 Sample and statistic I It is infeasible to look at the entire population Hence we39ll never know the population parameter exactly I We only examine part of the population a sample I We compute a statistic from the sample 0 Percentage of Republican voters in the sample 0 Number of deaths due to the war in sampled families I The statistic is used to estimate the population parameter Statistics are what we know parameters are what we want to know I We then make inference we generalize the results of the sample to the population I Good inference is only possible if the sample resembles the population We need a good sample 6 34 The 1936 election 7 34 Summary I Roosevelt Democrat ltgt Landon Republican I The Digest39s prediction Landon wins and Roosevelt gets 43 of the votes I Election result Roosevelt wins with 62 of the votes I The Digest predicted the wrong winner and the predicted percentage was off by almost 20 percentage points I What went wrong 8 34 The Digest39s sampling method I The Digest made a list of people by combining phone books and lists of club membership They randomly picked 10 million people from this list and mailed them a questionnaire 24 million people returned the questionnaire 43 of these people planned to vote for Roosevelt That was their prediction 9 34 Problems I Two main problems with the Digest sampling methods 0 Selection bias Only 1 in 4 households had a phone at the time The Digest39s list of eligible voters tended to screen out the poor because they had no phone or club membership 0 Non response bias Only about 1 in 4 households that received a questionnaire returned it The people who did not respond may have different voting habits than the people who responded I So the 24 million people who returned the questionnaire did not represent the 10 million people who were polled let alone the population of US voters The sample was biased 10 34 But I We had a big sample 24 million Doesn39t that fix the problems 0 No When a procedure is biased taking a larger sample does not help Thisjust repeats the basic mistake on a larger scale I Why did we first see this problem in the 1936 elections 0 Before 1936 the rich and poor tended to vote similarly In 1936 the poor voted overwhelmingly for Roosevelt and the rich for Landon 1134 Gallup I In the same year Gallup predicted 0 the Digest39s prediction 44 truth 43 0 election results Roosevelt wins with 56 truth 62 I How did he predict the Digest prediction so well 0 He took a random sample of 3000 people much smaller than 10 million from the same list that the Digest used and mailed those a questionnaire I How did he predict the election results 0 He used a method called quota sampling His sample size was 50000 0 His method worked better than the Digest39s method He predicted the correct winner but was still off by 6 percentage points 12 34 The 1948 election 13 34 1948 Presidential elections I Truman Democrat ltgt Dewey Republican I Three major pos predicted Dewey as winner 0 Crossley Dewey 50 and Truman 45 0 Gallup Dewey 50 and Truman 44 0 Roper Dewey 53 and Truman 38 l Election results 0 Dewey 45 and Truman 50 0 Truman won against the prediction of all three polls I What went wrong with the polls 0 All pos used quota sampling This did not give good samples 14 34 Quota sampling 15 34 The method I Each interviewer is assigned a fixed number of people to interview with also the numbers falling into certain categories sex age residence economic status etc fixed I The interviewers are free to interview anybody they like as long as they keep to these quotas 16 34 Example I A Gallup poll interviewer in St Louis had to interview 13 people of whom 0 6 were to live in the suburbs and 7 in the central city 0 7 were to be men and 6 women 0 Of the 7 men with similar quotas for women I 3 were to be under 40 years old and 4 over 40 I 1was to be black and 6 white I Monthly rentals for the 6 white men 0 1 was to pay 4401 or more 0 3 were to pay 1801 to 4400 0 2 were to pay 1800 or less 17 34 Pros and cons The quotas ensure that the sample looks like the population wrt some key characteristics This is why Gallup had a better prediction for the 1936 elections than the Digest Recall that the Digest39s sample contained mostly rich people I But there are problems with quota sampling 0 We may forget to set quotas for some important characteristics 0 The method still leaves a lot of freedom to the interviewers to pick people This can lead to unintentional selection bias 18 34 Back to 1948 election I Why did the polls predict the wrong winner 0 Republicans were wealthier than Democrats 0 Hence Republicans were more likely to have phones nicer houses permanent addresses 0 Within each demographic group the Republicans were a bit easier to interview 0 That39s why the samples included too many Republicans and predicted the Republican candidate to win 19 34 Republican bias Gallup39s prediction Actual Error in favor Year of Republican vote Republican vote of the Republicans 1936 44 38 6 1940 48 45 3 1944 48 46 2 1948 50 45 5 2O 34 Probability methods 21 34 Probability methods I A probability method has the following two properties 0 it incorporates planned use of chance 0 it leaves no room for personal choice of the investigatorsinterviewers l Examples 0 simple random sampling 0 multistage cluster sampling 0 stratified sampling won39t go into this 22 34 Simple random sampling l Method for simple random sampling 0 Write the name of each person in the population on a ticket 0 Put all the tickets in a box 0 Shake the box and draw a ticket 0 Shake the box again and draw another ticket Continue until we have the sample size we want This is called simple random sampling drawing at random without replacement Nowadays it is usually done with computers Each person has the same chance to get into the sample There is no selection bias Hence the sample is likely to be a good representation of the population 2334 Problem I Sometimes simpIe random sampling is not possible 0 Elections I In 1930s there was no list of all eligible voters there were no computers to draw a random sample from these voters I Moreover there were many people without phones Sending interviewers to people all over the US would be very expensive 0 Estimating casualties in Iraq I Investigators wanted to interview a sample of people and ask them about family members who died I There is no accurate list of all people in Iraq I Interviewing people all over the country would involve a lot of travel and that is dangerous 2434 Solution multistage cluster sampling Divide population into clusters usuaIIy geographically Randomly select a number of clusters It is possible to repeat this step several times see page 341 of the book For each cluster interview a random sample of people in the cluster Applications 0 Election polls 0 Estimating casualties in Iraq I Advantage interviewers only have to be stationed in the selected clusters 25 34 Election predictions sample Gallup election year size wmner prediction result error 1964 6600 Johnson 64 0 61 3 1968 4400 Nixon 43 44 1 1972 3700 Nixon 62 62 0 1976 3500 Carter 50 51 1 1980 3500 Reagan 52 55 3 1984 3500 Reagan 59 59 0 1988 4000 Bush 56 54 2 1992 2000 Clinton 49 43 6 1996 2900 Clinton 52 49 3 2000 3600 Bush 48 48 0 2004 2000 Bush 49 51 2 l The methods work very well and with small sample size I Note the larger error in 1992 This had to do with undecided voters 26 34 Iraq I Article in the Lancet l Cluster sampling 33 clusters in each of which they interview 30 households l Estimated 100000 more deaths than expected in the first 15 years of the war confidence interval 8000 194000 I Homework point out some strong and weak points of the study 27 34 Miscellaneous 28 34 Some other problems with election polls See Section 196 of the book Non voters 0 We must screen out people who won39t vote 0 But there is a stigma to non voting so people may not answer honestly if we ask them about it Undecided voters 0 What to do with people who don39t know yet how they are going to vote Response bias 0 Phrasing of the question or behavior of the interviewer may influence the response 2934 Some other problems with election polls I Phone interviews Interviews are usually done with the phone Most people have a phone so that does not give large bias But about 13 of residential phone numbers are not listed The rich and poor tend to be not listed So the phone book is tilted towards the middle class Solution Random digit dialing RDD l Non response bias Even phone interviews have about 20 non responders Solution call back more often I Problem People without land lines 3034 How to find out if a sample is good I You can often not see this by looking at the data I So you should find out how the data were gathered I Questions to ask What is the population What is the parameter How was the sample chosen Was there room for personal choice Did it involve the planned use of chance What was the response rate How were the questions phrased Be aware that medical studies often use convenience samples for example all patients of a certain hospita ldoctor 3134 Chance error 3234 Chance error I Even if sampling method is perfect we still cannot find the population parameter exactly l Example 0 Suppose there is a big box with tickets each ticket marked with 0 or 1 population 0 We want to estimate the proportion of 139s parameter 0 We take a random sample of 1000 tickets This is a good sample without bias 0 We compute the proportion of 139s in our sample statistic This is our estimate for the parameter 0 Due to chance the estimate won39t equal the parameter exactly estimate parameter l chance error I In more complicated situations estimate parameter l bias l chance error 33 34 Questions I How big is the chance error I How does it depend on sample size I How big does our sample need to be to keep the error under control I You39ll learn this in chapters 20 23 34 34

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.