Class Note for STAT 528 at OSU 65
Class Note for STAT 528 at OSU 65
Popular in Course
Popular in Department
This 16 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Ohio State University taught by a professor in Fall. Since its upload, it has received 18 views.
Reviews for Class Note for STAT 528 at OSU 65
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
Statistics 528 Data Analysis I Overview of Today s Lecture Lecture 4 June 29 2006 Christopher Holloman The Ohio State University Summer 2on5 o IPS Sections 31 34 Producing Data Designing Good Experiments Sampling Designs The Basics of Statistical Inference Christopher Helium an The omu State University Summer znno General Overview of Analysis Collect Data experiment sample available data l Exploratory Data Analysis EDA Formal Statistical Inference Christopher Holloman The Ohio State University Summer znno Data Collection 0 Why is data collection important The way in which data are collected determines what can be done with it Formal statistical inference Making statements about a large group of individuals based on observations of only a few of them Christopher Helium an The omu State University Summer znno o In order to perform meanin ful statistical inference we need to be ab e to collect trustworth data and be able to judge the quality of ata produced by others 0 We do not want to base conclusions or Inferences on anecdotal eVIdence 0 Sources of available data Statistical Abstract of the United States Government Databases Internet Chrlstnpher Hollnman The ohle State Unlverslty Summer znn Observation vs Experiment 0 An observational study observes individuals and measures variables of interest but does not attempt to influence the individuals responses 0 An experiment imposes some treatment on individuals in order to observe their responses The experimenter picks the treatment levels and assigns them to the individuals 0 Both of these are more expensive to collect than available data Chrlstnpher Hullnm eh The ohle State Unlverslty Summer znn Sampling 0 Population The collection of individuals we are interested in studying 0 Sample A subset of the population of interest 0 If you measure every individual in the population that s called a census Chrlstnpher Hollnman The ohle State Unlverslty Summer znn Example An educational software company wants to compare the effectiveness of its computer animation for teaching cell biology with that of a textbook presentation The company tests the biological knowledge of each of a group of first year college students then divides them into two groups ne roup uses the animation and the other studies he text The company retests all the students and com ares the increase in understanding of cell iology in the two groups Is this an experiment What are the explanatory and response variables Chrlstnpher Hullnm eh The ohle State Unlverslty Summer znn Design of Experiments DOEDOX 0 General Steps in Design Define the response Define factors and the treatment layout 0 Determine blocking factors Assign experimental units to treatments Chrlstopher Holloman The ohle State Unlverslty Summer znn Terminology O The individuals on which the experiment is done are called the ex erimental units When the units are human eings they are called subjects The ex lanatory variables being investiqated are called actors These factors are each ested at multiple levels By combining levels of different factors we create individual experimental conditions that are applied to the units Each of these unique combinations is called a treatment An experimental desi n describes how the treatments are assigne to the experimental unl s Chrlstopher hellem eh The ohle State Unlverslty Summer znn Example What are the effects of repeated exposure to an advertising message The answer depends both on the leng h of the ad and on how often it is repeated An experiment is conducted with undergrad students of OSU to investigate this question All subjects viewed a 60minute episode of a television show that included ads for a new ice cream Some subjects saw a 30second ad others a 90second version The same ad was repeated 1 3 or 5 times during the program After viewing all sub39ects answered questions about their recall oft e ad and their intention to try the ice cream Chrlstopher Holloman The ohle State Unlverslty Summer znn Experimental units the subjects are OSU undergrads Step 1 Define the response Recall of the add and intention to buy ice cream Step 2 Define factors their levels and treatment layout Factors 0 Length of the ad 0 Number of repetitions Levels 0 Length the levels are 30 and 90 seconds 0 Repetitions the levels are 1 3 and 5 times Chrlstopher hellem eh The ohle State Unlverslty Summer znn Treatments Make a diagram to lay out the treatments Repetitions 1 time 3 times 5 times 30 seconds 1 2 3 Commercial Length 90 seconds 4 5 6 Christopher hellemeh The ohle State University Summer znn o Recruit subjects 20000 doctors volunteer to be our subjects Q Should all the subjects take an aspirin every day No we want to determine if people who tahkedaspirin an peopew o 0 not have fewer heart attacks t take aspirin We need to obsene subjects not taking aspirin Another way to say this is that the experiment should be comparative it should compare two treatments Control group the subjects who are not receiving the treatment of interest Christopher hellemeh The ohle State University Summer znn Example Suppose we want to answer the following question Does aspirin help prevent heart attacks Let s design a good experiment to answer this question 0 Explanatory variable Taking aspirin 0 Response variable Whether or not a subject has a heart attack Christopher Hullnm eh The ohle State University Summer znn Q How do we assign the subjects to the treatment groups Let them decide for themselves Try to balance the groups using our judgment Let a chance process like flipping a coin decide Christopher Hullnm eh The ohle State University Summer znn Say we let them decide for themselves If the aspirin group then has fewer heart attacks do we have evidence that the aspirin was responsible No Maybe those who chose to take aspirin care more about their health and also take vitamins In this case the effect of aspirin is mixed upquot with another variable This is an example of confounding We won t know whether changes in the response are due to changes in the explanatow variable or the lurking variables Chrlstppher hellemeh The ohle State Unlversltv Summer znn Suppose we get an expert to divide the subjects using his judgment and he happens to be interested in the study Consciously or unconsciously our expert may stack the groups in a way that favors a certain result This bias could introduce lurking variables For example he could assign a overweight men to the no aspirin group The design of a study is said to be biased if it systematically favors certain outcomes Chrlstppher hellem eh The ohle State Unlverslty Summer znn Use randomization Allowing a chance system to choose the groups will eliminate bias due to self selection or researcher judgment Randomization scatters lurking variables and keeps the experiment free from bias Example We will not get all old people in one group and young people in the other group Chrlstppher hellemeh The ohle State Unlversltv Summer znn 0 Review Where are we so far Experiment should be comparative some get aspirin and some don t Experiment should be randomized we let a chance system decide who gets the treatment and who doesn t Experiments that combine these fundamental concepts are called randomized comparative experiments Now that the subjects are split into groups should we tell them Chrlstppher hellem eh The ohle State Unlverslty Summer znn Blind Q If we decide not to tell the subjects which group they are in how do we fool the ones who don t get aspirin It would probably be better to give them a placebo a dummy pill that looks and tastes like the aspirin but has no active ingredient Placebo effect Many patients respond favorably to any treatment even a dummy treatment presumably because of trust in the researcherdoctor and the expectation of a cure Chrlstopher Holloman The ohlu State Unlverslty Summer znn Q Should we tell the subjects which pill they are getting If we tell the subjects whether they are getting a placebo or aspirin they will react according to their expectation and ruin the experiment When you do not tell the subjects which treatment they are receiving the experiment is said to be blind Chrlstopher hullum eh The ohlu State Unlverslty Summer znn Doubleblind Replication Q Should we let the researcher giving the pills to the subjects know what pill the subject is getting If the person giving the treatment to the subjects does not know what treatment the subject is receiving the experiment is doubleblind Chrlstopher Holloman The ohlu State Unlverslty Summer znn Q Do we really need all those subjects or can we just assign one person to each group Assigning multiple experimental units to each treatment is called replication Assigning more units to each treatment condition makes the results more stable less random variation Chrlstopher hullum eh The ohlu State Unlverslty Summer znn More than one factor 0 An experiment is called completely randomized if all the experimental units are randomly assigned to the treatments Example An experiment is conducted to compare a new variety of corn called opaque2 with normal corn as food for chicks The researchers decide to serve each of the two types of corn at two protein levels 15 protein and 20 protein They feed each diet to 10 onedayold chicks and record their weight gains after 21 days Chrlstnpher hellemeh The ohle State Unlverslty Summer znn Weigh All Chidlts 20 ChidG 1Dd1ids15protein Normal and i Msignmem Random Variety 10 ohle 20 proteln Msignmem g 20 Chicky Random 10 cl39ucks15proteln Gaquot Opaque2 i hslgr M 1 chicks 20 prmein Chrlstnpher hellem eh The ohle State Unlverslty Summer znn Statistical Significance o In any experiment there will be natural variability within each of the treatment groups For example not all the chicks in the opaque2 15 protein group will gain exactly the same amount of weight 0 There will also hopefully be variation between the groups due to the treatments For example the chicks in the opaque2 groups will have more weight gain than those in the normal corn groups 0 When the variation between the treatment groups is so large that it would rarely occur by chance we say that the results is statistically significant Chrlstnpher hellemeh The ohle State Unlverslty Summer znn o If we observe statistically significant differences among the groups in a comparative randomized experiment with doubleblinding we have a good evidence that the treatments actually caused these differences Chrlstnpher hellem eh The ohle State Unlverslty Summer znn Cautions about experiments o Generalizing The resuits of ah experiment only apply to people or uhitS that are simiiar to the SubJectS or uh ts used in the experiment For example a Study uses maies in their 4039s as Sub ects and concludes that taking aspirin daily reduces the risk 0 having a heart attack Can we conclude that takih aspirin daily will reduce the risk of heart attack for Womeh7 What a out for meh ih their 8057 NO 0 Lack of Realism The Subjects or treatments or setting of ah experiment may hot realist cally duplicate the cohd t chs we really want to Study For example when polit cai scientists Study how voters form Opinions of cahd dates participants know they are part of ah experiment and often watch commercials or read pamphlets for fake candidates chrrstapher Hullcman The ohra State University Summer znn Steps to randomly assign units to treatments 1 Assign each unit a numerical label starting with 1 or 01 or 001 etc All labels should have the same number of units 2 Start anywhere in Table B and read numbers in groups of the same length as your labels If you want to assign for example 3 units to your first group read across until you nd three unique numbers corresponding to labels in your list christapher Hullcman The Ohio State University Summer znn Randomization 0 Now that you re sold on randomization how do you do it 0 Need to generate some random numbers either using a computer or using a table of random digits Table B List of digits 0 1 2 3 4 5 6 7 8 9 with these properties 0 The digit in any position has the same chance of being any one of 0 9 o The digits in different positions are independent chrrstapher Hciicm en The ohra State University Summer znn Suppose that a newspaper is interested in convincing its advertisers to buy larger ad spaces The newspaper has 20 local companies see below that regularly place small ads in in the paper The newspaper decides to do an experiment in which 5 companies are selected and get free upgrades to a large ad All of the companies are asked to report their revenue in the weeks before and after the experiment is performed Art plumbing Computer Answers Photo Arts Accent Printing Darlene s Dolls River City Books Balloons Inc Hernandez Electronics Riverside Tavern Bailey Trucking Johnson commodities Rustic Boutique Bennett Hardware JL Records Satellite Services Best Camera Shop Liu s Chinese Restaurant Von s Video Store Classic Flowers Magic Tan chrrstapher Hciicm en The ohra State University Summer znn 0 Which 5 businesses should be select to get the large ads We need to select five businesses at random 0 First Label the businesses g to 0 ol Arlplumblng oe CompulerAnswers l5 PholoAns 02 Acceanrlnllng 09 Darlene s Dolls l6 RlverCllyElooKS 03 Balloonslnc lo Hernandez Eleclronrcs l7 RlversldeTavern 04 BalleyTrucKlng ll Johnson cornrnodmes l8 nuslchouhque o5 Bennen Hardware l2 JL Records l9 SalellrleServrces oe BelearneraShop l3 Llu sChlnese neslauranl 2o Von sVldeo Slore 07 Classlc Flowers l4 MaglcTan chnstepher Hullnman The ohre state Unlverslty Summer zuue Step 2 Choose a line in Table B to start we ll use line 128 and 129 if necessary 128 15689 14227 06565 14374 13352 49367 81982 87209 129 36759 58984 68288 22913 18638 54303 00795 08727 Begin reading the digits in groups of 2 15 68 91 42 27 etc The first five labels chosen are 15 06 13 09 18 These labels correspond to Best Camera shop Darlene s Dolls Liu s chinese Restaurant Photo Arts and Rustic Boutique So mats our random sample chrrstepher Hellern en The ohre state Unlverslty Summer zuue Random Sampling in Minitab Using MINITAB instead of a table to select 5 businesses 1 If all of the experimental units are in a column ofa dataset then select Calc Random Data Sample from Columns want to select without replacement 2 If the experimental units are not in a column of a dataset then select Calc Random Data Integer This function basically gives you a table of random numbers as large as you want It will give replicates so choose a larger number of random numbers than you need chnstepher Hullnman The ohre state Unlverslty Summer zuue Matched Pairs Designs o More complex design than a completely randomized design 0 Experimental units are matched together as closely as possible into groups of size 2 One unit in each group gets each treatment 0 Why is this better The variability between units is distributed across treatments so we can get more precise results 0 Special case One unit gets both treatments Treatments given sequentially randomize order Treatments given simultaneously if they don39t interfere with each other chrrstepher Hellern en The ohre state Unlverslty Summer zuue Example matched airs design A manufacturer of chi dren s shoes wants to test whether a new brand of shoe laces is more durable They perform an experiment that measures how long the regular laces and the new laces last on a group of 6 year old boys Each boy wears one regular shoe lace on one shoe and one of the new laces on the other shoe To control for a foot effect the new laces are randomly worn on either the right shoe or the left shoe Chrlstopher Holloman The Ohm State Unlverslty Summer znn Block Design O O O O A matched pairs design is a simple block design where the block is a pair of experimental units In general a block is a group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments A block design assigns units to treatments randomly within each block Blocks are another way to control for lurking variables by ensuring that each block gets all the treatments we avoid confounding Chrlstopher Hellem eh The Ohm State Unlverslty Summer znn Example A lawn care company wants to test a new chemical for treating weeds to see if it is better than their old chemical The chemicals are applied in the morning and the effectiveness is assessed at the end of the day The experiment is performed on several days Since daily high temperature is known to affect the outcome days are used as blocks and each chemical is randomly assigned to several plots on each day Chrlstopher Holloman The Ohm State Unlverslty Summer znn Sampling Design 0 Usually when we want to ather information about a large group of in Ividuals it is often impractical or impossible to take measurements on all of them Example How many people watched the latest episode of Lost Example What fraction of trees in a forest have a disease Solution Take measurements on some individuals and draw conclusions about the whole group May be more accurate than measuring the entire population Chrlstopher Hellem eh The Ohm State Unlverslty Summer znn 10 Sampling terminology 0 Population The entire group of individuals that we want information about 0 Sample A part of the population we actually examine in order to gather information 0 Tree example Population All trees in the forest Sample We have to have a method for picking these The method used to choose the sample from the population is called the deSIgn Chrlstupher Hulluman The ohle State Unlverslty Summer znn o What kind of a sample do we want The sample should be representative of the population we are interested in The sample should be unbiased That is it should not systematically favor certain outcomes Chrlstupher Hullum eh The ohle State Unlverslty Summer znn Voluntary Response Sample Probability Sampling 0 One option is to allow the sampling units to choose themselves 0 Example TV callin survey 0 Voluntary response samples are often biased those with strong opinions especially negative ones are more likely to respond 0 A related idea convenience sample allow the person performing the sampling to choose May choose individuals that are easy to reach May choose individuals who look safe May choose individuals who will produce the desired result Chrlstupher Hulluman The ohle State Unlverslty Summer znn 0 To avoid the problems of favoritism by the sampler or self selection by respondents we allow random chance to select our sample 0 In a probability sample each individual in the population has a known probability of being selected 0 Three types of probability samples Simple Random Sample SRS Stratified Sample Multistage Sample Chrlstupher Hullum eh The ohle State Unlverslty Summer znn 11 Simple Random Sample Stratified Sample o A simple random sample SRS is one in which every set of n individuals has the same chance of being chosen as any other set of n individuals 0 The sample can be chosen the same way we assigned individuals to treatment groups assign a label to each member and get the values from Table B or use Minitab Chrlstupher hellemeh The ohle State Unlverslty Summer znn 0 Divide the population into groups of similar individuals called strata Choose a separate SRS in each stratum and combine them to form a full sample 0 If individuals within a stratum are very similar to each other but widely different across strata this can provide better estimates than a SRS for the same sample size Chrlstupher Hullnm eh The ohle State Unlverslty Summer znn Multistage Sample Sampling Cautions 0 Choose samples in stages to get a sample in clusters which are less time consuming and costly to visit 1 Split the population into primary sampling units PSU groups of individuals that are close together 2 Within each PSU randomly select some secondary sampling units SSU to visit 0 Estimation of population values from these samples can be very complex Chrlstupher hellemeh The ohle State Unlverslty Summer znn 0 Even if a random sample is well designed some problems can arise to bias the results Undercoverage some groups in the population are left out of the process of choosing the sample Nonresponse An individual chosen for the sample cannot be contacted or refuses to participate Response Bias The behavior of the individual changes or is not revealed truthfully because he is in the sample Chrlstupher Hullnm eh The ohle State Unlverslty Summer znn 12 Example Suppose we conduct a telephone survey to find out a bout the credit use of households in Columbus We take a SRS of listings in the white pages of the telephone book and call to ask about credit use Undercoverage Not all households have a phone Nonresponse Some people may not be home or may not want to discuss their credit Response bias The individual may lie about their credit usage to appear more wealthy Chrlstupher hellemeh The ohle State Unlverslty Summer znn o A special note for surveys wording of questions can strongly influence the responses given on a survey Confusing or leading questions can introduce strong bias and even minor changes in wording can change a survey s outcome Chrlstupher hellem eh The ohle State Unlverslty Summer znn Statistical Inference Inference terminology 0 When we measure something about individuals in a sample we often want to generalize the results to a population 0 This process is called statistical inference we infer something about the population based on what we observe in the sample Chrlstupher hellemeh The ohle State Unlverslty Summer znn 0 Parameter a number that describes a po ulation A arameter is a fixed true va ue that is 0 en unknown 0 Statistic a number that describes a sample The value ofa statistic is known for a specific sample but it can change depending on the sample we happen to get 0 Statistical inference alternative definition we infer something about a parameter based on a statistic Chrlstupher hellem eh The ohle State Unlverslty Summer znn 13 Sampling Variability o If we draw different random samples from a population the value of a statistic will vary This variation is called sampling Examples 0 A carload lot of ball bearings has a mean diameter of 2503 centimeters cm To decide whether or not to accept the shipment an inspector examines 100 randomly selected bearings The selected bearings var39ab39l39ty39 have a mean diameter of 2515cm O Sampling variability does not preclude us from drawing conclusions about a o A telemarketing firm in Los Angeles uses a device population that dials residential telephone numbers in that city at random Of the first 100 numbers dialed 43 are 0 ASk the ClueStlon What WOUId happen If unlisted This is not surprising because 52 of all we repeatedly drew random samples from Los Angeles residential phones are unlisted the population Christopher Holloman The Ohio Christopher Holloman The Ohio State University Summer 2006 State University Summer 2006 Suppose that 60 of people in the US population find shopping too time consuming p 6 O The histogram Of the 1 Randomly select 100 people and see how many find sample Statistics follows a shopping too time consuming For our first sample 56 2 Randomly select 100 more 46 people in this sample 0 The sampling distribution find shopping too time consuming so 132 46 f t t t th 3 Continue this procedure many times and make a O a S a 395 39C 395 e histogram of the statistics distribution Of values ta ken by the statistic in all SRSn 100 gt possible samples of the 3 V same size from the same 06 population Samp39e P39Opomon Sample proportion Christopher Holloman The Ohio Christopher Holloman The Ohio State University Summer 2006 State University Summer 2006 o The sampling distribution is affected SRSnleO 3056 5R5nioo A046 I I 5R5nioo 03961 The true value in the population 39 06 The sampling strategy Samp39epmpomon The number of individuals in each sample the more individuals that are selected the narrower the density 5225 A V i m p0609 Vi gtp0625 ii SRSn2500 A p O579 06 Sample proportion Christopher Holloman The Ohio State University Summer 2006 State University Summer 2006 Bias and Variability 0 When we talk about a sampling distribution bias 0 Descrlblng the samplmg concerns the center of the distribution A statistic distribution is unbiased if the sampling distribution is centered on the true parameter value Shape LOOKS me the Stat39St39CS fOHOW o The variability of a statistic is described by its a normal distribution sampling distribution The spread is determined Center Statistics are centered around by sample SIZE and sampllng deSIgn the true value this means they are not biased o Reducmg bias Use random sampling o Reducin variabilit if using an SRS Draw Spread Statistics exhibit less spread larger sagmples y when we sample more individuals Christopher Holloman The Ohio Christopher Holloman The Ohio State University Summer 2006 State University Summer 2006 a High bias low variability b Low bias high variability c High bias high variability d The ideal low bias low variability Christopher Holloman The Ohio State University Summer 2006 Population Size Doesn t Matter 0 The variability of a statistic from a random sample does not depend on the size of the population as long as the population is at least 100 times larger than the sample 0 The variability of a statistic does depend on the sample size Christopher Holloman The Ohio State University Summer 2006 16
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'