# Applied Statistics with Computers MATH 254

BSU

GPA 3.72

This 51 page Class Notes was uploaded by Breanne Schaden PhD on Saturday October 3, 2015. The Class Notes belongs to MATH 254 at Boise State University taught by Kathrine Johnson in Fall.

Date Created: 10/03/15

Chapter 3 Producing Data Anh Dao July 10m 2009 0 Experimental unit refers to the individuals on which the experiment is performed and measurements are recorded When the individuals are human beings they are called subjects 0 Treatment refers to the experimental condition applied to the units 6 Factors refer to explanatory variables in an experiment 0 Level refers to a specific treatment value formed by combining value of each of the factors o In experimental study we record information about units after applying treatments to units 0 Examples Clinical trials different kinds of experiments o In observational study we record information about units or response without imposing anytreatment passive participation of researcher 0 Examples Opinion polls market studies and surveys We are x 1 Population 1 CW Simple Random Sample Compare Population 2 gt O Advantage The data is much easier to obtain 0 Disadvantages c We cannot say the explanatory variable caused the response 0 There may be lurking or confounding variables 9 Observational studies should be more to describe the past not predict the future There are 3 basic types for observational studies 0 Sample survey provides sample observed apoint in time o Prospective study follows a sample forward in time cohort studies 0 Retrospective study follows a sample backward in time casecontrol studies CaseControl Study A study in which cases having a particular condition are compared to controls who do not The purpose is to find out whether or not one or more explanatory variables are related to a certain disease 0 Although you cannot usually determine cause and effect these studies are more efficient and they can reduce the potential confounding variables 11 Empire Suppose we are interested in comparing GRE scores for students in five different majors 0 O O 0 We cannot do a randomized experiment because we cannot randomly assign individuals to a specific major The individuals decide that for themselves Thus we observe students from 5 different preexisting populations the five majors We obtain a random sample of size 15 from each of the five majors We calculate statistics and compare the 5 groups Can we say being in a specific major causes someone to get a higher GRE score away Suppose we are interested finding out which age group talks the most on the telephone 010 years 1020 years 2030 years or 3040 years 0 O O O 0 0 We cannot do a randomized experiment because we cannot randomly assign individuals to an age group Thus we observe through polling or wire tapping individuals from 4 different preexisting populations the four age groups We obtain a random sample of size 25 from each of the four age groups We calculate statistics and compare the 4 groups Can we say being in a specific age group causes someone to talk more on the telephone What are some possible confounding variables 7 Design of Experiment refers to the process of planning designing and analyzing the experiment so that valid and objective conclusions can be drawn efficiently o Randomization refers to the random allotment of units to treatments through some random mechanism like random numbers The main purpose of randomization is to treat all the experimental units identically in every possible way except for the actual treatments being compared 0 Replication refers tothe repetition of each treatment on many experimental units The number of experimental units in each treatment group refers to number of replications 0 Blocking is arrangement of experimental units into groups blocks that are homogeneous similar to one another The key to a randomized experiment The treatment explanatory variable is randomly assigned to the experimental units or subjects Simple Random Sampling Suppose that before we want to test the effect of aspirin on the physicians we wish to do a study on the effect of aspirin on mice comparing heart rates 0 We obtain a random sample of 100 mice 0 We randomly assign 50 mice to receive a placebo 0 We randomly assign 50 mice to receive aspirin 0 After 20 days of administering the placebo and aspirin we measure the heart rates and obtain summary statistics for comparison 0 The single greatest advantage of a randomized experiment is that we can infer causation 0 Through randomization to groups we have controlled all other factors and eliminated the possibility of a confounding variable 0 Unfortunately or perhaps fortunately we cannot always use a randomized experiment a Often impossible or unethical particularly with humans Random Number Table is a list of digits 01 9 which are all equally probable and the probability of any given digit is unaffected by the digits that precede it or digits are independent Example Ebook Chapter31 58 Example 310 Depending on the method of randomization there are three types of experimental designs O Completely Random Design The subjects are randomly assigned to one of the treatments 0 Matched Pair Design Each subject is matched up with another subject who is similar in terms of age health etc Example Twins Randomly select subjects in a pair and assign them to different treatments Matching allows to control lurking variables 0 Block Design Similar subjects are matched to create a large set of experimental units called blocks Then treatments are randomly assigned to units within the blocks 0 Control Group allows us to compare against an existing treatment enables us to control the placebo effect the placebo effect occurs when patients seem to improve regardless of the treatment they receive 0 Randomization o eliminates biasthat can result when researchers assign treatments to the subjects 0 balances the group on variables that you know affect the response 9 balances the group on lurking variables that may be unknown to you 0 Replication assigns several experimental units to each treatment o Blinding increases reliability of the results a SingleBlind subjects do not knowthetreatment assignment a DoubleBlind neitherthe subjects northose in contact with the subjects know the treatment assignment o The entire group of individuals that we want information about is called the population 0 A sample is a part of the population that we actually examine in order to gather information 0 Sampling should not be biased nofavoring of any individual in the population a Example of a biased sample Select goldfish from a particular store 0 The selection of an individual in the population should not affect the selection of the next individual independence 0 Example of nonindependent sample Choosing cards from a deck without replacement 0 Sampling should be large enough to adequately cover the population 0 Example of a small sample Suppose only 20 physicians were used in the aspirin study 0 Sampling should have the smallest variability possible a We know there is some error want to minimize it Chapier 3 r Prodl ng Daia LSampling Design Sampllng a Single Populallon L Sampllng Techniques Simple Random Sample SHS Every member of the population has an equal chance of being selected Simple Random Sampling gimp ream gene 0 Assign every individual a number and randomly select 30 numbers using a random numbertable or computer generated random numbers 0 Example Obtain a list of all SSN for individuals in the US who are over 65 Using a random number table select 50 of them 0 Choose 3 from 28 hotels Textbook 32 Table B at the back of the book is random digits or use computer package to choose numbers randomly Chapter 3 Producing Data LSampling Design Sampling a Single Population L Sampling Techniques Stratified Random Sample divide the population into several strata Then take a SRS from each stratum Stratified Random Sampling Population Strata 1 Strata 2 Strata 3 Strata4 Strata 5 mama mem mmm 0 Advantage Each stratum is guaranteed to be randomly sampled 0 Example Obtain a list of all SSN for individuals in the US who are over 65 Divide up the SSNs into regions of the country time zones Then randomly sample 30 from each time zone Chapter 3 Producing Data LSampling Design Sampling a Single Population L Sampling Techniques Cluster Sample divide the population into several strata or clusters Then take a SRS of clusters using all the observations in each Cluster Sampling Population iw rg O Advantage May be the only feasible method given resources 0 Example Obtain a list of all SSNs for individuals in the US who are over 65 Sort the SSNs by the last 4 digits making each set of 100 a cluster Use a random number table to pick the clusters You may get the 4100 s 5600 s and 8200 s for example Chapter 3 Producing Data LSampling Design Sampling a Single Population L Sampling Techniques MumStage Sample Divide the population into several strata Then take a SRS from a random subset of all the strata MultiStage Sampling Population HEAMWJE MJGE i lg 0 Advantage May bethe only feasible method given resources 0 Example Obtain a list of all SSN for individuals in the US who are over 65 Divide up the SSNs into 50 states Randomly select 10 states Then randomly sample 40 from each of the selected states 0 Voluntary Response is response of people who choose themselves by responding to a general appeal Voluntary response samples are biased a Internet Surveys 0 Callin Surveys 0 Convenience Sampling 0 Sampling Friends 0 Sampling atthe Mall 0 Dishonesty 9 Asking personal questions 0 Not enough time to respond honestly 0 Undercoverage Some groups in the population are left out when the sample is taken 9 Nonresponse An individual chosen for the sample cant be contacted or does not cooperate 0 Response Bias Results that are influenced by the behavior of the respondent or interviewer O Example 0 the wording of questions can influence the answers 0 Respondent may not want to give truthful answers to sensitive questions 7 More Than We Pripulaiion 0 We sample from more than one population when we are interested in more than one variable 0 As previously discussed one variable is chosen to be the response variable and the other is selected as the explanatory variable 0 Examples 0 Comparing decibel levels of 4 different brands of speakers 0 Determining time to failure of 3 different types of lightbulbs 0 Comparing GRE scores for students from 5 different majors quot Comparing decibel levels of 4 different brands of speakers 0 What is the explanatory variable Brand 0 What is the response variable Decibel Level 0 Number of Populations Four 0 Number of Samples needed Four 22 Determining time to failure of 3 different types of lightbulbs 0 What is the explanatory variable Type 0 What is the response variable Time to Failure 0 Number of Populations Three 0 Number of Samples needed Three Comparing GRE scores for students from 5 different majors 0 What is the explanatory variable Major 0 What is the response variable GRE Score 0 Number of Populations Five 0 Number of Samples needed Five an we F iiiiulalliil39i 0 Each sample should represent the population it corresponds to well 0 Samples from more than one population should be as close to each other in every respect as possible except for the explanatory variable Otherwise we may have confounding variables 0 Two variables are confounded if we cannot determine which one caused the differences in the response Eamlpiee 0 Suppose we compared the decibel levels of the four different speaker brands each with a different measuring instrument 0 We would not know if the differences were due to the different brands or different instruments 0 Brand and lnstrument arethen confounded 0 Suppose we compared the timeto failure of the three different types of lightbulbs each in a different light socket e We would not know if the differences were due to the different types of lightbulbs or different light sockets a Type and Socket confounded mmi iaas i 0 Suppose we obtained GRE scores for each major each from a different university 0 We would not know if the differences were due to the different majors or different universities 0 Major and University are then confounded o Confounding can be avoided by using good sampling techniques 0 It is also possible that more than one possibly several explanatory variable can influence a given response variable 0 Examples 0 Perhaps both the type of lightbulb and the type of light socket influence the time to failure of a lightbulb o It is likely that different types of lightbulbs work better for different sockets in This concept is known as interaction The responses for the levels of one variable differ over the levels of another variable 0 O 0 The groups we are interested in isare called the populations Sample is a subset of a population Parameters are numbers describing the population Parameters are fixed numbers usually unknown Statistics are numbers describing the sample The value of a statistic is known when we have taken a sample Statistics change from sample to sample Statistical inference uses afact about a sample to estimate the truth about the whole population uses statistic to estimate an unknown parameter There may be some error involved in making inference due to improper sampling techniques or application of unreasonable statistical method etc Example 1 What is the average number of car accidents for a person over 65 in the United States 0 How many populations are of interest One 0 What is the population of interest All people in the US over age 65 Example 2 For the entire world is the IQ of women the same as the IQ of men 0 How many populations are of interest Two 0 What are the populations of interest All women and all men Example 3 How many times a day should I feed my goldfish o How many populations are of interest One 0 What is the population of interest All pet goldfish Example 4 Which is more effective at lowering the heartrate of mice no drug control drug A drug B or drug C 0 How many populations are of interest Four 0 What are the populations of interest All mice taking no drug all mice taking drug A all mice taking drug B all mice taking drug C Suppose we have no previous information about the questions addressed in the examples How could we answer them 0 Census 0 Advantages We get everyone we knowthe truth 0 Disadvantages Expensive difficult to obtain may be impossible 0 Sample 6 Advantages Less expensive Feasible o Disadvantages Uncertainty about thetruth Instead of surety we may have error 0 If we take a census we have everyone and we have no need for inference o If we take a sample we make inference from the sample to the whole population 0 For these four questions it is not likely we can get a census We will need to use a sample 0 Obviously for each population we are interested in we must get a separate sample Question Can Aspirin reduce the risk of heart attack in humans Sample Sample of 22071 mae physicians between the ages of 40 and 84 randomly assigned to one of two groups One group took an ordinary aspirin tablet every other day headache or not The other group took a placebo every other day This group is the control group Summary statistic The rate of heart attacks in the group taking aspirin was only 55 of the rate of heart attacks in the placebo group Inference to population Taking aspirin causes lower rate of heart attacks in humans It is common practice to use Greek letters for parameters of population We call the mean of a population it We call the standard deviation of a population a and the variance 02 When we are talking about percentages we call the population proportion 7r It is important to know that for a given population there is only one true mean and one true standard deviation and variance or onetrue proportion u a 02 and 7r are parameters fixed numbers o It is common practice to use Roman letters when talking about statistics 0 We call the mean of a sample 7 We call the standard deviation of a sample s and the variance s2 0 When we are talking about percentages we call the sample proportion p 0 There are many different possible samples that could be taken from a given population For each sample there may be a different sample mean sample standard deviation sample variance or sample proportion O 7 s s2 and p are statistics Chapter 3 Producing Data LStatistical Inference L Inference Overview We use sample statistics to make inference about population parameters Population Sample Inference Mean A gtX Proprotion n p 0 There are many different samples that you can take from the population 0 Statistics can be computed on each sample 0 Since different members of the population are in each sample the value of a statistic varies from sample to sample 0 Statistics are random variables while parameters are constants o The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population 0 We can then examine the shape center and spread of the sampling distribution of a statistic 0 Bias concerns the center of the sampling distribution A statistic is unbiased if the mean of the sampling distribution is equal to the true value of the parameter being estimated To reduce bias use random sampling The values of a statistic computed from an SRS neither consistently overestimates nor consistently underestimates the value of the population parameter Variability is described by the spread of the sampling distribution To reduce the variability of a statistic from an SRS use a larger sample You can make the variability as small as you want by taking a large enough sample lt High bias mgh mummy 1 mg wdeal low mas low vanzb ny

