×
Log in to StudySoup
Get Full Access to NAU - STA 270 - Class Notes - Week 1
Join StudySoup for FREE
Get Full Access to NAU - STA 270 - Class Notes - Week 1

Already have an account? Login here
×
Reset your password

NAU / Statistics / STAT 270 / lock5 datasets

lock5 datasets

lock5 datasets

Description

School: Northern Arizona University
Department: Statistics
Course: Applied Statistics
Professor: Nellie gopaul
Term: Fall 2016
Tags:
Cost: Free
Name: Stats 270, week one notes
Description: This is notes of the first chapter. The very basics of statistics
Uploaded: 04/07/2017
70 Pages 233 Views 2 Unlocks
Reviews



● How can we make sure to avoid sampling bias?




Truman won the election ● What went wrong?




What type of person are you generally more romantically interested in?



Section 1.1 The Structure of DataStatistics: Unlocking the Power of Data Lock5 Outline ● Data ● Cases and variables ● Categorical and quantitative variables ● Explanatory and response variables ● Using data to answer a questionStatistics: Unlocking the Power of Data Lock5 Why Statistics? ● Statistics iWe also discuss several other topics like What does empathy mean?
Don't forget about the age old question of What is a broad coalition of social justice groups?
Don't forget about the age old question of What is meant by Nationalism?
If you want to learn more check out sfsu immunization
We also discuss several other topics like What were the symbols found in the canon tables?
If you want to learn more check out What are the parts of a speech?
s all about DATA ○ Collecting DATA ○ Describing DATA – summarizing, visualizing ○ Analyzing DATA ● Data are everywhere! Regardless of your field,  interests, lifestyle, etc., you will almost definitely  have to make decisions based on data, or evaluate  decisions someone else has made based on dataStatistics: Unlocking the Power of Data Lock5 Data ● Data are a set of measurements taken on a set  of individual units ● Usually data is stored and presented in a  dataset, comprised of variables measured on  casesStatistics: Unlocking the Power of Data Lock5 Cases and Variables A case or unit is a subject/object in the study that we  want information on.. A variable is any characteristic that is recorded for  each case. Ex: If I ask 5 students what is their favorite flavor of ice cream There are 5 cases, thus “5 students” The variable is “favorite flavor of ice cream”Statistics: Unlocking the Power of Data Lock5 Intro Statistics Survey DataStatistics: Unlocking the Power of Data Lock5 Categorical versus Quantitative ● Variables are classified as either categorical or quantitative:• A categorical variable divides the  cases into groups • A quantitative variable measures a  numerical quantity for each case Statistics: Unlocking the Power of Data Lock5 Categorical and Quantitative Classify each of the following variables from the StudentSurvey data  as either categorical or quantitative: Year in School  Gender HigherSAT (which is higher – Math or Verbal?) SAT score GPA # of siblings Height Weight Exercise Hours of TV per week Pulse rate Award preference (Olympic Gold, Academy Award, or Nobel Prize?)Statistics: Unlocking the Power of Data Lock5 Categorical QuantitativeStatistics: Unlocking the Power of Data Lock5 Categorical variables ● Ordinal variables : measurements have meaningful  order. Ex: letter grade: A, B,C,D,F Patient condition: Good, fair, Serious, Critical ● Nominal variables : measurements are unordered. Ex: Gender: Male ,female Eye color: blue, brown, green,blackStatistics: Unlocking the Power of Data Lock5 Quantitative variables ● Discrete variables: they take on only finite or  countably infinite “isolated’ values. Ex: number of siblings, number of dogs ● Continuous variables: they take on any value  in an interval. Ex: height, weight, etc..Statistics: Unlocking the Power of Data Lock5 Using Data to Answer a QuestionQUESTION: If you are romantically interested  in someone, should you be obvious about it,  or should you play hard to get? Let’s Collect Some Data! Statistics: Unlocking the Power of Data Lock5 Romance What type of person are you generally more  romantically interested in? (a) Someone who is obviously into you (b) Someone who plays heard to getStatistics: Unlocking the Power of Data Lock5 One or Two Variables ● Sometimes we are interested in one variable, as  in whether people prefer obvious romantic  interest or hard to get ● Other times we are interested in the  relationship between two variables, such as  1) prefer obvious interest or hard to get? 2) gender Statistics: Unlocking the Power of Data Lock5 Explanatory and Response If we are using one variable to help us understand or predict  values of another variable, we call the former the explanatory  variable and the latter the response variable. The variable to help understand is the explanatory variable. The variable we predict is the response variable. Examples: ● Does meditation help reduce stress? ● Does sugar consumption increase hyperactivity?Statistics: Unlocking the Power of Data Lock5 Summary ● Data are everywhere, and pertain to a wide  variety of topics ● A dataset is usually comprised of variables  measured on cases ● Variables are either categorical or quantitative ● Data can be used to provide information about  essentially anything we are interested in and  want to collect data on!Statistics: Unlocking the Power of Data Lock5 Section 1.2 Sampling from a  PopulationStatistics: Unlocking the Power of Data Lock5 Outline ● Sample versus Population ● Statistical Inference ● Sampling Bias ● Simple Random Sample ● Other Sources of BiasStatistics: Unlocking the Power of Data Lock5 Sample versus Population A population includes all individuals or objects  of interest. A sample is all the cases that we have collected  data on (a subset of the population). Statistical inference is the process of using  data from a sample to gain information about  the population.Statistics: Unlocking the Power of Data Lock5 The Big Picture Population Statistical  InferenceSampling Sample Statistics: Unlocking the Power of Data Lock5 Dewey Defeats Truman?Statistics: Unlocking the Power of Data Lock5 Dewey Defeats Truman? ● The paper was published before the conclusion  of the 1948 presidential election, and was  based on the results of a large telephone poll  which showed Dewey sweeping Truman ● However, Harry S. Truman won the election ● What went wrong?Statistics: Unlocking the Power of Data Lock5 Sampling Bias Sampling bias occurs when the  method of selecting a sample causes  the sample to differ from the  population in some relevant way. ● If sampling bias exists, we cannot trust  generalizations from the sample to the  populationStatistics: Unlocking the Power of Data Lock5 Sampling Population Sample Sample GOAL: Select a sample that is similar to the population,   only smallerStatistics: Unlocking the Power of Data Lock5 Random Sampling ● How can we make sure to avoid sampling bias? Take a RANDOM sample!● Imagine putting the names of all the units of  the population into a hat, and drawing out  names at random to be in the sample ● More often, we use technology Statistics: Unlocking the Power of Data Lock5 Random Sampling ● Before the 2008 election, the Gallup Poll took a  random sample of 2,847 Americans. 52% of  those sampled supported Obama ● In the actual election, 53% voted for Obama ● Random sampling is a very powerful tool!!!Statistics: Unlocking the Power of Data Lock5 Random vs Non-Random Sampling ● Random samples have averages that are  centered around the correct number ● Non-random samples may suffer from  sampling bias, and averages may not be  centered around the correct number ● Only random samples can truly be trusted  when making generalizations to the  population!Statistics: Unlocking the Power of Data Lock5 Bowl of Soup Analogy Think of tasting a bowl of soup… ● Population = entire bowl of soup ● Sample = whatever is in your tasting bites ● If you take bites non-randomly from the soup (if  you stab with a fork, or prefer noodles to  vegetables), you may not get a very accurate  representation of the soup ● If you take bites at random, only a few bites can  give you a very good idea for the overall taste of  the soupStatistics: Unlocking the Power of Data Lock5 Simple Random Sample In a simple random sample, each unit  of the population has the same chance  of being selected, regardless of the  other units chosen for the sample ● More complicated random sampling schemes  exist, but will not be covered in this courseStatistics: Unlocking the Power of Data Lock5 Realities of Sampling ● While a random sample is ideal, often it isn’t  feasible. A list of the entire population may not be  available, or it may be impossible or too difficult to  contact all members of the population. ● Sometimes, your population of interest has to be  altered to something more feasible to sample from.  Generalization of results are limited to the  population that was actually sampled from. ● In practice, think hard about potential sources of  sampling bias, and try your best to avoid themStatistics: Unlocking the Power of Data Lock5 Non-Random Samples Suppose you want to estimate the average number of  hours that students spend studying each week. Which of  the following is the best method of sampling? (a) Go to the library and ask all the students there how  much they study (b) Email all students asking how much they study, and  use all the data you get (c) Give a clicker question in this class and force every  student to respond (d) Stand outside the student center and ask everyone  going in how much they studyStatistics: Unlocking the Power of Data Lock5 Bad Methods of Sampling ● Sampling units based on something obviously  related to the variable(s) you are studying ○ Sampling only students in the library when asking  how much they study, or sampling only students  taking a statistics class ○ “Today’s Poll” on fitnessmagazine.com asked “Have  you ever hired a personal trainer?”. 27% of  respondents said “yes” – can we infer that 27% of all  humans have hired a personal trainer?Statistics: Unlocking the Power of Data Lock5 Bad Methods of Sampling ● Letting your sample be comprised of whoever  chooses to participate (volunteer bias) ● People who chose to participate or respond are  probably not representative of the entire population ○ Emailing or mailing the entire population, and then  making conclusions about the population based on  whoever chooses to respond ○ Example: An airline emails all of it’s customers asking  them to rate their satisfaction with their recent travelStatistics: Unlocking the Power of Data Lock5 Data Collection and Bias Population Sampling Bias? Sample Other forms of bias?DATA Statistics: Unlocking the Power of Data Lock5 Other Forms of Bias ● Even with a random sample, data can still be  biased, especially when collected on humans ● Other forms of bias to watch out for in data  collection: ○ Question wording ○ Context ○ Inaccurate responses ○ Many other possibilities – examine the  specifics of each study!Statistics: Unlocking the Power of Data Lock5 Question Wording ● “Do you think the US should allow public  speeches against democracy?” 21% said speeches should be allowed ● “Do you think the US should not forbid public  speeches against democracy?” 39% said speeches should not be forbiddenSource: Rugg, D. (1941). “Experiments in wording questions,” Public  Opinion Quarterly, 5, 91-92. Statistics: Unlocking the Power of Data Lock5 Question Wording ● A random sample was asked: “Should there be  a tax cut, or should money be used to fund new  government programs?” Tax Cut: 60% Programs: 40% ● A different random sample was asked: “Should  there be a tax cut, or should money be spent on  programs for education, the environment, health  care, crime-fighting, and military defense?” Tax Cut: 22% Programs: 78%Statistics: Unlocking the Power of Data Lock5 Summary Always think critically about  how the data were collected,  and recognize that not all  forms of data collection lead  to valid inferences ● This is the easiest way to instantly become a  more statistically literate individual!Statistics: Unlocking the Power of Data Lock5 Section 1.3 Experiments and  Observational StudiesStatistics: Unlocking the Power of Data Lock5 Outline ● Association versus Causation ● Confounding Variables ● Observational Studies vs Experiments ● Randomized ExperimentsStatistics: Unlocking the Power of Data Lock5 Association and Causation Two variables are associated if values of  one variable tend to be related to values  of the other variable Two variables are causally associated if  changing the value of the explanatory  variable influences the value of the  response variableStatistics: Unlocking the Power of Data Lock5 Causal Association? “Daily Exercise Improves Mental Performance”The wording of this headline implies… a) Association (not necessarily causal) b) Causal Association This implies that exercising daily will improve  (change) your mental performance Statistics: Unlocking the Power of Data Lock5 Causal Association? “Want to lose weight? Eat more fiber!”The wording of this headline implies… a) Association (not necessarily causal) b) Causal Association This implies that eating fiber will cause you  to lose weight. Statistics: Unlocking the Power of Data Lock5 Causal Association? “Cat owners tend to be more educated  than dog owners”The wording of this headline implies… a) Association (not necessarily causal) b) Causal Association There is no claim that owning a cat will change your  education level. Statistics: Unlocking the Power of Data Lock5 TVs and Life Expectancy Should you buy more TVs to live longer? Association does not imply causation!Statistics: Unlocking the Power of Data Lock5 Confounding Variable A third variable that is associated with both  the explanatory variable and the response  variable is called a confounding variable • A confounding variable can offer a plausible  explanation for an association between the  explanatory and response variables • Whenever confounding variables are  present (or may be present), a causal  association cannot be determinedStatistics: Unlocking the Power of Data Lock5 TVs and Life Expectancy Wealth Number of  TVs per capita ?Life  Expectancy Statistics: Unlocking the Power of Data Lock5 Experiment vs Observational Study An observational study is a study in which  the researcher does not actively control the  value of any variable, but simply observes  the values as they naturally exist An experiment is a study in which the  researcher actively controls one or more  of the explanatory variablesStatistics: Unlocking the Power of Data Lock5 Observational Studies ● There are almost always confounding  variables in observational studies Observational studies can  Observational studies can almost  ● Observational studies can almost never be  almost never be used to  used to establish causation never be used to establish causation establish causationStatistics: Unlocking the Power of Data Lock5 It’s a Common Mistake! “The invalid assumption that correlation  implies cause is probably among the two or  three most serious and common errors of  human reasoning.” - Stephen Jay GouldStatistics: Unlocking the Power of Data Lock5 Randomization • How can we make sure to avoid confounding  variables? RANDOMLY assign  values of the  explanatory variableStatistics: Unlocking the Power of Data Lock5 Randomized Experiment In a randomized experiment the  explanatory variable for each unit is  determined randomly, before the  response variable is measuredStatistics: Unlocking the Power of Data Lock5 Randomized Experiment ● The different levels of the explanatory  variable are known as treatments ● Randomly divide the units into groups, and  randomly assign a different treatment to each  group ● If the treatments are randomly assigned, the  treatment groups should all look similarStatistics: Unlocking the Power of Data Lock5 Randomized Experiments ● Because the explanatory variable is randomly assigned,  it is not associated with any other variables.  Confounding variables are eliminated!!! Confounding  Variable RANDOMIZED  EXPERIMENTExplanatory  Variable Response  Variable Statistics: Unlocking the Power of Data Lock5 Randomized Experiments ● If a randomized experiment yields a significant  association between the two variables, we can  establish causation from the explanatory to the  response variable Randomized experiments are very powerful!  They allow you to infer causality.Statistics: Unlocking the Power of Data Lock5 How to Randomize? ● Option 1: As with random sampling, we can  put all the names/numbers into a hat, and  randomly pull out names to go into the  different groups ● Option 2: Put names/numbers on cards,  shuffle, and deal out the cards into as many  piles as there are treatments ● Option 3: Use technologyStatistics: Unlocking the Power of Data Lock5 Knee Surgery for Arthritis Researchers conducted a study on the  effectiveness of a knee surgery to cure arthritis.  It was randomly determined whether people got  the knee surgery. Everyone who underwent the  surgery reported feeling less pain.  Is this evidence that the surgery causes a  decrease in pain? (a) Yes (b) No Need a control or comparison group.  What would happen without surgery?Statistics: Unlocking the Power of Data Lock5 Control Group ● When determining whether a treatment is  effective, it is important to have a comparison  group, known as the control group ● It isn’t enough to know that everyone in one  group improved, we need to know whether they  improved more than they would have improved  without the surgery ● All randomized experiments need either a  control group, or two different treatments to  compareStatistics: Unlocking the Power of Data Lock5 Knee Surgery for Arthritis ● In the knee surgery study, those in the control  group received a fake knee surgery. They were  put under and cut open, but the doctor did not  actually perform the surgery. All of these  patients also reported less pain!  ● In fact, the improvement was indistinguishable  between those receiving the real surgery and  those receiving the fake surgery! Source: “The Placebo Prescription,” NY Times Magazine, 1/9/00.Statistics: Unlocking the Power of Data Lock5 Placebo Effect ● Often, people will experience the effect they  think they should be experiencing, even if they  aren’t actually receiving the treatment . ● This is known as the placebo effect ● One study estimated that 75% of the  effectiveness of anti-depressant medication is  due to the placebo effect  ● For more information on the placebo effect (it’s  pretty amazing!) read The Placebo PrescriptionStatistics: Unlocking the Power of Data Lock5 Placebo and Blinding ● Control groups should be given a placebo, a  fake treatment that resembles the active  treatment as much as possible ●Using a placebo is only helpful if participants do  not know whether they are getting the placebo or  the active treatment ● If possible, randomized experiments should be  double-blinded: neither the participants or the  researchers involved should know which  treatment the patients are actually gettingStatistics: Unlocking the Power of Data Lock5 Green Tea and Prostate Cancer ● A study was conducted on 60 men with PIN lesions,  some of which turn into prostate cancer ● Half of these men were randomized to take 600 mg  of green tea extract daily, while the other half were  given a placebo pill ● The study was double-blind, neither the  participants nor the doctors knew who was actually  receiving green tea ● After one year, only 1 person taking green tea had  gotten cancer, while 9 taking the placebo had gotten  cancerStatistics: Unlocking the Power of Data Lock5 Green Tea and Prostate  Cancer A difference this large is unlikely to happen just  by random chance. Can we conclude that green  tea really does help prevent prostate cancer? (a) Yes Good randomized experiments allow  (b) No conclusions about causality.Statistics: Unlocking the Power of Data Lock5 Types of Randomized Experiments ● Randomizing cases into different treatment  groups is called a randomized comparative  experiment ● We can also give each treatment to each case,  and just randomize the order in which  treatments are received: matched pairs  experiment ● Either are valid randomized experiments!Statistics: Unlocking the Power of Data Lock5 Why not always randomize? ● Randomized experiments are ideal, but  sometimes not ethical or possible ● Often, you have to do the best you can  with data from observational studies ● Example: research for the Supreme Court  case as to whether preferences for  minorities in university admissions helps  or hurts the minority studentsStatistics: Unlocking the Power of Data Lock5 Randomization in Data CollectionWas the sample  randomly selected? Was the explanatory  variable randomly  assigned? Yes Possible to  generalize to  the population No Should not  generalize to  the  population Yes Possible to  make  conclusions  about causality No Can not make  conclusions  about causality Statistics: Unlocking the Power of Data Lock5 Two Fundamental Questions in  Data Collection Random  sample??? Population Sample Randomized  experiment???DATA Statistics: Unlocking the Power of Data Lock5 Randomization ● Doing a randomized experiment on a random  sample is ideal, but rarely achievable ● If the focus of the study is using a sample to estimate  a statistic for the entire population, you need a  random sample, but do not need a randomized  experiment (example: election polling) ● If the focus of the study is establishing causality  from one variable to another, you need a  randomized experiment and can settle for a  non-random sample (example: drug testing)Statistics: Unlocking the Power of Data Lock5 Summary ● Association does not imply causation! ● In observational studies, confounding variables  almost always exist, so causation cannot be  established ● Randomized experiments involve randomly  determining the level of the explanatory  variable ● Randomized experiments prevent confounding  variables, so causality can be inferred ● A control or comparison group is necessary ● The placebo effect exists, so a placebo and  blinding should be usedStatistics: Unlocking the Power of Data Lock5 http://xkcd.com/552/Statistics: Unlocking the Power of Data Lock5 

Page Expired
5off
It looks like your free minutes have expired! Lucky for you we have all the content you need, just sign up here