# INTRO TO STATISTICS STAT 104

ISU

GPA 3.5

This 26 page Class Notes was uploaded by Giovani Ullrich PhD on Saturday September 26, 2015. The Class Notes belongs to STAT 104 at Iowa State University taught by Anna Peterson in Fall.

Date Created: 09/26/15

Stat 104 Section B Anna Peterson Stat 104 Instructor Anna Pete rson e mail Of ce Hours Lecture Laboratory Required Text ericksainastateedu annaericksengmailcom Snedecor2211 MWF 100071100 MWF 814079140 MoleBio 1420 TF 120320 Carver 0305 Just the Essentials of Elementary Statistics 10th Edition RobertJohnsonamp Patricia Kuby Thomson ooksCole Prerequisites Make sure you can do basic algebra Therewill be a pretest passed out Understand summation notation Orderofoperations Make sure you can use a calculator Bringyour calculatorto class and lab How can you do well in this class Attend all lectures and pay attention Attend all labs and participate Complete all assignments 0 Go over answers to assignments READ and STUDY the textbook Come to office hours with questions Form study groups with fellow students Course Information and Policies xams In class examswill be given duringthe lab period Bringa Calculator pencilpen formula paper and tables 39n class 7 Exarnswill be given during the 2 hour lab period ph rridays Final Exam August 7th a Three 8x11 sheets or paper typedwritten oh one side each may be used on the final aarn One from each of the two previous yarns and one for the new material La Course Information and Policies b There are two wee kly twoahour laboratory scheduled Bring your book class notes tablesand a calculatortothe ab Class participation pointsare givenfor presentinga homework problemsolution during lab Each student is only requiredto present one solution Vouwill receive 10 pointsfor presenting There are 11labsthis summer I willtake yourtop 8 lab scores towardsyour lab grade Each lab is worth 5 points Extra presentationsand better lab attendancewill influence oundangradesat the end ofthe semester Course Information and Policies I Homework Individual practice is an important part of learning For this reason homework problemswill be assigned throughoutthe semester 7 i encourage everyone to attempt the problems 7 Please note that homework will not be collected ch your completed homework to a quiz to receive a minimum of 5 points on a quiz Course Information and Policies Quizzes 39 Quizzes will be administered during labclass after homework is due The quizzes will be open note open book I You will have 10 minutes to complete the quiz I The quizzes will be composed of questions based on the homework I Each quiz is worth 10 points Up to 8 quizzes will be given Only your top five scores will e used to determine yourtotal points Max 50 points Attach your completed homework to the quiz and 392 receive a minimum of 5 points on the qui Course Information and Policies I Project A project will be assigned during the semester This project is intended to ex ose students to the collection and statistical analysis of data to solve real world problems Students will work in groups Specific details will be given later in the semester Grading Letter grades including plusminus will be given base on performance on exams class participation and the project Tentative ScheduleGrading I Exam l 100 points I Exam II 100 points 39 Quizzes 50 points 10 pts each I Project 50 points I Class Participation 50 points 10 for the presentation and 40 for Labs I Final Exam 150 points I Total Points 500 points What Is Statistics I Wikipedia A mathematical science pertaining to the collection analysis interpretation or explanation and presentation of data It is applicable to a wide variety of academic disciplines from the natural and social sciences to the humanities government and business What is Statistics I Statistics is the science of collecting describing and interpreting data allowing for databased decision making I I like to think of statistics as the science of learning from data Jon Kettenring ASA President 1997 What is Statistics In Business and Industry statistics can be used to quantify unknowns in orderto optimize resources eg Predictthe demand for productsand services Checkthe quality of items manufactured in afacility n Agriculture statistics can be used to Predictthe cropyields Estimate minimumfertilizer needed What Is Statistics Statistics is about variation The world is full of data Data exhibit variation Recognizing displaying and quantifying variation in data can help us make sense of the world Try to explain variation We distinguish between descriptive and inferential Statistics Descriptive Statistics The collection presentation and description of data in form of graphs tables and numerical summaries such as averages variances etc lookfor patterns summarizeand presentdata quick information compare several groups ie one can easily lookfor differences and similarities We distinguish between descriptive and inferential Statistics lnferential Statistics Deals with the interpretation of data as well as drawing conclusions and making generalizations based on data for a larger group of subjects making datarbased decisions a larger group ofindividuals Example Before movies are released they are previewed by a selected audience As e 00 people are asked to provide an overall rating for a movie Results 24verysatlsfled 26 satisfied 24 ofthe 200 previewers were ven satisfied with the movie this is a descriptive statement based on a sample of 2 previewers 24 ofall people goingto see the movie will be ven satisfied this isan inferential statementforthe entire populationof individuals Statistics is the science of Collecting Describing displaying Interpreting Data We collect data to answer a specific question of In eres yield Does nitrogen improve corn yield What seed is best WEE What is the relationship between rainfall and yield Does this new drug cure the disease Is it safe What do voters think about a candidate or an 39 s e Isu Does nitrogen improve corn yield e na e a question tnat we would like to answer Are we interested in all com lust one brand otcorrl or only corn grown in lowa The group tnat is to be studied is called tne population and each element otthe population is called an individual decide tnat we are speci cally interested in all com types grown in lowa is it feasible to collect data from every single cornfield intne state of lo 7 No Notenouglitirne orrnoney We look fora reasonable Subset ofthe populatlol l called a sample F39Erhapsunetarrntrurn eaon county in lowa Pupulatlun All farmsln lowa individual Aslnglefarm o Parameter a numerical Value summarizing all the data of the entire popu a 39on Example population mean yield of com P 1 ti 0p a 0J1allitems of interest Example All farms in Iowa Statistic a numerical Value summarizing the sample Example sample mean yield of com Sample a a few items fr rn the populatian Ex am 10 arms in Iowa Once we have data collected from our sample we can look at the statistics Statistics are the HumberS Summarizing the data if e 9 27513 Sometimes referred to as poll it e5 lmates We hope these Statistics are good estimates ofthe parameters re the hdrh bers summahzthg th P What do we heed to measure to answer the question oftnterest Variables are the characteristics otthe individuals wtthh the population What would be the variable torthe p b d question Does riltl ogel i improve corh yield Ye ore l5 this variable qualitative or quantitative quant tatwe Data Information Context is important Who are we collecting data on Cases Rows in a data table What data are we collecting Variables Columns in a data table Number Average Data Who People in Ames What Age Number of Children Happiness Job title Gender What Variables characteristic of interest about each individual element of a population or sample Categorical v Numeric Nominal v Ordinal Continuous v Discrete Variable Qualitative Categorical Quantitative Numerical descrlbes or calegorlzes an quantlfles an element of a e emen fa popu atlon population Examples Gender halr color eye Examples age helght shoe slze color Qualitative Categorical u l r posltlon or ranking Examples Gender nalr color type Examples level ofsatlsfactlon wlth a ofvehmleyou own mm color product heat semng on a mlcrow ave low med nlgln a Quantitative Discrete assumes a countable Continuous asmmes an number ofvalues uncountable number ofvalues Examples Age number ofslblmgsl Examples Helgnt welgnt dlstance dozens ofeggs tnlngs you can count measurements Variable Qualitative Quantitative Categorical umerica descrlbes or categorlzes an quantlfles an element entofapopulatlon ofapopula lon Ordinal Nominal Dis be Continuous mCO P omtes 3 names 3 assumes a countable assumes an ordemdPOS l Onr element ber ofvalues uncountable orrankmg number ofvalues Number Average Qualitative Categorical describes or eategonzes an elementofa population Nominal Gender Job Ordinal Happiness ordered by amountofhappiness Variable Quantitative erica quanti es an element ofa population Continuous Av erage Wage Discrete Age Number of Children Example Gallup News Senlee conducted a survey of l0l2 adults aged l8 years or the household had experlel lced some type otorlrhe durlhg the preoedlhg ohcluded that 24 ofall households had a ldehtlty the research oblec lye p ldehtlty the sample 0 LlSt the desohptlye statlstlos f dWhat ls the oorrespohdlhg parameter p e State the ooholustohs rhade lh the study Data Collection I Sampling studies surveys Ch 14 Experiments Notes only Sample Surveys I Idea 1 Examine a part of the whole Easierto obtain Easiertoworkwith Population 7 all items of interest Sample 7 a fe v v items from the populationf Properties ofa Sample part of a whole I Would like the sample to be representative ofthe population Should look like a smallerversion ofthe population I This may not be possible but at least we would like a sample that is not biased Ahiased sample is one that overor under representsa certain portion ofthe population Telephone Surveys how biased Sample Surveys Idea 2 Random selection Selecting items from the population should be done at random so as to reduce the chance of gettinga biased sampl Random selection is a KEY idea in data collection Sample Surveys Idea 3 It s the sample size What fraction of the population is sampled is not important The size of the sample is the important thing ie 1000 items in a sample tells youjust as much about a population of size 1000000 as it does about a population of size 100000000 What about a census Would a census complete enumeration f the population be a better way to go Dif39ficultto do Populations are often dynamic Can be more complex More expensive Examplethe Decennial US Census next one in 201 Example Population All students at ISU Question Have you posted a video on YouTube Population parameter Proportion of all ISU students who would answer yes Sample 400 ISU students Sample statistic the proportion of the 400 students in the sample who say yes How should we select the 400 Two weak approachesdon39t do these Voluntary Response Put an ad in the ISU Daily withthe question and ask stude sto rop offtheiranswers Problems Convenience Sampling Go to computer labs across campusand askthe first 400 studentsyou meet Askeven 100 people enteringafootballgame about theirfavorite sport Problems Sampling studies Single stage the elements of the samplingframe are treated equally and there is no subdividing or partitioning of the frame Simple randomsample Systematic Multiple stage the elements are subdivided and the sample is chosen in more than one stage StatifiedCluster S mp E Randnm Samp E SR5 5R5 Szmvhngmme 1 slmrsenmmeebmems mmme Wpuhuunhom mm samplswmhedunm 5 R u 5 w nnnmimgmanlmnmp E WNW wwmv ww mk39 mum a m 39 E Equotquotquot quot n mquot Mum 5mm iiVsmmmtwivimm mmmm m mm 7mmer smmvwwmam an m mimth kegmm ivmn m ummm m m mm may and mesmxmg Dmnuhwm in mm a mum rsxngwrstigzsimphngp in 5mm Simw immm simphng D in mum swan SR5 M mm rkzdvtz Evmv bv vamg mmst a rmezximD E mm SUm znumm mHEgzs and mm Sksimm Em mHEgz tun ww Ja fm chmm mamzdv mm rammed Inkz sksimm mth Nukmmm 4mm namw mm a lew Mm m m 39rme Mm m Nonsampling errors are errors that resurt from the surye process They are due to the nonresponse orrndryrduars serected to be rn the survey to rnaccurate responses to orry worded questrons to bras rn the serectron or rndryrduats to be gryen the survey and so on Muttrtude of reasons We Want to ehmmate them rf possrbte Sampling errors rs the error that resurts rrorn usrng Samphng to estrrnate rnrorrnatron regardrng a popuratron Thrs type or erroroccurs because a sarnpte gryes rncornptete rnforrnatron about the popuratron E The foHoWrrrg surveys are awed Deterrnrne whetherthe sarnprrng method orthe survey rtserrrs awed For awed surveys rdentrry the cause orthe error and suggest a remedy Amagazrne rs conductrng a study on the errects or rnrderrty rn a rnarnage The edrtors randornry se ect 4 o wornen whose husbands were unrarthrur an a Do you erreye a rnarnage can suryrye when the husband destroys the trust that must exrst between husband and Wtfe 2 AcoHege yrce presrdent wants to conduct a study regardrng student achreyernent or undergraduate students He serects the rrst 50 students who enter the burrdrng on a gryen day and adrnrnrsters hrs survey 3 Aporrrng organrzatron rs gorng to conduct a study to estrrnate the percentage orhousehords that speak a rorergn ranguage as the pnrnary ranguage rt rnarrs a guestronnarre to 1023 randornry serected househords throughout the Urrrted states and as s the ea orhousehord rra rorergn ranguage rs the pnrnaryranguage sp oken rnt e horne orthe 1023 househords serected 12 responded Observational Studies I Observational studies are those in which the researcher is a passive observer I Simply observing what happens A sample survey is an observational study There are otherobservalionalsludiesthat are not surveys I Can t make cause and effect inference based on observations Tanning and Skin Cancer 1500 people Some had skin cancer and some did not have skin cancer Asked all participants whether they used tanning beds Diet and Blood Pressure Enroll 100 individuals in the study Give each a diet diary Everything eaten each day is recorded From the diary entries the amount of sodium in the diet is calculated Measure blood pressure 20 Differences I Retrospective look at past records and historical data Tanningand Skin Cancer I Prospective identify subjects and collect data as events un ol Diet and Blood Pressure Experiments I Intentionally apply a treatment to individuals referred to as experimental units I Attempts to isolate the effects of the treatment on a response var39ab e I Terminology ExplanatonvariableiFactor Respons variable Su jectsi Participantsi Experimental Units Treatments Designed Experiment a controlled study in which one or more treatments are applied to experimental units The experimenterthen observes the effect of varyingthese treatments on a response variable I Experimental unit a person object or some other welldefined item upon which a treatment is applied Predictor explanatow variables are the factors that affect the response variable Also referred to as independent variables Treatment a condition applied to the experimental unit levels of the factors 21 I Responsevariable is a quantitative or qualitative variablethat represents the variable of interest Also referred to as dependent variables I Extraneousvariables are neither response nor predictor variables These are variables that may affect the outcome of the experiment but are not controlled by the experimenter Experiments I The experimenter must actively and deliberately manipulate the factors to establish the method of treatment I Interested in What might happen ifl change this factor I Experimental units are assigned at random to the treatments Controlling Cholesterol I Does a higher dose of a new drug lower cholesterol more 30 participants Factori drug dose Treatments 10 m or 20 m 15 subjects randomly assigned to each treatment Response 7 change in cholesterol 22 3 Experimental Principles I Control Outside variables Controlgroup I Random assignment I Replication within an experiment repeatingan entire experiment 1a Control outside variables I Control outside variables that may affect the response Have subjects ofthe same age gender general health ethnic grou By controllingoutside variablesyou reverit tnose variables rrorn causing variation in tne response 5 iirnittne atent thatyou can generalize resu 1b Control Group I Have a group that receives 0 mg The 0 mg pill is called a placebo no active ingredient The control group allows the experimenter to establish whether the drug is effective at all in reducing cholesterol by providing a means to see what the natural change in average cholesterol is during the experiment 23 2 Random Assignment Tends to spread the effects of uncontrolled outside variables evenly across the treatment groups Reduces the chance that an uncontrolled outside variable will bias the results Does NOT prevent uncontrolled variables from changing but does lessen the impact of these changes due to the even spread 3 Replication Within an experiment Have several experimental units in each treatment group Able to assess the natural variation in the response for units treated the same way Replication I Repeatingthe entire experiment This isespecially important ifthe subjects in an experimentare not randomlyselected froma population Arethe results ofthe entire experiment repeatable 24 Diagram of an experiment Group 1 several Subjecm gt Treatment 1 40m Subjects 91th ompare Group 2 gt Treatment 2 several subjects Nutlce thatwc have buth randcrrn asslgnmentAND replication Within the experiment several suplccts Within each gruup Example A school psychologist wants to test the effectiveness ofa new method for t ching reading She selects ve hundred rst rade students in District 203 Group 1 t od while Group 2 is taughtvia traditional methods The same teacher is assigned to teach both groups At the end otthe year an achievement test is administered and the results of the twogroups corn are 1 What ls the resiohse varlable lh thls etierlmeh 2 What l39s the treatmehtj How mah levels does the treatment have3 a Are any ot the predictor variables controlledv esearc ers eslgh an estu entstrorn ditterent How oes e r socloecohoml c levels3 5 ldenu fytheaperirnentzl units 4 Example study L L L of eight bags ofthe Eme brand ofmicrowave popcorn isused The rst bag and the number ere39nee39S eZ39k39Er39nere eeemee The next leg eeern selected at random is popped for 2 minutes and 15 seconds the bag is are selected at random and popped for 15 secondslonger than the previousbag and the number ofunpopped kernelsiscounted The Eme microwave set on high isused for all bags 25 lsthis an observational study or a designed experiment Haw at you knew What are the experimental units Whatis the explanatory variable What is the treatment and how many levels does the treatment have What isthe response variable Isthe response variable qualitative or quantitative and if it s quantitative is rt discrete orcontinuou Arethe three key quotingredientsquot taken care of Control Random Assignment Replication Multiple Factors Use ofexam aids on scoresonaStatistics exam Experimental units students in the Statistics class Factors 7 CalculatorVesorNoZlevels e FormulasheetVesorNoZlevels Treatme ntsi combinations of using or not using each exam The studentswill be randomly assigned to the treatme nts groupsso thatthere are several students in each group All studentswill be given the same statistics exam and the score onthe examwill be the response variable 26

