### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Introduction to Statistical Methods for Life and Health Sciences STATS 13

UCLA

GPA 3.67

### View Full Document

## 106

## 0

## Popular in Course

## Popular in Statistics

This 35 page Class Notes was uploaded by Isobel Stanton on Friday September 4, 2015. The Class Notes belongs to STATS 13 at University of California - Los Angeles taught by I. Dinov in Fall. Since its upload, it has received 106 views. For similar materials see /class/177952/stats-13-university-of-california-los-angeles in Statistics at University of California - Los Angeles.

## Similar to STATS 13 at UCLA

## Popular in Statistics

## Reviews for Introduction to Statistical Methods for Life and Health Sciences

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/04/15

Jams mam Manamaquot taj mu a m mm mmgzmsngmy ylnhcldumllcpnxmem Indium m Wham mum 1 mun m wmaw QAWW vevanahles m mmummmm cmms IVxlhkanhflwnptmdvahxmmmdu wu nuaux Iquhk xwmhrmry quotmam m ma 11mm 9mm unable ak 3 mm nr 212537 wnahlzs mm gran mmbevsmp gnk mdsld hhui wa mme awn y Emu a mm Guam emu WWMMMWWV Amian WWWquot mm H Mnmmm cumm 1va HAHN o Rammuxfnzpnmm o Mxmmmmpktz lullcymmmhzntnhtuld mukulmmu nmmammmmxw In npnhm Ih h mf n f mhsn lnxmuxel man my m quotnumm wuw xmnmlw mekutmmnlxxeuhkx mm gammaquot quotm m mm mkl memmyv Wywmmmenw mml o wmm mmuanmv mm mm mm kldrg lmxiynml mums Emuhiyexmu v mG ileWI snmmmmmamhm 5sz nammmmnmm My MW 111 5 Z m mm mmmmm Hungmhm m drug m a n u Hungmhmhmmdxy mm 12 m m m M u M hm WW yk dmxm luve m 81 pmhkm7mwmmmhufmx uum I M MM WW mmmmgmvw 1 quot MIN Wm 7 m vlmhlAJymfmmmamhnw39hdmmht I I e m mm m mm in mm m j Splkempattem k Outha39s 1 Trunmhunplmuu39ha Figure 23m Feamres m luuk fur m histngzms and Stanrandrlafpluts 0 What does it mean for39a histogram or stemandleaf plot to be bimodal What do We suspect When We see abimodal plot What are outliers and how do theyshoW up in these plots What should We try to do when we see them Medxan u a do We mean by symmetry and positive and A5 51000 I What negative skewness 91 5500 0 What shape do we call exponential Lowtzvquamltz Upptzvquamltz I Shouldyve be suspicious of abrupt changes Why Yesl Try m establish thereasun Lhejpmp may have m be Emmi I The sample meanis denoted by 7 The sample mam Sum ofthe observations Number of observations FiguralLl Mechanical cunstxucuun representing a d tpl t a shuws abalanced rud while in and e shuw unbalanced ruds I quotgummmmmm mg m mwnnhzvmmxmemhu m Ewanu mppwprime avtragilg mam g min mum mm nhzvlhmx Wmsz u mg helnw39hz mm mm mm mm mm a m m mm omqu gunmande myw mn m may m x 1mm mm man mm m m m ummi39hnn Magnum on mmmm umlwhnkmmhd g 22 5on an on mm mp5 mm om m lupizmdimululljpm39utdmh mp1 m in mm am Why xmgzz mdfnr 7 Jun dm L lduwhzhmMnZHmymmliglgg mm he ummmmdlnhmmh 1 mm oman y mm mph mm mm mm mm Ihlprmrrbtrmmm m hhllzd ymxx w l mm rivaFwinhruqmumw 171mm m m x lqumcy imam 2mm 242 Mean1145 mu m lnhxgmun m UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences Instructor IVO Dinov Asst Prof of Statistics and Neurology Teaching Assistants Fred Phoa Kirsten Johnson Ming Zheng amp Matilda Hsieh University or California Los Angeles Fall 2005 IWWW Stat ucla editMilne Vcoursesistudents html Sample Size Calculations Confidence Intervals for Proportions Planning a Study to Estimate 1 l o It is important before you begin collecting data to consider whether the estimates will be sufficiently precise 0 Two factors to consider I the population variability on l sample size Planning a Study to Estimate u 0 First In certain situations the variability on should not be controlled for response in medical study to treatment However in most studies it is important to reduce the variability of Y by holding extraneous conditions as constant as possible I For example study of breast cancer might want to examine only wo n Planning a Study to Estimate 1 l 0 Second Once the experiment is planned to reduce the variability on as much as possible we consider the sample si I For example how many women should we sample to achieve the desired precision for our estimate 0 RECALL SE i n J Planning a Study to Estimate u 0 To decide on a proper value of n we must specify what value of SE is desirable and have a guess of s I For SE we need to ask what value would we tolerate I For s we could use information from a pilot study or previous research Desired SE m J Planning a Study to Estimate u Example Reindeer Cont 7 S 8 83 SE 0 874 e We Would ilke to estlmate u the sample size hecessary forl lext years roul39ldrup to keep SE 0 6 8 83 5060 Carl t hayeos ofarelrldeel 3 50 we i OLii ld ALWAYS 14 72 SJ rouh h sample size calculations to h 2l7 reihoeer 216 58 Sri 21772md227 Planning a Study to Estimate 0 What happens to n as the desired precision gets smaller Example Reindeer cont Suppose we would like to estimate the sample size necessary for next year39s roundup to keep SE 5 03 8 83 030 2 39 J n 2 86632 s 867 reindeer 0 When we double the precision ie cut SE in half it requires 4 times as many reinde I ThlS is the result ofthe f De ons About SE 0 How do we make the decision of what SE we will tolerate is the estimation of u I RECALL WWW the 1 part is called the margin of errorah equiyalehttot o is orlum SE fora 95 confidence ihteryal 4an n11 SE E El ifwe scah the o 025 or95 colurhh ofthe t table the trhultipliers are roughly equal o 410mm 3 my mag m2 9 0 So then for example maybe we reason that we want our estimate to be within 1 12 with 95 con dence I USli lg the logic from the preyious slioe thli lkli lg of the spah of the cl suppose a total spah of 2 4 orll 2 is desire y 712 l2 th SE id dt b lt060 eri wou riee o e tltdfgtnmmzzm ZSE12 SE06 m Conditions for Validity of Estimation Methods 0 We have to be careful when making estimations l computers make it e l interpretations are valid only under certain conditions Con ons of va y of the SE formula 0 For F to be an estimate oft we must have sampled randomly from the population I If not the inference is questionablybiased o The validity of SE also requires l The population is large when compared to the sample size ll rare that this is a problem ll sample size can be as much as 5 of the populatioh Without seriously li l atli lg SE l Observations must be independent of each other ooseryatiohsto El we Wal lt h giye h ihoepehoeht pieces of information about the populatioh 17 Con ons of va y of the SE formula 0 De nition A hierarchical structure exists when observations are nested within the sampling units I this is a common problem in the sciences Example Measure the pulse of 10 patients 3 times each 0 We don39t have 30 pieces of independent information I One possible naive solution we could use each persons average Con ons of va y of a CI for u 0 Data must be from a random sample and observations must be independent of each other I If the data is biased the sampling distribution concepts on which the Cl method is based do n t hold I knowing the average ofa biased sample does not provide information about 1 14 11 Con ens of va y of a CI for VerI Icatlons of Condl Ions 0 We also need to consider the shape ofthe data for Student39sT distribution I lrv is normally distributed tnen student s T is exactly valid I lrv is approxlmately normal tnen student s T is approxlmatelyvall I lrv is not normal tnen student s T is approxlmatelyvalld only it n is large CLT How large Really depends on severity of nonrnormallty however our rule ortnumb is n 330 I Page 202 nas a nice summary ortnese condition I NOTE if sampling distribution cannot be considered normal student s T Will not nold o In practice these conditions are often assumptions but it is important to check to make sure they are reasonable I Scrutinize study design for El ll possible bias l39lol39lrll39ldepel39ldel39lt observations l Population Norma ll previous experlel lce Witn otnersimilardata ll nistogramnormal probability plot ll increase sample size ll try a transformatlon and analyze on tne transformed scale 15 m CI for a Populat on Proport on 0 So far we have discussed a con dence interval using quantitative data 0 There is also a CI for a dichotomous categorical variable when the parameter ofinterest is a population proportion p is tne sample proportion p is tne population proportion CI for a Populat on Propor n 0 When the sample size is large the sampling distribution of is approximately normal I Related to tne CLT 0 When the sample size is small the normal approximation may be inadequa To accommodate tnis We Will modify sligntly CI for a Population Proportion I The adjustmentwe are goihg to make topis to useiihstead CI for a Population Proportion 0 So what is the 22 bit um mm A k Z 0 RECALL In chapter 4 2a was the cut point of the upper part ofthe standard normal distribution for a given 0 w we want 202 because we are calculating a con dence interval and need to account for both sides of the distribution I So in the distribution above awouid be 0 05 which corresponds to a 95 confidence ihterva CI for a Population Proportion I The Standard ei39i39oi39of also needs a Slight modification SE1 Plizp HZ lASample vaiuei is typicallywithii i ZSEIN CI for a Population Proportion 0 Before we de ne the formula for a CI for p let s remember the formula for a CI for 1 s RECALL ytdfJn Where i00i e a is the desired confidence o If we pick this apart we are really saying that a Cl foris the estimate Of al i appropriate rhuitipiier x SE 22 m CI for a Population Proportion o Incorporate that logic and we get iz FJ p Where 1001 0 is the desired con dence This time we will use a z multiplier instead of at multiplier Application to Data Example Suppose a researcher is interested in studying the effect of aspirin in reducing heart attacks He randomly recruits 500 subjects with evidence of early heart disease and has them take one aspirin daily for two years At the end of the two years he nds that during the study only 17 subjects had a heart attack Calculate a 95 con dence interval for the true proportion of subjects with early heart disease that have a heart attack while taking aspirin daily Appllcatlon to Data Example Heart Attacks cont 0 First we need to nd z 2 Appllcatlon to Data 0 Next solve for NZ n4 The Text vuunds this tn 2 y05Zjiy05z m7y05l962J7y192 7 n2 7 n19 7n384 n2 7 l that sjust the formula for p now we actually I because this is a 95 ct this rneans that aWiii be 0 05 have to nd and Za vvill be Zum A L Z rzm 2m I in this case 202 i 96 5 CM 26 114 App at on to Data I Next soive ror SEN p SE 0 038 0 962 5003 84 I Fihallythe 95 Cl forp 0 0085 izwai 0 0381960 0085 2 0 03810 0157 0 02110 0547 Application to Data 0 What is our interpretation of this interval CONCLUSION We are highly con dent at the 005 level 95 con dence that the tr proportion of sub39ects with early heart disease m have a heart attack a er taking aspirin daily is between 00213 and 00547 I Is this meaningful 0 Calculate iand SE for a 99 con dence interval m5 boos rzm 2m So zm is 2 58 2 W052 y05z m y0525827y333 n22 23 iiarzss2 n666 f E lln l llnfl n2 n2582 n666 A 1A A o This is a lot of work 0 Consider the following shortcuts I The value of 2 can be carried t rough forall three formulas yn5 2 N N 575 SE 2 22 nz thSE J Diust don t rorgetto square it in and SE I RECALL The t distribution approaches a 2 distribution when dr D this means that at the bottom of the t table there are severalt rnuitibiiers that can be substituted rorz use the dr e row a CAUTlON this Will only Workfor certain levels or a if not round on the t table you rnust go back and solve With the z tablel 1n 4 UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences ulnstructor IVO Dinov Asst Prof In Statistics and Neurology oTeaching Assistants Tom Daula and Kaiding Zhu UCLA Statistics University of California Los Angeles Fall 2002 httpwwwstatuclaedudinov cteld ta Hbspital is summarized all jias m easurem ents recprded Types of variable 0 Quantitative variables are measurements and counts lVariables Withfzw repeated values are treated as continuous lVariables with many repeated values are treated as discrete O Qualitative variables aka factors or class variables describe group membership Chapter 2 Tools for Exploring Univariate Data Types ofvariables Presentation of data Repeated and grouped data Qualitative Variables Distinguishing between types of variable Types of Variables Quantitative Qualitative measurements andcuunts de ne mups Cuntinunus Discrete Cztegnriczl Ordimi fewrepeatedvalues manyrepeatedvalues nuideaufurder fallinnaturalurda Tree diagram oftypes ofvanable Figure 211 0 What is the difference between quantitative and i 7 quahtauve Variables 0 Maintain complete accuracy in numbers to be used 0 What is the difference between a discrete variable in calculations If you need to roundoff this should and a continuous variable be the very last operation I Round numbers for presentation 0 Name two ways in which observations on qualitative variables can be stored on a computer strlngslndexes 0 When would you treat a discrete random variable as though it were a continuous random varia I Can you give an example 24 45 1111 Cnnntry 19711 1975 198D 1585 1991 1115 32m 27m 25m 25m 25m 28m Swltza39land 78 a a 83 France mm mm 82 82 82 a9 1taly 82 82 a7 a7 a7 73 Japan 15 22 21 H 24 23 24 33 Netherlands 51 54 44 44 44 47 Belgum 42 42 34 4 3U 37 Swnzerland 1 15 21 24 24 24 22 U K UK 39 21 19 19 19 23 Umif mufmm Canada 23 22 21 21 15 2n n m valdAlmauz md wkif zn AVE 83 78 71 71 7D Umts mlllmns ftmy ounces O For what two purposes are tables ofnumbers R I presented eonyey mformation about trends 1n the data detalled es 7 21 analysls Chm 6 Can 8 U S 1m 2 vi a Austx a d p 5 g g 11 UlSSR 14a 1 1 I How should you arrange the numbers you are most I When should you round numbers and when should you preserve full accuracy 1 Ba gmph b Pie chm c Segmented 1 interested in comparing Arrange numbers you want to compare ln columns not rows Provlde wnttenyerbal summanesfootnotes Show Percentages of the world39s gold production in 1991 rowcolumn ay Gages Wards sen eynnwlt am am I Should a table be le to tell its own story Figure 231 Dot plot 0 000 0 cluster gap Figure 233 Grading of a university course outlier Atyplcal obs Dot plot showing special features 393 2 R a3 f 3 a Unbroken scale 7 8 3 4 5 Groth m GDP 26 4 39 39 scale break b Broken scale Figure 235 Forecast ofpercent growth in GDP fox 1990 for some SouthEast Asian and Paci c countries Dot ploLWiLh and Without a scale break Figure 234 25 5 Funny mu mqu iZDScadand iEIlEhxiandampWaies tutu m3 mm 911385 dAyemmg we mm waxmkmm Gem2v 5m ummm mme 2 Units 17141711 dathsper 1I S 4 7 8 9 8 Units 11717 deaths par 1I 1D 1 1 3 4 S I S 11 3 I 12 I I 1 5 6 Round Off I 13 n 1 Cunapsem 1 n n n n n 1 1 14 1 2 2 2 2 3 3 3 1s 3 7 2 stem 1 s s 16 1 7 7 17 1 4 1 9 18 2 I I I I 1 19 9 2 2D I 1 1 2 21 1 2 7 22 13 11 24 25 26 8 a FIGURE237 i 1211211523 Frequency m1 fur Female antz Length Class 1mm 1 any Frequency sumardnar pm 70e75 11 2 7 1 A 7 o 5 o a 2 compare 55551711quot unuzzzsu Funny I an 111 7 7 2n 9 mu 1m c eh m a H1stu yam 12 Sumeandleaf platrmated Figure 232 Histugam ufthe female edyd1e1er1guas data I What advantages does a stemand leaf plot have over a histogram SampL Plots retum mfo on 1nd1v1dua1va1ues qu1ck 1e produce by hand prov1de data sortmg meehamsms ButH1st s are more adraeme and more understandable O The shape of a histogram can be quite drastically altered by choosing different classinterval boundaries What e of plot does not have this problem dens1ty trace What other factor affects the shape ofa histogmm b1nrs1ze 0 What Was another reason given for plotting data on a Variable apart from interest in how the data on that Variable behaves shows features clustergaps out11ers as we11 as trends TABLEZ 2 CumeugdusDzh m remiss 93m 97m 9221 1116 93 245 1125 972 91m 92m 935 917 912 91 2m 2154 914 235 22m 71m 213 225 255 9nd 2421 295 24m 25m 27m 22m 255 95m 27m 935 935 9nd 2521 97m 25m 737 axes 97m 95 n 95 n 91 n 95 245 22 n 95 n 95 n 27 95 n mun 1mm 95m 93m 925 95m 925 22m 213 914 229 2154 IBM 232 1141 229 92m 91 9nd 2521 935 72a mus 1mm 1mm 2 aydaes capuxedinNawScaha Canada Data camesyafDrV uEavaaad ABLE 233 T anuznnyTzhlz r rem Cuwteldmgdls Classlnlerwl Tally Frequency Sumeamileafplm 721775 1 2 7 1 4 7572 7 Body 1 2 a 1 2 2 2 1311573125911 MN 12 8555667777889 9195 mm 12 9nnnn1122233444 951nm m 5 9 a 7 7 2 2 1212171215 2 1n 2 2 Tam 49 Histogram binsize change 12 12 2 2 4 4 n H 7n 2 921 mu 7n 2 921 mu LengtMcm langdncm a Ongnal histugam 12 Change classrmterval W1dth mamdwrd e 1merw1w1dth3 Histogram binboundary change 2 12 8 2 4 4 H 7B 21 921 mm 1121 7n 2 921 mu LergtMcm LevyMam 1 Same wrduas d11rerer111mur1dar1es d Densty aare interval width 5 wmdew math 5 rigrarezsg Hlstugams and dans1ty aare uffemale rdyd1e1eng111s data a Ummdda1 12 B1mudal c Tnmdda1 A 1 Negaave1y skewed lung1uwerta11 A e Fusmvely skewed lung uppertzll 11 B1mudal with gap 9 Symmeme 1 Exponential shape j Splkeinpattem k Outha39s 1 Trunmuunplusuuthar Figum 23m Fatwestu luuk funnhistugams and anandlafplms 0 What does it mean for a histogram or stemandleaf plot to be bimodal What do We suspect Whentwe see abimodal plot 0 What areoutliers and how do they show up in these plots What shouldWe try to do when We see them mam 0 What do We mean by symmetry and positive and AS 51000 negative skewness 91 A6 503 0 What shape dowe call exponential WWW Mammy I Should Webe suspicious of abrupt changes Why Yesl mm establish the reasun majump may have tube recu edl O The samplemean is denoted by 7 The W mam Sum ofthe observations Numba39 of observations Mechanical cunsh umunrepmsenhng a dutplut 2 Shaw a balanced md While b and c shuw unbalanced ruds The ample medlan 1 If quotT is not a Whole number the median is the average of the two observations on either side Beware ofinappropriate averaging I Wdcome to l Effect of outliers on the mean and median P a yawn Med 2 a Data symmetric about P P e u u o w e u n Med 1 b molatgest pointsmovedto the right Figure 242 The mean and the median Grey assks m h are the ghasts nfthe pmhts that ware muved T he fivenumber summziy Min Q1 Med Q3 Max SYSVOL Figure 243 Box plot for SYSVOL UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences Instructor IVO Dinov Asst Prof of Statistics and Neurology Teaching Assistants Fred Phoa Kirsten Johnson Ming Zheng amp Matilda Hsieh University or California Los Angeles Fall 2005 IWWW Stat ucla editMilne Vcoursesistudents html Sample Size Calculations Confidence Intervals for Proportions Planning a Study to Estimate 1 l o It is important before you begin collecting data to consider whether the estimates will be sufficiently precise 0 Two factors to consider I the population variability on l sample size Planning a Study to Estimate u 0 First In certain situations the variability on should not be controlled for response in medical study to treatment However in most studies it is important to reduce the variability of Y by holding extraneous conditions as constant as possible I For example study of breast cancer might want to examine only wo n Planning a Study to Estimate 1 l 0 Second Once the experiment is planned to reduce the variability on as much as possible we consider the sample si I For example how many women should we sample to achieve the desired precision for our estimate 0 RECALL SE i n J Planning a Study to Estimate u 0 To decide on a proper value of n we must specify what value of SE is desirable and have a guess of s I For SE we need to ask what value would we tolerate I For s we could use information from a pilot study or previous research Desired SE m J Planning a Study to Estimate u Example Reindeer Cont 7 S 8 83 SE 0 874 e We Would ilke to estlmate u the sample size hecessary forl lext years roul39ldrup to keep SE 0 6 8 83 5060 Carl t hayeos ofarelrldeel 3 50 we i OLii ld ALWAYS 14 72 SJ rouh h sample size calculations to h 2l7 reihoeer 216 58 Sri 21772md227 Planning a Study to Estimate 0 What happens to n as the desired precision gets smaller Example Reindeer cont Suppose we would like to estimate the sample size necessary for next year39s roundup to keep SE 5 03 8 83 030 2 39 J n 2 86632 s 867 reindeer 0 When we double the precision ie cut SE in half it requires 4 times as many reinde I ThlS is the result ofthe f De ons About SE 0 How do we make the decision of what SE we will tolerate is the estimation of u I RECALL WWW the 1 part is called the margin of errorah equiyalehttot o is orlum SE fora 95 confidence ihteryal 4an n11 SE E El ifwe scah the o 025 or95 colurhh ofthe t table the trhultipliers are roughly equal o 410mm 3 my mag m2 9 0 So then for example maybe we reason that we want our estimate to be within 1 12 with 95 con dence I USli lg the logic from the preyious slioe thli lkli lg of the spah of the cl suppose a total spah of 2 4 orll 2 is desire y 712 l2 th SE id dt b lt060 eri wou riee o e tltdfgtnmmzzm ZSE12 SE06 m Conditions for Validity of Estimation Methods 0 We have to be careful when making estimations l computers make it e l interpretations are valid only under certain conditions Con ons of va y of the SE formula 0 For F to be an estimate oft we must have sampled randomly from the population I If not the inference is questionablybiased o The validity of SE also requires l The population is large when compared to the sample size ll rare that this is a problem ll sample size can be as much as 5 of the populatioh Without seriously li l atli lg SE l Observations must be independent of each other ooseryatiohsto El we Wal lt h giye h ihoepehoeht pieces of information about the populatioh 17 Con ons of va y of the SE formula 0 De nition A hierarchical structure exists when observations are nested within the sampling units I this is a common problem in the sciences Example Measure the pulse of 10 patients 3 times each 0 We don39t have 30 pieces of independent information I One possible naive solution we could use each persons average Con ons of va y of a CI for u 0 Data must be from a random sample and observations must be independent of each other I If the data is biased the sampling distribution concepts on which the Cl method is based do n t hold I knowing the average ofa biased sample does not provide information about 1 14 11 Con ens of va y of a CI for VerI Icatlons of Condl Ions 0 We also need to consider the shape ofthe data for Student39sT distribution I lrv is normally distributed tnen student s T is exactly valid I lrv is approxlmately normal tnen student s T is approxlmatelyvall I lrv is not normal tnen student s T is approxlmatelyvalld only it n is large CLT How large Really depends on severity of nonrnormallty however our rule ortnumb is n 330 I Page 202 nas a nice summary ortnese condition I NOTE if sampling distribution cannot be considered normal student s T Will not nold o In practice these conditions are often assumptions but it is important to check to make sure they are reasonable I Scrutinize study design for El ll possible bias l39lol39lrll39ldepel39ldel39lt observations l Population Norma ll previous experlel lce Witn otnersimilardata ll nistogramnormal probability plot ll increase sample size ll try a transformatlon and analyze on tne transformed scale 15 m CI for a Populat on Proport on 0 So far we have discussed a con dence interval using quantitative data 0 There is also a CI for a dichotomous categorical variable when the parameter ofinterest is a population proportion p is tne sample proportion p is tne population proportion CI for a Populat on Propor n 0 When the sample size is large the sampling distribution of is approximately normal I Related to tne CLT 0 When the sample size is small the normal approximation may be inadequa To accommodate tnis We Will modify sligntly CI for a Population Proportion I The adjustmentwe are goihg to make topis to useiihstead CI for a Population Proportion 0 So what is the 22 bit um mm A k Z 0 RECALL In chapter 4 2a was the cut point of the upper part ofthe standard normal distribution for a given 0 w we want 202 because we are calculating a con dence interval and need to account for both sides of the distribution I So in the distribution above awouid be 0 05 which corresponds to a 95 confidence ihterva CI for a Population Proportion I The Standard ei39i39oi39of also needs a Slight modification SE1 Plizp HZ lASample vaiuei is typicallywithii i ZSEIN CI for a Population Proportion 0 Before we de ne the formula for a CI for p let s remember the formula for a CI for 1 s RECALL ytdfJn Where i00i e a is the desired confidence o If we pick this apart we are really saying that a Cl foris the estimate Of al i appropriate rhuitipiier x SE 22 m CI for a Population Proportion o Incorporate that logic and we get iz FJ p Where 1001 0 is the desired con dence This time we will use a z multiplier instead of at multiplier Application to Data Example Suppose a researcher is interested in studying the effect of aspirin in reducing heart attacks He randomly recruits 500 subjects with evidence of early heart disease and has them take one aspirin daily for two years At the end of the two years he nds that during the study only 17 subjects had a heart attack Calculate a 95 con dence interval for the true proportion of subjects with early heart disease that have a heart attack while taking aspirin daily Appllcatlon to Data Example Heart Attacks cont 0 First we need to nd z 2 Appllcatlon to Data 0 Next solve for NZ n4 The Text vuunds this tn 2 y05Zjiy05z m7y05l962J7y192 7 n2 7 n19 7n384 n2 7 l that sjust the formula for p now we actually I because this is a 95 ct this rneans that aWiii be 0 05 have to nd and Za vvill be Zum A L Z rzm 2m I in this case 202 i 96 5 CM 26 114 App at on to Data I Next soive ror SEN p SE 0 038 0 962 5003 84 I Fihallythe 95 Cl forp 0 0085 izwai 0 0381960 0085 2 0 03810 0157 0 02110 0547 Application to Data 0 What is our interpretation of this interval CONCLUSION We are highly con dent at the 005 level 95 con dence that the tr proportion of sub39ects with early heart disease m have a heart attack a er taking aspirin daily is between 00213 and 00547 I Is this meaningful 0 Calculate iand SE for a 99 con dence interval m5 boos rzm 2m So zm is 2 58 2 W052 y05z m y0525827y333 n22 23 iiarzss2 n666 f E lln l llnfl n2 n2582 n666 A 1A A o This is a lot of work 0 Consider the following shortcuts I The value of 2 can be carried t rough forall three formulas yn5 2 N N 575 SE 2 22 nz thSE J Diust don t rorgetto square it in and SE I RECALL The t distribution approaches a 2 distribution when dr D this means that at the bottom of the t table there are severalt rnuitibiiers that can be substituted rorz use the dr e row a CAUTlON this Will only Workfor certain levels or a if not round on the t table you rnust go back and solve With the z tablel 1n 4 UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences ulnstructor IVO Dinov Asst Prof In Statistics and Neurology oTeaching Assistants Janine Miller and Ming Zheng UCLA Statistics University of California Los Angeles Writer 2003 httpwwwstatuclaedudinoVcoursesstudentshtml at d e ctedat aHbspitalis summarized 61y ements recorded on factof en ici ar Types of variable 0 Quantitative variables are measurements and counts lVariables Withfew repeated values are treated as 39 anus lVariables with many repeated values are treated as discrete O Qualitative variables aka factors or class variables describe group membership Chapter 2 Tools for Exploring Univariate Data Types ofvariables Pres entation of data Repeated and grouped data Qualitative Variables Distinguishing between types of variable Iypes oi Variables Quantitative Qualitative measurements audcuunts de ne mups Cuntinuuus Discrete Cztegnriczl Ordimi ewrepeate v ues manyrepeate v ues nni eau Dr er innatur Dr n r a a1 a at a r a fall a1 a Tree diagram oftypes ofvanable Figure 211 0 What is the difference between quantitative and i 7 quahtauve Variables 0 Maintain complete accuracy in numbers to be used 0 What is the difference between a discrete variable in calculations If you need to roundoff this should and a continuous variable be the very last operation I Round numbers for presentation 0 Name two ways in which observations on qualitative variables can be stored on a computer strlngslndexes 0 When would you treat a discrete random variable as though it were a continuous random varia I Can you give an example 24 45 1111 Cnnntry 19711 1975 198D 1585 1991 1115 32m 27m 25m 25m 25m 28m Swltza39land 78 a a 83 France mm mm 82 82 82 a9 1taly 82 82 a7 a7 a7 73 Japan 15 22 21 H 24 23 24 33 Netherlands 51 54 44 44 44 47 Belgum 42 42 34 4 3U 37 Swnzerland 1 15 21 24 24 24 22 U K UK 39 21 19 19 19 23 Umif mufmm Canada 23 22 21 21 15 2n n m valdAlmauz md wkif zn AVE 83 78 71 71 7D Umts mlllmns ftmy ounces O For what two purposes are tables ofnumbers R I presented eonyey mformation about trends 1n the data detalled es 7 21 analysls Chm 6 Can 8 U S 1m 2 vi a Austx a d p 5 g g 11 UlSSR 14a 1 1 I How should you arrange the numbers you are most I When should you round numbers and when should you preserve full accuracy 1 Ba gmph b Pie chm c Segmented 1 interested in comparing Arrange numbers you want to compare ln columns not rows Provlde wnttenyerbal summanesfootnotes Show Percentages of the world39s gold production in 1991 rowcolumn ay Gages Wards sen eynnwlt am am I Should a table be le to tell its own story Figure 231 Dot plot 0 000 T cluster gap Figure 233 Grading of a university course outlier Atyplcal obs Dot plot showing special features 393 2 R a3 f 3 a Unbroken scale 7 8 3 4 5 Groth m GDP 26 4 39 39 scale break b Broken scale Figure 235 Forecast ofpercent growth in GDP fox 1990 for some SouthEast Asian and Paci c countries Dot ploLWiLh and Without a scale break Figure 234 25 5 Funny mu mqu iZDScadand iEIlEhxiandampWaies tutu m3 mm 911385 dAyemmg we mm waxmkmm Gem2v 5m ummm mme 2 Units 17141711 dathsper 1I S 4 7 8 9 8 Units 11717 deaths par 1I 1D 1 1 3 4 S I S 11 3 I 12 I I 1 5 6 Round Off I 13 n 1 Cunapsem 1 n n n n n 1 1 14 1 2 2 2 2 3 3 3 1s 3 7 2 stem 1 s s 16 1 7 7 17 1 4 1 9 18 2 I I I I 1 19 9 2 2D I 1 1 2 21 1 2 7 22 13 11 24 25 26 8 a FIGURE237 i 1211211523 Frequency m1 fur Female antz Length Class 1mm 1 any Frequency sumardnar pm 70e75 11 2 7 1 A 7 o 5 o a 2 compare 55551711quot unuzzzsu Funny I an 111 7 7 2n 9 mu 1m c eh m a H1stu yam 12 Sumeandleaf platrmated Figure 232 Histugam ufthe female edyd1e1er1guas data I What advantages does a stemand leaf plot have over a histogram SampL Plots retum mfo on 1nd1v1dua1va1ues qu1ck 1e produce by hand prov1de data sortmg meehamsms ButH1st s are more adraeme and more understandable O The shape of a histogram can be quite drastically altered by choosing different classinterval boundaries What e of plot does not have this problem dens1ty trace What other factor affects the shape ofa histogmm b1nrs1ze 0 What Was another reason given for plotting data on a Variable apart from interest in how the data on that Variable behaves shows features clustergaps out11ers as we11 as trends TABLEZ 2 CumeugdusDzh m remiss 93m 97m 9221 1116 93 245 1125 972 91m 92m 935 917 912 91 2m 2154 914 235 22m 71m 213 225 255 9nd 2421 295 24m 25m 27m 22m 255 95m 27m 935 935 9nd 2521 97m 25m 737 axes 97m 95 n 95 n 91 n 95 245 22 n 95 n 95 n 27 95 n mun 1mm 95m 93m 925 95m 925 22m 213 914 229 2154 IBM 232 1141 229 92m 91 9nd 2521 935 72a mus 1mm 1mm 2 aydaes capuxedinNawScaha Canada Data camesyafDrV uEavaaad ABLE 233 T anuznnyTzhlz r rem Cuwteldmgdls Classlnlerwl Tally Frequency Sumeamileafplm 721775 1 2 7 1 4 7572 7 Body 1 2 a 1 2 2 2 1311573125911 MN 12 8555667777889 9195 mm 12 9nnnn1122233444 951nm m 5 9 a 7 7 2 2 1212171215 2 1n 2 2 Tam 49 Histogram binsize change 12 12 2 2 4 4 n H 7n 2 921 mu 7n 2 921 mu LengtMcm langdncm a Ongnal histugam 12 Change classrmterval W1dth mamdwrd e 1merw1w1dth3 Histogram binboundary change 2 12 8 2 4 4 H 7B 21 921 mm 1121 7n 2 921 mu LergtMcm LevyMam 1 Same wrduas d11rerer111mur1dar1es d Densty aare interval width 5 wmdew math 5 rigrarezsg Hlstugams and dans1ty aare uffemale rdyd1e1eng111s data a Ummdda1 12 B1mudal c Tnmdda1 A 1 Negaave1y skewed lung1uwerta11 A e Fusmvely skewed lung uppertzll 11 B1mudal with gap 9 Symmeme 1 Exponential shape j Splkeinpattem k Outha39s 1 Trunmuunplusuuthar Figum 23m Fatwestu luuk funnhistugams and anandlafplms 0 What does it mean for a histogram or stemandleaf plot to be bimodal What do We suspect Whentwe see abimodal plot 0 What areoutliers and how do they show up in these plots What shouldWe try to do when We see them mam 0 What do We mean by symmetry and positive and AS 51000 negative skewness 91 A6 503 0 What shape dowe call exponential WWW Mammy I Should Webe suspicious of abrupt changes Why Yesl mm establish the reasun majump may have tube recu edl O The samplemean is denoted by 7 The W mam Sum ofthe observations Numba39 of observations Mechanical cunsh umunrepmsenhng a dutplut 2 Shaw a balanced md While b and c shuw unbalanced ruds The ample medlan 1 If quotT is not a Whole number the median is the average of the two observations on either side Beware ofinappropriate averaging I Wdcome to Effect of outliers on the mean and median P a yawn Med 2 a Data symmetric about P P e u u o w e u n Med 1 b molatgest pointsmovedto the right Figure 242 The mean and the median Grey assks m h are the ghasts nfthe pmhts that ware muved T he fivenumber summziy Min Q1 Med Q3 Max SYSVOL Figure 243 Box plot for SYSVOL Jn rotor ml m L W Techin Assis s Repeated and grouped data UC W Qualitativevariables 9 u dinoco sst39d 39b t m A subset ofthe data collected at aHospltal 1s 11mm 39 in 39s table Each patient has measurements recor d for anumber of variables 7 Ejection factor ventricular output blood I 113 systolicdiastohc pressure e c Reading the tab e Which 39ofthemeasured wriables age ejection etc areuseful in gredictinghow long the patientmaylive Are there relationships betWeen these predictors Variability xenoisein the observations hidetheruessage a v dride e counts a I I V Iypes of Variables lVarlables with few repeatedvqluex are treated as continuous Quantitative Qualitative measurements and edunts de ne mups lVariables with many repeated valued are treated as discrete Cmg39m Cuntjnuuus Disuem zl ordinal few repeated values many repeated Values nu idea of urder fall in natural arder Qufdmmve Yan bles a39k a39 faCtOFs or relasy Figure 211 Tree diagam oftypes ofvanable vanables describe group membership New swam am it C What is the difference between quantitative and qualitative variables I Round numbers 39for presentation 0 Maintain complete accuracy in numbers to be used in calculations If you need to roundoff this should be the very last operation C What is the difference betWeen a discrete variable and a continuous variab e O Name two Ways in Which observations on qualitative vaiiables can be stored on a computer stringsindexes 0 When Would you treat a discrete random variable as Lhoughit were a continuousmndornvariable I Can you give an example 134 4513111 lentry 1971 1575 198 19 US 32D Z7E Z6E Z6E watz Edend 78 83 83 83 France 1I 1I 82 82 82 89 Italy 8 82 67 67 67 73 51 54 44 44 44 47 42 42 34 34 3D 37 15 21 Z Z4 Z4 22 39 21 19 19 19 23 23 22 21 ZEI 15 ZEI 83 78 71 71 7D ftmy ounces Rest 21 Chm 5 Can 13 39339 m a 39 Aug m D g E 5 E 11 USSR 11 Figure 231 Dot plot 6565 000 o 1 a Bar graph b Pie chart c Segmented 1m V t cluster gap outlier Figure 263 Percentages ofthe world39s gold production in 1991 Atypical obs n mwu mm Figure 232 Dot plot showing special features 30 40 a Unbroken scale ooooo noon 0 o 10 15 55 60 20 m scalebreak b Brokenscale Figure 233 Grading of a university course Figure 234 Dot plot with and without a scale break 3 E51331 t fs t 7 8 0 1 2 3 4 5 Growth m GDP Figure 235 Forecast ofpercent growth in GDP for 1990 for some SouthEast Asian and Paci c countries Umts 1 141 4 damsp mama 5 4 Umts 11717 damsp mama n s Roundoff c Dilapse tn nun11 22333 8 12 stems 17er 2va 121n1s1m1 ZUSm md mu mam M m 1 ms 641me mmtmmnmmma amt Wm mung 21 mm n n 1 1 1 1 1 2 2 2 2 mum 5m FIGURE 237 122222223 222222221222 22 22222 CoyoveLeng u 222 272 222 2222 222 245 2225 272 222 222 225 227 022222222222 222 2222222 32222222222222 W2 915 REID 864 914 835 88D 71B 813 885 865 WEI 70e7539 H 2 7 l 242 225 242 252 272 222 225 222 272 225 225 222 752227 a 7 25 272 222 727 ms m s E o I 4 4 4 Mi 2522277722722 22 2555227777222 272 22 222 222 222539 mm 2 222222222222222 IDIEI 222 232 225 252 225 222 213 214 2 1212 954 W 5 9 5 7 7 E E 222 2222 222 222 222 222 252 225 722 2225 2222 222 1 w 2 3 I 2252 222 255 225 225 222 222 CayutzscapmredmNmScmu c2222 DatacamesyafDxVenEasmad 22 v TABLE 233 FrequzmyTah 2 M Femalecm L 2 compare F 2 23922 Frequency 22 4 38 72775 2 7 2 gagE 7522 2 7 D 2222 Body 2 2 2 2 2 2 2 2 22 2 22 3 length2522 mm 22 2 5 5 5 2 2 7 7 7 7 2 2 2 lawman m mm 13 9 2 2 2 2 2 2 2 2 2 2 2 4 4 4 a Hmumm 2522me222222rp222222222 25222 77 5 2 2 7 7 2 2 2227225 2 m 2 3 Figuxeux Histugmmufthefanalecuyut Engthsdata 22222 42 Histogram binsize change 22 22 7 i A 2 E E I What advantages does a stemandleaf plot have over 2 4 i a histogram 5ampL P1222 retum 22222 222 2ud2v2du21 values quickto produce by 122221 provide 21222 sorting mechanisms But H2222 are more 5 9 m 7 E 9 m attractive andmore understandable 2222212222 22222222 I 0 1h it b Chan 1 r ta39wl dth a 3335222 322 Ogimrilsiligh9m O The shape of ahistogram can be quite drastically 39 Hlsmgmm binboundary Change altered by choosing different classinterval 22 2 boundaries What type of plot does not have this 2 E problem deus2 trace What other factor affects the 2 2 shape of ahjstogmm 22pm u 7 Eu 9H m u 7H m 9H m I t Was another reason giyen for plotting data on a 6 ch Laws variable apart from interestinhow the data on at 2 Same wmms differentbuundanes 2 Density Lace 22222222 width5 windawwidth5 variable behaves shows features clustergaps outliers 2s wen 22 Figure 232 Histograms and density Lace 2r female coyoterlmgths 2222 trends A 2 1122222221 2 22222221 2 122222221 2 2 sp2ke 222 pattern 2 Symmeme e Fusmvely skewed 2 Negatively skewed lung uppa 2221 12222122222221 AA g Symmeme h 22222221212222 gap 2 Fxpuueuuai shape k Dumas 1 Tmncauunplusuu ier Figure 2312 Features 2212222 222222 2222232222 and starnrandrlafpluts Mean 45 50133 mmmum 35000 Maxlmum 59000 Figure 241 Meehameal construction I What does it mean for ahistogram or steI39nandleaf plot to be bimodal What do we suspect when we see a bimodal plot I What are outliersandhow do they show up in these plots What should wevtry to do when we see them I What do we meanbysymmetry and positive and negative skewness I What shape do we call exponential I Should webe suspicious of abruptchanges Why Yes Try to estahhsh th xeasonthejump may have to maximum I The samplemeangs denoted by E Standarddevmnan Sum of the observatlons Number of observations The W12 mean Median 51000 TrMean 50355 Qa 55000 s wpw mung StDev SE Mean 5092 0909 on 45500 Lower quamle representing a dot plot a shows abalanced rod whxle b and E Show unbalancedrods umdmm M 1m 1 If 1s not awholenurnber the medlangs the average of the two observations on either side Beware of inappropriate averaging I P I IIWII II Med 2 a Datasymmetric about P P nunwu 7 o Med 7 b Two largestpoints moved to the right Figure 242 m mean and the median Grey disks m h are me gangs arms pmms Lhatwere muved anylnt Dntylnt SYSVOL Figure 243 Box plot for SYSVOL Scale FigureZAA Construction of box plot

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.