Statistics AMS 5
Popular in Course
Popular in Applied Math And Statistics
This 58 page Class Notes was uploaded by Milton Sawayn DVM on Monday September 7, 2015. The Class Notes belongs to AMS 5 at University of California - Santa Cruz taught by Staff in Fall. Since its upload, it has received 542 views. For similar materials see /class/182149/ams-5-university-of-california-santa-cruz in Applied Math And Statistics at University of California - Santa Cruz.
Reviews for Statistics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/07/15
I 1Ms75 STATISTICSI Central Limit Theorem In this class we will introduce the notion of a probability histogram These are histograms that describe the behavior of random variables We will consider approximating probability histograms using a normal curve This is based on a mathematical result known as the Central Limit Theorem FE I 1Ms75 STATISTICSI Each box is centered at a number and its area corresponds to the probability of that number The sum of the areas of the boxes is equal to one This is because the areas are associated with probabilities or chances Probability histograms are used to represent chance and not the frequency of data Histograms based on sampled data are used to represent how the data are distributed over their range Probability histograms correspond to the chances that a random variable take some speci c values 186 I 1Ms75 STATISTICSI Consider the box of a 4 are 27 ity histogram are 47 the chances of a a 3 are 17 and the chances g a graphically in a probabil 39 Probability histograms Prnhzhll y msnmmm Then the chances of ob taining a ticket with a 1 We can display that information 185 I 1Ms75 STATISTICSI Empirical his tograms based on the frequencies of observed outcomes of an experiment converge to the corresponding probability his tograms as can be seen by the example of rolling two dice I 4Ms75 STATISTICSI In the previous example consider taking the product of the two dice The convergence is also true when con sidering the prod uct of the two dice In this case we no i tice that the prob u ability histogram is 4amp1le a much more irregu lar than the one ob tained for the sum E The regularity is a general feature related to the sum 188 I 4Ms75 STATISTICSI Using the normal approximation We can approximate the probability histogram of the sum of heads in a large number of coin tosses using the normal curve Q A coin is tossed 100 times what is the probability of getting exactly 50 heads A We can look at the probability histogram for this case We observe that the chances corresponding to 50 are equal to the area of the box that has a base from 495 to 505 The area of this box is 796 Q What about an approximation using the normal curve A First step is to calculate the mean and standard deviation Consider a box model where there is a zero for the tail and 1 for the head 190 I 4Ms75 STATISTICSI Consider the problem of tossing a fair coin a certain number of times 11 We can obtain the probability histogram for each n mosses A We that 1 l 5 s l x g m the probability his togram of the num ber a verges to a very reg masses of tails con ular curve as the ummsses number of tosses is increased This curve is a common probability density named Gaussian curve 189 I 4Ms75 STATISTICSI We draw a ticket from this box 100 times The average is 1 100 X 50 2 and the standard deviation is given by the square root law 1 x 100 X E 5 Now we have to convert to standard units 49557 50 701 50557 50 So the normal approximation consists of the area under the normal curve for the interval 0101 According to the table this is equal to 796596 01 Q What are the approximate chances of getting between 45 and 55 heads inclusive A The probability of getting between 45 and 55 heads is equal to H u H I 4Ms75 STATISTICSI the areas of the rectangles between 45 and 55 in the probability histogram This is approximated by the area under the normal curve for the interval 445555 In standard units this corresponds to the interval 1111 which has a probability of 728796 according to the table Q What are the approximate chances of getting between 45 and 55 heads exclusive A This time the probability is given by the areas of the rectangles between 46 and 54 which is approximately the area under the curve corresponding to the interval 455545 this is the interval 0909 in standard units which has a probability of 631996 Very often it is not speci ed if the end points are included or not In that case we consider the approximation using the given interval So for the previous example we would have 4555 that is converted to 11 in standard units and yields 682796 probability I 4Ms75 STATISTICSI 192 The Central Limit Theorem In general it is true that the probability histogram of the sum of draws from a box of tickets will be approximated by the normal curve This is a mathematical fact that can be expressed and proved as a theorem The Central Limit Theorem When drawing at random with replacement from a box the probability histogram for the sum will follow a normal curve in the limit This is even if the probability histogram of the contents of the box does not have a probability histogram that is approximately normal The reason why the CLT is used as an approximation for distributions of lists of numbers is that it often happens that the uncertainty in the data can be thought of as the sum of several sources of randomness I 4Ms75 STATISTICSI When can we use the normal approximation Consider the box then the probability histogram for the tickets in the box is I which is far from being normal Nevertheless if we consider the experiment of drawing tick ets from the box and sum the results over and I over again then the probability histogram of 1 the sum will be approximated by the normal curve What if we consider the product of the tickets In that case the probability histogram will not be approximated by a normal curve no matter how many draws from the box we take I 4Ms75 STATISTICSI 193 Q Four hundred draws will be made at random with replacement from the box Estimate the chance that the sum of the draws will be more than 1500 A The average in the box is 4 and the SD is about 224 The expected value for the sum is 4 X 400 1600 and the SE is m X 224 m 45 Converting 1500 to standard units we have 1500 7 1 600 T According to the normal curve the chance of being above 222 is about 99 7222 195 1de W5 AMS 6 Wow W3 1 SW Model Gov Wms 21 P Len gth lhf vana HWF ibl olchJ Mon 22 May P9 Q lo coM39A LfiLolll MPQNJt rl39 UUmmmVu3 Chowk P53 me WM 05 Win Hn s r 0f V3 4b mfgga g n3 E o NS or minus 6 9 I 2 s 39 1 gr 1 Mi A 11139 L 3 J YW VE We39s a gs CWCL W A L 9 Wm dl39ff no mong 23e yas A7 ngxis Q u e S Ha MWVGL 7 235 7 LUVH 230 39 39 3 374 weaneda 45 MW MWVM lZs v 15E C13 10w 3in v WDh 1 jW W omom woua M7 uym smug m m uch WNW W you ve Wimb VS Wotund 7430 I or Wu OW VIPo 3 LUJM a 16197 4 giboes W5 a W 444M qui i ilqv7 i Fvequ nu 3J m m 314 amnqu 0 QYbbJ9 Nor Mformaiuy A g 4 AMA L M m 5am C 6 7 07 r 7 0 L4 wm zqw i n WM 6436 amt xiiim MM is m wzloo 3 Mb no in We M is in JrW PD39Do 71 no my MHUULS in 07 army Ml Yv LWU U ij 13 DD CW7 Eta 1913 Csmp ng 7 2 whim 99 9 SWLHQFLQMQS 7 4f y V 999g on Shag S p 51 BgndomjzaHm IS gocsd bu hoof gqu rhigq i Q 991 LQI C gwugss wltgt39Igt02 gte 3 V9 fbegg S 39 r VWv H w mm u r w V N 0mm mama Moxie Blow Pressmv m H33 eche r 7 7 ym ablei C08 trod Packer CSCF PingsQ VDgn9 c K x A mm xr5w QSJSQCLQ IQSL4441m9fb1 h q X ICowsas 5 H 51 causes x ED E X 71 8 assocJ cd edi 00quot 2 s ragsMS m 5 g PORDHOA C0mundin3 Fador PCFY 4 3m uam ame Hmoch be assocmhed wiHm bow x d 32 in We Way oF cu comer Undersmd hg 09 whehxer x f s Coquhxa j 3 3 Asp age goes up blood presswe gets up age goes up PQH use 90 dawn 69 3 6mm ma cenw Haul 0 dead wH h Cnn m 1cm 0 PCFZ mm 8m Divrde Ox Qubkech yAo Komogenecus ubsmU39os Wan ml over on PCP mom swarms T ac commisoms peed n T4 C 3PM Am each 3 WQSC Sw osmuPs P 0mg CjiC FQJ enCLS Conw WPCF we see 1 QA RCT Mae C345 0u acme A6333 vs a AXDS Eh w men r ISo I vs Placebo Cc 9 PCP I xom g SW3 make thoe oLouFCONQ is a pQRALI Ql CcinCco Hn CXer I XM875 STATISTICSI The Law of Averages In this class we will consider the law of averages We will introduce the concepts of expected value and standard error We will see that random variables can be represented using a box model This in turn can be used to calculate expected values standard errors and approximate the probabilities of relevant intervals E I XM875 STATISTICSI Q A coin is tossed and you win a dollar if there are more than 60 heads Which is better 10 tosses or 100 A 10 tosses is better As the number of tosses increase you are more likely to be close to 50 according to the law of averages Q Same as before but you win one dollar if there are exactly 50 heads A 10 tosses is better since in absolute terms you are more likely to be off the expected value when the number of tosses is large Q 100 tickets are drawn with replacement from one of two boxes one contains two tickets with 1 and two with 1 The other contains one ticket with 1 and one with 1 One hundred tickets will be drawn at random with replacement from one of them and the amount on the ticket will be paid to you which box do you prefer A In this case the expected payoff is the same for both boxes since they both have 50 1 and 50 1 I XM875 STATISTICSI Averages When tossing a fair coin the chances of tails and heads are the same 50 and 50 So if the coin is tossed a large number of times the number of heads and the number of tails should be approximately equal This is the law of averages The number of heads will be off half the number of tosses by some amount That amount is called chance error So we have number of heads half the number of tosses chance error The chance error increases with the number of tosses in absolute terms but it decreases in relative terms El I XM875 STATISTICSI A box model A box model is a useful device to understand the properties of random variables These are variables whose results depend on the outcome of a random experiment Two quantities that describe the behavior of a random variable are the Expected Value and the Standard Error A box model relates to a process that can be simulated by considering that there are a number of tickets in a box each of them has a number and they are drawn at random The numbers that result are added I39Sl I XM875 STATISTICSI Consider a box model for a roulette A roulette wheel has 38 pockets 1 through 36 are alternatively colored red and black plus 0 and 00 which are colored green So there are 18 red pockets and 18 black ones Suppose you win 1 if red comes out and loose 1 if either a black number or 0 or 00 come out Your chance of winning is 18 to 38 and your chance of loosing is 20 to 38 A box representation is 18 tickets 20 ticket I XM875 STATISTICSI Expected value Count the number of heads in 100 tosses of a coin The law of large numbers tells you that the expected value of the number of heads is 50 You actually toss he coin and the results are 0 57 heads you are o r by 7 o 46 heads you are o r by 4 o 47 heads you are o r by 3 The amounts o r are similar in size to the standard error which we will de ne in a couple of slides The expected value and the standard error depend on the random process that generates the numbers E I XM875 STATISTICSI Suppose now that you bet a dollar on a single number and that if you win you get your 1 plus 35 but you loose your 1 if any other number comes up A box model of this bet is given by 1 ticket 37 tickets What is your net gain after 100 plays This correspond to the amount of money you are left with after 100 draws with replacements from the box To calculate this amount we need the concept of Expected value E I XM875 STATISTICSI Consider the box and draw a ticket at random with replacement 100 times What is the expected value of the sum of the tickets The chance of a 1 is 75 and the chance of a 5 is 25 So we expect to see 25x575gtlt1200 Notice that this number is equal to 100 X 2 200 which is the number of draws times the average number in the box As a general rule we have that the expected value is given by I number of draws gtlt average of box I 120 I XM875 STATISTICSI Q Suppose you play Keno a game where you win 2 and pay 1 to play You have 1 chance in 4 to win How much should you expect to win if you play 100 times A A box representation of the game is given by so the average in the box is 2 7 3 X 1 70125i So you are expected to win 25 Of course if you keep playing you are expected to loose more money Q Consider the box IE IE and suppose 25 draws with replacement are made from the box What is the expected sum A Each number should appear 15 of the time that is 5 on average So the expected value of the sum is 5gtlt05gtlt25gtlt35gtlt45gtlt67525gtlt3 I XM875 STATISTICSI H to H Back to the example of the box with 5 numbers to compute the SD we need the average which is equal to 3 The deviations from the average are 73 7 1 0 1 3 91019 V 5 f If we do 25 draws from the box the SE is x 25 X 2 10 So we expect the sum of the draws from the be around 75 plus or minus 10 then the SD Observed values are rarely more than 2 or 3 SEs away from the expected value 123 I XM875 STATISTICSI likely to be The standard error calculate the standard error for the sum of the draws as In the last example we will not see each ticket appearing exactly 5 times The actual sum we observe will be o r by a chance error sum observed value expected value chance error The standard error gives a measure of how large the chance error is When drawing at random with replacement from a box we can numbers in the box I number of drawsgtlt SD of box where SD of box stands for the standard deviation of the list of Notice that the SE increases as the number of draws increases but only by a factor equal to the square root of the number of draws 122 I XM875 STATISTICSI Consider again the box IE draws the expected value is 75 and the SE is 10 Also we observe that in 25 draws the sum ranges from 0 to 150 all 0 through all 6 What are the chances that the sum will be between 50 and 100 Can we make the previous statements more precise To answer this question we observe that IE 1 We know that in 25 50775725725gtlt10 and 1007752525gtlt10 So that 50 and 100 are 25 times SEs away from the expected value We say that 25 is 25 standard units I39Evl Casey vulva fund 11quot Sh iffh cbl r ead m5 DD ah I FPP ch lawn A U24 Hay 0 Class We 4W5 UUJ omwgil l y Modal 45 quot subs A Hem hewm error 16de 425 Meow 4 13 wow 0 work 1ch 0v 1 fear L Jay dU L Y CWMM A 1 Ms 5 513 az 13 W00 06 m MOM EYfa Ie Mug A em 64 W HM39 WmlLS MCM for 52 10 so mam Emma quot TA 95 SDJa dmJ 9CST7QW 13 AV km Loco dms 249 ch 14gt Lu ELEM59 3 or w 5Ler 182 I wch Mags mg by m mat AS CW Wu V HNJ I CW LIquot W39 1110M CHI gt4 quotFossmb way A oww an 04 1 lav in A no 8 a M4000 9quotquot 3 quot729 W spur 976 w we 5 7 aw para Ais jrah 3 V 41 3r l Jag 5 146 6quot S M U misc 0 319 SL160 L901 19w 233 A 201 g 73 I t 0 r m TL dow 14 14154 WM bus X Egg MSmLo m M 5L RwCLJTW v t lLcnomcl arm in I logged rm ngcm p391 g Gu bk Mj MmrdLJS G t A254 mk ma 1 long Ni NM 0 39 0 3 I EL 13 ow Ufa 323 RC lv 23 m I XMSg5 STATISTICSI Average and standard deviation In this class we will give the de nition of average as a measure of the center of the distribution of the data We will consider the relationship between the average and the histogram We will also give the de nition of the standard deviation as a measure of the spread of the distribution of the data We will compare the notion of average and median We will see how the relationship between average and median determines the shape of the histogram The concepts of longitudinal studies versus cross sectional studies will also be discussed I XMSg5 STATISTICSI To obtain an estimate of the center of the distribution we can calculate an average The average of a list of numbers equals their sum divided by how many they are Thus if 18 18 21 20 19 20 20 20 19 20 are the ages of 10 students in this class the average is given by 18182120192020201920 10 195 In the hospital data that we considered in the previous class the data corresponded to the average length of stay of patients in each hospital in the survey This means that the length of stay of all patients in a given hospital were added and the sum divided by the number of patients in that hospital I XMSg5 STATISTICSI histogram These two histograms cor respond to samples with the same center The spread of the sample on top is smaller than that of the sample in the bot tom Average and spread in a histogram A histogram provides a graphical description of the distribution of a sample of data If we want to summarize the properties of such a distribution we can measure the center and the spread of the Hlslnnmm mm I XMSg5 STATISTICSI person Longitudinal versus crosssectional studies Suppose the university conducts a survey of all students faculty and sta on campus and records the height age and weight of each Such a study is a cross section of the campus population It gives a picture of the characteristics of the population at a given time Suppose you group your sample in three age groups below 30 between 30 and 50 and and above 50 Suppose you observe that the average weight in the rst group is smaller than in the second and this in turns is smaller than the average weight in the third group I XMSe5 STATISTICSI You can not conclude that age is responsible for an increase in weight This is because the effect may be confounded with the fact that eating habits may have changed during the last decades and this may have an effect in the average weight of the population Warning You can not draw conclusions about the effects of age from such a study since you are comparing different people of possibly different ages To draw conclusions about the effect of age you need to conduct a longitudinal study That is you follow the evolution of a person s weight in time for each person in the sample I XMSe5 STATISTICSI Notice that this histogram is not symmetric with respect to the average lemgrm m an A symmetric his togram will look like this g 39 In this case 50 of A the data are above g the average The median of a histogram is the value with half the area to the left and half to the right In a symmetric histogram the median and the average coincide I XMSe5 STATISTICSI Average and median This corresponds to histogram ofrainfall in Guarico Venezuela hIStOgram the rainfall over periods of 10 days nms in an area of the eman mean central plains of Venezuela Denslty n mm The average or rainfall is 3765 mmr We observe that only about 30 of the observations are mean n 005 0 mm W above the average I XMSe5 STATISTICSI The relationship between the average and the median determines the shape of the tails of a histogramr kA Average brgger mar Average aboulme same medlan long rrgm lall asmedAan symmetry Average is smaller mar medlan long le lall The average is very sensitive to extreme observations so when dealing with variables like income or rainfall that exhibit very long tails it is preferable to use the median as a measure of centralityr I XM875 STATISTICSI A measure of size Consider the sample 05787 77 73 How big are these ve numbers If we consider the average as a measure of size then we obtain 02 which is a fairly small value compared to 7 The trouble is that in the average large negative quantities cancel large positive ones To avoid this problem we need a measure of size that disregards signs We proceed as follows 1 square all values 2 Calculate the average of the resulting numbers 3 Take the root of the resulting mean This is called the root mean square size of the sample I XM875 STATISTICSI Spread As we saw at the beginning of the lecture two samples can have the same center and be scattered along their ranges in different ways To measure the way a sample is spread around its average we can use the standard deviation or SD The SD of a list of numbers measures how far away they are from their average Thus a large SD implies that many observations are far from the overall average will be more than two SDs away Most observations will be one SD from the average Very few I XM875 STATISTICSI For the previous data set we have 02 52 78gt 72 lt73gt2gt 5 rims size We could have also considered the average disregarding the signs which amounts to 0573 46 Unfortunately the mathematical properties of this way of measuring size are not as appealing as the ones of rims El I XM875 STATISTICSI We can quantify what is written above as o Roughly 68 of the observations are within one SD of the average 0 Roughly 95 of the observations are within two SDs of the average 0 Roughly 99 of the observations are within three SDs of the average This statements are more accurate when the distribution is symmetric AHS C0523 L mheH L 94706 Van Ma a 1415 HQ orm Curve I exp elm 94 nex W exp Jar 7 Sanlinj Pewimg DD cb FPP ch 1203 pr ch nou an MaeF 7 A S aluhl7 LWFCJS 8506 Cele Merlquot macc an aw aF 8553 paid Mak39WfS are39Hindne Mugen mlm Kayla 553 MMLIQ A L COWlaid it swn l39ovt 0F aria page 3 l 39 67M39 7334 xf hmqu39 area mdw Ma normal arm is 007 0 n 52mm c M quotW yakMH39 use Mgrlamb 0 am JFCMJWrLTQD OM 67AM D mu a wad Mb Iconic o cor9904 HIM 7 W O A I aw mo IMF539 0 U Ae new eaCLk 5U 391 dicta 1 2 OWL 3 SD a Arm her 67H 79 0 2 YU n new quot39 3U or 2 5 libV67quot a S39I M a K aria arm 39 ampc k O 7C W W 3U O D ftiaaczx 39 1quot is quot 2 64 amnion de 60M Lc Qc l39I G mungv FDWA 5 l 7 5659 WAD C an faj LIV3 39 EXfi PI39NuLi 0514 5 591 Cdnver kb 40 AIDS MMS39WVD 317 a de yO Cvezl w an GilIL 6mm 4km alf 0 LJ I2 Sea Lie 1 HMJ 71 P1640 6 H01 how 39 39Hm W13 11 er ao w MS low 173 woss ola Q73 aw 39J39Od IHMLJ Pecplz IE4 amp j ECCML 64 cw M ELM ML LuJ4 iota on acCML oal More 3915 7quot00 ha SkhF Wh xolmk LJLthquot 14 DJ 39 and 1455233 in uid duel 2710 NLJIQ 5b 710 144 76 1 ow 130 HALL DOSQ P93951491 I39MKg n Mh Lalf J c COUH 92471136 EAJY 7L GY 7L3 quot W cm 47 cop Ll AHS 5 Casey UM W L q QIo Clan Mo l ef l4n5 m Sanlglinj pro aaL IHL7 pm l Wxi1uei no clArf Hm April 9 I wea April apmeMLMV amt 1 ltLtscursions m ocean a wual an Hawk 39DD aging llow ma a Li anCJ Mgchedulecl 39lu Fri wgl April 98 2 9 on to s q COMP24 quot m dohli c d 1 randcnuz 395 39 V J 9 a d41 ere cLesigvx QQ Po 39 ah InaRye oA acme x doneuorlz I 3 AK 4 nsmnfacs Gcl 0 Jeep k Lake Lave MLR sleepwj hexane lua S eep Jule Mingus Md lane Vlo39i 39 739 c L T c 8060 if dig M M 1 m 06 07 1 1 LI 00 3 39 39 1 to 61 Q quot039quot Mdowx u 4 56 and K1 lm 4 932 SaMEY V g 9 53 04quot Naamp r 0ft Huck7 quot LirlE ar n Dug25 Poll SaMPk func431 Po pmlorhlcv alt sub cd f pape o c Iw39k Cf WOL SC VCA ow map4 5am I IL 9180 VOWS 4 Q S akaHow 44th MSzucl39l39 WJYquot Russo 414 no Wu C a no o N Wowow n quot 4Wquot var i 39 no I l A ll 9 f x New 1 Z303gtMem 53 2 quot72th 1Q M 1 A Md p MPWQ4UJII Minnow Fraudim w w g a meow Lul39 39l Luu 0 Chuck L3 LAW D a 7 30 Haw Ls 9 F 17quot f 7 MM P a7a 7 err0w let M m iquot ULch fe47 69 b13910 for 30 PE 29gt wrcri SP 36 mras rn PM 9505 0 3345 r P BFQIPL totem 333 W7 7P matrxzru Ea lm Ring Snx9od it 833 9 Ll Gtmo wd nw gorg 03w 007 w FVHI mo E W0 39 szo SJ 239 E1 5095 5 row C gnu p 4 on 311 S3Q010mn r P L viiLiv E m A Eitidwgigy Aw 107 393m 93 gr wsuv mo Oi F0 Su Bu Ox wkww 1 Eu Q Oi Fix 391 00 Mari 33 blamcrnp D 22 gm r an 6 2 may gtu Co clt w9un Efrr gw WW wmrrv OS RP oxosfw OJ OTLUAV Crbp 3w r ma Elfw FMszl w lgPTPl Tkmmknm Xv 05 3 we pm 5 3 Judy 9 QNm D n if me L 938 menros WW I 05 on 93 gw 335934 w abyn ab 1 sexuhvucip th 503 L3 are 6 7033an d 390 VA Enfp nrzw E a SPF AQQOP 283 9amp0 m wmx c7 Vinxr uv m3 aw RUM 9 MM M ua 0C0 H 7 r r quotr 39 V V V W r m V V A 7 a V A I ALL U 7 r S R15 1 S Os 39 I 65 C will given Ham n s teary large 39TH U flagchhCg LEA PM Marya Twinquot TErTWcr CH L 19L1M fquot 6amp1 J4 1mm v CDMF WEE TIME L Am Normu 12 NEXT WEEK L3f urorv fEcTtoAf Han1 To 8E PEAUQAANCYEJ Ir 3 THE 1 wfy 7795me L ac 6 TUE 1 MAD OrlJoAA Ancums WE TWE W U1 Mlt 39MP fau oNf Ja b VIE vim1v Jayo rem PM M NH no 4290 9114 N hrva 37L 0N E TIME ONLW 3 L7 oFrN39T MHzfa WE Apart r ch39rww J39LEAI E erg To ANVY HEiL JVCT IQM rer WEEK 1 39NFE L a iz f Pi j9f2 c1 J7 df hk WildW 0 90393 at Z 9 HLOMF I COME n x D 76 I 739 60 tL7 300 MEAN q lr oi quot390 mgH I I 3 39o r i I AO39 910 t 3 40o 97mm LLEGE39DLquot A NJa of Par I cs k mm W 32 W Mquot m w IREHI IANPM Q vH rr for 1 frer 9 AM no MIC WWW77 WW A FEEJD To 1 mm b QPVEV397 IF lm W M W N 70 WWW 3 V9 PVT HAVE N w 412 MAIN H No ww39 wofa MoT39H39EL 6142 My ALL q xlt rm nanTr In 1906 MegaL THEE 7 f rq DWT vacHT Nor PE MiMEf iw39f rnE of A44 Mar ffthMTf ltIE A7 A r kt39l Incr NE W wen NoT 9r A 7039 A fhkfrl u Ph ltHLIN H39 1 n CAN T IDFLNTl v lv INTKV39guE NG rub rer Won a wl hcFF DPFT39A39 ml quot45 A n39 NJO H J MLEA WM 7 fmuLIN T 9 NF EFENcE CAN FAIL Q can arruh m wuob E OF VIA L MP4 I Wquot541453 Aquot E If f Qrzx WWW i Ft 24 umquot awe ew bH A itE NH oy rxl oHEcuNCr MBA wrEynwr EMA 7 Mo E Alml afr Nf ef ng F w VEP Ify fawn7431 F A IL UM 4 4439 IF TIHf e E 7 N209 4 4 h J 6 RIM LATT MEN 7 MEANu TLHTH D r vrMFV Jbr ToT bLe oME 40 H55 K quot Mi TIME r A A A i3gvfygq F AOJEL CH NHF quot4 lt 953 MEA39A ti krraw Lrn M E 0 MEL M MT u v NHD JMFF V GE H aw en t E 4f rIvaotLyAmg P1L 3 0P 7 IF39 Pl 39f fFErENr M 0G mun em 1 x1m S K LAN but 1 Aquot we WHAT e f EH Lu R 7 a quot TFMTIH QIA r 4 quot39r p 114 Rik69 P 4 IP 1quotquot Q39A f I LHFELV q A mouNT Q wH39ILH if of p MP EM Fi org TquTgl n r A wan n uwr p 6 I wuw q ov f wPFenx F fL39Jvu TH L L K BIA f IE 2 14 MN 05 A Mr g q rHAN Ii 3quot 9v I39FfElF ALL 2 OF fH E AQIVF TON J UfFIFuL IN PnA 4 l EMS 9H1 In fl m V fie 05 quot 0C HR7 aok39r Y roz 0 r F i 947095 JPIquot h a wWH I I4 8105 F A gr k 39 3 FE rLE 014A PIN J wv laxEp ei vhf ENTIME NT 0N Pam wagN quITPN WITH A GPaA r by car Ea are A cc LAC 0 Mr A 3m 1 I00 W mc z 42 ML W Ml39nk e Aus gt 67 Casag Wheik mass Fae H ts f QatIIIN u Libel HMO PealeJib lmed VFPP Ly o 4va 6 QIDSquot3gtCL vlbw 2 aquot ff5 0 WA new39eu IVSIQ h HARTLeGJer r mzmab Amen 7 FML 64 46m 9 M 6 am lube 00 magML CG St Silvdg 3971 39 e mknq 7F rquotltSD we 414415 r 39939 104 SM S x my 39 39 ShyH s Cf ag ouf Hm LA 77 19 N s Aww74 2wwl 9 lasydapa Haw 73 U W MrUr BeanJul LM 4Aamp4W L L a4 amnvu 2K v7 Cat71m 9115 A Jaeopt er j Ives7L we 3982 y ing 5401770 0F 77hr Jam657441 cNF 39 1 4 CM I V COp7l3 39C X 3 39 a 4 quot X r r 9 W03 439 WA M mryr m f eAJC Hf W I 6th W Wyaa39 21 5531 ateego uwvmsfu Lo PYUSD c3450 Margaqu M 1W lbs The steps of the process of learning and making inferences from data can be summarized as 0 Designing the data collection process Collecting data In this class we will give de nitions for randomized controlled experiments and observational studies We will compare the two Describe the information graphically and numerically types of studies using some famous case studies We will discuss how a 0 Build statistical models for inference causal effect can be established and how confounding factors can be the hidden cause of association between two variables We will also consider the problem of reporting percentages for the different groups 0 Preparing the data for analysis 0 Test the validity of the models 0 Report conclusions in a Study In this class we ll discuss experimental design issues and strategies I Longitudinal versus cross sectional studies Suppose the university conducts a survey of all students faculty and No you can not conclude that age is responsible for an increase in staff on campus and records the height7 age and Weight of eaCh weight This is because the effect may be confounded with the fact person that eating habits may have changed during the last decades and this Such a study is a cross section of the campus population It gives a may have an eet in the average might 0f the popl ation39 picture of the characteristics of the population at a given time Suppose you group your sample in three age groups below 30 Warning You can not draw conclusions about the ef between 30 and 50 and above 50 And you observe that feets 0f age from SUCh a StUdy Since you are Comparing different people of possibly different ages o The average weight in the rst group is smaller than in the second To draw conclusions about the effect of age you need to conduct a longitudinal study That is you follow the evolution of a persons weight in time for each person in the sample 0 The average weight in the second group is smaller than the third group Can you conclude that age is responsible for an increase in weight Observational studies In a controlled experiment the researcher decides who gets assigned to which group But there are many situations in which the researcher can just watch what happens In an observational study the subjects assign them selves to the treatment and control groups Studies related to accidents or smoking are examples of observational studies People are not usually willing to be randomized to smoke or have an accident just to participate in a study Observational studies can nd evidence of association between a treatment variable and an outcome response variable For example between smoking and lung cancer Collecting data experimental design Suppose a new drug is introduced how do we gather evidence that it is effective to treat a given disease The key idea is comparison A group of patients suffering from the disease is divided into two groups a treatment group where patients get the drug and a control group of patients that are not treated To eliminate bias subjects are assigned to each group at random and the experiment is run double blind That is neither the patients nor the doctors know who is in the control and who is in the treatment This is called a Controlled Randomized Experiment and can establish a causal effect of the treatment on the response But there may be hidden factors that make people smoke and also make them get sick Association is not causation I To reduce the effect of confounding factors in observational studies we have to make the control and the treatment groups as similar as possible that is we have to control for confounding factors In the case of smoking age and gender can be confounding factors So the right thing to do is to compare subjects of the same age and gender who smoke and do not smoke Case Study The Salk vaccine Polio was an epidemic in the US during forty years that started in 1916 In the 750s several vaccines were developed and the most successful one was the Salk vaccine When it was ready to be tried on humans a eld trial was conducted Why not give the vaccine to every child Because even after laboratory experimentation the effectiveness and the risks associated with the vaccine were unclear Also giving the vaccine to every child in a given year could confound the effect of the vaccine with the cycles of the epidemic Why not give the vaccine only to children of consenting parents Because the incidence of polio was higher among higher income families which were more likely to give their consent So the effect of the vaccine would have been confounded with that of social class All children in the study had to receive an injection Children IAM875 in the control group were given salty water to avoid confounding with the psychological effect of receiving a shot This is called a placebo Who decided which children were going to be under treatment This was decided at random to avoid biases due to human judgment Doctors in charge of diagnosing children were not told if they were vaccinated or not This again was done to avoid biases since doctors may have had a preconception about the validity of the vaccine A different eld study was conducted giving the vaccine to all second graders whose parents consented and leaving all rst and third graders unvaccinated in areas with high risk of polio Notice that this study not only had the problem of consenting parents being confounded with social class it also had the problem that children were grouped in different grades and since polio is a contagious disease this can have an effect on the way it spreads I We can instead compare rates number of polio cases per 100000 in each group controlled randomized study on 1st 2nd and experiment 3rd graders Sample Size Rate Sample Size Rate Treatment 200000 28 Grade 2 225000 25 Control 200000 71 Grades 1 amp 3 725000 54 No consent 350000 46 Grade 2 no consent 125000 44 Notice that in the randomized controlled double blind RCDB experiment the rate drops from 71 to 28 This is a much higher drop than the 54 to 25 shown in the other case controlled randomized study on 1st 2nd and experiment 3rd graders Sample Size Polio Sample Size Polio Treatment 200000 56000 Grade 2 225000 56250 Control 200000 142000 Grades 1 amp 3 725000 391500 No consent 350000 161000 Grade 2 no consent 125000 55000 Notice that different groups have different sample sizes lt doesn7t make sense to compare the numbers of children who contracted polio Whats a better way to compare the numbers Since the other study had treatment and control groups which were not comparable it resulted in a bias against the vaccine The key of a good design is that both groups have to be as similar as possible Historical controls Since randomized controlled trials are hard and expensive sometimes other designs are used to assess the validity of a treatment One possibility is to compare the treatment with historical data That is patients who were treated in an old way in the past compared with patients treated with a new drug or surgery The problem with this approach is that treatment and control may differ in important ways A controlled experiment begins with a well de ned patient population The rst step is to decide which patients are eligible Among these patients a control and a treatment group is chosen at random and contemporaneously If for example all patients who are too sick to undergo surgery are treated as control then the trial is biased towards the surgery Clo brate Placebo number deaths number deaths taking 708 15 1813 15 not taking 357 25 882 28 total 1103 20 2789 21 Comparing subjects that took the medicine with those that did not is an observational study This is because subjects assigned themselves to one of the treatments7 Among the subjects under Clo brate there is a drop from 25 to 15 when the subjects not taking the medicine are compared to those who really took it This seems as strong evidence that Clo brate works Is this correct In fact we see the same drop in deaths among subjects under placebo Is there a confounding factor present If so What could it be The Tlo brate trial The Coronary Drug Project was a randomized controlled double blind experiment to compare ve drugs for the prevention of heart attacks Of the 8341 middle age men with heart problems 5552 where assigned to one of the ve treatments and 2789 to the control group The patients were followed for 5 years The group on Clo brate a drug that reduces the level of cholesterol in blood did not do very well The death rate was 20 compare to 21 in the control group It was suggested that this was due to the fact that many subjects did not take their medicine The following table reports the death rates of subjects who took and did not take the medicine The confounding factor in this observational study could be the level of health consciousness that made some subjects be more willing to follow the prescription than others The conclusions are 1 Clo brate does not have an effect 2 The subjects that stick to the prescription are different from the ones that don7t Lifestyle or health consciousness may be the hidden factor that make people take their prescription regularly and also lowers their mortality rate If a variable is a confounding factor it must affect BOTH the treatment and control variables IE Cervical cancer and circumcision factors to consider possibly more likely to develop cervical cancer Fact In the 750s cervical cancer was found to be fairly rare among Jews in different countries A similar pattern was observed among Muslim women As a result of this observations several researchers concluded that male circumcision was the protective factor But There are differences between Jews or Muslims and members of other community besides circumcision so there are many confounding It turns out that cervical cancer has a causal agent in the human papiloma virus which is a sexually transmitted disease So women who are more sexually active are more exposed to the virus and thus So the confounding factor was probably sexual behavior in the 730s and 740s Simpson s Paradox The following table shows the admission rates of the six largest majors in UC Berkeley Men Women Major applicants percent applicants percent A 825 62 108 82 B 560 63 25 68 C 325 37 593 34 D 417 33 375 35 E 191 28 393 24 F 373 6 341 7 totals 2691 44 1835 30 I Collecting data experimental design See handout Case Study The Contraceptive Drug Study The totals suggest that there is strong sex bias in the admission system with 30 of women against 44 of men When we look at the percents major by major we observe that they are pretty comparable actually in some cases men have a substantially lower percent like in major A What is going on Let7s have a look at the table showing the percent of women and men applying for each major Major Men Women A 31 4 B 21 2 C 12 32 D 15 21 E 7 22 F 14 19 Notice that women apply to the majors that have lower acceptance rates whilst men apply to the easy7 ones This effect is confounded with gender Simpson s Paradox can be stated as Relationships between percentages in subgroups can be reversed when the subgroups are combined Tables with many entries are hard to read Can we produce a number The weighted average admission rates are Men 62 gtlt 933 63 gtlt 585 37 gtlt 918 33 gtlt 792 28 gtlt 584 06 gtlt 714 4562 Women 82 X 933 68 X 585 34 X 918 35 X 792 24 X 584 07 X 714 4562 The results are 39 for the men and 43 for the women 39 a t different 1 total admission rate than the one based on the that represents the information contained in the acceptance ratios table If instead of calculating the total percent of male and female applicants we consider an average of the ratios per major we could get such a number The trouble is that we have to give a different weight to each major corresponding to the number of applicants to that major major A B C D E F applicants 933 585 918 792 584 714 Fl University Problem A university has two departments A and B There are 2000 in state applicants of whom half apply to each department There are 1100 out of state applicants 100 apply to department A and 1000 to department B Department A admits 60 of the in state and 60 of the out of state who apply Department B admits 30 of the in state and 30 of the out of state who apply Since for each department the of in state admitted is equal to the of out of state the same must be true for the two departments together Answer yes or no and explain brie y dovdom va om 7 HJ 13 AMS 5 Nadne dm Whamm ct ijn 5W aWLMVQAA 31Vng quotT on 10 0396 veawLoy I i fsvm w M vaw am L cw aaw Mani 9 which COMMms Wusmug dc 110xjwvv wetWV i VOHKk v Onlxj i dt zl it 46 W r QM lamw lufm Vecjmmn HMS 5 10H hnwywmlj Vw emA m 9 6 M mass c gee 015 Mm L ECleCHLLw ohi n 06 Jib WWLQ 30 Mug FEW 2 4 1Coilowmp 1 3M0Hh C3930 lt34 vmxomm Comma mu war Mw won Qx dowmvwww aoHn Sijrg 7 examMuznt NW WWOWM mi mm 4 m DvCngwp QMNM am 5 WMM snolpsewM 7COM3rbv md vaamp is dam weer o39w39w39h laa 39i r ewi WV rlw quot W12 m WWW m0 gyowgb 0 m nay nib AMGVQ laxadv ComwvrbH a azalth vo f ww Hm Wm do Eli 2 4va we gw 39m i ijx Mm new Hxsu Mama Tr Becomes 5 Dave 17 am a M m mam M33 f Emm I MA HXM W1le 0 oh MMuacluee I39m 1 g gamma 0f VWOMin LgKLZMFS Ni WW4 0 6 66mm ygoyW Cl l Q fvdt s easL 39 WWWquot qu W1 MAH vMDxth V5 MUM U bbsewazbml 3 D QLQ inc D mw CHOOSL cl39 m39m HUL a Oioud 21066 Lymmidrcjkg wi tj W IWWw r xmm mama vgal f a my mam Cg 4 xi V dovgmw a mi WW W lt3 gm Use W m L124 1 y e a d i I quotgt93 as ou x 66 mahofhw 39 CampLU 39 l 7 6944 5 i 1 x j He w W m wa39wwmwmewwW QDWHLLLLQD oggmdm PQF 39 in Win ML 023W Caff uf W g l mx iin 0 WW 2 ewim 7 p 163 QM lt3 V O LN r r 7 7 quotEmmi0 WMquot in 2 0 ONTO a PCP 39 Cmv01 3U diu m m GUJOjlld s mi6 y r 7 hb MD weave 6W2 rows on PLF 7 7 5 Wm Wm CCD Compam eome m gm1 m4 H1932 Mb Y0 3 w m was M 942 m m m bmuilim Wu 0613Mme re 4 aVDJerHM F w J 1 EM A x 7 MINA Ugutjey mdngujen gocsczdu AMS SHH oc r Q2157w k mpLoitsw w r N p lt V Monday Measures 09mgg igkg MSQLCQJAAMW WHEQ 1amp6KQQDlril gdgr 190 71 Mon 39 M Emlwg mi g gu cans mallowus are neg gum lmk a Je 5 la 7 llt98okLzL g qCQs ltamp 9 3kman J ing imgicj C0 quotg a cg egmi8m 4 C mQXg wLQ EkaQQ s Waggfiim j MV Vquot 1 W b Make oecizsfen v WN W A mt 3 SCAGGCQ39 Cknokagelao Cor 5m own aakew quotWWW WW quotif I g 7 gbaem dm QWMW Le VMXHAG mmg MW A MH g mmmmw4m3kj i ghg gegw PVmcop1 tet i03 ormHq k W V 7 Noon 7 gesw aiiom F OC 3ecr A 3 EZZ QQQEAL 4v cw r umn pal9 as 3 J 9 10 B L V l V COS as 806 a 7 j3 b g an 7 w 7 A 7 i 329 qh govw QEJMJEEQ W 61 Qj7mw v eci Med 39 r HOP noneseom a o 39 Gang 4 GE Eec rr PictHonahi t s K quot Wxxm L WWW Lg W 5 a Exge mgqLaLD 39 Lw l SEMPXLW S CL 5W k Nuwmq LC Meow moaef BRxK m d dgev g nlr wd CQLNM Dabs 1W nJ xr quot 5 6924 i 1ampng a P I H a jn e y T MampUi 7 H m e 4 W mHmphayatgwlme gfi gwpbt Cases Diuhei M45 5 CH I39O C 668 Uoies HA39IS f IMe Spreac normal CumLC exJ deyfgr k ed 739Me exp clergym 420mb quot DD CL 1 FPP CIA 1 3quot 395 copie a emLx DD Pea w D n FPP M wile ale 1A SXE U or rD an 9 hr Examc h ao 3 we laH dab 0 4c 19 a 2511 no lt39 lter nc l39 Oheudr k is Ame W2x F I39dav I on Faye Head qu 01C a Lir 3 amph rcd Mhpm39bd l owf oL Neon X We a am MEA39U b ia W C no R39F Q QC Sghhdwt Mahulk New IRA pad p71 039 ajnhdr are Hat Sit16 nu Ler A I V J yw up gm 4w o sjm xq vzxk 9Mw gamma 535 TE 7 3w W hlm o n93 F YMCA germ Oh nan C Yul MESS sf 9va nwxnhxrrcr 0 BEEP x 38 ll 7 3 o ugr nbs 1U Gear 393 To gran ESNPm WE fmwrx Legom Ex PM NP Em 1TP Eu 3 15quot 7 in w 639 uovcofsr T Ocartms m 65 7 3930 Brill mH 5 731350 n1 w oh 3 W 3933 0b L E w mrwiquo 9st MsnL 9 Txuaw aKW A g9 IL 74 n SMM J q Jignm asvm aveiqe 950 it Aaxkl lc39m m Ltm a39tylz agoLew 33 AND l aqjcs L90 47 gg uwhD 39Htem KOC39DL H144 Lang a q quot va r 1277 iH7 S 114 7Lqu mych AMA55H Swan 53 S 1 a 93 m9gt 49 S nl P VI lquot 3915 ofCCJ39Igbu 43h ldz xuolfvzdwquot G39valxxical l lvpreluh a r SD Kempw m ml rankquot 39 vhfF 0amp Me New 6 Qw Lam F m0 Aalr k Se are l 23 was 114 ei39Hnevquot i xr eclien jack w 6le af x OU i 3 637 15 7 1161 all 477 6 H Q W gejr P SPeGHve5 LP 3 c C reader l SID37 fax 51mg 3 cw mad he 970 M 1 no A ML Su sou27 quot070 yw 710 quot V11 5733 mewz 70 I M SD 2574 A Bau cumLquot u akju am D E u Mar 14 Cumequot G wh 7 di WQSL Sije J QUL 4 OP v 3 39 A C WSIJy w A31 395 hw h WW 15 49 gquot 67quot 72 I i t z a y p 0am e 90 SD A M m7 40 Mom 94 a normal CUNNJ ehPPCQ palc gagH Edda 19 pm a rimy 33quot A nw0 I 9317 A loo 65 Wu AHS 5quot i 541 06 Cam LIME3 Class b35456 H ES lmquot ngnfpr cqui Tand m Cahpamhcj Q waJ why FPP CL 27 HAaWk I dun LJcJ Ha 9 54 c34 szl US Links m vawdlv g Q 61 m3 LLSIQ pk TULL a 3 la quotMu J Act k a Wlogws o omwe qu A W t 0 d finL sm I39 01 dc awed 39 T SQSi llp T 59 39D lmo 39 gt Mateo I k L 939s M120 A 976635 30 new C5gtq 9 0 so 3210 6M WF WJ h 39m Qr aims M is Cir Awkw f5 srsnda mw Teale vow rm 39cunzmwrkj CAM wasWm Auaanpmve WfoWHCAZAY 44 dank 0mm 274quot null N ampAA La 5 A 572 camMei ranM 74263 dagF aux W Hk Hex 75 dog W5 m4 c Gueco 12 2470 31127 7 30 de 1 de W l MMAA 473 VII J 01 Rb 194 I Lu 1 CleCNyILC7 W 7thquot HACKS 1 Hm lath gt US 14 9M4 3 CM e 00 14am CONc ch39 9 AuM S39n gnaw lge WVL 5 Ca 9L M Och139 qrquot 74 Zuni 5 419er i C Mk 0 ALI ALltb 3 7 r A quot a DU TZW Q flex l 3145 neen mm 311 M Z 1 7 3417 93 L 39 an 2 3 7 ew m w EM Lagt axeca s 39A a 139 DMD KM 30 6 saw 9 asap Er az r39 quot5quot RM H1 5 cur 0 i I P2753 37 C z 554quot 3 30 Q is I Da 3990quot 39IL D WQmQLj HM Jrue mu 3 mm dU N L c3 MA S L LgV4JQLV Ckmu39l MN 3 4 c 74443 dd 61 5 eXW 41quot 0 mm 21M 41quot LJLa l owe 30 9am Md we A070 OM mm on nale 444quot PWAw Egg E 0164 4m mgogf 3914 P 3 3m quotquotme calJermain 557 r I39J b apkvor 114 39 k KlItjwb 53915 quot Onclbvfl d HLK rmer wrkj 7 amn ma ex 131 w JLL ndwheat sqm rw OWL N we W I 4Msi5 STATISTICSI Models for Averages In this class we consider situations where we are interested in the average value of a population We sample at random from a box accuracy of he average We consider the following speci c topics 0 Mean and standard deviation for averages o Approximations based on the Central Limit Theorem 0 Con dence intervals for averages o Gauss models containing tickets with different values and we want to estimate the I 4Msi5 STATISTICSI gt31 What about the average The expected value for the average of draws average of box 7 SE or sum SE for the average of draws 7 number of draws So for the previous box we have that the expected value of the average is 4 and the SE is 10 7 25 7 so the average of draws will be equal to 4 give or take 04 4 Q What if we want to calculate the probability that the average of draws will be above 44 A We can use the normal curve as an approximation This is because the normal approximation is valid for the sum and the average is just a chance of scale I 4Msi5 STATISTICSI Estimating Averages We want to estimate the accuracy of samples obtained from box models where the population is not divided in two groups that is a sample is obtaining by drawing tickets with replacement from a box and the numbers are recorded what can we say about the average Consider the box suppose 25 draws are made from the box with replacement The average of the box is 4 so the expected value of the sum is 25 X 4 100 The SD of the box is 2 so the SE for the sum is 25gtlt210 F31 I 4Msi5 STATISTICSI Changing to standard units we have that 44 7 40 7 4 7 and the probability that a standard normal will be above 1 is approximately 16 1 When the number of draws is increased by a factor of k the SE for the average decreases by a factor of E I 4Mse5 STATISTICSI The sample average Suppose a city manager wants to know the average income of the 25000 families living in his town A simple random sample of 1000 is taken and the total income of those families turns out to be 32396714 So we can obtain the average income by 32396714 1000 We can use this number as an approximation to the average income of the 25000 families but we would like to have an estimate of the 3239671 m 32400 chance error We need the SE of the sample but this depends on the SD of the box that produced the sample which is not available For a large sample size the SD of the box can be estimated using the SD of the sample Suppose the SD of the sample is 19000 the SE for the sum is E I 4Mse5 STATISTICSI NOTICE The former does NOT mean that 95 of the families in town have an income between 31200 and 33600 This is an interval for the average income of the families in town NOTICE The normal approximation may not be valid for the sample but still be valid for the sample average That is using the CLT we can approximate the probability histogram of the sample averages with a normal even if the probability histogram of the box is far from normal to to H I 4Mse5 STATISTICSI equal to 1000 X 19000 6008328 m 600000 and the SE for the average can thus be estimated as 600000 600 1000 and the average income in the town is 32400 i600 We can now use the normal approximation to create con dence intervals In fact a 95 con dence interval for the average is given by 32400 i 2 X 600 3120033600 220 I 4Mse5 STATISTICSI Sodium Chloride Concentration In 36 randomly selected samples of seawater the mean sodium chloride concentration was 23 cccubic meter and the SD was 67 cccubic meter Find a 95 con dence interval for the mean sodium chloride concentration We can estimate the SE as 67 F then a 95 con dence interval is approximately 112 cccubic meter 23 i 2 X 112 20182524 cccubic meter What is the probability that the concentration will be above 24 cccubic meter Changing to standard units The probability that a standard normal will be above 089 is about 18 222 I XM875 STATISTICSI Standard errors The standard error is interpreted as the likely size of the errors So far we have considered several possible standard errors each corresponds to a di erent model SE for the sum Vnumber of draws gtlt SD of the box SD of the box m SE for count SE for sum from a 0 1 box SE for count number of draws This SE can be used to convert to standard units in order to build SE for average M number of draws SE for percent X 100 con dence intervals based on the normal curve when the number of draws is large 223 I XM875 STATISTICSI Q What is the di erence between the SD and the SE A o The SD says that a single measurement is accurate up to 6 micrograms or so 0 The SE says that the average of all 100 measurements is accurate up to 06 micrograms or so SD is related to the precision of single measurements SE is related to the precision of the average In the previous example any speci c measurement is only accurate by about 6 micrograms The estimated weight of the NB 10 based on the average of 100 measurements is accurate by about 06 micrograms 225 I XM875 STATISTICSI recorded micrograms Measurement error Any measurement is subject to chance error The estimate the size of the chance error the best thing to do is to repeat the measurements several times 100 measurements of the NB 10 a weight owned by the National Bureau of Standards are considered The nominal weight of the NB 10 is 10 grams The units in micrograms below 10 grams are The average equals 4046 micrograms and the SD equals 6 micrograms Since there are 100 measurements the SE equals 06 Rial I XM875 STATISTICSI Chance models The methods developed to estimate averages and accuracies are useful for samples that are obtained as draws from a box where all tickets have the same chance These data cor respond to the population in the US from 1970 to a 1990 The data do D not look like ran dom draws from a box since the show a very signi cant trend 226 I XM875 STATISTICSI If the data show a trend or a pattern over time then a box model does not apply Consider the daily maximum temperatures at San Francisco Airport These do not correspond to a box model since during the summer we expect to see higher temperatures than during the winter so the data will show a seasonal pattern Data that correspond to box models are irregularly scattered around their mean value About the same proportion of the data will be below and above the average T I XM875 STATISTICSI Problems Problem 1 A survey organization takes a simple random sample of 625 households from a city of 80000 households On the average there are 230 persons per sample households and the SD is 175 Say whether each of the following statements is true or false and explain 1 The SE for the sample average is 007 This is true since 1 75 SE 007 x 625 2 A 95 con dence interval for the sample average is 216 to 244 This is false since the sample average is a known quantity 3 A 95 con dence interval for average household size in the city is 216 to 244 229 I XM875 STATISTICSI Gauss Models A Gauss Model is a model for measurement error Each time a measurement is done a ticket is drawn at random with replacement from the error box The number in the ticket is chance error This is added to the exact value The average error is equal to 0 When the Gauss model is applied the SD of a series of repeated measurements can be used to estimate the SD of the error box The estimate is good if there are enough measurements We can write the Gauss model as measurement exact value chance error where the chance error has an expected value of 0 228 I XM875 STATISTICSI True the average household size in the city is the population parameters that we are trying to estimate F 95 of the households in the city contain bertween 216 and 244 persons This is false The con dence interval is a statement about the average household size 01 i The 95 con dence level is about right because the household size follows the normal curve This is false The sample average is approximately normal Problem 2 In a long series of trials a computer program is found to take on average 58 seconds of CPU time to execute and the SD is 2 seconds There is no trend or pattern in the data 1 How long can you expect it to take in order for the program to run 100 times This is like considering box with an average of 58 seconds We 230 vans C 29 6 gt10 uxm 90 6 tam flo 309ncr3k On 005 h mixw 551 20quot IFS 30319 936 i or TL 2 or wumx Rommx m Qav k nor OV PSENDSA 326 Jr rm P x suffers OJYP 05 u C8313 Oh mock Efurwnvuv Tv UltL gypsy vim a an 22 DEE in EP QR RT CS mining 70 meal LN PFng warm 0 TEL OLxmcwxzzbw hsfmurtg hUaJ ch wnroorru arugug bnimmrx 9ra7h IErfx i l 0218 331 293 mmUwTuZPv Lwh hf Ox 30 th so 0 9 uh EVt o 953303 91393 frE FXF 02an 7gp WUNNM a prim u Sox scrzmxnmorf 9294Wgcx O3 Btwolwnrx nub 505w 23 so ci k 193 a 13 Fzrbx 59 ha 1V3hww5c 03 332 BtRmhrws IDLE garmwoxwi giver01oltn L Erna FEE ti 3 P Scrkplnl overgifrtsn xmwmosokml u warm Titan s unoXISSOfmu 30 wPEH ThuTbQPt bait Lerct 9995ltP 93E8 El oeku 5 Sonar P i5 93 L P Lerom max can XAOV Drum nosfsoeo a 04 LWYn lS Psm rAmQ 8L T0 noxcth L 0 rm 661 Lth 0 D Go 1amp8 9 r me 0 9 Ef FZWrF N Ln Jdmnsmw d 43 grimy W Jet 33 3410 0 091 orig E op Ch A w baa Th F11 0 Vb nm N moat 70 9nd in Q t a r m0 VI 8 r 153 m N u m o I 07 do 8 In Or 79 238 AAamp A KP H 05 reader HEM mum m 9394 n375396 3904 Rea ml Freq rel L m HMS EMSe due 11 vac MW number a p SijU fl s relo lve live L7 ftp1501MB 7WD 4339 nl n3 uke H921 r Q aquot PeachJ income pan fora 29 ML gala 1mm 3 J OOS 0 ooo i mo am a I w 50000 1quot 4 quot ij c 3 i when berm rimt3 7I0k tabla 146w 1573 law S39H nlJ 9 3 each beaudz loa73 Md 539 7 7quot 5 7139quot quotf Camel 4dZAUquot Stale ABMuh 39 F It mg ESP Qwhuwbmv Fax Ema 5 wovvrwid QV 591 tam WLQSG IKE h g Ja iu 2v or 6 E OYQAEWV t t w Nauru 65th Darkn DU 93k roru E w nnBFL Em him to Emcmn Thq It s VD GS PAW W Ob QSU 5619 PtmlknL 4133 PL an 4E how Ru A TCL 95m g Kim Mo u 9 91 9mg wry 9amp0 3m Erin W92 3 mam L 13 Eta r is 9L 3 a Gm v Ub 03 F09 Lgfu it bin ya M mwlrxs iv 3i 28 25 92 TD Xovs t OQLAQSL P3P NFeru 5quot mu QuGPEPUF tF1v by 5er Pawnsng Do um 35mg 31 E gt A 9 o muo maox NOV Qv 33 31quot Bax LcrLL QDSNP R318 moi 3 56L 0 Ta M c5 or hon TS LnJP EJ I39ll 15173 A 5 7 l or 2 onorg 1w739P 075 BALI 007 PCIM39 I d644A mw m4 5 L553 rules de kci Love p0quot 44m won Gradual excu w PCA of 12gt mama e fur HM 7M c542 pCA or 3 P A PCB PCA and B3 P00quot mm T5 Pegtael amuumlO 5 PQXACL D 4 P Yac H a mew s quotI P Xac7a 2 ooxquot PCnoTS eager Pow M Mdaa 4 54 J M I a M 3 HD 93 AH M q MarWm 92 and 32m 17 a m 1 LI WW ELM wanes 3 7 aquot 2 2 I Pc a mi js 3 47 269 92 Chg f j13lt J lgz4rf
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'