Statistics AMS 5
Popular in Course
Popular in Applied Math And Statistics
This 25 page Class Notes was uploaded by Milton Sawayn DVM on Monday September 7, 2015. The Class Notes belongs to AMS 5 at University of California - Santa Cruz taught by Staff in Fall. Since its upload, it has received 54 views. For similar materials see /class/182149/ams-5-university-of-california-santa-cruz in Applied Math And Statistics at University of California - Santa Cruz.
Reviews for Statistics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/07/15
I XMSg5 STATISTICSI The normal density The Gaussian or normal curve corresponds to the following formula 1 7122 e y x27r and corresponds to the graph 6 271828 N anrml sum The area below the curve is equal to one We observe that the curve is symmetric around zero g N and that most of the area is concentrated between 3 74 and 4 The probabil ity of an interval is the cor a r g area under the 74 2 u 4 5mm curve I XMSg5 STATISTICSI Percentages A box contains tickets with Us and 1s The SD of the box is given by W The SE for the sum of is is W X SD The SE for the percentage of is is SD M 0 Sample percentage i 1 SE is a 68 con dence interval of the percentage 0 Sample percentage i 2 SE is a 95 con dence interval of the percentage 0 Sample percentage i 3 SE is a 997 con dence interval of the percentage I XMSg5 STATISTICSI Doing calculations with the normal curve requires the use of a table Tables are available for the standard normal curve and they require that observations be transformed to standard units Given a list of numbers we convert to standard units by sub tracting the average and dividing by the SD 0 P 0z12 gtlt P7zz 72206 PZ70 PHONE 0P 0 quotU Cm xWKidHgtm ltf P w17P w D o quotU P lt z Plt 0 P0z o P 12 X P7xx 7 P7zz I XMSg5 STATISTICSI 39l Estimating Averages The expected value for the average of draws average of box SE for the average of draws SD BOX V number of draws For a large sample size the SD of the box can be estimated using the SD of the sample The normal approximation can be used to create con dence intervals for the average Remember that a con dence interval of say 95 means that if the experiment is repeated 100 times about 95 of the resulting intervals will contain the true value of the average r39l I I XM875 STATISTICSI Standard errors SE for the sum Vnumber of draws gtlt SD of the box SE for sum SD of the box SE for average number Of draws x number of draws SE for count SE for sum from a 0 1 box SE for count SD of the box SE for percent gtlt100 number of draws number of draws gtlt100 I XM875 STATISTICSI The observed signi cance level is the chance of getting a test statistics as extreme or more than the observed one This is usually denoted as P and referred to as the Pvalue The smaller the P value the stronger the evidence against the null but The P value is NOT the chance of the null hypothesis being right I XM875 STATISTICSI Test of Signi cance 0 set up the null hypothesis 0 pick a test statistics to measure the difference between the data and what is expected under the null hypothesis 0 compute the test statistics and the corresponding observed signi cance level In general we are calculating a test statistics given by 7 observed expected 7 E which is referred to as the z test I XM875 STATISTICSI The t test Step 1 Consider a different estimate of the SD SD i number of measurements X SD 7 number of measurement Notice that SDJr gt SD Step 2 observed expected t 7 SE where SE4r corresponds to SDi Step 3 To nd the observed signi cance level we can not use the normal curve any more We need to use a Student s t curve This curve depends on the degrees of freedom These are calculated as degrees of freedom number of measurements 1 I XMSe5 STATISTICSI I XMSe5 STATISTICSI Tests for differences H0 the difference is 0 Two tailed tests H1 the average of one group is bigger than that of the other The standard error for the difference of two independent quan H0 3 the difference is 0 Hues IS m H1 the average of one group is different than that of the other a b When a two tailed test is used the p value is calculated adding the Where area that corresponds to both tails of the normal curves a Is the SE for the rSt quantlty The decision about using a one tailed or a two tailed test should b is the SE for the second quantity be done before looking at the data 7 observed difference 7 expected difference under H0 7 SE for difference I XMSe5 STATISTICSI I XMSe5 STATISTICSI Correlation The correlation coefficient gives a measure of the linear asso The Chisquare test ciation of two variables b d f t cl f 2 X2 sum of O serve requency expec e requency The correlation coefficient is usually denoted by r and takes expected frequency values between 1 and 1 o The correlation is not affected when the two variables are When testing for independence in a table with m rows and n interchanged columns there are m 7 1 X n 7 1 DF 0 The correlation is not changed if the same number is added to H0 the two variables in the table are independent all the Values Of one Of the Variables H the two variables in the table are not independent 0 The correlation is not changed if all the values of one of the 1 variables is multiplied by the same positive number It will row total gtlt column total expected value of one cell in the table change slgn 1f the number Is negative total of the table 0 The correlation coefficient is 1 if the variables have perfect positive linear association and 1 is they have perfect negative linear association Fl Fl I XM875 STATISTICSI Regression The regression line for y on x estimates the average value of y corresponding to each value of x Associated with an increase ofy S one SD in as there is an increase X Dy of TX SDs in y on average error actual value of y predicted value of y RMS error V1772 gtlt SD ofy I XM875 STATISTICSI E Problem 1 The speed of light is measured 25 times by a new procedure The 25 measurements are recorded and show no trend or pattern The average of the measurements is 29978912 kilometers per second and the SD is 12 kilometers per second Find an approximate 95 con dence interval for the speed of light 1 Calculate the SE of the average The SE is given by 12x25 125 24 F Find an approximate 95 con dence interval for the speed of light Two SEs correspond to 48 km per second Thus a 95 con dence interval is given by 299784 4 299794 E I XM875 STATISTICSI The average of the residuals is 0 and the regression plot for the residuals is horizontal The formula for the slope of a regression line is rgtlt SDofy SDofx The intercept of the regression line is the predicted value of y for 9601 Among all possible lines through a cloud the regression line is the one that has the smallest RMS error in predicting y from xi I XM875 STATISTICSI E Problem 2 A simple random sample of size 400 was taken from the quot of all I lug quot in a certain state rr The results are that 16 establishments had 250 employees or more 1 Estimate the J g of I lug 39 with 250 employee or more 4 2 Attach a standard error to the estimate 104 X 96 V400 01 E I st75 STATISTICSI I st75 STATISTICSI Problem 3 Find the area under a Student s t curve with 3 degrees of freedom in the following cases 1 To the right of 235 5 2 To the left of 335 Problem 4 Looking at data and making sense of them is the rst 57 step of a statistical analysis 0 The scatterdiagram below shows the ages of 1000 husbands and 3 Between quot2 35 and 2 35 wives in a town in California Explore the plot Is there anything 90 wrong with the data 4 Are these values higher or lower than the ones that correspond to the standard normal curve a and b are smaller for the normal as a consequence c is larger E I st75 STATISTICSI I st75 STATISTICSI Problem 5 True or false 1 To make a t test with 4 measurements use a Student s t curve 3 5D 13 with 4 degrees of freedom a 7 t5 0 F m 2 For a given experiment the null hypothesis is that the average is equal to 231 units The alternative hypothesis is that the average is above 231 units You compute a z test and the corresponding value P value is 215 The conclusion is that the probability that the average is equal to 231 units is 25 age mmesmvears m D F 5 l 5 I 25 3 35 3 The RiMiS error for a regression line of y on x is less than or a eufhusbandsm eavs g v equal to the SD of y The range of as does not correspond to the usual range of married T The correlation between the daily minimum temperatures of men In particular there is a 5 years old man married to a 20 years LA and San Francisco is higher when measured in Fahrenheit F old woman E I XM875 STATISTICSI than when it is measured in Celsius F 5 The correlation between two variables is 92 this implies that there is a strong negative linear association between the variables T 3971 I XM875 STATISTICSI then the SE of the difference is V 0332 0312 045 3 Calculate the appropriate test statistics 122 7 92 67 045 4 What is your conclusion The null hypothesis is rejected since the P value is VERY close to 0 I XM875 STATISTICSI Problem 6 Freshmen at public universities work 122 hours a week for pay on average and the SD is 105 at private universities the average is 92 hours and the SD is 99 hours Assume the data are based on two independent samples each of size 1000 Is the difference due to chance 1 Formulate the null and the alternative hypothesis H0 There is no difference between public and private universities H1 Students at public universities work longer hours than those at private universities 2 Calculate the SE for the difference of the averages 1 000 7 SE public m 033 SE private m 031 99 1000 I XM875 STATISTICSI Problem 7 A statistical analysis is made of the midterm and nal scores in a large class The results are average midterm score m 60 SD m 15 average nal score m 65 SD m 20 r m 050 1 Using the normal approximation about what percentage of the students scored over 80 on the midterm 80 points on the nal corresponds to 80 7 60 15 standard units Using the normal we obtain that approximately 133 9 of the students scored over 80 on the midterm 2 What is the RMS error V1752 X 20 1732 3 I 4MS75 STATISTICSI I 4MS75 STATISTICSI 3 What is the slope of the regression line 05 20 X 7 067 4 What is the predicted nal score for a student who scored 80 in the midterm and there is an area of about 46 to the right of this value 80 points on the midterm is 133 SD units above average This under the normal curve corresponds to 133 X 05 067 SD above average on the nal Problem 8 Each respondent in the Current Population Survey of That corresponds to 134 points over average on the nal so March 1993 was classi ed as employed unemployed or outside the the students that scored 80 on the midterm scored on average labor force The results for men in California age 35 44 can be 65 134 784 on the nal cross tabulated by marital status as follows 5 Of the students who scored 80 on the midterm about what percentage scored over 80 on the nal ln standard units we have I 4MS75 STATISTICSI I 4MS75 STATISTICSI 1 What is the null hypothesis relevant to the former table Married WidOwed dimmed never married TOtal H0 marital status and employment status are independent 2 The expected values under the null hypothesis are given in or separated parenthesis calculate the XQ test Employed 679 103 114 896 654 109 133 679 7 6542 103 7 1092 114 71332 654 109 133 Unemployed 63 10 20 93 I 63 7 68gt2 10 7 11gt2 20 71492 I 68 11 14 39 68 11 14 39 Not in the 42 18 25 85 42 7 622 18 7102 25 7132 N 31 labor force 62 10 13 62 10 13 3 How many degrees of freedom has the test Total 784 131 159 1074 371gtlt 3714 4 What are your conclusions Men of different marital status seem to have different distributions The PValue is smaller than 1 so the null hypothesis is of labor force status 14148 5quot 44106 440 LJIkLJIHE L 7948 emfI Mitt3 35951 inex HMle lw M 15 mom 3 m Mama Hm IMJ39 ammoc c4 M UFM corrdvch h and mjapiim r n p 799 47 mm In gyrme x fungi SiaPLgtl 97D Ines155th 1 45 Ig RDquot i GI Flt39 A MWIL Hit 55 9W r35 g x1lso ago We 01 SD n3 IS MEMA 14quotquot S39 yewx a Willy Mm gain I so r lSD 12551 fs7torr a amamac Lav Ft L455 r eu J up aimin bum 01365442 bake Sauce Tl LZr 1393 HCML UL hoI39l39 we 7 lidto f39 t mjruWA I39M JQW39 pm u39nl v K WJi 39io 2 W Equot Sprawms ope eveng39 wko de b Maids 8 Ms a l39lEG uf o 31 Badmade W b 0 I sample M meum 04 praf nrJ39IQnJ 39 corre39lod l rm OWA rwj mf h 39 EL SW1 pk pathJ COVMi39er I uf MCC P lemvf 542 75 you lh 3930 mm B mem g 554 Eplea 5M l SM 39 S 1a ride 3 6534M p554ng d Mb I39M mama dud 9W5 p wt 1 T rsz T 39SlJaa T f T quotwe l T I zem WWWL SMCS39Qlt39lw L42 hem guo s n k 90 09 3 3 x 17 3000 l 6410 53047 7 3 SOIL 39Aco ngL L K Lu 39 go I N ew m M s W FF 4 le mod Am UT 93 gummy 3100346 t 39 Jlbuo ewj Aka Weasel 5 0 14 4 m aquot 94 qu bk n 4 0 f f Co avdogl wgu 7 5649V 97997 00 39JI DMS i ja Pan Iv Ln 139705 012 37 7 ouo P 3 f 42 X7 77r39Xqu43awls 3152 39a 3 4 SW39 7 as eggk 2 2 a 9 9437 waqx d p 5 Ma a M yin n2 39kth b32944 evea rake 3255 A Mi quot WW6a 393 hdrf39 Mcrsa ML Amman 7 7 f 9 k A A 7039 03 lgj P aqx 599793 7rl q 375112 39 M cannons in f gt72 k my 737 C 74 ll3 2179 gt125 97 7 09 My 917 am we 7 05 L53 2 3 z a 377mm ma 9 Jud IA 6 amt MJ 5 wt 1 33 rE39m cLaJ 395 I 56398 Mk l f1 gigf3 2155 5 I 5399k1 SJ 7 kj nquot T 579 41 M8 n39x rhuq 19 399 gtquot fi 5 59 Iz3 Sc 1397z m n 39 4 7 quot 957 a g csz afEGi39ijW39Zfi qua9 i W 13 7n539 17 9g LC1quot quotI rufWJ g CW 6 JV CLAW 3911 hmbf vol M39a Mme WU y ny 7 m Jij dim W 137 M4 15 a g A k9 96 a 6930 sno 65 36 M L IS 1 l5 1 Z 14 Jo 3000010 L n new 10031 05 q Os J heon 3 392739 l Ldnfxdm4 Sawfly Iire 65quot M aba a W4 g T 6 K 4 IDJ X 1 03 g r 5 a37 12 l quotprbpov cl erq g IL 1 124 am up 3 lb gh 7547 l Lij r grave 1 5 k l DEW 39 la r O un I f7 a it5 l5 ls M n 903twem L 1 m pogovopm 05 0 M716 jpbf 5 3 g WO me 04v 114W W w doem Auk feast DS 4253 400 r 3 1e 7 a e127 19 rabwm 39 LA 7705 4 quot Wm 4Wr V prima ar 0 hgt Mllqu 09h t S If SMf c 5 r V 5 L l S If t l 0 D M53313 n0 MK 07 0 L K J p r o A 9 L42 mw pk srzz 16 SD S Z c l a 9 7 n fire 39 L E his 3 Sf27 u I 1 0 ifs0 a9 iglk A 7 E 50 m w w J New 9 729 new PM CLT 07 2007 1 5574 m ltr lt 9quot 377a 50 r r a 4H I XMSg5 STATISTICSI Review I In a histogram the areas of the blocks represent percentages I Variables can be classi ed as 0 Quantitative data Correspond to observations measured on a numerical scale This can be 7 Discrete when the values can differ by xed amounts like in size 7 Continuous differences in values can be arbitrarily small like in age 0 Qualitative data Correspond to observations classi ed in groups or categories like in sex and marital status I XMSg5 STATISTICSI Collecting data design of experiments To eliminate bias subjects are assigned to each group at random and the experiment is run double blind This is called a Controlled Experiment and allows to establish a causal effect of the treatment on the response In an observational study the subjects assign themselves to the different groups I Association is not causation I Relationships between percentages in subgroups can be re I versed when the subgroups are combined I XMSg5 STATISTICSI Average and standard deviation The average of a list of numbers equals their sum divided by how many they are The median of a histogram is the value with half the area to the left and half to the right In a symmetric histogram the median and the average coincide The SD of a list of numbers measures how far away they are from their average I SD rims deviation from average I I XMSg5 STATISTICSI 39l Collecting data Sample Surveys A population is a class of individuals that an investigator is interested in A full examination of a population requires a census If only one part of the population is examined then we are looking at a sample There are usually some numerical characteristics of the population that we are interested in These are called parameters Parameters are unknown quantities which are estimated using statistics which are numbers that can be computed from the sample Taking a large number of samples with a biased procedure does not improve the results When considering the quality of a survey keep in mind two possible sources of bias 0 Selection bias 0 Non response bias r39l I XM875 STATISTICSI Probability How do we quantify chance The chance of a given event is the percentage of times the event is expected to happen when the process is repeated over and over independently and under the same conditions The chance of a given event is the amount you would be willing to bet in favour of that event to obtain a reward of one unit if the event happens and nothing if it doesn t happen 0 and 1 chance has to be a number between 0 and 100 or between I If an event has a given chance p of happening the opposite has chance 1 7p I XM875 STATISTICSI Notation Consider an event A then the probability of A is denoted as P A Consider two events A and B then the conditional probability of A given B is denoted as P All The multiplication rule can be written as PA and B PAlBPB PBlAPA A and B are independent if PAlB PA and PBlA PB When two events are independent the multiplication rule is PA and B PAPB I XM875 STATISTICSI The chances of an event are equal to the ratio of the number of outcomes corresponding to the event over the number of all possible outcomes The probability that two events will happen equals the proba bility that the rst will happen times the probability that the second will happen given that the rst one has happened Two events are independent if the probabilities of the second given the rst are the same regardless of the outcome of the rst event Drawing at random with replacement produces independent events Drawing without replacement produces dependent events I XM875 STATISTICSI The addition rule Two events are mutually exclusive or disjoint when the occur rence of one prevents the occurrence of the other If two events are disjoint then the probability that at least one will happen is obtained by adding the probabilities of each event The mathematical notation for this is PA or B PA PB 7 PA and B I XM875 STATISTICSI Expected value and standard error I number of draws gtlt average of box I I number of drawsgtlt SD of box I 68 of the draws will be within one standard unit of the expected value 95 of the draws will be within two standard units of the expected value 99 of the draws will be within 25 standard units of the expected values CA3 F I XM875 STATISTICSI Problem 2 Which of the following statements is true and which is false 1 If two events are independent then they are mutually exclusive F 2 If A and B are two events then according to the multiplication rule the probability that both A and B happen equals the probability of A times the probability of B F i The ages of 10 freshmen and two professors are recorded then the average age of the group is larger than the median age T One kilogram is approximately equal to 22 pounds this implies that the standard deviation of the amount of sh consumed in a restaurant per day is larger if measured in kilos than if measured in pounds I XM875 STATISTICSI Problem 1 The gure below is a histogram for the scores on the nal of a certain class 1 What percentage of the students scored between 20 and 40 points The boxes in the histogram correspond from left to right to 10 10 10 20 25 125 and 125 of the scores From 20 to 40 there are 30 of the scores m i What is the median score of the class The median is 40 Final Score I XM875 STATISTICSI Fl Amount of sh in Pounds 22 amount of sh in Kilos Then SDKilos 22 SDPounds So the SD of the amount in Kilos 122 times the SD of the amount consumed in Pounds Then answer is FALSE The important part of this question is that the SD changes when the units are changed 9 100 tickets are drawn at random with replacement from the box containing and you win the sum of the tickets The same game is repeated You expect to win the same amount in both cases T using the box 6 A high non response rate is a serious problem for survey IFI 47M PINH rakrw lbwaw Wkch WINE IFer NWT FIMA L rrrw 39F TW N W39 TIME ml EMA W q ap Ndquot war W F WT outrt q Hie H LNE w M9 1445 RAPE HA4 hfquot 9H x VP L Na 9F fLWpLAG HJ 11 H H oIL El 3 L J AmtE 5 8 fawn1 oaCbl r H5 7 W FE Vufll N3 W39LL ONL L V L y r l h Maw AW 395 o51fbx Movfi M1 JMv Am d L r 7amp1 0F unfund H A 39r Wu F folvL4de 17 Ir quot1 p MenE Au F H t we em 17 Wu gill 7 RE v1 z 0 rr 0 v1 AWE Q5 MR Lt 7 A eoVE AVE d I14 gquot 3 I 79 f S PMRLEMr 0 Eu EvapyHMg we M035LI F fuhfa w f o 1 Nb fay3mu1rmwpg4 o M Mrv IVFMEMLE 1 pmrmrw 66 SWIMON I WC MJION x fM LE A lk ah ljo vf AT L rr DisCE 1 N DbFE Mk4 Fr INFE JirZE tr Mrwnm PHW H39 EX k PctEu M K a 1 lx PM 3 Hr 4615 9SF a 0quot F 1 n 71quot r u 6L7FIH 611quot quotL719 1 6 7rl quot3 If X t6 1fJ i m C 5675 m J Q1 u w 507701 chp TA39LL mvam MrArwAt 11m Min11w 6 r LONGITLIIV v PW quot JAY7 M45 mn rEuuw 39L v MPF mu 390 Haw VAL xvMr 0ltLWIDM me LeanI525 IMF M Ecr m TW PWTM W39 lawfab Mm 5420 HT nvokghi F I I F 333 r Aw N mim 5amp5pr 99 it N 39F 7 M L 4PM F39 MEN Kaye mfg 5 chw WT E l m f 5quot 1 Wear To Pm nmc H rn WLL IPFFHquot 3911 AVE 3 MM 39 IN ruat LW GWWR N ATFINoN L2 A39rf 4 COM 7 CAher N TH nva 39r who my T39I HVM Vaef W TWNAIL m w 941quot Db w DONGW39rUJ N h 0mm 6 I NcLhm31 VHMZ m5 9E 15 I WrTB up NMTMTNN o uxm Pam Inscums 17 1 MnaM 11 9 47mm V lt9 UL 1 a 3901 r V 39 quot 9 19 5 3 3r 7 O L LG 3 MW NEW 137 4 qu rW 1n h quot 9 ofjv 4 O L WA T fp r 7 MP Q rw yv 1157 OJrz f 63 N ME 9 39970 i 15 0quot A 210z oocOo L H11 lStabu A as MEI 71 7 82 af 0 k quotZ now fAh mar L m V my H66 A L13 1 QL Z 9879 1 5 a C 33 537 7quot57 LE Winner VEwVIm 55 4 IN Hurt W5 1 757 IN ouLbf M L7 mGHLVI PAA L E 4 EMT I amquot MP TAT NQ gov 30mm 6 NOTquot ALF one 99 1 312 firh pm my I39 No aucMTA39ATcy 39 I 9 I V R v 55 1quot Q y T T INF PLEMqE m DOWVT MAPS r5 0 rquot1 17 69 Qi39tr r qqo 32 r6 Y hf jol 39f J Hquot r NOT ban44 IN I 39hr mp5 9 1 q 611 o flur rt P P r I a AAL 5 3 91 W9 Q IS ZNIn Cem sm fz 71 y u q a aquot 1 F LltPn391y r I exhaummh 097st9 2 9 fit65 e K 4 Q w u RI Z 0 1vhf IL pHp39 N T fr rng 9 f395 61 1Y T 1 m a 7 g 1 46 moms Arrow Mum 4 yquot DM IN v39Pz 0F PA MILC39QH Q uy WAVE he rr7 fc7 w DEL If MODEL UV 0 Mau 39 645511 To 1141 FNL ETMLr W TIKN PF L barher 01v PIN4quot Hquot w Quah nder Put MEL 139 7 UL wlLr Jquot 393 LS Z r 2 39 Chip 7rw r 9 7 0 v905 1 39L 39L 1 1 5 1 7 L 4 3 quotf1lz V C 19 39 391 quot gm 012631 1 7S Wr ft 31S wk f Glrtg p I l L L 7 1 0 U N r h 17 37 D U TH lfcr r0 T39lHf JIF39P Tr rTng Q Li I WIT1 7 a n o39ff EL ML if WT V L filth f N M7100 L39ln m VIC 4 1 L p 5AM quot fu 7 A M 4 9319 m S th W E f 1 o 039 3 LE 394 83k Log Em Q a myquot Yam 4quot HWY Q Q RrL LEA47 I 5 31pm L39 1 723 p gtzuoa pa 391quot 09 lt H gt160 I V39 J2r1 r WW W Q 5 quot hu Oquot 39 51L quot w M EL S WEIQWQW Lyg 13s 0455 Mdpowvi Z I N Vm fbw loo Puch LoADED ow so Iv1 T Y W f rpmEng 5997 0 J0 VDML EVEftV L O 9 DAY Uvr rm M u rH
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'