gt1 UCLA STAT 13 Statistical Methods Midterm Review Solutions Chapter 1 What is Statistics 4 3 i The treatment being compared is Study 1 the number of storeys from which the cat fell Study 2 the gender of the student Study 3 the style of the commercial ii Study 1 An observational study There is no allocation by the researcher of cats to the number of storeys of the fall Results are simply observed for cases that happen Study 2 An observational study There is no allocation by the researcher of subjects the students to the groups male or female S u 3 An experiment The researcher allocates which commercial is to be watched by each subject shopper iii It is not possible to do an experiment for study 1 due to ethical and moral considerations To eriment a sample of cats would have to be allocated a height and then thrown out of a window at that height It is not possible to do an experiment for study 2 as the researcher cannot allocate a gender to a student 5 3 4 Selfselection selection bias i False A sample of 500 is large enough to give a use Jl indicative result ii True Homeless people cannot be contacted by telephone iii True Telephone polls will usually have some nonresponse bias There is no indication of how vigorously nonrespondents were followed up iv False Control groups are not necessary in polls or surveys Control groups are o en used when comparisons Want to be made ie in observational studies and experiments 2 Chums 7 1m 5m Emlnn39l l39nimr39nm Dun mm mm 3 DA indv14 aha x a M wwwmw A xnmumu a w w w 3an a m a m Chapter 3 Exploratory Tools for Relationships Section A Types of Variables 1 a b V 2 a b 3 a b 4 a b C Quantitative variables are measurements and counts Qualitative variables describe group membership Variables with few repeatedvalues are treated as continuous Variables with many repeatedvalues are treated as discrete Variables with order are called ordinal Variables without order are called categorical To explore the relationship between two quantitative variables we use a scatter plot To explore relationships between a qualitative variable and a quantitative variable we use m plots stemiandileaf plots and M plots To explore the relationship between two qualitative variables we use a twtywax table of counts Section B Two Variables 1 2 3 3 5 4 Sales Revenue versus Advertising Costs Sales Revenue 39OOOS 6 8 10 12 14 16 18 20 22 24 26 Adve nisi ng Costs OOOs Interpretation As advertising costs increase sales revenue also increases The relationship is not linear 7 the increase in sales revenue decreases as advenising costs increase There are no outliers in this ta The amount of scatter about the trend cmve is small ill a b Paci c Ocean rivers Tasman Sea rivers Me 64 Q1 48 Q3 76 Longihs at Major Finw n 1m ur Islam Rm llljlll ir Note For the rivers owing into the Paci c Ocean R 105 15 x IQR 1575 Q3 15 x IQR 3265 Q1 715 x IQR 935 There are no outside values the whiskers end at 48 lower and 322 upper For the rivers owing into the Tasman Sea IQR 815xIQR42Q315x1QR118Q1715xIQR6 177 and 121 are outside values the whiskers end at 32 lower and 108 upper On average the rivers owing into the Paci c Ocean are longer The lengths of the rivers owing into the Paci c Ocean have a larger spread than the lengths of the rivers owing into the Tasman Sea The lengths of the rivers owing into the Paci c Ocean are skewed to the right positively skewed The Grey River and Buller River are outliers amongst the rivers owing into the Tasman Sea 11gt a b c V d a b d a b c 01 Chapter 4 Probabilities and Proportions Females Males Total Stat11 03003 02949 05952 Stat13 01924 02123 04048 Total 04928 05072 1 i g 02949 ii E 04928 iii 01924 2011 2011 2011 E 05246 814 E 04186 1020 40 or under Over40 Total Wearing a seat belt 0484 0369 0853 Not Wearing seat belt 0066 0081 0147 Total 055 045 1 0147 E 0551 0147 055 Under40 40 or over Total Mild cases 16 20 36 Serious cases 15 35 50 Total 31 55 86 16 20 35 15 35 i 01860 li 08140 iii 04070 86 86 86 E 03 50 E 03636 55 a b a b c a V 2 Highrisk Lowrisk Total In default 40 of005 002 003 005 Notindefault 02185 77 of09507315 095 Total 02385 07615 1 2385 002 00839 02385 Female Male Total Business degree 27 of0 175 004725 012775 0175 Other degree 042281 4875 of0825 040219 0825 Total 047006 052994 1 052994 012775 004725 01005 047006 Chapter 5 Discrete Random Variables Section A Discrete Random Variables 1 a x 0 1 2 prXx I 025 05 025 b i prXgt 1025 ii prX21075 iii prXSZl 2 a prYgt12003028012043 b prYS10010007025042 c prY26 l 7prYlt 617010 09 d pr6 5 Y 12 007 025 015 047 e pr10 5 Y 12 025 015 04 t prl3 lt Ylt 25 028 Section B Binomial Distribution 1 Let X be the number of customers Who purchase at least one book XN Binomial n 7p 03 prX2 2 17mes 1 17 0329 0671 2 a n 10p 005 b There is a xed number of trials 10 Each disk drive is a trial Each trial has 2 outcomes Disk drive malfunctions or disk drive does not malfunction The disk drives are independent The probability that a disk drive mal mctions is constant c The rst two assumptions Will be satis ed The disk drives may not be independent Disk drives could be made from the same batch of materials or may have the same systematic fault The probability of a disk drive malfunctioning will not be constant because it will depend on how a disk drive is used iii prX2 2 17mg 1 170914 0086 iv pr2 SXS 5prXS 57prXS 1 100 70914 0086 Chapter 6 Continuous Random Variables Section A Probability Density Function Quiz 1 Areas under the density curve represent probabilities The probability that a random observation falls between a and bis equal to the area between the density curve and the xaxis from x a andx b 2 The total area under the curve equals 1 3 The population mean u is the point where the density curve balances 4 No because for a continuous random variable pra 5X 17 praltXS b pra SXlt b praltXlt b areaunderthe curve between a and b 5 The curve is bellshaped symmetrical and centred at u The standard deviation 0 governs the spread 6 The parameters are u and 6 Section B Normal Distribution 1 a prXlt 245 00947 b pr255 ltXlt 280 prXlt 280 7 prXlt 255 08092 7 02459 05633 c prXgt 287 1 7pIXlt 287 17 09053 00947 2 LetXbe the survival time inmonths ofa cancer patient on this drug a i mes 12 01163 pr12 ltXlt 24 prXlt 247prXlt12 03286 7 01163 02123 iii prXgtx 08 therefore prXltx 02 and sox 176341 80 of the patients live beyond 176 months iv praltXlt b 08 prXlt a 01 and so a 105932 prXlt b 09 and so I 516048 The mnge ofthe central 80 ofsurvival times is from 106 to 516 months A V b V There are some doubts about the validity of the assumption that survival times are Normally distributed Although the data is roughly symmetrical there is a gap in the centre which could indicate bimodality of survival times The tails seem too short for the underlying distribution to have a Normal distribution 3 LetXbe the maximum distance reached by a pilot without moving the seat a prX2 120 17 prXS 120 17 03085 06915 b prX2 x 095 therefore prXltx 005 and sox 1085515 The maximum distance at which the switch should be placed is 109cm V c V i That this pilot s maximum reach is 15 standard deviations above the mean ii x 125 15 x 10 140cm Azscore of 15 corresponds to amaximum reach of 140cm 61 Section C Combining Random Variables b C 4 1 5 1 6 a Let G be the charge for arandomly chosen gardeningjob G 7 Normal 1 25 U 3 XG1GZG3G4G5G6 EXEG1GZ 03 G4G5 G66gtltEG 6x25150 sdXsdG1 GZ 03 04 05 06 J3 110 J6 x3735 VIP5quot b i LetM be the charge for arandomly chosen mowingjob M 7 Normal 1 15 U 2 YM1MZM11 EYEM1 MZ MH 11 Mam1 11 x 15 165 sdYsdM1MZMH J x sdM J x2663 ii TX Y iii T has a Normal distribution because it is a combination of Normally distributed mndom variables iv ETEX YEXEY 150 165 315 sdT sdX Y jsdXZ sdYZ I735Z 663Z 990 In order to calculate the standard deviation of T we had to assume that X and Y are independent random variables c V Let Mbe the charge for a randomly chosen mowingjob M7 Normal 1 15 U 2 W 52M EW 52EM 52x 15 