### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# STAT METH RESEARCH 1 STA 6166

UF

GPA 3.83

### View Full Document

## 58

## 0

## Popular in Course

## Popular in Statistics

This 250 page Class Notes was uploaded by Golden Bernhard on Friday September 18, 2015. The Class Notes belongs to STA 6166 at University of Florida taught by Salvador Gezan in Fall. Since its upload, it has received 58 views. For similar materials see /class/206551/sta-6166-university-of-florida in Statistics at University of Florida.

## Popular in Statistics

## Reviews for STAT METH RESEARCH 1

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/18/15

quot quot39 of Variance Comparing More Than Two Means In a designed experiment the analyst controls the treatments and the selection of experimental units to each V treatment In an observational study A the analyst observes the treatment and response on a sample of experimental units perimenter Quantitative factors are measures on a numerical scales Qualitative factors are those that are not naturally measured on a numerical scales Factor levels are the values ofthe factor utilized in the experiment Population of Experimental Units Sample of Experimental Units Apply factorlevel combinations 7 I T T in in T 7777 jI77777777777777777i 7quot 77 77 77 77 7 77 7 7 739 Iquot quot39I7 7 7 7 quotx777 77 I 39I 39IV lv 1quot 3939 3939 3 3 l 39le III III lquot lquot I I u I39 I I I I I I I I I I I I I I I I I 7 I l 39 l 1 l I I quotI l39x l39x l I l I I39I 39I 39 7 I quotI I quotI I quotx I I I vquot quot39 39 I quot I I quotquot39 39 I 39 quot quot I quotI I 397 ll 7 39I I v 39I iI 397 39I II 7 39IL 7 39I If i 39IV 7 77 Il 7 l 7 l 7 I T I I 39I IL 7 77 77 I 7 I 77 77 LV 7 II 17 I39 if 39 7 Q 39I 39l397 7T7quot 7 IL Li Y L f39 I7 Fifiquot J quotL 7quot L I L quot7 II I quot1747 I 73993 L 1 7L7 7 1 39 7 I i 7quot I 777 7r 7 7 L i 7 I 25 7 71 I I I 7x l i 1 y i F V 3 3 I I quot l l I 7 L VV 139 9 I I Y I i I L l l xx F 39 39 I I I I 39 x II I L39 I II I I 77 1 II II I I F I 39 I Ir 39 e a r 7 r 39 I I I 39I I I I I r 39 I I I I 39 IIiI I I I II I39I quot I quot27 Lquot I quotI I 739 IV39I l 7 quotI I I I quot 39 x 39 a 39 7 quot L I I I l I 7 I I 39 x I A 7 39 quotI 7 I quot 1 I I I quot7 lquot 7 Vquot c quot quotI H39 7 I quotI 39 397 V I I V 7 I f 13 I 7 9 I V W 39 39 s 39t39 39 quot quot n a quot t a 39 39 39c39 39 39t39 quot 39i39 39 a V 39 quot quot 39 V a 39 t39 y 39c39 x 39 1 39II 39II 39I I I I A 1 v I l39 l I II Iquot 39 7 I 7 7 I 39I l quotI 39I II 4v l39l I39l II I I quotI r quotI l I quotI II I 39 39 quotr 39 x x 39 39 39I 39 H H quot I I quot 395 quot 7 39l l quot quotLquot ul quot 39 39 l r v gt r 3 gt t i 39 I 39v w quot I u39o gt w 7 quotHI quot I quot quot 7 7 39 7W 739 7 7 I I n I I 7 I 7 I Iquot 7 I I 72 If I quotTll I 397quot 7 I quot 7 quot7 397 I Lquot39 39 39 39 39 quotI 7 f7 g I l l I V W I I 39 7 I 7 I I quot77 I I quot 397 7 I 77 l l 39 I I I J i I39 I39 7 39 77 l 1 I j I I I 39I I 3939 7 7 I 7 J 39 II I J I I39 a 39I I7 7 l 39I7I I I s l I 39l 39I I I I39 l 39 r I I I 7739 I I If 7 l 1 I I a x I I I I39 j 39I 7 34 I r 39I 39 I II I W I V I I I quot I a I r I A 397 I I I I m I I I v 39 I quot39w I 7 I quot I I 7 quot I I 39739 I I I I r 7 I 7I I I 7 77 x I II 77 7 7777 I 7 I 7 I 77 7 I 7777 If I 7 7I I I 7 I 77 I 7 7777 II 7 7 77 39 quot I quot quotI quotI quot quot 39 39 quot 5 quotI I I 7 II 7 quot1 quot1 I I quotI Ill Ill yllI quotI l I Il Analysis of variance or ANOVA is the tool used to evaluate the significance of quantitative or qualitative factors AM randomly applied to 2 pots 12 pots 4 plants per pot 4 treatments each randomly applied to 1 plant per pot l8d as one batch I COMPARISON testing hypothesis Variability background noise measurement error Comparing Means or groups F test by using ANOVA o Hypothesis BHm hzs LIIC uuacl vcu val auun uchvccn suuup IIICaII I slcaLCI Lllall background variation then we reject H17 F test imental units are SEIECtEd for each treatment Designed Experiments subjects assigned at random to one of the k quot 39 quot to be compared Observational Studies subjects are sampled from k existing groups Statistical model yj is measurement from thejth subject from group i u8 uor8 is the effect oftreatment i i 1 2 k a is a random errorfor subjectjon the group ij 1 2 nl Example Part alcium pot trial This example is based in a pot experiment to investigate the effect of different calcium levels in the soil on plant growth Four relative concentrations were tested A B C D each with five replicates arranged in a CRD The total root lengths obtained from each pot on a given data are quothemquot below together with the design Extract information about D l A D C C The mean response for each calcium 49 72 45 72 76 concentration D A D B B 63 57 39 72 66 The variability clue to differences B B C D B between the calcium concentrations 80 75 74 56 80 the signal A c A c A 56 7o 68 65 64 The background variability clue to irr erences between replicate plots the noise lqug 10 8 k 2 is a random error for subjectjon the group L 1 2 n 72 k 2 nlnz4 k nan448 11 quotquotquotquot quot for degrees of freedom Therefore we need to partition the total variability into two components SST0tal SST SSE SST0tal SST SSE djpTotal dfT k ni 2 2121 2121071 dfTotal n 1 UIUUP Ddlllplcl leldLlUll SST 211216 1W ELIMi 1W 01 k 1 21 i2 21711 1Sl2 n 1 SST0tal 211219 7 3 2 SSE 22213 if 59 7 m 55 7 m 57 7 m 56 3 E 55 V 54 7 3 3 53 E 1 52 H 51 11121314 2122 2324 I Tquot 1 T 2 SST ZLZLQ if 21 7 a m W irltyzj 53972 2 wiwg 41 1 i1 f jg 33 1s ng1Sk WS defy mg MOW Square Ermr 135 331 Fa MST MSE 39t Folt 1 the difference between the treatment means m y be attributable to sampling error memmmn numemnr M z enummmnr di 9 mm Example Part II alcium pot trial This example is based in a pot experiment to investigate the effect of different calcium levels in the soil on plant growth Four relative concentrations were tested A B C D each with five replicates arranged in a CRD The total root lengths obtained from each pot on a given data are quothemquot below together with the design Extract information about D l A D C C The mean response for each calcium 49 72 45 72 76 concentration D A D B B 63 57 39 72 66 The variability clue to differences B B C D B between the calcium concentrations 80 75 74 56 80 the signal A c A c A 56 7o 68 65 64 The background variability clue to irr erences between replicate plots the noise Example Part II t Defining the Hypothesis H0 uA uB uC uD there are no differences amongst the four means Ha at least one pair of means is different Treatment Structure Simple Treatments Design Structure No blocks CR Step 3 Calculating Means A B c D i1 i2 i3 i4 D A D C C 72 72 72 49 49 72 45 72 76 57 66 76 45 D A D B B 63 57 39 72 66 56 80 74 63 B B C D B 68 75 7O 39 80 75 74 56 80 64 80 65 56 A C A C A 56 7O 68 65 64 634 746 714 504 y 6495 Example Part II 4 Calculation of the sums of squares k ni SSTOZal Z ZFI yij y 2 k nl SST 212110 y2 y 6495 139 quot yij yij 7 yij 72 1 1 72 705 497025 1 2 57 795 632025 1 3 56 VW 801025 1 4 68 305 93025 1 5 64 095 9025 4 4 39 2595 673402 4 5 56 895 801025 Sum 249695 SSE 2121yy if AU B12 03 D14 634 746 714 504 i if 7 7 7 if 1 634 155 24025 2 746 965 931225 3 714 645 416025 4 504 1455 2117025 Sum 34883 SST0tal SSTreatment SSError Example Part II p 5 The ANOVA table HO MA MB MC JD there are no differences amongst the four means Ha at least one pair of means is different F0 2 signal noise MST MSE Variance ratio Source of Sum of Mean PrObablllty df F ratio L I variation squares square 9V9 Treatment 3 174415 58133 1236 lt 0001 Residual 16 75280 4705 Total 19 2496 95 F0053 16 324 F0013 16 529 p 6 Validating the ANOVA Check residuals before anything else test to compare k treatment means yijul8lj 0r yijyai8ij H012k H0a1a2ak20 H a i at least one pair is different H a at least one pair of at 39 sis different Test Statistic F0 2 MST MSE Rejection Region F0 2 Fak1nk p value P Fk1nk 2 F0 A N OVA Ta b e Treatments 1 SST MST SS T k 1 F 0MST MSE TM nk SSE MSE SSEnk Total nI SS Total u5 ua8 EMSE 039 k 7205 EMST02Zquot1 k l k 2 i1 i i 2 k EltMSTgt0 k l 1 EMSE 02 02k 1 EMST WhenH 1st139ue 1 0 1 2 le EUWSE otherwise H a is true w gt 1 EMSE Analysis of Variance Conditions required for a valid ANOVA FTest 1 The samples are randomly selected in an independent manner from the k treatment or rou o ulations 2 All k sampled populations have distributions that are approximately normal The k population variances are equal Examination of residuals to detect potential departures from the basic accllmntinnc yij ui 8139 0 Residual errors are independent A A o Variability of residual errors is yij ui yi homogeneous A A 0 Residual errors belong to a Normal 52139 yz39j yz39j yij yi distribution 50 N002 Analysis of Variance Ensures fair allocation of treatments to experimental units Avoids experimenter bias treatment differences It does not guarantee that any individual randomization will be unbiased L r lllllll lIlIlIlIlIlIIIJIJ which occurs with probability 1 in 12870 Step 6 Validating the ANOVA very important Residuals Histogramofresiduals Normal pld rzurlarlurua EIEI D5 in 15 2D uuantiles Residuals Absolute values of residuals FittedValue pld Fittedvalues HalfNormal pld quartiles 8 yjjyij yijyi 0 Residual errors are independent 0 Variability of residual errors is homogeneous 0 Residual errors belong to a Normal distribution 50 N002 Histogram of residuals Fitted values vs residuals Normal probability plot Halfnormal plot Example Part III alcium pot trial This example is based in a pot experiment to investigate the effect of different calcium levels in the soil on plant growth Four relative concentrations were tested A B C D each with five replicates arranged in a CRD The total root lengths obtained from each pot on a given data are quothemquot below together with the design Extract information about D l A D C C The mean response for each calcium 49 72 45 72 76 concentration D A D B B 63 57 39 72 66 The variability clue to differences B B C D B between the calcium concentrations 80 75 74 56 80 the signal A c A c A 56 7o 68 65 64 The background variability clue to irr erences between replicate plots the noise xample on 3A3 data root mput trt length N datallnes susswnnnnnmmmmmz N a Example Part III proc sort dataroot by trtrun proc means dataroot mean min max std n by trt run lhe quotsans Prncedure nalysls Varlahle length Mean quotInlmum Naxlmum Stu Dev u sa4oooooo 5sooooooo 72noooooo ssl37544 5 LrLB nalysls Varlahle length Mean nlnlnun Maxlmum Std Dev N 74soooooo ssooooooo aoooooooo 5355I525 5 nalysls Varlahle length Mean quotInlmum axlmum Stu Dev u 74000000 s5ooooooo 75noooooo 42lsoo4s 5 nalysls Varlahle length Mean quotInlmum Naxlmum Stu Dev u 5o4oooooo asooooooo 53 0000000 savols54 5 me ELM Prncedure Number nf hservallnns Used 20 Denendenl Variable length Sum nf Source nr Squares Mean S uare F Value Pr gt F Example Part III Plat of resid nred LEgend n I abs 3 2 abs etc rssld I5 n IO h B 5 A A a 1d A a a probplot resid normalmuest sigmaest o n a histogram re51d run 5 a n I n w n I5 I K I I I I 395 s 5E 36 s39 75 1539 rm IEI A 5 F 4 H Ia 57 I I I I I I I I I I 5 m 25 50 75 90 95 99 NarmaI Percenules and treatmentj is SED s ii MSEx ii n n n n 7 7 taZx 5131 ft 7 taZx SED 39 e considered statistically different Wltn Significance ievei 0i Remember 52 MSE is a pooled estimate of the variance for ALL treatments Source of df sum of mean variance probability variation squares square ratio level Treatment 3 174415 58133 1236 lt 0001 Residual 16 75280 4705 Total 19 249695 A B c D 634 746 714 504 t002515quot41i A 95 CI for the mean of treatment B S2 p pic taZXSEM 3 J71 746 212gtlt307 MSE 6809 7 8111 taZXSEM 746212x307 Source of df sum of mean variance probability variation squares square ratio level Treatment 3 174415 58133 1236 lt 0001 Residual 16 75280 4705 Total 19 249695 A D 634 746 714 504 toms 16 Lil A 95 CI for difference between B amp D i yj tmxSED y 1500 33 139 yjt 40 052 X SED as mistakes 5 m the time FALSE P lTIVEg 1 Under i 1 1 05k 05f 06 Numberof Number of Treatments Comparisons of means different when in fact they are not different Notice how this changes when k 15 This seems like too much error on reporting our results Hence some control for the experimentwise error is needed pairwise comparisons among all pairs of groups Tukey s Method Specifically compares all c kk12 pairs of Utilizes a specnal taple Bonferroni s Method Adjusts individual comparison error rates F test For each pair of groups compute the least significant difference llSDl that the sample means need to differ by to conclude the population means are not equal LSDij rat2 MSE 1 i rm gtltSED with df n t I ll n Conclude 1 7 y if j 2 LSDij Fisher39 5 Con dence Interval yi ji LSDZ a 0051 signi cance level rejectinn region cubnff s mime ememe than when a 005 Critical tsvaluas are given in special tables t mnarisans 6f Means A B C D 634 746 714 504 SEM Sp SEDs i 117 p n to025 16 212 Trt Mean SE EE LL UL A 634 3068 6503 5690 6990 B 746 3068 6503 6810 8110 C 714 3068 6503 6490 7790 D 504 3068 6503 4390 5690 A B C D Trt Mean 634 746 714 504 A 634 00 B 746 112 00 C 714 80 32 00 D 504 130 242 210 00 SED 434 LSD 920 t lests LSD fur length N YE ths test egntrgls the type I cnmnarlsnnulse errgr rate net the exgerlnentnlse errgr rate Ennferrnnl Dunn t lests fur length NulE ths test egntrgls the type I exgerlnentnlse errgr rate hut t general y has a hgher lyge ll errgr rate than REEuu lnha r 005 quot apter 6 Sampling Distributions Parameter vs Sample Statistic Mean I Tc Variance 02 62 52 St Deviation o 6 S Proportion p f Sampling Distribution ples from a o ulation the sample statistic used to estimate the population parameter is itself a random variable The sam lin distribution of a sample statistic calculated from a sample of n measurements is the 39 b b quot distribution of the statistic httpwww mf rirn min ane5tat Pmm pupulaiuncmb2 chugsiwnhlhz muse Sample um Dismhnmmnl39Mems 2 quot dist Properties of Sampling Distributions quottat st quot is said to be an unbiased estimate of the parameter 0 If two alternative sample statistics are both unbiased the one with the smaller standard deviation is preferred minimum variance z Unbiased Biased Sampling Distribution of The mean of the sampling The standard deviation of the distribution equals the mean sampling distribution the standard of the population error of the mean equals 039 XE7C Ez If Exly EltxgtEleijxnlu g Va ch Ma pzxijzgjmazwe Sampling Distribution of distributed 2 Nyo 2 gt39cNyaJZ Example standard deviation were 100 mg and 71 mg respectively a Find the probability that the average caffeine contents will be 39 etween 99 and 101 mg b How likely are you to get on those 50 cups an average caffeine content higher than 103 mg Central Limit Theorem CLT x1 mow z 7c NowJ2 Sampling distributions of averages will become more like a normal distribution as n increases regardless of the shape of the population of individual measurementslll a Population relative frequency distribution b Sampling distribution of Central Limit Theorem CLT Sampling Sampling Original distribution of distribution of distribution of population 2 for n 2 A A P1 A A htt wwwrufriceedu lane stat sim sam Iin dist htt webuvicca esfchan stats Centa20Limit20Theorem20 let Centra20Limit20Theoremhtm Example gion has an average IMECA reading of 102 with a standard deviation of 40 If a sample of 50 days is measured what is the sampling distribution of a sample mean b Find the probability that for a sample of 50 days the average Example b Find the probability thatfor a sample of50 days the average IMECA reading is less than 100 Pxlt100P 128 J5 Pzlt 7035 7 Dter 8 Inferences Based on a Single Sample Tec rc of Hypotheses M 2393 an scientist is planning a sampling in a river where she plans to catch 200 fish samples Typically only 6 of the specimens belong to a particular specie that she is interested on study further After field out of 200 of interest do you think this is an unusual outcome Earlier we calculated a 95Cl of the proportion using the Wilson s adjustment as 170 CI iz 1 p p m n4 N 2 22 px 200196 n4 2004 CI p 2 Then we have some doubts about the population proportion p being 6 Fx H0 p p0a specified value for p Alternative hypothesis Ha Contradicts the null hypothesis Stated f parameters Will only be accepted if strong evidence refutes H0 based on sample data Ex H p 7 p0 Null Hypothesis H0 This will be supported unless the data provide evidence that it is false The status quo Alternative Hypothesis Ha oThis will be supported if the data provide sufficient evidence that it is true The research hypothesis reier rinn of the null hypothesis in favor of the alternative hypothesis pvalue Probability assuming H0 true that we would observe sample data test statistic this extreme or more extreme in favor of the alternative hypothesis Ha 9 u 5211 Hg 1 526 r 91 Type ma If the test statistic has a very low probability when H is true then Hois rejected PType Error a typically 010 005 or 001 Type II Error Test resulting in failure to reject H0 in favor of Ha when in fact H is true H0 is false PType Error 8 depends on true parameter value H0 is true CORRECT Type Error 01 H0 is True Type II Error 3 CORRECT Note Null hypotheses are either rejected or else there is insufficient evidence to reject them ie we don t accept null hypotheses 2Tailed Test lest where the alternative nypotneSIs is that the parameter is not equal to null value simultaneously tests greater than and less than The null hypothesis is usually stated as an equality even though the alternative hypothesis can be either an equality or an inequality Population Mean Rejection Regions for Common Values of 01 010 090 zolt 128 20gt 128 zO gt 1645 005 095 20 lt 1645 20 gt 1645 zO gt 196 001 099 20 lt 233 20 gt 233 zn gt 2575 MTailed Test TwoTailed Test HotFug H01Mu0 Hatultorgtx0 HatHMO Test Statistic 20 0 Test Statistic o quot The sample size n is large ZO igni cance Level Test Statistic x 005 Dmnhumn an m mil he Test stansnc tummy 1 645 mmmmmn u the Test stansnc Irmshmmmn ignificance Level x 005 Test Statistic n 25 J 545 a 50 mm W 7 5457 520 7 7 5 505 Steps of any Hypothesis Test Steps of any Hypothesis Test the alternative hypothesis Ha How likely is it to observe a sample mean as far of farther from the value of the parameter under the null hypothesis ie assuming quot null hypothesis is true Pl 2 l2 za lHO two sided test P 2720 H P 2220 H l mama2 Prvalue 3910 U 1n 71 a E In my 39n 5 W1 7 h u 739 rm L a F n I 4 mm L I u i uumummymmmmnmm mummmmmnnmmmmmmmmmmmmmmm mmmmummumum nmnmnu H H quot r d 39 wxIxyxIxmmumnmmm H a a m TI Lh 54 V n 1 m 7 1 F I I r r u v I quotI r j un r gt n l m V V n r a Z WWWWIWIIIW 1 limwuwwmum m u no Pz 2 20 g a v 4 l 44 u L 1 u 7 7 u r w 7 F u a 7 a 7 1 r j Z In J u m y L L F H 3 u a 2 v w F s II F E L H 1 L m 7 J H w l 4 I j m g 7 r u 7 H If entire interval gt no 9 pvalue lt a 9 20 gt zo2 then conclude n gt no If entire interval lt no 9 pvalue lt a 9 20 lt zo2 then conclude n lt no If interval contains no 9 pvalue gt a 9 zo2 lt 20 lt zo2 ail to reject the null hypothesis for based only on a 2sided test SUMMARY a b c Construct a 95 confidence interval on the mean mercury concentrations after the accident Is there Sufficient evidence that the mean mercury concentrations has changed since the accident Use a 5 Calculate the pvalue associated with this test u120 0 2032 n15 f1406 fter the accident E u gt1406i1961406i0162 12441568 ujfi196 I Is there sufficient evidence that the mean mercury concentrations has changed since the accident Use a 5 H0 41120 C yo 1406 120 Z0 2 Ha 4172120 0 032MB 249 20 gtzo025 196 H0 u120 Tc yo 1406 120 20 2 Ha u at 120 03 032MB 249 I 20 gt 20025 196 Hence we reject H0 P Z 2Z0 H P Z u P ZS P Z2 2 gtltP Z 2gtlt 05 04936 2 00128 Population Mean l1l10 H05l1l10 HaLltorgt10 HatHMO f 2x 0 TestStatistic t u0 0 sJ 0 SxZ smallsample pvalueZ a pvalueZ 0 t0 L E a L r 7 r 1 y m J u 1 W I39J C Fw39 n U j n 1 39H L j 1 I 1 1 g m W W n v m I a L 3 v J 7 I A 7 r 4 t f m a g y E 7 11 m r I E r V J U W J L m 7 1 M W l m J aw H m 39 In m y E w r v 1 L M 1 7h JV gt h x l M I 1 J V m fl fz i l fW fW 4 J a new process for mining copper is to be put into fulltime operation it must produce an average of more than 50 tons of ore per day A 15day trial period gave the results shown in the table Day Yield tons 1 578 2 583 3 503 4 385 5 479 6 157 7 386 8 1402 9 393 10 1387 11 492 12 1397 13 483 14 592 15 497 n15 x7418 s4418 052005 H0u50 Ha ugt x yo 7418 50 2418 22120 t quot sJZ 4418JE 1141 we re ect H 0 Equivalently p value Pt2 to H0 Pt2 2120 7 0025 lt p value lt 005 exact value 00262 dfn114 Hence p value lt u v Wml um Mfggt1 Population Proportion OneTailed Test TwoTailed Test H05 p p0 A Ha I p i p0 A Zo p p0 Test Statistic Zo 0 1130 0 190 Dquotquot quotquotquotquot Region 20 gt20 Rejection Region 20 gtzm2 p0 hypothesized value of p 0130 M and q0 1 pO n pO 2 15 and nx1 po 2 15 ducts the dairy commissioned a market research in its sales area A random sample of 250 individuals showed that 86 of them suffer from milk intolerance milk intolerance The marketing of this product will be convenient if there are is at least a proportion of 30 of individuals that suffers milk intolerance Evaluate this hypothesis using a uu3 a n p 2 Calculate a 90 confidence interval for the population proportion that suffers milk intolerance 39 A2 0344gtlt1 0344 039 A 0030 2 p 250 AA C1ltpgt lama20 erupt2V C1p 0344i1645 gtlt003O 0344 i 0049 0295 039339 b n p The marketing of this product will be convenient if there are is at least a proportion of 30 of individuals that suffers milk intolerance Evaluate this hypothesis using a 005 H p 06 03gtlt1 03 O39AO p0q0 P l n 250 p po 0344 030 202 1518 0 0029 252005 1645 then 1518 zolgtzo05 1645 PZZZOH PZZ H Power of a Test Type I Error Test resulting in rejection of H0 in favor of H when H0 is in fact true PType Error a typically 010 005 or 001 PType Error depends on true parameter value Reality Test Result Do not reject H0 Reject H0 H0 is true CORRECT Type Error 1 H0 is True Type II Error 3 CORRECT a in the alternative hypothesis power 1 B probability that the test will correctly lead to the rejection of the null hypothesis for a particular value of pa in the alternative hypothesis 39 Illustration 1sided Test 6PZSZO lHa Power 1 1 PZ ZO Ha fNNltIL10907c fNNlt aao i mmmmmn u the Test stansnc Irmshmmmn ignificance Level x 005 Test Statistic n 25 J 545 a 50 mm W 7 5457 520 7 7 5 505 Evaluate the power of this study considering that the H0 H 520 true value ofthe alternative hypothesis is ya 550 Ha H gt 520 bushels per acre ZNN52010 ZNN55010 52m 55H m Pz za lH Pz 1645lH Px u1645X0YlH PZ 5201645gtlt10lH PZ 53645lH Pz gw Pz 71355 05704121 00879 Power 09121 For fixed n 40 and ya the power decreases as the value of 0L is decreased For fixed n OL 40 and ya the power increases as the population variance is decreased For fixed OL 40 and ya the power increases as n is increased 39 39 whfreemancomSCCbriOQQWZ go to power ofa test ncrease of 5 gkg or more ie 67 gkg with a power of 80 and a confidence level of 95 n 262028 Hn39lugt62 Significance Level CL 005 Power 1 B 080 SAS has implemented a procedure based on a ttest not a ztest unu alpha 005 run Yhe P MER Prncedure nesamnle t test far quot2 Fixed Scenarln Elements DISLthuLInn methau Number nf Sldes Null Mean lnha Mean Standard Devlatlnn an n I Pnuer amnuteu u tatal Nurmal Exaet I 52 moS 57 ma parameter value A meaningful effect should be determined and confidence interval reported whenever possible A nonsignificant test result does not imply no effect ie that i M overall Type I Error rates ie experiment wise error rates Population Variance bample variance sz varies from sample to sample just as sample mean does When x N NuU the distribution of a multiple of s2 is ChiSquare 2 n 1s Z 2 039 ur 1 df 4 ChiSquare distributions Positively skewed with positive values over 000 Indexed by its degrees of freedom df Mean df Variance 2 x df 2 2 H0a 00 Hg 02 lt002 02 gt03 6 Z n 1s2 2 00 265 lt 212 r255 gt 25 Conditions H0 02 03 H a 02 7t 03 Test statistic 2 n Us2 950 2 00 Rejection region 2 2 2 2 950 lt X1 a2 0 950 gt 96052 Population Variance 012 2 2 X 1ul2 X u2 H PEI u if39J P i in I3 quot5 II I 39l I39 i if It H W N Equot 139quot EU I ElJlf39lZiL39lil Lil lill39l f l I ll riti Eli H I El If I quotJ Fumm Hr mm meM H wwwm Fm hmmmdw whwmmF mm mam F mm wwwmme mmmwmm mm m Fm mmhmdm 211mm mmm m nETmamH mmmm m mm n m umm Fm mnmgdm mmhw r mFFmrmF F ma ath pther FF warm m m a nu m H in 2 CI Iquot I A quot2 DJ Iquotl TU 39lquotJ I3 139 till m m V I1 I IIII El39 J1 L13 III VF 3 IT Emma 83 Etna mm ng wank 59 31 mmmmmF EFF FF an Emma m mmm mmFd F um mmmmmm FREE ammoka mmmm am m mmamm nH m H FEW lt5 nmhmmt abuer narm r mBm r Fhm dr Emmmr mung hm FE F II If Cl LEI Iquot39u I 39 C11 39739 iEI Iquot 45 magma Emmmm Emma Emma 3 mFmeF EEmF I ummh FEW Fm mmmfF m F mm F an F GM F mumman m mmdr wmm dF DJ J 12quot W 61 I I j Ijjl U l l E IZI39I r I E I head mmFmFd up F d F Emma m Em m mFFF mh moan m mmhm m msme m n Lvm mme u mag m bu F m u m rmmd mFNFuF u quad 1 n IJJI l 391 D F IL I H ICI H39I ra rv 3 If m hmmm hnnm mmn wf hm rmm Fu um hmhrm mammF Emm F N It P 453 3 mmFmd 2 u F mad HEP mica FEE me 53F mmuF q mm m mmmam mmmmm Ewan mmme nmwm Emmi E33 FEE EFF F F m gamma hm Emma Emma E9 ng Eng FEE an F 3 mwmnum ulr FF I 7 m 1 L fquot 1T In 3919 I39 F25 E H FE H F 3 swam E H FF EFF 35 FF FF Ema FF 3 EFF coFQmo Nx Fug FF FF F random sample of 30 containers was selected and will be used to determine whether the variability is maintained at their nominal values Forthis data the sample mean was 500453 and its standard deviation of 4433 grams Is there erW Iah evidence to think that the variability is greater than its nominal value Use a 005 24 u 5000 0 40 x 500453 s 4433 n 30 a 2 n 1s o 2 0 0 Co ea ambica 2 Attractive shrubby evergreen to 25m Suilable for Z n tropical to wa 39 H rvesi be a regularly in NovemberMarch growing pzriud u 5000 039 a n 1s2 30 z2 2 2 0 U 1 0 35618222 0 Co ea ambica Attractive shrubby evergreen to 25m Suilable for tropical to warm empelate climates Harvesi beans June 0 August when bright red Fertilise and water regularly in NovemberMarch growing pzriud 1gtlt44332 40 x 500453 s 4433 n30 35618 16 qgt 5 242557 quot e chisquare distribution with nl df Step 4 Compute the confidence interval for 52 based on the formula see below Step 5 Obtain confidence interval for standard deviation 5 by taking square roots of bounds for 2 l a100 CI for 02 VIW M z zi single number that can be used to estimate the population parameter An interval estimator or confidence interval is a formula that tell us how to use sample data to calculate an interval that estimates a population parameter uIgtxizc77 ENNUIJwZ llnmrtaintv of this estimator variance confidence interval Notice that a random variable that is normally distributed then probability that it will fall within 2 sd ofthe means is Population Mean domly selected confidence interval encloses the unknown population parameter eg 90 95 and 99 039 1001 aC1y x i 26mg x i 2W J on hence we are left with 1000c uncertainty about u Confidence Interval Population Mean Standard Normal Distribution Population Mean Normal Distribution a 06 105 2 2 i u Z 1 uz i aZJ aZJ standard deviation were 100 mg and 71 mg respectively mnrhino Assuming a normal distribution for the caffeine content or using the CLT we can be 95 sure that E u gt100i1967 391100i197 9803 10197 J5 237 i196 Con dence l r a 90 c1 Ei1645 J2 039 950CI xi1960 0 J 039 99CI xi2576 J2 170010004 c1 Eizm E hl12 Mean What do you think about the following statements about a 95 mnfidenm interval for the mean 1 The probability that the population mean is contained in this interval is 95 be used to give an indication of whether the sample mean is a precise estimate of the population IJ quot Mean What do you think about the following statements about a 95 mnfidenm interval for the mean 1 The probability that the population mean is contained in this interval is 95 TRUE On the long run 95 of the time the u will be contained here be used to give an indication of whether the sample mean is a precise estimate of the population TRUE The width of the Cl is related to the magnitude of the standard deviation ple is 335 pounds per box with a standard deviation of 21 pounds Find the 90 confidence interval for the true average weight of the boxes of tomatoes iNuaJZgtN335 21M 90CI Eizo051335i1645 J m 335i055 3295 3405 n 11 2 Ifwe know that Exl and Varltxi 039 Then for the sample mean we have 02 Ex H ValTc n X Binn p E xi n Varx quot1961 But for this variable 1 what is the meaning of the sample mean x 2 Xi H i l 1 Z of successes in sample x x 1 sample size And it can be demonstrated that b39 A ValIA 032 p 1 p n A P 13 N00 p1pn Dnnl IlaFinn We can be 1001 0c confident that A A 1 A 136 C1ppiZa20 piZa21 PiZa2 1 1 1 1 x A A where 13 and 6121 19 n In the Gallup Poll Monthly it was reported that 31 of the people surveyed in a recent poll claimed that vegetables were their least favorite food The poll was based upon a sample of 1001 people Assuming that a random sample was chosen construct a 90 confidence interval for the percent of all Americans who say that vegetables are their least favorite ood 13 031 nx z3103 ngtlt1 f96907 n 02 p1 p Z 6A2 p1 p O31gtlt1 O31 p n p n 1001 13N03100146 001462 CIp13iza2 2 90C1p031i1645gtlt0umo Population Proportion 39quotr39m sample sizes n 2 10 Pretend you have 4 extra individuals 2 successes 2 failures Nx2 p n4 N Nl N piZaZ n4 A x 2 A A 001 ngtlt N2 nx 1 198 p n 200 p p N 2 2 2 A2E1 13 O0196gtlt1 O0196 px 00196 02 n ZOOM n4 2004 0009712 N Nl N CIppiza2 p 1 n4 Clp i X H To be within a certain sampling error 5E ofu with a level of confidence equal to 1001oc we need to solve SE 2062 gt n Za2202 ln SE2 to be sure you don t have too small a sample samole yielded a mean number of lesions of 22 with an standard deviation of 3 How many observations will be needed to estimate the population mean to within i1 lesion with 95 confidence or 005 0 3 SE i1 20025 196 n 201920 2 196gtlt3 2 2 3457 SE 1 2 lpq 2052 pq SE gt 11 ZOlZ l n SE2 Where SE will be the desired level of error and p could be a Estimated from a prior study b Fixed at p 05 will give the maximum variance Note always round the calculated value of n upwards to be sure you don t have too small a sample 39 c 0002 and a 90 degree of confidence if the true proportion is approximately 001 p 001 SE i 2 2 n 2052 1961 1645 gtlt001gtlt099 66974 20025 2 SE 0002 2 x050gtlt050169127 Za22pq 0002 n SE2 opulation Mean 39quotM Sample Small Sample Sampling distribution on i Sampling distribution on i is normal is unknown 0 S CIu xiza2 CIIUx ta2n 1T J2 n Indexed by degrees of freedom df the number of independent observations deviations comprising the estimated standard deviation Have heavier tais more probability over extreme ranges than the zdistribution Converge to the zdistribution as df gets large t0025 1960 t002564 1998 t002532 2037 t002516 2120 t00258 2306 t00254 2776 t00252 4303 Population Mean Small Sample 0 S C10 xiZa2W C1xifa2n4 z3764days Cxila2nli c9 s14days J n15 64i2145gtlt1394 64i078 JG 0 03905 562718 t002514 2145 To be within a certain sampling error SE of with a level of confidence equal to 1001oc we need to solve l 2S2 SEta2nl gt 112 n tz ml also depends on n X to be sure you don t have too quotMquot a sample ple mean of 64 days and a sample standard deviation of 14 days How many records will need to collected in order to estimate the mean stay with an error of 1 day Use a 005 a 005 2 A tor2n 12S2 torZJI l X S 039S14 n SE2 2 SE SE i1 W 2 2262XL4 2100341 2 2nd Try Use 72 11 025210 2228 n 973 N 10 Confidence Level 1 0 Increasing 1 05 implies increasing at a cost in confidence Sample size n Increasing n decreases standard error of estimate margin of error and width of interval Quadrupling measurements the wider the interval Potential ways to reduce 0or s are to focus on more precise target population or use more precise measuring instrument Often hing can be done as nature determines 0or s ultipl Regressmn and quotMedal Building Assign a value of O the base level to one category and 1 to the other categories y2 0 1xl39u kxkg Example alcium pot trial This example is based in a pot experiment to investigate the effect of different calcium levels in the soil on plant growth Four relative concentrations were tested A B C D each with five replicates arranged in a CRD The total root lengths obtained from each pot on a given data are quothmm below together with the design D A D C C 49 72 45 72 76 D A D B B 63 57 39 72 66 B B C D B 80 75 74 56 80 A C A C A 56 70 68 65 64 yu8 0r yu058 yzlui5 yz o 1x1 2x2 3x3g 539 M7 72 57 56 68 64 72 66 80 75 80 72 76 74 7O 65 49 45 63 39 56 UUUUUOOOOOwwwwaDZDZDZDZD gtlt Q OOOOOOOOOOAAAAAOOOOO is it OOOOO A AOOOOOOOOOO AA AAOOOOOOOOOOOOOOO A 634 B CID 746 714 504 1 if y is Observedatlevel 1 1 0 otherwise A Z o o ZIUA IUB 0 1xl 121u13 zuA IUC o 2x2 z ZIUC A 1 o 3x3 3 1 A Note Treatment A is the base level npnpnnpn Varlahlp39 Ipnnth Yhe nLn Frucedure Least Squares quotsans Iangth Standard lrt LSME N Errnr Pr gt m n 534oooono 30575723 ltoool 3 7455000000 30575729 mom 5 714000000 30575723 000 n 5u4oooono 30575723 ltnoo model length x output outZresda 1 x2 x3 ta ppred rZresld InterEEPt I 3340000 3037 7 20 37 lt000I xl I II20000 433320 253 0020I x2 I 300000 433320 I34 00333 x3 I I300000 433320 3 00 00035 Iength Standard Models with Quantitative and Qualitative Variables Kegressmn With groups ariables For example when fitting the relationship between y and X may for several groups or treatments Here the relationship may not be the The effect of groups is assessed through a parallel model analysis Slopes and intercepts are estimated separately for each group A series of hierarchical models is evaluated Model 1 garallel lines Note this is a more powerful analysis than dividing data set and fitting each line separately x The actual times of measurement k2 Genotype Xd A1 A1 A1 A1 A1 A1 A1 A2 A2 A2 A2 A2 A2 A2 yAl 2 ow unx 9 yAZ oA2 unx 9 y u11 OAZ xd lAl 1A2xdx 5 1 ify belongsto A2 d 0 otherwise 44444440000000 Ash Arab mput x y Genotype s xd Xixd datal1nes v 45 38 A1 0 0 a 75 52 A1 0 0 w 95 72 A1 0 0 393 105 87 A1 0 0 2 130 102 A1 0 0 v 150 135 A1 0 0 to 180 150 A1 0 0 a 45 50 A2 1 50 i 80 85 A2 1 85 1 5 91 A21 91 5 115 120 A2 1 120 3 130 125 A2 1 125 o 140133 A2 1133 3 155 152 A2 1 152 5 a Model 0 segarate lines y om OAZXd 51211 1A2xdx 50211 oAzxd lAl x mzxd x Model 1 garallel lines y u11 0A2xd 51 x 5 0A1 0212on 1x Esnnlyps u qu Example mmn reg datanrab model y xd x xix output outZresdata ppred rZresld n the an Prncedure quotml N DELI Dependent Variable y Sign 003537 2m 0001 5 005953 o5a c5090 xJu I y BM zz x 70147 0831x V 13 5 Example output outZresdata ppred rZresld run the REE Prunedure Nudel nunELI Lss i o i is 033353 x I 002557 yin nA1 lx 700100819x 19530819x 932 13332 131x 7 7 Sign 903 0001 3031 000 7 7001019620819 fthe relationship between Arabidopsis bolt height and thermal time is two parallel lines one for each genotype ie Model 1 Fitted an M Uwu vw mm m up vs enotypeA2 4 Le enolypeAl pendent variables have a significant impact on y Screening variables in an attempt to identify the most important ones is known as stepwise regression The mean is equal to O The variance is equal to 02 The probability distribution is a normal distribution 35 Random errors are independent of one another If these assumptions are not valid the results of the regression quottimotim are called into question Checking the validity of the assumptions involves analyzing the residuals of the regression See chapter 11 for more details lrk quotv Regression and ANOVA diagnostics pattern lh the residual plot may indicate a problem Wlth the model a b l l Slight departures from normality will not seriously harm the validity of the estimates ie regression analysis is robust with mathnmatiral function Example Insect count prediction in a field based on temperature and v 0 One or more of the parameters will have large variance 0 Parameter estimates will change considerably if a variable is added or ropped or if small changes in the observations even rounding errors are introduced Multicollinearity collinearity By inspecting the matrix of correlations between explanatory variables Variables with high correlations gt 080 will be potential candidates Presence of a significant overall Ftest in the ANOVA table in combination with all nonsignificant slooe parameters Signs of Bs are opposite to those expected Large standard errors for B parameters 2 VIF 7 coefficient of determination from j 2 1 4 regreSSIon of the explanatory variable 1 against all other explanatory variables gt 10 suggest a problem Note When variables are orthogonal ie independent then VIF 1 Response variable weight Explanatory variables diameter moisture hardness length ength 0883 00Zj i000 hardness 0207 0112 0124 1000 diameter 0887 0021 0999 0125 1000 weight moisture length hardness diameter Diameter 09983 60287 Length 09983 60275 Olsture 0011 1 02 Hardness 00286 103 57 5 x u x zxz Yhe REE Prncedure weight 4719 1693x length 7 0049xhamqm u z 5 5 y n x zxz xza th an pynrpnuyp 73232740665x 70049x 9197x tl l 152 8 Paraneter Standard Variable Dr E tllllalz Errnr t Value Pr gt it Intercupz I 4773456 3535 4264 lt000l length I 1599411 065301 2579 000 ardness l 004501 00mm 252 00035 weight 7 n A length 2 hmdnem A dimneter 8 Paraneter Standard Varlahle nr Ellmate Errnr t Value Pr gt t lnLercenL l 4232235 242305 l334 000 length l 4055475 1550546 252 00094 hardness 1 404323 00ISZ4 297 00034 dianeter l 5EEEZ4 24735 4 372 00003 it change in either of them holding the other constant will be difficult or impossible Use some transformation to expand the range of the explanatory variables Drop some variables from the model lost of information Implement other statistical techniques such as ridge regression z mwmz m an 39 III l39t39lquot u lawsuit lt 21UKZ lt 44n 2 compare proportion Qua tame Data Proportions Independent Sampling E131132p1p2 of the difference of their proportions is 0031 Jp1q1p2q2 131qu132qu n1 n2 n1 n2 pproximately normal ie 131 132 N Np1p120 3132 Dvnnm r mquot Independent Sampling Large Sample 10010L Confidence Interval for 131 132 131 132i ZOlZo39larl32 I3132i ZaZ p1q1 19qu n1 n2 1 1 m n1 n2 5 131 32 i ZaZ Proportions Independent Sampling Large sample test of hypothesis about p1 p2 Taied Test TwoTailed Test Ho p1 39 p2 Do Ho p1 39 p2 Do H3 p1 39 p2 gt Do lt Do Ha p1 39 pz 7 Do Rejection region Rejection region zogtzOC zogtzw2 p1p2Do With n2x1x2 0 nln2 Z 0 Example scientist decided to evaluate the survival of mites to the application of a new insecticide Because it is known that males and females can have different reactions to this chemical 50 individuals of each sex were Iected She located each indIVIduaI mite Into a small piece of rug applied the recommended dose of insecticide and after 48 hour the status of the insect alivedead was recorded The summary table is shown below Male 20 3O 50 Female 5 45 50 TOTAL 25 75 100 Male 20 30 50 Female 5 45 50 TOTAL 25 75 100 a Obtain a 90 confidence interval for the population difference in proportion of dead mites between females and males pf a 010 13f mi Za2613f 13m Male 20 30 50 Female 5 45 50 TOTAL 25 75 100 a Obtain a 90 confidence interval for the population difference in proportion of dead mites between females and males pm A pf pm 13fo mm 00812 nf n 090gtlt010 060gtlt040 50 50 m 13f pmi Za2 f13m 090 060i1645 x 00812 Male 20 30 50 Female 5 45 50 TOTAL 25 75 100 b Conduct a test to evaluate If there is significant difference in the proportion of dead mites between females and males Use a 010 Hoiprpm pf4550090 052010 Ha pm 3050 0 A XfXm nfnm Male 20 30 50 Female 5 45 50 TOTAL 25 75 100 b Conduct a test to evaluate If there is significant difference in the proportion of dead mites between females and males Use a 010 Hoipfpm f4550090 052010 Hapf7 pm pm 2305020 A XfXm 3045 nfnm 100 A A 1 1 f 1 1 on A O75gtltO25 200866 WW p q nf 11m 50 so 20 Iff 1 0 ZMZMM 3464 zo gt 2062 1645 007713quot 00866 Hence we reject H0 075 given level of confidence and a specified sam lin error E it is possible to calculate the required sample size Typically n1 n2 n 371 372 N N11 220971 172 ltz2gt2ltafaggt n tv 2 22afa 2 n1 2 quot2 SE2 1 SE2 Estimates of 012 and 022 will be needed For small sample sizes we use ted2 n1n22 and an iterative process Example n experiment was done to determine the effect on dairy cattle or a diet supplement Two groups were studied standard diet and liquid whey diet Suppose that we test the hypothesis of difference in mean rou if we want a margin error of at most 05 for a 95 confidence interval Previous experimentation has shown 0 to be approximately 08 a005 SE i0c 00102 a 005 SE 2 i050 039 0391 2 0392 E n n1 n2 Za2201 022 SE 2 2 2 n1 nzzwzl967g20 30 If we do this experiment we might want to use a t distribution with 394 n1 n2 2 38 hence 150025 38 2024 and 202420802 0802 ouz n 2 n1 2 n2 2 2097 E 21 40 2021 How about the power of the test A 191 132 N Np1p12031132 2 ZaZ p1q1p2q2 SE2 p1 and p2 will be needed Use conservative values of p1 p2 050 quot1772 For small sample sizes we do not use t but we need the general condition 1111131215 and 111qu215 112132 215 and nz z Z 5 the squaring involved we use a different technique for inferences about variances Independent Sampling 2 Fctlmatnr S1 observed ratio of sample variances Distribution of a multiple of this estimator under the assumption hat we have normal data is 2 2 2 2 512012 312322er Withdflznl l anddf2nr1 5202 0102 2 S Snemal case If 0 12 0 22 then 12 F 1f1 n1 1 de 2 n2 1 S2 Independent Sampling quot quot the likelihood that both samples share the same population variance drops range 0 I 00 Cannot take on negative values Nonsymmetric skewed right 39Nlevequot by two degrees of freedom df1 and dfz InAnnnnAnnl 1a100 Confidence Interval for 03912 03922 Obtain ratio of sample variances Fa 512522 51522 for convenience the larger sample variance is put in the numerator For a given aobtain U Fazy r1171 n24 leaves 052 in the upper tail FL FH2 n14 n24 1 Fa2 r1271 11 leaves 052 in the lower tail Com pute 1OL CI lx0t939 I 007 i 2 2 DOS 7 S S 1 1 005 2FL 2FU M VWW S2 2 g UL 8 003 7 002 00241015 0017 J M unequal if interval does not mo 1 i II x contain 1 1039 20 an 40 so F I39w Flwz ompany officials are concerned about the length of time a particular drug retains its potency A random sample sample 1 of 50 bottles of the product is drawn from the current production and analyzed for sample 2 of 50 bottles is obtained stored for 1 year and then analyzed The summary readings are n1 2 50 n2 2 50 271 1037 x2 983 01 51 2032 02 52 2074 Obtain a 95 confidence interval for the ratio of the population variances and use it to test its equality 22 m 0324 Critical values of the Fdistribution a 0025 Numerator Degrees of Freedom VD t2 5 20 24 30 no an 120 1 953 53 975 72 954 37 993 as 997 27 mm 40 I005 so was 79 1014 04 2 39 4 39 M 39 43 394 39 As 39 45 39 47 49 3 I4 42 74 14 25 13 95 4 e 64 9 75 9 66 3l 5 5 62 s 52 5 43 6 D7 6 516 5 37 527 490 7 4 7s A 67 4 57 A 20 a A an 4 2a A 1D 3 73 9 3 9s 3 B7 3 77 3 39 to 3 72 3 62 3 52 3 14 I1 3 53 3 43 3 33 2 94 12 3 37 3 2x1 313 2 79 3 325 3 I5 3 05 2 66 1A 3 15 3 us 2 95 2 55 5 3 05 2 95 2 as 2 16 15 293 2 as 2 73 2 33 t7 2 92 2 52 2 72 2 32 7a 257 2 77 2 67 2 26 t3 2 B2 2 72 2 62 2 20 20 277 2 ea 2 57 2 16 2 2 73 2 64 2 53 2 I1 22 2 7n 2 an 2 5n 2 DB 23 267 2 57 2 A7 2 DA 24 2 64 2 SA 2 AA 2 BI 25 261 2 5 2M 1 SB 26 2 as 2 45 235 1 95 27 2 57 2 47 2 35 v 53 23 255 245 2 34 1 91 23 2 53 2 43 2 32 3 99 3n 251 2 AI 23 1 a7 111250 112250 37121037 x2 2983 0391 z s1 032 0392 z s 024 2 2 032 0 S 12 2 21778 s2 024 2 2 S12 FL S IZFU 1778gtltO567 1778gtlt1762 1009 3133 2 S2 LI 2 2 2 2 2 2 2 2 Ha 61 32 7E 1 Ha 61 7E 62 Evy Jvr a m lt52 5391 12 15 9 a 15 39 2007 the ten fastgrowing economies had an average GDP growth rate of 869 with a standard deviation of 170 The ten slow growing economies had an average GDP growth rate of 229 with a 39 39 39 deviation of 058 Do slow and fastgrowing economies have similar variability a 5 r 10 229 358 3 a 1i SS 1quot 2 2 2 2 Hagar a 13u359 mm 3159 1v awn 4 03 Hence we raj Hg quot mple Linear Regressmn Not effected by linear transformations of y or x Does not distinguish between dependent and independent variable eg height and weight earson39s Correlation Coefficient 25151 8 1ltr 13 JSSXXSSW yx Carrelation Deterministic component one or more explanatory variables This is done by finding the line that comes closest to all the points in the scatter plot 0 Helps to solve scientific questions about relationships between variables Uses of regression 39 MODEL SPECIFICATION form ofthe mathematical relationship y 0 IXg gN0702 37 owea l 0 y is response or dependent variable 1 0 x is explanatory or independent variable 1 J oyintercept 0 oand 81 are the parameters to be estimated l l l K 4 0 e is random deviation e u or residual o 1 2 3 caused by 39 uncontmlle l laCtorSI 81 gt O 3 Positive Association measurement errors 81 lt O 3 Negative Assouation missing variables in the model rounding of numbers etc 3981 O 3 No ASSOC39at39On I Expressed statistically as Ey l x read as quotthe expectation ofy given A O For the simple linear model we have Ey l x n A x O y is assumed to follow a Normal distribution and if we have no information on X our best estimate ofy correspondsto the mean ofy or f 0 However ifwe have some additional information from a correlated Variable x then we can improve our estim and we use EyVF Mgix forwhichthe distributions are much Eyix o ix E of observed values to the straight line Model y o lx Estimates 3 0 lx Deviation y 3 y 0 3196 SSE Ziy 90 Mix 55 Zltx 2 SSW SSW l o 2 S Fitting the Model LS 55m 2x2 Zx2n SSW ny Znyn SSW ZZy2Zy2quot SSE SSW SS2 SSxx xy mmE Zyy2 Zoe m5 SSW Zoe b2 SSH y 1 20 in i1 SSE MSE n 2 2 quot aleKernel Characterization System The variables recorded were diameter length weight moisture content and hardness of each seed Weight Response y Predictor X I I I I I I 250 275 300 325 360 375 400 yi weight 80 6 1 lenglhl 81 n 190 k 2 2x 262608 2 y 5445 fo 2208230 ny 2163336 2x y 21827303 Weight Example SSXX ZXi2ZX2 2 SSW Zyi2zyi2n ssxy ZXiyiZXiZyin A SSW B1 SSXX 30gltzy Blzxgt Example yi2180181xi8i 5 X weight 80 6 1 lenglhl 81 4 xyx x Xx X 2 E as X x S x x k 2 g a x 255 X n 190 23335 2x 262608 Zy 5445 33 2 gtltxk XX fo 2208230 2 y2 163336 X 15 XX 8 Z XI yl 21827303 25 2739s an 3239s 35 3739s Asa Lemh ssXX 2 2x3 42 x2 n 208230 626082 190 19268 ssyy ny Z y2n 163336 54452 190 7293763 ssxy Z x y Z x Zy n 1827303 62608 gtlt 5445190 330895 A SSW 330895 131 17173 ssxx 19268 30 1 2y Blzx 6445 17173 x 62603 27931 studrespredVref0 run proc univariate dataresdata noprint var studres rob lot studres normal muest Si maest histogram stures normal HHr14HMHHHH4IHHHH Fredlcted Value c6 welgm run m yi2 0 1xi8i i1n ssumption 1 E 81 0 The expected mean of the residuals 8 is assumed to be zero Assumption 2 Vargl 02 The variance of an residual is e ual to a constant value common to all residuals homoscedasticityhomogeneity of variances Assumption 3 C0v8i78i 0 The residuals are independent Assumption 4 X are nonstochastic The explanatory variable X is measured without error C0vyi8i 0 Assumption 6 8 N O 62 The residuals follow a Normal distribution with mean 0 and variance 02 A critical step in the evaluation of the model is to test whether 81 O H0 81 0 Ha 81 0 y gt X X X Positive Association No Association Negative Association 1gtO 510 1lt0 A l N N 19031 031 S81 SVSxx where s SSE MSE quot nd a 1oc100 confidence Interval on l is A S litaZSZ litaZ 1 SS xx tN2 is based on n 2 degrees of freedom Hoz if0 H igt0 or t RR 0 z quot 510 RR It IE imwg p valu Pt 2 0 I p value 2PI 2 it I i 1 wk y K n Emma H 539 KW HA i a 739 saxmmnmmgmg a A x g gg 5 1 aw mxv1vh1m ems namlsmq 111 3 t 1 I swam 3195 l Model k 1 SSR MSR SSRk 1 F 0 MSRMSE Error n k SSE MSE SSEn k 0 Hence we can test whether the variability explained by the regression and n k df for the numerator and denominator respectively MSR O 31 0 TS F0 MSE Ha 61 i 0 F0 2 Fa1n k p value PFLnk 2 F Note for SLR k 2 W m m W m m m 235 mam Allow us to assess the quality or goodness of fit of a model or to compare it against competing alternatives Coefficient of determination Measures the proportion of variability that is 2 SSR SSE 1 r SS Total SS Total accounted for by the fitted statistical model MSE SSTotal n 1 2 1quot adj 1 guards against over fitting Mean square error MSE SSEn k s2 Example ource df sum or squares mean square F Regression 1 568270 568270 6631 Residual 188 161106 857 Total 189 729376 2 SSR SSE Iquot Z 1 SS Total SS Total M SE SSTOZal n l rzadj 1 MSE r sign 31gtltr 2 Example ource df sum or squares mean square F Regression 1 568270 568270 6631 Residual 188 161106 857 Total 189 729376 2 SSR SSE 568270 r 1 207792779 SSTOIaZ SST0lal 7293 76 MSE 1 857 0778778 SST0Ial n l 729376189 rzadj 1 MSE857 r Sign lxr 2 0883 883 0 loc100 confidence Intervals get increasingly wider as we move away from the center 0 Narrower in the center as there is sufficient information from neighbouring points 0 Wider at the ends as information becomes more sparse 0 Extrapolation must be done with care The same relationship may not hold elsewhere 39L quot level Xp Is EU I xp 2 80 181xp yxp Eyxp8 0 lxng I 95 limitsjmea Estimate the mean of y for a specific value of x E y X over many experiments with this xvaue Estimate an individual value of yfor a given x value for a single experiment with this value of X Note Prediction intervals for individual new values of y are wider than confidence intervals on the mean of y because of the extra source of error Eyxp o lxp Fstimated Mean response and standard error replacing unknown o and 1 with estimates A A A 1 x 2 y 0 1xp SE zs 2 ES xx Confidence Interval for Mean Response 1 xp Ei fita nSEA E fita ns 2 2 y 2 2 n SS yxp Eylxpe o lxpe oonse and standard error re lacin unknown o and l with estimates A A A 1 x 2 y 0 1xp SEP zs 1 gs XX Prediction Interval for Future Response At SEP At s 11M y a2n 2 y y a2n 2 n Example yi2180181xi8i 5 X weight 80 6 1 lenglhl 81 4 xyx x Xx X 2 E as X x S x x k 2 g a x 255 X n 190 23335 2x 262608 Zy 5445 33 2 gtltxk XX fo 2208230 2 y2 163336 X 15 XX 8 Z XI yl 21827303 25 2739s an 3239s 35 3739s Asa Lemh ssXX 2 2x3 42 x2 n 208230 626082 190 19268 ssyy ny Z y2n 163336 54452 190 7293763 ssxy Z x y Z x Zy n 1827303 62608 gtlt 5445190 330895 A SSW 330895 131 17173 ssxx 19268 30 1 2y Blzx 6445 17173 x 62603 27931 j 279311713xi 39 A 95 cunfidence Interval for the mean response is 331 2793117173xi 2793117173gtlt 25 15M A 95 confidence Interval for the mean response is A x 3 3295 2 yitaZHS HM 1500i196gtlt2928gtlt Lw n 55m 190 19268 i A 95 prediction interval for the a future response is A x 3 32952 yitaZHS 11gS 21500i196x2928x1iw n xx 190 19268 2 1500 i585 915 2085 y o 1x8 8 NO02 Methods of detecting potential problems with a regression analysis and ANOVA Used to check that IndIVIduaI data values are not affecting the fit of the mooel Types of regression diagnostic tools Analysis of residuals Influence Statistics Procedures Graphical and statistical tests y g x a N Hwy Tvnm of regression residuals Simple residual discrepancy between the data and the fitted mmpanent eynlnnnfnrv variable are known as leverage points Outliers in the residuals an observation for which the residual is more extreme than would be expected from random variation alone Reasons for outliers Yaur39 Theory is wrong 7 Small sample size 3 ame Ii ible Might affect parameter estimates Might increasedecrease MSE llhlnklhmHuney hears the heal D a duuerent drum ylauailinolayaamnpamamm Ecamrpw quot21 r was Einal sources and correct erroneous values Repeat analysis with the offending points removed Assess effect on final conclusion parameter estimates etc Ifthe results are markedly different then one may need to report both analyses Warning Do not discard outliers without careful consideration as data are often expensive 0 obtain An outlier might be the most important point in the study new aquaRuiz unk a Diet 0 w W a m MW e w W a mwmmmnmammmagaum t llfa i m mmm summixed Residual 5 smdeuuzed stduzi 2 Pmulued Vane dem 5a Nnrmal Penanmes Complete Example 31M 13ww 42 ammm42mmnm 10g Count 10gC0unt 1 D s 10g 0 1 8 roc re dataS 52 model logCount Dist plot logCountDi5t output outre5data ppred r5tudre5 run nqnw 1 anaruumnm 5 I u 5 m lilm MRI Ml Him 3 3quot t 2 2 1 r o o u I u m 2m an m 5n 6n 7n an an Dwst 2m 15 1D u5 E E n M5 4U 45 an an 95 99 Complete Example 333 M Inseam aimam 5 mm x Wtw WWH 1 13A Mg r Mm I 3m i Complete Example Chapter 4 screte Random Variables ew a given interval of a number line Number of sales Pennle in line Mistakes per page Depth Volume Time Weight 39 Variables The probability distribution of a discrete random formula that specifies the probability associated with each possible outcome the random variable can assume px 2 O for all values ofx yW 1 Notation X Values Xcan take on x1 x2 xk Dmbabi39mes39 PX x1 p1PX xk pk blue marbles and 3 red marbles s Bl B2 Bl Rl Bl R2 Bl R3 B2 Rl B2 R2 B2 R3 R1 R2R1 R3 R2 R3 uerlne U Ie ranaom variable A as X number of red marbles drawn 1 3 2 0 010 pXx J 1 160 x 2 x 2 030 Events Number of Heads Probability TTTT 0 heads 7 1 6 IITTT THTT TTHT 1 head R TTTH HHTT The number of heads in 4 tosses is a binomial random variable arlables The mean or expected value of a discrete random variable is Z Xis 02 Ex m2 Z x m2 1996 J xEKX m2 iZ x mzpoc 2 039 9621906 2 010 1 3 2 X 060 p x 10 x 2 x 030 k NI O 039 E x u Zx u px 0 7122x010 17122x060 2 7122x030 a V0360 060 P u Ultxltu0 ncy Chebyshev s Empirical Rule Rule 2 00 2 068 2 075 2 095 2 089 2 0997 V X where a and b are constants EW EabX Z 1 wa a b X Linear functions of random variables W 61le61an where EXiui WW Ea1X1 aan al l 39 39an n VW Va bX ajbX bzo VW VaX bY 2 05m 612039 bzo 2 pr39XO39Y Where 0 is the correlation between and X and Y are independent p 0 last term drops out ab1p0 VXY0X20Y2 a1b1p0 VX Y0X20Y2 ab1p 0 VXY6X26Y22pGXGY a1b1p 0 VX Y6X2GY2 2PGXGY Notation XBinnp Random Variable n identical trials Trials are independent Two outcomes Success or Failure X n m r of Successes in n ri l PS p Notation X Binnp x O 1 2 n Random Variable n identical trials gt Flip a coin 3 times Trials are independent gt A head 0 ip doesn39t affect other flips Two outcomes Success or Failure 5 HeadsTails Xn m rofSuccessesinn ri PSp PFq1 p gt PS5PF155 Events Probability Combined TnI q4 q4 HT39I39I39 pq3 THTT pq3 4pq3 39I39I39HT pq3 39ITI39H pq3 HHTT p2q2 IJTIJT p2q2 H39I39I39H p2q2 6p2q2 TH HT p2q2 THTH p2q2 39I39I39HH p2q2 THHH p3q HTHH p3q 4p3q HHTH p3q HHHT p3q H H H H p4 p4 np 2 Variance 039 npq Standard Deviation 039 quotP51 The number of ways of getting the deslred resqu Events Probability Combined TnI q4 q4 HT39I39I39 pq3 THTT pq3 4pq3 39I39I39HT pq3 39ITI39H pq3 HHTT p2q2 IJTIJT p2q2 H39I39I39H p2q2 6p2q2 TH HT p2q2 THTH p2q2 39I39I39HH p2q2 THHH p3q HTHH p3q 4p3q HHTH p3q HHHT p3q H H H H p4 p4 x Bin4 05 n Px jpxq X 4 Px x05x054x P2 3052x052 P2 6052052 P2 616 0375 X Events Probability Combined 0 T39ITI39 q4 q4 HT39I39I39 pq3 1 THTT pq3 4pq3 39I39I39HT pq3 39ITI39H pq3 HHTT p2q2 IJTIJT p2q2 2 H39I39I39H p2q2 6p2q2 TH HT p2q2 THTH p2q2 39I39I39HH p2q2 THHH p3q 3 HTHH p3q 4p3q HHTH p3q HHHT p3q 4 H H H H p4 p4 x Bin405 unp4gtlt052 0392 npq 4gtlt05gtlt05 022120 httpwwwicuedumathisepQuincunxQuincunxhtm disease and will die within a year Define the random variable X if a buyer purchases 10 trees what is the distribution ofX Calculate u and o What is the probability that 2 trees will be infected What is the probability that at least 1 out of 10 trees is infected x number of trees infected b If a buyer purchases 10 trees what is the distribution of X c Calculate Hand 0 unp10gtlt033 a 10x03gtlt07 145 10 PX 2 2 J X 032 X 078 45 X 009gtlt 00576 02335 What is the probability that at least 1 out of the 10 trees is infected PX2 1 PX 1 PX 2 PX 10 PX 0 1 PX 0 1 00282 09718 quotumulaitive probability PX S x f n k 001 005 01 02 03 04 05 06 07 08 09 095 099 O 0904 0599 0349 0107 0028 006 0001 0000 0000 0000 0000 0000 0000 1 0996 0914 0736 037 0149 046 0011 0002 0000 0000 0000 0000 0000 V PX 2 PX 2 PX 1 0383 0149 0234 e What is the probability that at least 1 out of the 10 trees is infected np 2 Variance 039 npq Standard Deviation 039 quotP51 The number of ways of getting the deslred resqu Then Pxi 1 p and Pxi O 1 p Note that X x1 xn and that trials are independent x 5 p Iaudannn Vxi Exi2 52 p 102 p1 p Then VX 0X2 np1 p npq A probability distribution that is used in describing the number of events that will occur during a specific period or in a specific are or volume Number of traffic accidents per month at an intersection N m r of parts per million ppm of some toxin found in a sample of water Number of diseased trees per acre of a certain woodland xer A mean number ofoccurrences in the quoti W unit of time area volume etc 1300 i 2 2 7i x u I 6 cologists often use the number of reported sightings of a rare species of animal to estimate the remaining population size For example suppose the number x of reported sightings per week of blue whales is recorded 2 05184 1x67 6726 Px 0 x39 T 261 6 26 0U4 Px1 01931 02510 26Ze Z P 2 x gt 2

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "When you're taking detailed notes and trying to help everyone else out in the class, it really helps you learn and understand the material...plus I made $280 on my first study guide!"

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.