Popular in Course
Popular in Biology
This 18 page Class Notes was uploaded by Cameron Koss I on Friday September 18, 2015. The Class Notes belongs to PCB 4044 at University of Florida taught by Staff in Fall. Since its upload, it has received 15 views. For similar materials see /class/206951/pcb-4044-university-of-florida in Biology at University of Florida.
Reviews for GENERAL ECOLOGY
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/18/15
Statistics Guide PCB 4044 Fall 1999 p 14 Stats Manual Part 2 Statistical Tests The following are a number of commonly used statistical tests you can use some of these in your lab writeups This information will be most useful as you begin work on your independent projects You are strongly encouraged to do statistical analyses in your independent projects and you should also present your data in figures andor tables tTest Let us assume that we have two samples of numbers that we know to come from normally distributed populations In many cases we are interested in knowing whether or not the means ofthe two distributions are different In other words do we have sufficient evidence to reject the null hypothesis that the two samples were drawn from a single distribution To answer this question we can compare the two samples using a t test There are several types of t tests each designed mathematically for a specific application Here we will use a version of this test that assumes the variances of the 2 groups being examined are approximately equal If this assumption is in gross error we recommend that you use the non parametric MannWhitney U test described below CALCULATIONS 1 Calculate the mean Ym and variance s2 of each sample Y1m ZY r 1 Y2m ZYzi r 2 s12 2W1i Y1m2n1 1 s22 Z Y2i Y2m2 n2 1 2 Calculate a pooled estimate of the standard deviation sp sp n11s12 n2 1s22n1 n2 2 Statistics Guide PCB 4044 Fall 1999 p 15 3 Calculate the tstatistic t mm Y2m Ep 1m1 1n2 395 4 Calculate the degrees of freedom df n1 n2 2 5 To test the null hypothesis that the population means are equal against the alternative that they are not equal compare the calculated t value with a t table in a statistical book of tables for the appropriate degrees of freedom and a significance level of P lt 005 Reject the null hypothesis if the absolute value ofthe tthat you calculated is greater than the value in the table Rationale We can always calculate an estimate of variance It does not matter if we are dealing with sets of observations or with sets of means of observations Since our null hypothesis assumes that the samples were drawn from a common distribution we can better estimate the variance of that distribution by pooling the data In calculating the denominator of the tstatistic we are pooling the variance ofthe means The t test itself can be phrased as follows What is the likelihood of drawing two means as different as these from the same population of means The greater the difference the less likely it is that they come from the same population EXAMPLE A scientist was interested in whether competition with neighboring plants affected the seed production of focal plants She selected 18 focal plants and randomly assigned 14 to one treatment and fourteen to another The treatments were 1 all neighboring plants within 1 m were weeded out 2 neighboring plants were left in place but the soil was disturbed to control for soil disturbance resulting from weeding After the plants had flowered and set seed she harvested the plants and counted the seeds that were produced by each plant She expected plants grown in isolation to produce more seeds than plants grown in close vicinity to others competition reduces plant size and small plants produce fewer seeds As a result the statistical null hypothesis Ho was that isolated plants do not produce more seeds than plants grown at ambient density This is a 1tailed test because we predict that the single plants will produce MORE seeds than those grown under ambient conditions 1 Calculate the means and variances for the two samples Statistics Guide PCB 4044 Fall 1999 p 16 Isolated Y1 Ambient Y2 Statistics Guide PCB 4044 Fall 1999 p 17 2 Calculate the Pooled Standard Deviation Sp 137270 136973 39526 267 3 Calculate the t value t 474 243 267038 23 4 Calculate the degrees of freedom df14142 26 5 Compare the calculated tvalue with that having 26 degrees of freedom in a stats table Remember that this is a onetailed hypothesis Since the calculated t23 is greater than the table value which is 171 we reject the null hypothesis and conclude that the removal of neighboring plants increases the seed production of focal plants Paired ttest The paired t test allows you to compare two related samples It tests the null hypothesis that the means ofthese samples are equal against the alternative hypothesis that the means are not equal More appropriately it tests the null hypothesis that the mean difference between paired samples is equal to 0 against the alternative hypothesis that the mean difference between pairs is not equal to 0 This test is commonly used in clinical trials of drugs where two drugs are given to the same patient and the response of that patient is examined It is paired in the sense that you are examining the responses within the patient and you are not concerned with the variance that occurs among patients These paired designs are often more powerful because they reduce the variation in individuals For example individuals vary in blood pressure so by looking at the blood pressure of a patient under two drug regimes and taking the difference we effectively remove the fact that this individual might have on average unusually high or low blood pressure Statistics Guide PCB 4044 Fall 1999 p 18 For each group that you examine two different values are calculated the value under condition 1 xi and the value under condition 2 yi In our above example it might be the blood pressure of a patient when given drug 1 and the blood pressure of the same patient when given drug 2 in this example the variance in blood pressure between individuals is not considered Procedure i For each individual determine the value under condition 1 xi and under condition 2 yi ii Calculate the differences di between condition 1 and condition 2 di Xi39yi iii Calculate dm which is the mean of the di39s and is computed as d 2 di n where n total number sampled iv Calculate the standard error ofthe mean difference denoted se or sdm se sdm 2 d dm2 n1n 395 v Calculate the t statistic t d se vi The degrees of freedom are equal to one less than your sample size u n1 Man nWhitney UTest lf have two distributions of numbers 2 groups and we want to know whether the average of the two group differ we can compare the median values using the MannWhitney U statistic This nonparametric test compares the medians of two distributions It is analogous to a t test that compares two means from a normal distribution Rationale What we do is arrange our observations in a linear sequence and use ranks to indicate the overlap between the distributions If there is little overlap then we should not reject the null in favor ofthe alternative hypothesis For small samples you can even calculate the Pvalue by hand using basic probability theory For example let39s say you have two groups and all four observations from one group are large in value but the four observations from the other group are all Statistics Guide PCB 4044 Fall 1999 p 19 low in value Hence the observations are ranked 11 112222 This is extreme but it makes the point What39s the probability of getting this ranking or something more extreme just by chance ie under the null Well this is fairly easy because there is nothing more extreme The odds of getting this particular arrangement under the null is 4x3x2x18x7x6x5 or 170 or 00143 Because it is so unlikely Plt05 we reject the null Note that because we have converted the observations to ranks the scale of measurement becomes irrelevant lfthe scale is in logarithms or if the largest value is 100 times larger than the next largest value those differences do not affect the ranks assigned to the observations The ranks deal with the sequence of observations not the magnitude of the difference this is also one of the limitations of the approach Calculations 1 Order the combined sets of data in sequence from smallest to largest Assign a rank to each observation If two or more observations have the same value then each receives the average of the ranks For example if two values are tied for ranks 5 and 6 then each observation is assigned the rank of 55 8 Sum the ranks for each sample R1 the sum of ranks for the first sample R2 the sum of ranks ofthe second sample The ranks are used in all subsequent calculations 8 Calculate U1 and U2 where U1 N1N2 N1N112 R1 U2 N1N2 N2N212 R2 As a check the following should be true U N1N2 U2 The test statistic U is the larger of the two values U or U2 To test the null hypothesis that the two medians are equal against the alternative that they are not equal compare U to the values in a table of critical values for the MannWhitney test in a statistics book lfthe calculated value is greater than or equal to the table value for the appropriate sample sizes the null hypothesis is rejected at P005 Read the legend of the statistical tables carefully the table maybe onetailed or twotailed In addition the tables may report the critical value as the smaller of the two U values in which case a significant difference results when the observed U is smaller than the critical value Statistics Guide PCB 4044 Fall 1999 p 20 Example Say we were interested in whether salinity affects the reproduction of Florida flagfish a fish that lives in water that varies in salinity We set up pairs of fish in aquaria that have two different salinities 0 ppt and 15 ppt and we monitor the number of eggs that are produced by the pair of fish over a two week period We have 10 pairs of fish in each treatment group We test the null hypothesis that the median production of eggs is the same for the two groups of fish 1 Rank all of the observations in sequence and sum the ranks R1 and R2 for each sample Egg Production and ranks in the two treatment groups R11117595127 R283 Note that the three 9839s were tied for ranks 12 13 and 14 Each was therefore assigned a rank of 13 2 Calculate U1 and U2 U1101010112 127 28 U2 1010 10112 83 72 3 Compare U 72 with the tabulated value for sample sizes of 10 and 10 Since the 72 is less than 77 77 is the critical value of U associated with oc005 and a twotailed test from the table we cannot reject the null hypothesis even though the estimated medians of these groups Statistics Guide PCB 4044 Fall 1999 p 21 differed by approximately 10 eggs this difference or an even greater one is likely to occur just due to sampling error over 10 of the time ChiSquare There are different types of chisquare tests and this is a useful type of statistic for many types of contrasts that involve analyses of frequencies We will illustrate one type of chisquare test the chi squared test of independence Suppose we have a set of observations that can be classified according to two types of different attributes For example suppose we have captured a large samples of frogs from a patch of forest The frogs can be divided into species and microhabitat eg swamp shrub epiphyte etc We may be interested in knowing whether there is any association between species and microhabitat We can evaluate the independence of various factors using a contingency table and the x2 chi square statistic We ask whether the species and microhabitat type have independent effects on frog abundance ie are frogs of each species found in each habitat in proportion to the overall abundance of each species across all habitats and the overall number of frogs in a habitat summed across all species Calculations 1 Arrange the data in tabular form eg where countij is the number of frogs of species i observed in microhabitat j First Factor e s ecies Level A Level B Level C Second Level 1 countA1 countB1 countC1 Factoreg Level 2 countA 2 countB2 countC2 microhabitat Level 3 countA3 countB3 countC3 2 Calculate the sums of each row and column of the table First Factor e s ecies Level A Level B Level C Second Level 1 Factoreg Level 2 microhabitat Level 3 Statistics Guide PCB 4044 Fall 1999 p 22 3 Calculate the expected frequencies The proportion of all observations that fall into level A for example is just SUMATOTAL and the proportion all observations that fall into level 1 for example isjust SUM1TOTAL If the two factors are independent then the probability of observing a specific combination of both factors is just equal to the probability of the first event times the probability of the second event This simple rule allows us to calculate expected probabilities if we use independence as our null hypothesis The expected frequency ofthe 9 outcomes in our table can be calculated from the column and row totals For example that39s predict under the assumption of independence the number of observations that should have occurred at Level A and Level 1 The probability of Level A is SUMATOTAL the probability of Level 1 is SUM1TOTAL Hence the proportion of observations that should have fallen in this category is the product The total number of observations by multiplying this product by the TOTAL Hence the Expected Number for cell ij is SUMixSUMjTOTAL 4 Compare observed and expected frequencies using the x2 statistic x2 2 observed expected2expected 5 Determine the degrees of freedom for the test df of rows1 x of columns 1 6 Compare the calculated x2 with the table value with the appropriate degrees of freedom Reject the null hypothesis if the calculated value exceeds the table value using on 005 Example A fisheries biologist samples adult fish from two lake populations 100 from Lake 1 and 150 from Lake 2 He records whether or not they are infested with a nematode parasite that encysts in their muscle He wants to know whether the presence of the parasite is independent ofthe lake from which they were taken 1 Arrange the data in a tabular form 2 Then calculate the sums for each row and column of the table Statistics Guide PCB 4044 Fall 1999 p 23 Table of observed values parasite 3 Compute the table of expected values For example the expected value for the number of fish with parasites in Lake 1100x6525026 ie the row total x the column totaltotal Table of expected values parasite 4 Compare the observed and expected frequencies using the x2 statistic x21526226 8574274 50 39239 100 1112111 105 5 Determine the degrees of freedom for the test 2 rows1 x 2 columns1 1 df 6 Compare the calculated 38 value 105 with the value for 1 degree of freedom from a stats table Since our calculated value is greater than 384 from the table we can reject our null hypothesis that the presence of parasites in fish is independent of the lake Correlation Parametric Pearson39s r Pearson39s correlation coefficient r is used when you want to see if two variables covary ie ifwhen one variable increases or decreases a second variable also changes in a predictable manner For example you could ask the question does the weight of an animal39s brain increase with its body weight Correlation allows us to evaluate the degree to which 2 variables vary together It does not allow us to evaluate causation When a positive relationship exists we will find a positive correlation Statistics Guide PCB 4044 Fall 1999 p 24 r gt 0 When one variable increases while the other decreases we will find a negative correlation r lt 0 When the 2 are independent there will be no correlation r 0 Both the sign or and the tendency for the two variables to change together can be summarized by the quotcorrelation coefficientquot see Figure 1 which varies from 1 to 1 This is an extremely useful statistic and one that is commonly used Calculations You have two variables Y1 and Y2 measured on a set of n objects eg brain size and body mass measured across n different individuals Then the correlation between Y1 and Y2 is 201 Y1mYz1 Y2m 201 Y1mYz1 Y2m Zymym r 11 2 11 11 71 1s 1s 2 n n n n Y Y 2011 1022061 Ym2 Zy ZyZi 11 11 11 11 where yi YiYm You can find r by a step of calculations that are derived algebraically from the above equations 1 Calculate the sums and the sums of squared observations for each variable ie 2Y1 2Y2 2m 2Y212 2 Calculate the sum of the products of Y1 and Y2 ZY1Y2 3 Calculate the Zy12 Zy22 and Zy1y2 Zy12 2112 2Y12N Zy22 2122 2Y22N 2 ZYin ZY1ZY2N 4 Calculate r r Zy1y2 2y12y2211 5 Statistics Guide PCB 4044 Fall 1999 p 25 5 To test the null hypothesis that the true correlation is not different from zero compare the calculated value with the tabulated rvalue with the appropriate sample size the number of pairs of observations Example In a field study of primate diets say we estimated the average amount of kg of food a juvenile monkey ate during a month and the weight that the animal gained over a month We want to know if there is a relationship between the amount that the juvenile eats and the amount of weight it gains 1 Calculate the sums of Y1 Y2 Y12 Y22 and the sum ofthe products Y1Y2 2Y1 3106 2Y2 52810 zv12 14068 ZY22 2407577 ZY1Y2 150910 N 12 2 Calculate the Zy12 Zy22 and Zy1y2 Zy12 14068 3106 310612 6029 2322 2407577 528105281012 83497 ZY1Y2 150910 31 065281012 14220 3 Calculate r Statistics Guide PCB 4044 Fall 1999 p 26 r 14220 602983497 r 063 4 Compare the calculated r 063 with the critical value for a sample of 12 in a book of statistical tables Since our calculated value of 063 is bigger than the table value of 058 we reject the null hypothesis and conclude that the more a juvenile primate eats the more weight it gains A correlation coefficient tells us something of value in addition to the probability level that it is associated with If you square the rvalue I2 063 X 063 040 that value is the amount of the variation in the dependent variable that can be attributed to the variation in the independent variable I2 is called the coefficient of determination Thus in our example 40 of the variation in the weight gain in juvenile primates can be attributed to how much food they eat Presumably the remaining 60 of the variation could be attributed to other variables such as how active the animal was the genetic tendency to gain weight etc Correlation NonParametric Spearman39s rho If your data do not meet the assumptions of a parametric correlation you can do a nonparametric equivalent very easily using basically the same technique Take your data and rank the data for each ofthe two groups Now repeat the procedures outlined for the Pearson correlation coefficient but use ranks instead ofthe actual values This will give you what is called a Spearman39s Rank Correlation Coefficient rs also sometimes called Spearman39s rho In general terms it is interpreted similarly as its parametric analogue except that you cannot state how much variation in the dependent variable is explained by the other variable Calculations You could use the equation for Pearson39s correlation coefficient but substituting ranks However it39s much easier to use another and equivalent expression for rs 6 d i1 n3 r 1 where n is the sample size number of paired observations and di is the difference in ranks for the i observation Example As an example let39s use a hypothetical dataset on activity level and frequency of songs in males of a species of territorial songbird Assume that we have hypothesized that more active males also call more often ie that there39s a positive correlation between activity and song frequency This data set consists of matched pairs of measurements of the two dependent variables 1 activity Statistics Guide PCB 4044 Fall 1999 p 27 flightsminute and 2 song frequency songsminute The data are based on medians from several onehour samples for each bird Activity frequency Song freq in ranks To test the null hypothesis that the two variables are independent of one another 1 Rank the measurements for each variable If two or more measurements within a variable are equal then assign to each of them the average ofthe ranks that would have been assigned had they not been equal 2 Determine the differences between the ranks for each pair of measurements d and calculate the square of that difference d2 3 Sum the squared differences 4 Calculate Spearman39s correlation coefficient rs In our example rS 1 66343 7 10107 089 5 Compare the calculated rS 089 with the tabular value in the back of a stats book for nnumber of paired measurements 7 Note that this is a onetailed test Since our calculated rS 089 is larger than the tabular value 0714 we reject the Ho of no correlation and conclude that there is a positive correlation between activity and song frequency ANALYSIS OF VARIANCE ANOVA The analysis of variance is a test for differences among population means There are many types of ANOVA which comprise a large class of related statistical tools A t test is a member ofthis class of statistics but unlike a t test ANOVA can compare two or more means When you have just two groups the t test and ANOVA are identical Suppose for example that you wanted to know if the body weights for different populations of fruit bats differed If you only had two populations you could use a t test The test uses the within group variances to estimate the likely distribution of differences between Statistics Guide PCB 4044 Fall 1999 p 28 two means drawn from a single distribution with that variance But with more than two groups we cannot simply compare the difference between the group means this is only possible with two groups how would you calculate the quotdifferencequot among the means of four groups Instead we require a different approach As its name suggests ANOVA uses the variance among group means to quantify the divergence among groups In other words given a particular withingroup variance ie spread of observations among replicates within a treatment group you could envision the analysis as 1 estimating the variation in means that would arise if you repeatedly sampled n observations from this distribution and 2 comparing your observed variance among the means to this expectation The non parametric analogue to ANOVA is the Kruskal Wallis Test for details see a statistics book In ANOVA the null hypothesis is always that all populations being compared have the same mean The alternative hypothesis is that at least one of the means is different To discover which means are different you use a statistical test called an a posteriori test like a Scheffe or Duncan39s test see a stats book CALCULATIONS The test statistic in ANOVA is an quotF ratioquot which is the ratio of two variances In the ANOVAs we will consider a single factor ANOVA or oneway ANOVA the F ratio is constructed as the ratio of a measure of the variance among groups times n divided by the variance within groups If there are no treatment effects that the expected variance among groups is just the withingroup variance divided by n ie it39s the square of the standard error of the mean In other words under the null hypothesis the numerator and denominator ofthis Fratio should be equal Hence the expected value of F under the null hypothesis is 1 A much greater value indicates departure from the null The primary challenge then lies in calculating the Fratio for the data In ANOVA jargon the numerator and denominator of F are not called variances but instead are called quotmean squaresquot MS which are derived from quotsum of squaresquot SS and the degrees of freedom df MS SS df Below we provide the calculations that must be done to perform a oneway ANOVA In reality many of these statistical calculations can be done easily using a statistical package on a computer However it is important that you understand what the statistic is actually doing x Calculate the sums of Y and the sums of squared Y39s of your groups I won39t be showing all of the subscripts here but technically each Y should be denoted Yij eg the f observation from group i where there are 0 groups and n samples within each group ie would be indexed from 1 to n and would be indexed from 1 to c 8 Calculate the total sum and the total sum of squared Y39s and the total sample size Total Sum Z Z Y the sum of all the observations combined Total sum of squared observations 2 Z Y2 the sum of all the squared observations 8 9 Q 5 Variation Statistics Guide PCB 4044 Fall 1999 p 29 Total sample size N Z n where n is the sample size within each group Number of treatments ie groups that are being compared c Calculate the correction term CT CT ZZY2 N Calculate the total sum of squares 880 SSioial Z 2 Y2 CT Calculate the treatment sum of squares 88 Treatment Z Z Y2 N CT Calculate the error sum of squares SSermr SSerror SSioial 39 ssm Fill in your ANOVA table squares SS MS SSIdf of two MS39s groups ie treatments groups a Withingroup component 9 Compare the calculated F value with the value in the statistical tables for c 1 numerator and N c denominator degrees of freedom If the calculated F is larger than the one in the table the null hypothesis equal means is rejected The ratio of SS SSW indicates the proportion of the total variability among observations that is explained by the treatments This value is sometimes called the coefficient of determination and is given the symbol I2 yup just like the square of the correlation coefficient which has a similar interpretation Statistics Guide PCB 4044 Fall 1999 p 30 EXAMPLE Say we wanted to compare the body weight of three groups of fruit bats that we fed different types of tropical fruits We have collected all of the data and made all ofthe measurements to save space we report the appropriate sums rather than all the data 1 Calculate the appropriate sums 2 Calculate the total sum and the total sum of the squared observations and the grand sample size zzvi 640 zzv 13396 EN 54 3 Calculate the Correction term CT 64064054 758519 4 Calculate the total sum of squares SSTma 13396 758519 581081 5 Calculate the treatment sum of squares among treatments 88quot 41341325 17417417 535312758519 883778758519 125259 6 Calculate the error sums of squares within groups SSerror 581081 125259 455822 Statistics Guide PCB 4044 Fall 1999 p 31 7 Complete the ANOVA Table Source of df Sum of Mean Square F the ratio Variation squares SS MS SSIdf of two MS39S Among 2 12526 6263 701 groups ie treatments Within groups 51 45582 894 Total 53 58108 8 Compare the value ofthe calculated F statistic with the critical value in a stats tables with 2 and 51 degrees of freedom In this case we can conclude that the populations are different the observed F of 701 is greater than the critical value of 32 the 32 is actually for df240 because the tables don39t give all possible df39s for large sample sizes 9 We can also calculate the coefficient of determination I2 88quot SSW 125259581081 022 Thus we can attribute 22 of the variation in body weights among bats to the diets that they were fed