Class Note for ECOL 485 at UA
Popular in Course
Popular in Department
This 15 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at University of Arizona taught by a professor in Fall. Since its upload, it has received 19 views.
Reviews for Class Note for ECOL 485 at UA
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 02/06/15
One of the most remarkable developments in recent years has been a tremendous increase in the avail ability of calculators computers terminals and other devices as aids to analyses of numerical data Still many persons avoid learning methods for handling and interpreting numbers because the terminology looks strange or they presume that they do not have an adequate mathematical background In most in stances these fears are unwarranted because one can learn many techniques for analyses quickly and easily The purpose of this chapter is to provide an over view of statistics the scientific analysis of numerical data Learningihow to collect organize analyze and interpret numerical data is vitally important if one is to be an effective scientist and researcher As biolo gists we are increasingly involved with the collection and analysis of large quantities of data on environ mental morphological and physiological variables Statistics provides techniques for objectively identify ing sources of variation for comparing data sets and for establishing confidence limits for various estimates Students are strongly urged to enroll in a statistics course or courses sometime during their tenure at col lege There are many statistics books on the market but the following deserve special comment Steel and Torrie 1980 while technically good is oriented to ward applied and experimental agriculture Three books that directly concern biologists include the in troductory account by Simpson 32 12 1960 and the more extensive treatments by Sokal and Rohlf 1969 and Zar 1974 Data and Sampling Units Like most scientific disciplines statistics does have special terminology to aid standardization in publi cations and clarity in communication between work ers in the field In many instances these words are familiar to you but may be defined more rigorously Statistical Analysis and Representation of Data Data singular datum are observations or measure ments taken on a sampling unit These data or ob servations are taken from objects or entities termed sampling units Sampling units could be individuals leaves soil samples or other objects A variable or character is the property being mea sured or observed on the sampling unit For example a mammalogist interested in the variable body weight in grams weighed ten rats the sampling units pro ducing the following set of data 105 110 90 85 1220 115 110 100 90 95 Many people use the term statistics to refer to the observations or data Statisticians restrict the usage of this term to apply only to the discipline or a com puted quantity such as the mean arithmetic average 29 A In the following list circle all words or statements representing variables and underline all those representing sampling units Rat i421 Weight Female 201 Row T14 Temperature Glucose Level Ear Length Sex Quadrat 1102 Kinds of Variables Sokol and Rohlf 19691142 stated that variables may be classified into those resulting from measurements continuous or discrete2 counts based on attributes and those based on relative rankings Discontinuous variables meristic or discrete have fixed values with no intermediate values possible eg 1 2 or 3 toes present but never 24 toes present Continuous vari ables can assume theoretically an infinite number of 39 values between any two fixed points eg between points 1 and 2 there exist 11 1112 10113 etc de pending on the accuracy of the measuring instrument and the patience of the data recorder An attribute is 275 a qualitative measure or property eg black brown or blue eyes male or female dead or alive When attributes are combined with frequencies into tables of numbers they are referred to as enumeration data Ranked variables can be placed on an ordinal scale but differences in ranks remain relative rather than absolute For example the widths of body stripes on a skunk might be ranked from 1 to 5 but a stripe in a class I may not be five times narrower than a class 5 stripe The letter grades that you receive in a course are ranked values derived from measure ment values eg percentages 298 Using the letters in parentheses classify the following data as continuous 0 dis crete D or rank R variables 901 C Code 3 density 8 m 2302 grams 14 stems 8 bites 1 Class male 8 Largest 11422 mm 242 Kites 007 grams incisors 33 Accuracy and Precision The researcher must strive for accuracy and precision in measurements Accuracy is the nearness of a mea sured value to its true value Precision is the nearness of values Of successive measurements of the same char acter of the same specimen Both accuracy and pre cision depend on the skill of the person making the measurement Accuracy however can also be lost by a measuring device that is incapable of measuring to the researcher s goal of accuracy eg vernier calipers accurate to 01 mm cannot produce results accurate to 001 mm But an improperly calibrated balance could produce precise successive measurements even though these measurements may be inaccurate eg the balance may show that animals weigh 05 grams less than their true weights 290 To the right of each numerical observation listed record the range of accuracy implied by the value For example for the value 01 the range of accuracy implied is 0050 0149 5 1000 001 100 100 10 123 0001 105 206 276 Chapter 29 Populations Samples Sampling ln statistics a population consists of all values of a particular variable within a specified space or time For practical and logistical reasons it is usually im possible to record observations on all individuals in the population Instead the biologist works with ran domly selected individuals from the population A set of such individual observations is called a sample The biologist then makes inferences about the entire popu lation based on the available samples Thus sample statistics usually symbolized by Latin letters are es timators of population parameters usually symbolized by Greek letters A population or universe may refer to all possible individuals or objects units or may be restricted by a space or time limitation For example a population could consist of all human inhabitants of the world or be all males 15 years of age in New York City It would be extremer difficult to devise a sampling scheme to estimate the population parameters of the entire world population Most scientists and statisti cians make estimates of populations that have been more narrowly defined Many statistical procedures assume that samples will be obtained in a random fashion such that each member of a population has an equal and independent Chance of being selected Randomness in sampling might be achieved by assigning all individuals in a population a number and then drawing numbers rep resenting the individuals out of a hat Better still a Table of Random Digits Table 291 in this text Table 0 in Rohlf and Soltal 1969 Table D45 in 23139 1974 should be consulted for selecting samples Sup pose that we wish to sample plants in 15 of 64 pos sible quadrants in an 8 X 8 grid Since each quadrat can be represented by an X and Y coordinate then pairs of numbers can be selected in order from the table of random numbers Prior to consulting the table determine the direction that you will follow in the table eg across page top to bottom etc Then select a starting point in the table proceed in the predetermined direction and write down pairs of unique numbers between 1 and 8 Fifteen of these pairs will then indicate the quadrants that should be sampled The number of sampling units or the sample size N that is used to estimate the population parameters is a matter of prime importance in any statistical analy sis In many instances the biologist has no control over the sample size available for study This fre quently occurs in taxonomic studies when the scien tist must work with existing collections of specimens see Simpson et al 1980102107 ideally the investi gator should work with a sample size sufficient for Table 29 1 Two thousand and ve hundred random digits Brewer and Zar 197424 72965 25182 78812 87264 21571 98532 38981 11305 96753 28316 24390 23995 41920 78281 92910 29265 60422 42748 39611 74011 49056 06572 32726 13800 09838 86499 19618 04145 44083 13883 08697 86447 37914 08771 65529 53783 40881 81424 47362 79898 98433 79849 26004 46218 49618 66259 65170 82679 37900 27111 92280 09959 39100 75327 57796 11191 76006 19964 89989 27206 09214 32726 60706 15410 17017 89779 65242 43783 11320 92403 17282 13935 45220 03061 95794 23583 23145 26409 35657 40605 34971 56887 39110 12569 84747 03060 01466 81842 92940 44180 36491 94549 14598 40835 10730 25266 81485 72969 90316 31679 85318 91375 81576 92529 67813 63198 33931 22932 67869 32507 19493 41075 55864 26154 92704 95437 57037 94238 52913 54878 52320 69948 41600 28494 39792 61444 32406 44737 49215 76333 19204 61107 60363 06379 61160 00563 66439 17993 89774 49706 48288 69691 80743 82386 47690 88651 14727 04512 47434 71539 98478 97794 84683 25409 88705 79306 22225 62300 65743 96140 94975 32118 70343 70445 25210 51929 95091 97764 20490 91689 73306 12322 61236 09432 06406 72616 91793 98157 93131 56473 70701 63246 95348 51277 19575 21869 92600 63784 05283 58783 53653 11789 25043 91946 44746 56018 22898 11079 60701 61375 05200 50193 47466 52589 52576 04193 00014 64508 65353 83430 71393 63946 61238 27828 63833 75534 25582 64110 84147 20402 91759 84900 55701 95359 81584 78692 50163 14158 41815 27866 56065 88350 96498 88233 98709 41559 95878 39351 49461 47012 77220 43233 45287 14266 09566 68181 63815 95969 89649 58691 26558 25930 04204 20914 51712 00859 37716 32996 55722 75357 54675 62464 06810 38282 04909 70858 76743 68935 59510 20287 85329 85760 08181 92550 49541 50822 72615 94981 34454 16074 23839 51579 17447 45879 23055 85468 43878 41580 21521 90892 82969 46831 35345 77484 36769 45119 17317 87389 51773 20215 04697 47938 86339 58768 12407 83906 67499 05699 91650 57822 51712 60599 53263 29051 02571 54623 95890 21057 45967 05402 88229 67583 26259 11251 20520 10283 61939 66518 46347 00939 45794 95387 18058 89353 37992 37401 76006 89006 72572 89032 70063 08737 46914 02759 39108 91387 68293 46263 26139 91170 63195 95633 31919 46171 00644 30625 90402 06536 77833 75247 00581 81670 07815 13433 58402 99661 09033 14994 93742 16617 62615 36717 49841 76533 91941 23499 18183 84956 02783 05149 62036 03708 36020 82759 82397 81331 93166 47888 37403 56904 58551 25992 71487 92164 47001 37257 06449 96780 39231 44290 41679 22271 83404 32657 81748 67175 08962 22619 05353 93486 11187 73097 56588 84405 62649 87146 08368 94235 71756 86101 25802 75897 88968 51923 43448 17883 20368 65372 27988 14090 07594 16441 57301 16691 62884 62733 20603 92753 90574 56473 20025 11296 10345 77842 62935 83610 91890 78124 72264 91396 24133 15628 48293 89750 04204 37774 88602 89725 05950 20481 73464 78553 29384 82969 86771 16775 62677 53722 09298 28192 28655 39169 83197 34450 91692 21908 51482 00578 86461 70080 36604 64848 50492 20680 63738 10999 76173 45323 22562 38246 83414 69195 48236 21600 10227 53138 49994 04120 17654 90173 Statistical Analysis and Representation of Data 277 the chosen statistical technique and aims of the in vestigator Sample sizes numbering fewer than 6 or 8 individuals are rarely useful for statistical purposes since the investigator can place little confidence in the statistics derived from such small numbers Most statistics textbooks give guidelines for determining adequate sample sizes and discussions of the value of larger sample sizes A procedure for estimating adequate sample sizes is described in Sokal and RohliC 1969246249 2913 Examine a table of random digits leg Table 291 Select numbers to take 20 ranw dom samples from each of the following situations I Location of vegetation plots alongside TOOmeter transect divided into 100 onemeter segments 2 Locations for taking ten soil samples in 100 x 100 meter grid divided into lot 10 x 10 meter quadrats What was similar about the methods used to select the locations What was clit terent Frequency Distributions Inspection of the distribution of a variable is an im portant step in the analysis of data since it frequently guides our decisions on what statistical analyses to utilize A frequency distribution is prepared by listing all observed values and then noting how many times each value is observed For example we autopsy a sample of 29 female rats and wish to examine the fre quencies of various numbers of embryos as an indi cation of potential litter size A frequency table of the distribution of these data appears below Table 29 2 If the shape of a frequency distribution needs to be examined a graphical technique is employed For discrete ranked or attribute data a bar grapli Fig Table 292 Frequency table of numbers of embryos in a sampte of the cricetine rodent Hoieoniios sciureos beref censis Twigs 196521 Number of Embryos Frequency 1 19 2 94 3 135 4 639 5 25 6 to r 3 8 1 28 Chapter 29 29 1 should be prepared keeping each vertical bar separate For continuous data a histogram Fig 292 is prepared with the vertical bars adjoining one an other Data of continuous variables may also be rep resented by a frequency polygon Fig 293 formed by a line that connects points A relative cumulative frequency polygon Fig 294 may be plotted when you wish to examine the contribution of particular values to overall totals Figure 291 Bar graph of the frequency data in Table 292 Mary Ann Cramer i quot l 120 aquot 5 E g l 1L 40 1 l m C 2 4 6 8 NUMBER OF EMBRYOS 9W Figure 292 Histogram of body weights Of the degu Octodon degus Mary Ann Cramet 17039 quot 85 y bi 8 E 0 E l 145 185 225 gm BODY WEEGHT GRAMSl Figure 29 43 Frequency polygon of body weights of the degu Octodon degus Mary Ann Cramer no 8amp i3 In 3 8 m LL 0 e I 7 3 158 g stw Y F N BODY WEIGHT GRAMS Figure 294 Cumulative frequency polygon of body weights of the donut Detector degus Mary Ann Cramer 100 o o 39 S I x l l I RELATWE CUMUIZATIVE FREQUENCY b o v 1 a a r I I I l 1 I I 1 185 80m wEaeHT GRAMS NW Central Tendency and Dispersion In analyses of data we generally wish to know something about central tendency the localization of values near a central point and dispersioxn the seat ter of these values from the central region The sample mean X the average of a set of numbers N is one of the best estimates of central tendency and the best and most consistent estimator of the population mean M X EEGN where 2 summation fonnula 294 i 1 Other measures of central tendency will be discussed in the section on Basic Statistics The range the difference between the highest and lowest values is a measure of dispersion or variability familiar to most persons For statistical purposes it is a crude measure since it frequently underestimates the range of the population The sum of all deviations from the mean is equal to zero 4 M X e 35 z 0 292 1 The population variance 0392 sigma squared is de fined as the mean sum of squares of the deviations from the population mean ampnnwm2 N 293 The population parametric standard deviation 0 the square root of the variance is a useful measure since it is expressed in the same scale as the popu lation values The sample variance 2 is the best estimate of the population variance and is distinctly superior to the mean deviation for hypothesis testing Zar 1974 Com putational methods for calculating 33 and its deriva tives will be described in the section on Basic Sta tistics Basic Statistics Descriptive statistics provide a numerical summeryr on the properties of an observed frequency distri bution Measurements made for computing these sta tistics are generally recorded in tabular form Table 298 Standard descriptive statistics include sample size N degrees of freedom generally N l range arithmetic mean 31quot or quot1quotquot variance 83 standard devi ation s standard error of the mean sf and the co efficient of variation CV The symbol 2 indicates that a set of observations must be summed By reference to Table 29 3 the mean for the character eondylohasal length can be com puted as follows szn 45448439 N W s z 33 z 480 The range of values for character 1 of taxon B is 439 to 518 Variance is a measure of dispersiOn of a set of data about the mean Statistical Analysis and Representation of Data 29 The variance is expressed in squared units The standard deviation is the square root of the variance and is expressed in the same units as the original obquot servations X From data in Table 293 the values for variance and standard deviation are then calculated using computational formulas are EXPN 32 N4 29 4 S2 45424395443928 2 1847854 384028 S 7 32 6648 2 2 S I or S 2 29 5 9 V6648 2578 The standard error of the mean 83 is the standard deviation of the means for a sample From the quan tities above 85 is computed as follows 82 6648 The coefficient of variation is the standard devia tion expressed as a percentage of the mean This per mits comparison of variation in data when the mean or standard deviation values are very different eg the measurements of a horse can be compared with those of a small shrew The CV is computed on the basis of the above data as follows CV 3 100 537 One disadvantage of the coefficient of variation as pointed out by Lewontin 1966 and Moriarty 1977 is the inability to perform exact statistical tests to com pare CV values Lewontin suggests transforming the measurements to logarithms to any base and then computing descriptive statistics on the characters eg A and B Then the ratio 92 log As2 log B can be compared with an F distribution to test the magnitude of the difference if any Lande 1977 points out other precautions that must be observed when using coef ficients of variation eg should not compare CV s based on discrete data 297 280 Chapter 29 Table 293 Cranial and bacular measurements and descriptive statistics based on them Measurements are of a small sample of a single species at a single locality Refer to text for explanation of symbols Measurements Identification Number of ndividua Condylo Zygo Bacular Secular Xi basal maria Length Width Length Breadth A01 454 253 63 16 A02 487 258 A08 518 273 83 17 A04 493 255 84 20 A05 475 254 63 15 A06 471 246 w A07 503 264 67 17 A08 439 288 74 17 Sample Statistics N 8 8 6 6 2 3840 2041 434 102 7 4800 2551 723 170 52 6648 1127 0911 0028 5 2578 1062 0954 0167 35 0912 0375 0390 0068 CV 537 416 1319 982 X 21 at 480 i 22 255 x 21 72 i 10 17 i 02 To compute the confidence limits on these values we utilize the tdistribution see section on TwoSam ple Comparisons Thus to compute the 95 confi dence limit for the measurement condylobasal length we must utilize the following values degrees of free dom df tstatistic from table tum standard error of the mean as and mean dsz lor39l 8520912 tmb 05 m I 2365 Y 480 The general formula for obtaining the confidence limits for the mean is the following X tadfssgpgftadaa 298 Thus 480 39 2365 0912 S y S 480 2365 0912 or in abbreviated form Itquot i to if as or 480 216 295 Verity this caicutetion and then substitute an ctr level of 01 to see how this changes the value Y W s or 480 x 319 29F Compute the mean standard deviation standard error of the mean coefficient of variation and confidence limits for the measurements of bacular width and band ular length in Table 29 4 Check your answers with the values in the table Com pare the values of coefficient of variation for all four measurements What do these values indicate Descriptive statistics such as mean range stan dard deviation and standard error can be presented in the form of Dice Leraas diagrams or Dicegrams Fig 295 These diagrams are helpful to see over all patterns of variation but should not be used for extensive testing of differences between means see section on Multiple Samples and Comparisons Probability Distributions Probability and Binomial Distribution Probability and probability distributions are impor tant concepts for understanding many statistical pr0 39 oedures Probability is the chance for the occurrence of a particular event given the total number of pos sible outcomes of all events For example when a die pl dice is thrown one of six possible numbers may appear Thus the probability go that the number 5 will appear is pig Z 07 0 16 The probability k of any of the other numbers oc curring ie 1323425 is 1 O167 0833 Probability values always range between 0 and l Figure 295 Variation in measurements of two external characters and one cranial character between populations of the rodent Phyllotis andiam Descriptive statistics are indicated by modi ed DioeLeraas diagrams Each bar shows the mean vertical line twice the standard error of the mean black rectangle and standard deviation black pies open rectangiee Sample sizes N are indicated in the parentheses above the diagrams Pearson l958 488 new Ann sow YML LENGTH or sxum HOG H0 20 no 320 l30 26 23 I l I i i I I l i i I i m as H W acumen lt35 3 e1 W muons CM anglingmammaiavzoue 033 a I w 12421 cAJANARGA 11 taei 6 as m m AN CASH 52 w z e us in l gt 2 W nunnuco 5 all to ed 122 M mm m l l l t i I l l l l A J 500 MO 320 no 520 30 26 23 Statistical Analysis and Representation of Data 281 295 Verity this caicutetion and then substitute an ctr level of 01 to see how this changes the value Y W s or 480 x 319 29F Compute the mean standard deviation standard error of the mean coefficient of variation and confidence limits for the measurements of bacular width and band ular length in Table 29 4 Check your answers with the values in the table Com pare the values of coefficient of variation for all four measurements What do these values indicate Descriptive statistics such as mean range stan dard deviation and standard error can be presented in the form of Dice Leraas diagrams or Dicegrams Fig 295 These diagrams are helpful to see over all patterns of variation but should not be used for extensive testing of differences between means see section on Multiple Samples and Comparisons Probability Distributions Probability and Binomial Distribution Probability and probability distributions are impor tant concepts for understanding many statistical pr0 39 oedures Probability is the chance for the occurrence of a particular event given the total number of pos sible outcomes of all events For example when a die pl dice is thrown one of six possible numbers may appear Thus the probability go that the number 5 will appear is pig Z 07 0 16 The probability k of any of the other numbers oc curring ie 1323425 is 1 O167 0833 Probability values always range between 0 and l Figure 295 Variation in measurements of two external characters and one cranial character between populations of the rodent Phyllotis andiam Descriptive statistics are indicated by modi ed DioeLeraas diagrams Each bar shows the mean vertical line twice the standard error of the mean black rectangle and standard deviation black pies open rectangiee Sample sizes N are indicated in the parentheses above the diagrams Pearson l958 488 new Ann sow YML LENGTH or sxum HOG H0 20 no 320 l30 26 23 I l I i i I I l i i I i m as H W acumen lt35 3 e1 W muons CM anglingmammaiavzoue 033 a I w 12421 cAJANARGA 11 taei 6 as m m AN CASH 52 w z e us in l gt 2 W nunnuco 5 all to ed 122 M mm m l l l t i I l l l l A J 500 MO 320 no 520 30 26 23 Statistical Analysis and Representation of Data 281 When a pair of dice is thrown the probability of obtaining a pair of 5 s is the product of the inde pendent probabilities pm pl 393 on 16 use or0028 Further details on methods for computing probabili ties can be feund in textbooks such as Snedeoor and Cochran 18673199202 and Sokal and Rohlt 3989 6971 The theoretical frequency distribution or prob ability distribution of events that can occur in two classes is known as the binomial distribution Sukal and Rohlf 1969181 Actual proportional data of a given sample size for two classes can then be com pared with the theoretical distribution 293 Compute probability values for the follow ing situations 1 Probability of obtaining two heads in two tosses of a coin 2 Probability of selecting on one occa sion 3 male from a cage containing five male and 10 female rats Normal Distribution Data that approximate a normal probability dens ity function or bellshaped normal distribution Fig 296 are necessary for conducting most parametric kinds of statistical tests Many kinds of biological data such as lengths weights heights and rates conform reasonably well to these distributions Brewer and Zar 1974 Data based on counts frequencies and pen oentages generally are not normally distributed and thus nonparametric methods of data analysis must be utilized unless these data can be transformed into approximate normal distributions by the use of loga rithms or square roots Sokal and Rohlf 196938088 29H Examine Figure 29 6 What would happen to the shape of the normal distribution if the mean a was as 10 and the standard deviation 0 05 if n was 10 and 015 A standardized normal distribution is one in which the y z 0 and Cr 2 1 Standard tables eg Table 139 Zargt 1974 Appendix I Simpson or oi 1960 en able one to determine proportions of normal distri Answers 1 025 2 033 282 Chapter 29 Figure 296 A normal distribution These data are a hypothetical population of tree heights X with a mean a of 1172 m and a standard deviation 6 of 116 m The mean i 1 standard deviation includes 683 of any normal curve the mean i 2 standard deviations encompasses 955 and n t 30 includes 997 Brewer and Zar 1974 12 FREQUENCY OF OCCURRENCE a i I K u I I i I I I t ameon f I I K l 4L I I ll 70 840 90 DO 0 20 l3 0 I40 350 ISO YREE 38687 so meter butions Thus for a normal population with a mean 11 and standard deviation 039 the expression 239 w Xizl 0 299 yields a Zvalue which indicates the number of stan dard deviations from the mean that an X value is located These Zvalues are termed normal deviates or standard scores and the calculation is referred to as normalizing or standardizing Iiivalues Since we rarely know the population mean it and the stan dard deviation or we must use the sample approxi motions Y and s respectively However for small samples these statistics are poor approximations of the population parameters Zar 19 4839 Testing Hypotheses In statistics an hypothesis is phrased very carefully and consists of We components the null hypothesis and the alternative hypothesis The null hypothesis abbreviated H0 is a statement that there is no dif ference eg between sample groups and is formu lated for the purpose of being rejected The alterna tive hypothesis abbreviated HA is the operational statement or hypothesis that the researcher is testing Thus to test the assertion that differences exist be tween the mean body weights of two groups of rats the hypothesis would be stated as follows Ho Group A 2 Group B or Ho A B HA A B Once the null hypothesis Ho has been formulated there must be an objective method for determining when to reject this hypothesis First let us examine the two types of errors that can be made when by potheses are tested A TypeI error symbolized by oz is the rejection of H0 when it is true or accepting a difference when there is none A TypeII error sym bolized by 3 is the acceptance of 8 when it is false or failing to find a difference when there is one Since our primary goal in hypothesis testing is to reject the null hypothesis when it is false we gen erally wish to keep the probability of a TypeI error minimized to a stated a level The larger the value of or the more probable that the null hypothesis will be rejected falsely ie committing a Type l error Remember that all probability values whether a or 8 or some other parameter range from t to 1 The levels of as and 8 are inversely related to each other and dependent on the sample size N Thus to decrease the possibility of both types of error N must be increased The pincer of a statistical test is the probability of rejecting Ho when it is false or the probability of finding a difference when there is one Stated in another form the power of a test is 1 probability of a Typoll error or 1 8 2910 Generally the power of a test increases with an in crease in the sample size N Siegel 1956 A test of significance evaluates the probability of rejecting the null hypothesis when it is true The probability of making a Typel error or expressed as a percentage is termed the significance level F or example if a is 01 then the significance level is 1 for a given test If we choose a 5 level of significance then we ex pect that only 5 of 100 samples examined will result in making a TypeI error Le rejection of a true null hypothesis In many scientific disciplines Atlevels of 05 01 and 001 are utilized for hypothesis testing However the choice of an appropriate trlevel is somewhat arbitrary and will depend on the nature of the investigation and the degree of predictability required Refer to Sokal and Rohlf 1989153166 for further discussion on the selection of appropriate or levels 29l lt the significance level was set at i or z 01 rather than 5 would you be more likely to make a Typel error at the 1 level if the sample size was smaller How can you decrease the probability of making Typel and Typell errors Once the hypothesis has been formulated an ap propriate statistical procedure and test must be util ized Data obtained from continuous variables are gen erally analyzed by parametric statistics since these variables most nearly follow a normal distribution see section on Probability Distributions Data from enu meration discontinuous and ranked types are gener ally analyzed by nonparametric statistics since no as sumptions about the shape of the distributions are required to utilize these procedures Large samples of enumeration and discontinuous data often have a nearly normal distribution and can be analyzed as if they were continuous Ranked data however can never be analyzed with parametric statistics since ranks are relative and cannot be multiplied or divided TwoSample Comparisons In biological problems we frequently wish to know whether or not the means of two sample groups are significantly different from one another cg H0 it 2 X3 cs Ha X L 35 If the sample values for theso two groups 1 follow a normal distribution or nearly so and 2 the sample variances are not sig nificantly different from one another then parametric statistical procedures can generally be used for test ing If not nonparametric statistics must be utilized One of the most common and useful tests for com paring sample means is Student s ttest or simply the Host The tdistribution is like a normal distribution when sample sizes approach infinity but the curve is more flattened for smaller sample sizes Prior to utiliz ing the t test or other parametric test for two samples the homogeneity of variances to equality of vari ances between the groups must be tested For this pur pose we use the following statistic F3 33mm quot 2941 Sgsnmller for dflnr39ger 32 dfsmuller 2 Tnen substituting the appropriate variances s from Table 295 into the formula we obtain the follow mg 58 Ft Tquot 200 if 9 8 numeraton denominator Since the Fiat 05 value for 9 numerator and 8 do nominator degrees of freedom is 339 we accept the null hypothesis that the variances are equivalent and thus can proceed with testing the equality of the group means To calculate a tstatistic rm we utilize the same procadures that were employed for obtaining basic Statistical Analysis and Representation of Data 283 statistics with these exceptions A sum of squares SS is calculated for each group utilizing the basic formula for estimating variance 82 with the exception 0f omitting the Step where the quantity is divided by Nl Thus utilizing the values from Table 295 the sums of squares for the two groups are calculated as follows SS1 2 2163 m mmN 2 2362207 4855210 51045 55B 2 286 211312118 2022787 42661719 2 16009 1 2912 To evaluate the means of the two groups it is neces sary to obtain estimate of the pooled variance 82900196 and from this the standard error of the pooled mean 86900196 as follows 82 SSA quotlquot SSE quot df11 dfe 12013 61016 16000 39 9 1 8 39 z 394 8x3 xB 2914 Then ngoled is substituted into the formula below to obtain 6m teal M 16 A x B 686 474 0912 2 132 Inspection of Table 294 reveals that the tabulated value of the t statistie for 1 df and o z 005 is 211 Thus the null hypothesis of no difference in mean skull lengths of the male and female samples is ac cepted at the 5 level because to lt tm 284 Chapter 29 Table 294 Critical values of student s 1 Brewer and Ear 197410 DF 0 010 at 005 139 002 a 001 l 631 1271 3182 6366 2 292 431 696 992 3 235 318 454 584 4 213 278 375 460 5 201 257 336 403 6 194 245 314 371 7 189 236 300 350 8 186 231 290 336 9 183 226 282 325 10 181 223 276 31 H 1 180 220 272 311 12 178 218 268 306 13 177 216 265 301 14 176 214 262 300 15 175 213 260 295 16 175 212 258 292 17 174 211 257 290 18 173 210 255 288 19 173 209 234 286 20 172 209 253 285 22 172 207 251 282 24 171 206 249 280 26 171 206 248 278 28 170 205 247 276 30 170 204 246 275 35 169 203 244 272 40 168 202 242 270 45 168 201 241 269 50 168 201 240 263 60 167 200 239 266 70 167 199 quot 238 265 80 166 199 237 264 90 166 199 237 263 100 166 198 236 263 120 166 198 236 262 150 166 198 235 261 200 165 197 235 261 300 165 197 234 259 500 165 196 233 259 oo 165 196 233 258 The above values were computed as described by Zar 194 414 More extensive tables of Student s z are found in Rehlf and Sokal 1969160461 and Zar 1974413414 Table 29 5 Skull measurements condylobasal lengths of two samples males and females of a single species from the same locality Males Females Specs men 68 in Specimen 08 in No mm No mm 068 514 059 482 064 516 05 489 087 514 073 490 056 498 009 473 048 467quot 062 462 071 464 061 452 053 491 064 477 065 42 057 485 072 456 052 457 054 463 N 10 8 of Q 8 Y 488 474 88 51845 16009 3 3 56 200 s 288 14 33 0753 042 E i 2105 85 488 x no 4314 1 109 we 0517d 2110 t a 7m 39 Yf cal s x x Since ltail gt foul 486 474 132 the null hypothesis 09 H0 A28 is accepted A singleclassification analysis of variance ANO VA can also be used to compare group means Steel and Torrie 1960 Sokal and Bohlf 1969218219 For nonparametric data the Mann Whitney U test Siegel 195611612 Sokal and Rohlf 1969392394 and the Kolmogorov Smirov twosample test Siegel 1956127436 are appropriate The latter test should be applied only to nonparametric data of 3 continuous variable eg continuous data not meeting the as sumptions of anormal distribution For nonparametric discrete data the chisquare test for independent samples is frequently very useful Suppose fer example that we wish to compare the sex ratios in several litters pooled of a species of rodent The null hypothesis to be tested states that half of the sample will be male and the other half female H0 P 2 05 Upon examination of the lit ters we discover that 20 are male and 1 are female The chi square XE statisticquot is computed according to the following formula 139 k 4 2 s s 0 E i1 Fl E 2946 where Ogj 2 observed number of cases in the ith row horizontal of 3th column vein tics1 and Eij number of cases expected under H0 in ith row and 3th column 139 1 and E 2 indicates to sum over all rows r i1 i1 and all columns k ie Over all cells Since there are only two cells there is only one k cate gory and the expected value is determined by multi plying N times the predicted probability of occur rence Le 05 Thus the expected value for each cell is 185 05 X 37 Substituting the observed and expected values into the equation the X3 statistic is computed as follows 2 so 185 i 2853 5 185 Ae 1216 1216 2 0243 In order to compare the calculated value of X2 with the tabulated value we must determine the number of degrees of freedom and then utilize Table 296 The general formula for determining the degrees of free dom is r l k l Since k l in the present ex ample the appropriate degrees of freedom is rl or 2 1 1 Thus the tabulated value of chi square for an CilE VBl of 0513 384 Since X911 lt Xtih we aooept the null hypothesis that the sex ratio does not differ significantly from a 11 ratio There are special formulas for calculating X3 values when data are arranged in 2x 2 contingency tables and when the expected frequencies must be calcu39 lated from the marginal totals Refer to Siegel 1956 424 104111 and ITS179 or Sokal and Rolilf 1969 549620 for additional information Multiple Samples and Comparisons Several statistics can be used to test for the signifi cance of differences between the means of samples An analysis of variance ANOVA not only gives an indication of differences between means but provides See Sokol and Rohlf 1989553 for the rationale behind using the symbol X2 rather than X9 for this quantity Statistical Analysis and Repmsentation of Data 285 Table 29 6 Critical values of Chisquare Brewer and Zen 19141546 DF 2 010 a 005 r2 m 0025 a z 001 1 2706 3841 5024 6635 2 4605 5991 7378 9210 3 6251 7815 9348 11345 4 7779 9488 11143 13277 5 9236 11070 12833 15086 6 10645 12592 14449 16812 3 12017 14067 16013 18475 8 13362 15507 17535 20090 9 14684 16919 19023 21666 10 15987 18301r 20483 23209 1 1 17275 19675 21920 24725 12 18549 21026 23337 26217 13 19812 22362 24736 27688 14 21064 23685 26119 29141 15 22307 24996 27488 30578 16 23542 26296 28845 32000 17 24769 27581139 30191 33409 18 25989 28869 31526 34805 19 27204 30144 32852 36191 20 28412 31410 34170 37566 21 29615 32671 35479 38932 22 30813 33924 36781 40289 23 32007 35172 38076 41638 24 33196 36415 39364 42980 25 34382 37652 40646 44314 26 35563 38885 41923 45642 27 36741 40113 43195 46963 28 37916 4133 44461 48278 29 33711 39087 4255 45722 30 40256 43773 46979 50892 31 41422 44985 48232 52191 32 42585 46194 49480 53486 33 43745 47400 50725 54776 34 44903 48602 51966 56061 35 46059 49802 53203 57302 36 47212 50998 54437 58619 37 48363 52192 55668 59893 38 49513 53384 56896 61162 39 50660 54572 58120 62428 40 51805 55758 59342 63691 The above values were computed as described by Zar 1974 411 More extensive tables of chi square are found in Rohlf and Sokal 1969164167 and Zar 1974409410 286 Chapter 29 a measure of variation within samnles Sokal and Rohlf 1969173388 Steel and Torrie 196009160 and Zar 1974 give extensive accounts of the use of ANOVA S A tmtest is inappropriate for making multiple paired comparisons of means Sokal 1965 When more than two samples are involved an ANOVA can be used to test for overall difference between the means although significant differences between pairs of means can not be established A posteriori tests such as the sum of squares simultaneous test procedures SS ST de scribed in Sokal and Rohlf 1969 permit determin ation of homogeneous subsets of means within the total collection of means eg between means from different geographic localities The StudentNomen Keuls SNK test Solcal and Rohlf 1969 and Dun can s multiple range test Steel and Torrie 1960 have also been used to test multiple means Some research ers believe that the last two tests are more useful since the experimental error rate is not altered In con trast the SSSTP procedure generates mere possible answers than can be realistically evaluated Covariate Analysis Correlation and regression are techniques of co variate analysis Correlation analysis is an investigation of the degree of association between pairs of variables cg forelimb length versus hind limb length Cerre lation analysis estimates the strength of the relation ship between variables but implies no cause and effect relationship between the two Regression analysis seeks to estimate the dependence of one variable Y on an other independent X variable Fig 207 Such a relationship expressed mathematically is generally written as a function termed the regression equation such as Y EX wherequot the magnitude of a given Y is dependent upon the value of a given X The slope of a regression line is termed the regression coeffi cient Regression analysis can be used in studies of dif ferential growth allometry of body parts or regions Differences between regression coefficients can be tested using a ttest or analysis of covariance Steel and Torrie 1980 Sokal and Rohlf 1969 A correlation coefficient r is a measure of inter relation betvveen two variables independent of the scale of measurement The most commonly used cor relation coefficient is Pearson s productmoment cor relation coefficient Procedures for the calmlation of this statistic may be found in Sokal and Rohlf 1069 508515 and Zar 1974236240 In addition many of the programmable calculators currently on the market have routines to calculate this statistic Figure 29 Relationship between mean body weights of three species of Necrosis from ten populations and mean annual temperatures The regression line its slope b and the correlation coef cient r are given N cinerea circles N afbignla squares N tepida triangles Brown and Lee 1969 300 200 BODY WEIGHT 6 100 v39 t l x 5 to 65 20 25 MEAN ANNUJSL TEMPERM URE quotC Multivariate Analysis Multivariate analysis simultaneously considers vari ation and covariation of two or more variables Com putations involving three or more variables are ex tremer complex and time consuming Thus the wide application of multivariate statistical techniques in biological studies awaited the development of elec tronic digital computers with their capability for rap idly processing numerous variables and data points General references on multivariate analysis include Cooley and Lohnes 191 Morrison 1963 Seal 1964 and Anderson 1958 A working knowledge of matrix algebra is helpful though not essential for using multivariate statistical procedures Searle 1966 is a useful reference on matrix manipulations Sneath and Sokal 1973 provide information on multivariate analyses in phonetic classification studies A useful key for determining what types of multivariate analyses to utilize may he found in Atchley and Bryant 1975 34 and Bryant and Atchley 197523 In multivariate statistical analyses the biologist is interested in one or all of the following 1 a mea sure of similarity between groups eg taxa 2 re duction in the number of variables and 3 discrimi nation between groups Similarity can be measured by correlation association or distance coefficients Phe nograms are frequently constructed using the U11 weighted Pair Group Method of Analysis UPCMA on correlation and average taxonomic distance mat rices Choate 19m Cenoways and lones 1971 and Johnson and Selander 1970 and Patton et al 19235 are examples of phonetic cluster schemes using the UPGMA technique Seal 1964 in contrast recom mended the use of a distance coefficient that con siders relative correlation such as generalized or Ma halonobis distance D2 Factor analysis is a general term for several multi variate techniques that convert a large number of original variables into a smaller set of new variables Two techniques that are used in systematics research include principal components analysis and multiple factor analysis frequently with rotation to simple structure The principal components analysis has a sound mathematical basis although interpretation is sometimes difficult Multiplefactor analysis is less ex act mathematically since there are no unique solutions for obtaining commonalities summarization of inter correlations among variables or for estimating the number of factors to extract from the many potential factors Despite these difficulties factor analysis is an important summarizetion technique Genoways and Choste lg l g utilized principal components analysis in a study oi geographic variation in Nebraska popu lations of 85mins in their study the first three prin cipal components accounted for approximately ninety two per cent of the total variance in nine cranial and three external measurements Fig 29 8 h39lultiple factor analysis with rotation to simple structure was used by Wallace and Bader 1967 in a study of twenty seven morphometric variables in a single sample of the house mouse Mus mnsculus To improve interpret ability and understanding of the forces affecting tooth size the twentyseven variables were reduced to five factors of which the first three were identified as widtha anterior length and posterior length factors Poole 1971 1974 utilizied factor analysis for model ing natural communities of plants and animals and for measuring the structural similarity of communities composed of the same species Discriminant functions were developed by Fisher as a means to distinguish members of closely related taxa The computations produce differential weights for the various characters Those characters with the highest weights loadings are the most useful dis criminators for separating two groups or turn A step wise discriminant analysis can be used if more than two reference groups or taxa must be separated Sum med values of the discriminant scores are frequently plotted on a frequency histogram Yvaxis individuals X axis discriminant scores to illustrate the separation between taxa Fig 118 Genoways and Cheate 1972 were interested in analyzing the specific relationships Statistical Analysis and Representation of Data 28 Figure 29 8 Threedimensional projection of 83 specimens of Blarina onto the rst three principal components the third component is indicated by height Solid circles B b brevicauda reference sample half solid circles B b carolnensis reference sample open circles test specimens of both taxa collected near zones of contact Genoway and Choate 1972 O 34 O 28 0 4 39 H 39l of two previously defined subspecies of shorttailed shrews Blarina occurring in a contact zone in Ne braska After collecting specimens of these shrews from the contact zone they wished to compare the morphology of these specimens with the morphology of reference specimens representing each of the two subspecies The technique of discriminant function analysis permitted the calculation of discriminant scores for each of the two reference samples Then when the discriminant scores for the specimens in the contact zone were compared with the scores of the reference samples the taxa were easily separated and potential hybrids or intergrades spotted Fig 118 Discriminant analyses were also utilized by Jolicoeur 1959 and Lawrence and Bossert 1967 in studies of canid populations and Robinson and Hoffmann 1975 in studies of geographical and interspecific cranial variation in bigeared ground squirrels Sper mophilus 288 Chapter 29 Supplementary Readings Bliss C I 1967 Statistics in biology Vol 1 McCraw Hill Book Company New York 558 pp Campbell R G 1974 Statistics for biologists 2nd ed Cambridge Univ Press London 385 pp Rohlf F J and R R Sokal 1969 Statistical tables W H Freeman and Co San Francisco 253 pp Siegel S 1956 Nonparametric statistics for the be havioral sciences McGrawHill Book Co New York 312 pp Simpson C 3 A Roe and R C Lewontin 1960 Quan titative zoology rev ed Harcourt Brace and Co New York 440 pp Sokal R R 1965 Statistical methods in systematics Biol Rea 40337391 7 Sokal R R and F J Rohlfquot1969 Biometry W H Free man and Co San Francisco 776 pp Steele R G D and J H Torrie 1980 Principles and procedures of statistics McCrawHill Book Co New York 481 pp Zar J H 1974 Biostatistical analysis PrenticeHall Englewood Cliffs New Jersey 620 pp