StatForHort 571 Data transformations AneKeuler Fall 2008 This handout provides a list of commonly used data transformations They are used to solve two issues 1 the data or the residuals not showing a normal distribution andor 2 variances not being similar across samples Indeed the one sample t test assumes that the data comes from a normal distribution The two sample t test and especially ANOVA assume that the variances are equal across all groups being compared in addition to a normal distribution in each group We comment here on using transformations to obtain normally distributed values As a disclaimer I would like to say that results from transformed data analysis may not be easy to interpret Con dence intervals for the difference between group means in particular may not easily be transferred back on the original data scale 1 Logarithmic Transformed values are a logo This transformation can be useful when the data show a right skew The graph below shows a data set with a right skew and the distribution of the log values This example uses the log in base 10 but the exact same histogram shape and qq plot shape would be obtained with a log in base 2 as is used in micro array data or in natural base 5 original data logtransformed data squareroot transformed positive values base 10 here data 0 L0 3 g 8 so a a 2 lt1 o x n n m n In 3 O 339 3quot i39m a O 9 u 0 u N u S N 0 9 In 0 m o quot o 0 2 4390 390 8390 100 1395 39 0395 0395 1395 2390 0 i 1 5 ll 1390 original data log in base 10 of data square root of data Normal QQ Plot Normal QQ Plot Normal QQ Plot 0 D g o 0 N o 8 Do 0 lt9 0 All 8 x quot no u f 3 o 16 5 g quot E g 839 E m 3 939 Tu 5 5 8 E n T 8 Q Q ltr 0 go 1 3 63quot 3 O m N39 87 0 N Q 0 0 O T O O 393 2 391 0 l 3 3 2 391 0 l 3 393 2 391 0 l Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles StatForHort 571 Data transformations AneKeuler Fall 2008 to Since we can only take the logarithm of a positive value the transformation logz can only be applied when all data values are positive The presence of zeros prevents the direct use of this transformation A way around this limitation is to use a Iogm c where c is a number 0 1 for example that the user has to choose This number has to make sense biologically and has to be on the same unit scale as the original observations However the result of the analysis t test ANOVA is most often very sensitive to the choice of this number 0 so I would recommend staying away from using this logzi l c transformation data dredging is a nearby danger Square root Transformed values are a The square root transformation can also be useful when the data show a right skew and the presence of zeros in the data is not a problem This transformation is typically used for counts number of herb species number of larvae etc The previous graph showed the squareroot transformation applied to continuous data and the graph below shows the square root transformation applied to count data original data counts squareroot transformed 13 zeros here d ata 2 25 Frequency 15 Frequency 15 5 5 O O 2 4 00 10 20 30 original data square root of data Normal QQ Plot Normal QQ Plot I O 0 w o lt0 0 E O on Ln am 8 i m 3 4 8 V m 42 g m 90 9N 9 0 V g V o o 0 0m O 0 0am I 239151392 239151392 Theoretical Quantiles Theoretical Quantiles StatForHort 571 Data transformations AneKeuler Fall 2008 3 Arcsine This transformation is sometimes applied to proportion data when all observations are between 0 and 1 It might help in situations when the data are evenly distributed between 0 and 1 at histogram or when there are many observations close to 0 or close to 1 U shaped histogram This transformation 7expands7 the region of values close to 0 and to 1 The transformed value is the angle whose sine is the squareroot of m i i i i 360 i arcs1n m in radians or arcs1n m T in degrees 7139 The graph below shows an example where this transformation helps because the original data shows a rather even distribution between 0 and 1 original data arcsinetransformed data in 01 in 090 degrees 0 C l L0 0 3 O 3 O39 C 9 9 0 LL LL L0 L0 0 u u u u u u O u u u u u 00 02 04 06 08 10 0 20 40 60 80 original data angle Normal QQ Plot Normal QQ Plot 0 0 C oq 00 m C a 28 3 90 Eng V o C l C O D O O 2 1012 2 1012 Theoretical Quantiles Theoretical Quantiles


