Phylogenetic Analysis of Molecular Data
Phylogenetic Analysis of Molecular Data BOTANY 563
Popular in Course
verified elite notetaker
Popular in Botany
This 16 page Class Notes was uploaded by Etha Kassulke on Thursday September 17, 2015. The Class Notes belongs to BOTANY 563 at University of Wisconsin - Madison taught by Staff in Fall. Since its upload, it has received 36 views. For similar materials see /class/205329/botany-563-university-of-wisconsin-madison in Botany at University of Wisconsin - Madison.
Reviews for Phylogenetic Analysis of Molecular Data
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/17/15
D A Baum Mar 2004 Int J quot t0 39 39 quot trees and treethinking Phylogenetics is the study of the relationships of organisms to each other A central concept in this eld is a phylogenetic tree a depiction of the inferred evolutionary relationships of species to each other Phylogenetic trees are very useful for organizing knowledge of biological diversity and for providing insights into events that occurred during evolution Thus being able to correctly interpret such trees is an essential tool for a modern biologist This document is intended to provide you with a brief overview of the important principles and a list of speci c skills you need to acquire to be able to extract information from phylogenies What a tree represents Start by imagining one generation of plants of a particular species for example shepherd s purse growing side by side in a meadow and producing offspring by exchanging pollen If we focus on ve individual plants in the parental generation G1 and the offspring generation G2 the pedigree could look like the following Here we have assumed that each individual has two different parents whereas in many plants selfpollination can occur Parents Generation 1 Offspring Generation 2 Now expand your image to encompass all the plants in this population and several generations It might look something like the following Note each individual has two parents but gives rise to a variable number of offspring in the next generation 39in r my sex kg w mgt my a 4 KVA x 4 W a y amp39 w 1 wig sci 1 soquot gt get 1 3cm 3 2 a w 1 NU Wig 7 r r o s7 a 3 V f lt fl lt n1 T K gt 39 I A f 1 l 1 x T if T 1 f i at 1 x l a l l v 7 l l l r r 1 y 1 s l l r 5 w aquot 9 49 if 5 ii xix w is 15 4w if 49quot 5 quot 4 3 14 by if 4 j w l quot Jr 7 4 1 g quot 7 g 39 lt quot quoti lt quot TY lt 7quot 1quot x as 4 k Y r g I y quot l l l I 39 f T T V x 7 quot lt quot1 lt fin a 3 393 k W i JVquot J IA 3 39 7 l 1 A 3 quot 3 39 1 39 r 1 l V A 1 S if ix i2 ix 5 iv iv il i w i2 i2 EV if i Imagine taking the proceeding and getting rid of the organisms and only keeping the descent relationships the glue that holds the population together This would look like the following Now expand your field of view to include many more individuals and generations For example the image to the right is derived from a similar diagram as the one above but now includes about 250 individuals and 80 generations As you can see if one were to try to represent a typical population of several thousand individuals that persists for hundreds or thousands of generations all one could see would be a fuzzy line 9 1L IJL 11 i i39 33 I All quot1quot J 139I Jquot J Iquot39hquot IL g J 1quot Fun T 441quot 39H39LIFquot l I Ifquot 3 lIF II 39I I I I39J39I39 I EI39 II I 1 39 1E HIP5quot IaII39L 39 2 1 TrI1 i 139quot I I I I 39HF39 39r iufI I 439 I I I 39 I 39 I IIquot Ii i I 1quot I 39 I3939 39 r l I39Ih39I 3939 EllFIC39 39quot II39 39 39II I I aquot 39ri 39 39 uu Mia394 IIIIquot Iii f I quot39rquotquotI LII l r in39r39r rquot gluing2 i F III 39I i l F39 39 T 1 39 IIIr3939 39v39i 39IquotII I39 5 I gsL 39 1 F II IIIH39 39IF39 I l I 1 II ail3 39 jarF a I I I quot I F Past 39J39 5III L Individual populations may be fairly isolated for some period of time However on an evolutionary timescale seeds and pollen occasionally move between the discrete populations that comprise a typical species This gene ow between populations has the effect of braiding the population lineages into V a single species lineage which might be thought Present of as resembling the graphic to the left d 39 quot39quotII r Fp39nri39 i I I39 a EagleJed qr c 11 T39T h39 39 FT 39 IIIIh If quotI I 39 i h 391quot quot391 aquot A IF I ll IL i ilk39139 I 39I39 quot39I Fl39l I I l l I I II lI I C 39 I IlI I IIIt fLI539I I I 39I39I h F Lil J Jul Ll I I h F I i Jll Jquot I 39F39ll39 h39 ELF12 39139 r I h J39 I I I r i39l 39F39quotquot39 qII gi I IT 39 39 I Id quot39 1 d 39I 39 39quot39 H I I I Equot39 3939 39 39 Fyu H39 if As a matter of convention when we start looking at longer time frames it is normal to invert the arrow of time placing the past at the bottom and present at the top This convention probably arose because in fossil beds older ancestral fossils tend to lie in lower strata than fossils of lesser age Thus the preceding figure should be redrawn Page 2 frnm dx erentlmeages in male successfu y and create mebxe n spnng Thus m a usefu h n me nghhs an example nfwhat we mght eee xfwe DeeeenAenunene fallnwed me me nfnnexmual wanes fwrlnng mnugh that A B c D n gave nsetn fa wanes me example mcludes three lmeages spa2125 maths23mg estabhshed but became exuncl befnre me aid mne ebeemuenpene Inns 5 a ennpxe phylngmeul tree glaring me exuncl wanes we can eummenzeme tree as fallnws Thexmual weuaunnevmt gavensetn mm lmeages nne nfwl39nchlata39 1 in gvensetn descendant senesAand wh senCadD p a areas39hemha39 aven n In thatAandB emr dnsely me ea Amend a e an andnnvexs and aremne epene usely remen in each cavemen w The xssue cf relaunnshxpswm be new sed funhe yn pram2 we are net able in watnh lmeages evexmng hstead cf A B C D slamngfrnmnneancestnrandnbsa39vmg wnluunnncwmngma fwxward nremn pn laganes are gmera y appmanhedm me reva se39dlr a Westm 39nmasamplenfl gweuesand wm ape nep ylngenenctree m ed 5 at nfspe e an enymuchxgmre ene tbsmeexuncl remepresmt uppnsewewa39estud ngspeuesABC a In t seall39herelevantmfnnnaunn em wnu d be mmanze m me sxmph ed diagram enewnw m 1 a1 Emmian mhladncss and phylngenies relatedness me mare distant pee n nexp make me lag2 clear unnk abnunhe relaunnsl39nps wnhm rennnee The MRCA39 nfynu Page 3 cousins are your great grandparents Your grandparents are situated only two generations before you whereas your greatgrandparents are situated three generations back This provides a solid foundation for the claim that you Q more closely related to your first than your second cousins Phylogenetic trees contain information about the relative recency of common ancestry and thus provide a succinct way of determining the degree of relationships among species For example in the tree above you should be able to see that the species A shares a more recent common ancestor with species B than with species C Hence the tree above shows us definitively that species A is more closely related to species B than to species C De nitions and conventions Most of the trees that you will encounter are rooted meaning that one branch usually unlabelled is taken to correspond to the common ancestor of all the species included in the tree Here is a simple rooted tree bacteria birds marsuoials Homo saniens TerminalsTm node root gt The labels at the top of the tree could be individual species or sets of species that comprise one branch of the tree of life General terms for the items represented by these labels are terminals or taxa in more mathematical circles they are called leaves The branching points corresponding to inferred speciation events are called nodes Internal branches or intemodes connect two nodes whereas eXtemal branches or tips connect a terminal and a node gt7 Unless indicated otherwise a phylogenetic tree drawing depicts branching relationships only The pattern of relationships is what matters and branchlengths are irrelevantthey are just drawn in such a way that the tree looks tidy Thus the following three trees contain the same information A B C DA BCD D A C B Similarly the same information is depicted if the tree is oriented differently or is drawn with rectangular branches Page 4 A A B C D Sometimes rectangular phylogenetic trees are drawn so that branch lengths do mean something These are often called phylograms They generally depict either the amount of evolution occurring in a particular gene sequence or the estimated duration of branches You will see relatively few phylograms in this class but they are abundant in the scienti c literature How to read a tree When looking at a tree the most important thing to look at is A B C D E the relative branching order In doing so one must take care not to be distracted by the shape of the tree and how close two tips are to each other Considering the tree to the right and looking at taxa A and B one might think that they are closely related because the tip labels are right next to each other In fact A and B are as distantly related as any pair of taxa on the tree Indeed B is more closely related to E than to A The problem with looking at the order of taxa along the tips is that two trees showing the same fundamental relationships can have the taxa in different orders If you recall the way that a phylogeny grows by ancestral lineages splitting in to descendant lineages it is arbitrary which descendant lineage one draws to the right or left Thus one can spin parts of the tree around any intemode without changing the implied relationships If you can change one tree into another tree by simply twisting or bending branches without ever having to cut and reattach branches then the two trees depict the same relationships indeed they are really just different views of the same tree For example these three trees are one and the same Trees and similarity In interpreting phylogenetic trees it is also important to remember that what is depicted is the inferred pattern of lineage branching While closely related organisms usually look quite similar sometimes they may not be because morphological evolution can occur at different rates on different branches of a phylogeny As a result two taxa can be quite similar but distantly related or conversely quite different but closely related This is well illustrated by the following Page 5 phylogeny which correctly depicts the currently accepted relationships among these familiar organisms evidence for this tree comes from both molecular and morphological data lizard crocodile bird According to this tree crocodiles are more closely related to birds than to lizards How can that be It is after all an indisputable fact that a crocodile has more morphological similarities to a lizard than to a bird However the similarities of crocodiles and lizards such as the sprawling gait elongated tail and scales are features that trace back to the MRCA of lizards crocodiles and birds While the lizard and crocodile lineage have both retained this ancestral body form birds underwent dramatic evolutionary divergence Birds evolved such divergent features as feathers ight a wishbone a keeled sternum wa 39 39 J J and a f ur 39 39 A heart while loosing a tail and teeth Nonetheless the presence of unique features of birds does not change the fact that a crocodile is more closely related to a bird than to a lizard Relatedness is about descent not similarity Clades and monophyly A clade is a piece of a phylogeny that includes an ancestral lineage and all the descendants of that ancestor This group of organisms has the property of monophyly from the Greek for single clan and thus may also be called a monophyletic group A clademonophyletic group is easy to identify visually it is simply a piece of a larger tree that can be cutoff with a single cut If one needs to cut the tree in two places to extract a set of taxa then that group is non monophyletic Nowadays systems of classification strive to only give formal names to monophyletic groups mammal Monophyletic group clade 39 Nonmonophyletic group Monophyly is important for two reasons First all members of a clade are more closely related to each other than to any organisms outside the clade Second if phylogenies are fully divergent lineages never converge then a classification composed of only monophyletic groups will be perfectly hierarchical A hierarchical classification is one in which groups can include one Page 6 another but never overlap in content Hierarchical classifications have a unique classificatory path to each terminal and such a structure provides the clarity needed for scienti c purposes Collapsing and pruning Phylogenetic trees only depict relationships among the terminals that are included Nonetheless the tree like form has the desirable property that pruning taxa off a tree does not change the implied relationships of the remaining taxa Thus given the tree on the left the two trees on the right would both correctly represent the phylogeny for the remaining species ABCDEFGABCGADFG Similarly if one collapses a monophyletic groups to a single taxon its relationship to the remainder of the tree is unchanged Polytomies and the representation of uncertainty All the trees shown so far are A B C D E F G H strictly dichotomously branching each node has only two descendent lineages However there is no reason to assume that speciation events always occur dichotomously one can easily imagine cases were an ancestral lineage splits simultaneously into three or more descendents The result would be a tree with one or more polytomies For example the two trees to the right both include two polytomies circled They are actually the same tree drawn in two different styles Cl quot1391 mUOUUIP While polytomies can represent actual polytomous branching it is more commonly indicative of phylogenetic uncertainty Thus the tree above can be read as stating that we know that EFGH and GH form clades but we do not know whether E is more closely related to F or G H H Page 7 although we might assume that the true tree is fully dichotomous As a general rule when you see trees with polytomies you should assume that this indicates ignorance as to the resolution of this part of the tree rather than a claim that a polytomous speciation occurred Inferring character evolution using phylogenetic trees One of the main reasons that phylogenetic trees are useful is that they provide a simple way to infer when particular characteristics of living organism evolved For example given the following tree when did seeds evolve No seeds No seeds No seeds seeds seeds seeds seeds Any number of scenarios are possible for example seeds could have evolved four times in the four species to the right or alternatively seeds could have been present in the common ancestor of all seven taxa but have been lost in the three leftmost terminals However the simplest or to use the technical term mostparsimonious explanation is that there was a single origin of seeds No seeds No seeds No seeds seeds seeds seeds seeds Seeds evolve The parsimony criterion states that the most plausible mapping of a character onto a tree is that which invokes the fewest changes This does not guarantee that this is what actually happened but simply means that in the absence of contrary evidence this is generally the best bet for the true evolutionary history As you will see later the principle of parsimony also provides a criterion that can be used to estimate the correct tree for a group of terminals Further reading Judd W S Campbell C S Kellogg E A and Stevens P F 1999 Plant Systematics A Phylogenetic Approach Sinuer Assoc Sunderland MA Chapter 2 provides a nice introduction to phylogenies and how they are inferred Maddison W P and Maddison D R 1992 MacClaale Analysis of Phylogeny anal Character Evolution Sinauer Assoc Sunderland MA Chapter 3 provides a nice summary of why phylogenies are useful Page 8 Maximum likelihood cont Stepwise View Propose a tree With branch lengths Consider the rst character 0 Sum the probability of the data arising via each possible history Sum over all other combinations 709 X 10393 0 AA 587232X10 6 0 CA 2 11204X10 12 0 AG 2 7064163X10 30 CG 2 178372X10 7 0 AC 2 443719X108 0 CC 148277X10 12 0 AT 2 443719X108 0 CT 11204X10 12 0 GA 11204X10 12 0 TA 211204X10 12 GG 236063X10 5 0 TG 178372X10 7 GC 2 11204X10 12 0 TC 11204X10 12 0 GT 2 11204X10 12 0 TT 2 148277X10 10 The maximum likelihood criterion 0 The optimal tree is that Which would be most likely to give rise to the observed data under a given model of evolution Worked example A Nonichange 002 A A 001 1 equot 003 001 Change 1A 7 Mequot 002 A G P v 4 L025 e39002 467023 e390033 7 motion 7 1MW Likelihood scores 0 Raw likelihood of the data at this site given this tree and branch lengths and model 025709 X 10 3 Log likelihood 6334787983 What does this number mean 0 6334787983 The loglikelihood of this character s tip values given 7 This tree topology 7 These branch lengths 7 The model of molecular evolution Multiplying across sites N LLlt139 La 3939LN139IL39 i1 N lnLln L1ln L2 1n MAD2 lnLq i1 To make it easier we can lump characters with the same pattern lnL lnL0w0 0000 1 Lltooo1gtNOOOD 1 L 0010W0010 1quotLltoroogtN010 lnL 0111N0111 lnLmuN0011 1nL0101N0101 1nL0110N0110 What branch lengths should we assume 0 Under the principle of maximum likelihood we use the set of branch lengths that maximize the likelihood Once we nd those branch lengths the likelihood score is taken as being the likelihood of the data given this tree topology Stepwise View Propose a tree with branch lengths Consider the rst character 0 Sum the probability of the data arising Via each possible history 0 Multiply across all sites Stepwise View Propose a tree with branch lengths Consider the rst character 0 Sum the probability of the data arising Via each possible history 0 Multiply across all sites 0 Find the branch lengths that yield the highest likelihood for the entire data set Stepwise View Propose a tree with branch lengths Consider the rst character 0 Sum the probability of the data arising Via each possible history Multiply across all sites Find the branch lengths that yield the highest likelihood for the entire data set Search among trees More complicated realistic models for DNA 0 Allow deviation from equiprobable base frequencies 7 HKY85 F81 GTR Allow two substitution types ti and tv 7 KZP HKY85 Allow for six substitution types 7 GTR Sitetosite rate heterogeneity 0 The easiest is to assume that all characters have the same intrinsic rate of evolution 7 but this is unrealistic 0 What can we do instead Long Branch Attraction 01 01 B True tree 0 2 04 A 01 C 02 D Parsimony estimate Relationship among models 3 whlnunnn types ummrmns 2 mnsimn hsses 7m 6 2 subslnuliuu We imam vs mswmm mass m Equal m rrqucnnus mgl mummy W m 4 w 2 1un am lux ucnzws Sunglnsumhmuml type 1 J Long Branch Attraction 01 01 B True tree 02 04 A 01 C 02 D Data patterns Frequency 0011 0 0682 0 101 0 0522 0 110 0 1242 Parsimony will be positively misleading Why does parsimony fail Parsimony is blind to branch lengths It cannot see cases in which the data are better explained by rapid evolution and unequal branch lengths ML can if you have the right model of evolution Parsimonylike likelihood model see Lewis 1998 for more 0 Estimate branch length independently for each character or force lengths to be equal Relationship between MP and ML 0 One argument MP is inherently nonparametric No direct comparison possible 0 Only sum over maximum likelihood MP is an ML model that makes particular ancesual States 5 E C A 6 cu A Lwd mb A rmb L assumptrons A I Q E quot wmb GC CH v Prob V Why use MP Why use ML 0 The model is less realistic but 0 The model is explicit 7 We can do more thorough searches and data We can statistically compare alternative explorauon computauonal ef CIency models of molecular evolution 7 RObust results W111 usually sun be supponed We can conduct parametric statistical tests 0 Even the most complex model is still unrealistically simple Relationship between MP and ML 0 One argument MP is inherently nonparametric No direct comparison possible 0 MP is an ML model that makes particular assumptions The study of character evolution 0 Many methods allow one to study the comparatlve methOdS Wrap39 up evolution of single traits or pairs of and key innovations potentially correlated traits given a phylogeny What do we do about phylogenetic uncertainty One approach A better approach 0 Repeat the analysis over a number of 0 Treat the phylogeny as a nuisance parameter plausible trees and see if the results are Eg estimate the posterior probability of TObUSt t0 Phylogenetic uncertainty two traits being correlated given the trait data some sequence data for the same species a model for the evolution both traits and sequences and priors on all the parameters Obtain a distribution of histories Stochastic mapping 1 f g z z j Huelsenbeck et al 2004 Syst Biol I 0 Use a posterior distribution of trees 0 Simulate evolution up each tree only keeping simulations that arrive at the observed data 0 Look at the simulations to see if they show evidence of directionality correlated evolution 0 Implemented in SimMap brahms ucsdedu simmaphtml Resources Stochastic Mapping or Morphological Characters erv in lttslsmutten Asnub mumquot up IoNATHtN P UoLLBAck http Wwwsimmapc0m Are opposable thumbs in humans adaptations for tool use opposable thumb improves performance ofthis biological role Making better tools improves tness opposable thumbs evolved after tool use The concept of key innovation 0 Originally referred to traits that permitted invasion of a new a aptive zonequot Simpson Nowadays associated with changes in diversi cation rate 0 Adaptations are to natural selection what key innovations are to lineage species selection Testing Causal Hypotheses 0 Taking account of phylogeny we can establish the extent of correlations between traits 0 What about causal statements 7 This trait evolved because it improved performance of biological role a hypothesis of adaptation 0 Can be assessed if one has a clear causal mo e Are exible shoulders in humans adaptations for throwing Flexible shoulder improves performance ofthis biological role Throwing projectiles better improves tness Flexible shoulder evolved before tool user t How do we show that a trait is a key innovation 0 Correlation the trait evolves on a branch to which we map ajump in the rate of diversi cation Causation the trait is repeatedly associated with accelerated diversi cation andor we have a model that predicts a causal effect Detecting changes in the rate of species accumulation Under a constant rate of speciation with no extinction which of the following trees is more likely They are equally likely Explanation Now what is the probability that the next speciation event will be on the A39 side 66 Becomes stronger when there is a repeated pattern meirh necmspuu Species Sisitrum s 39u 139 mumm m Smuwuum DehJimnmAmmlum v N u Urlphmmm Awnuum w N1 r LLA arm frmug39ugu uralrartne M kurmar Trcpan39llaume a Aknnmcew and huh ride n 39dl39 11m m Cn39ymslylrs 11 I u as MW 1 ran 7 r uh x mm 2L 2 1le l n39mgzmmrl 2w 3w 0 M m u d Summitm Explanation What is the probability that the next speciation event will be on the A39 side The counterintuitive result 0 All basal splits equally likely eg for 100 taxa a 199 basal split is as likely as a 5050 split 0 If we have a prior hypothesis need a 955 imbalance or more to statistically support the hypothesis Taking account of time helps 0 If the delay until the second speciation t is long then acceleration in the ingroup is implied Sanderson and Donoghue 1994 Science w v How can one study changes in diversi cation rate Application to Aquilegia Look for a repeated pattern that trait x correlates With clades that are bigger than their non X sister groups 0 Provides evidence that the character is a key innovation
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'