Biological SequnceGenome Anly
Biological SequnceGenome Anly BINF 730
Popular in Course
Popular in BioInformatics
This 20 page Class Notes was uploaded by Nathanael Schowalter on Monday September 28, 2015. The Class Notes belongs to BINF 730 at George Mason University taught by Staff in Fall. Since its upload, it has received 35 views. For similar materials see /class/215257/binf-730-george-mason-university in BioInformatics at George Mason University.
Reviews for Biological SequnceGenome Anly
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/28/15
Lecture 6 Phylogenetic Analysis Algorithms Types of Data Distances Nucleotide sites UPGMA Neighbor Joining urqluo w39 uualstq Maximum Minimum Parsimony EVOImlOH Maximum Likelihood 0 90mm gammaml 101121113 X11113 urgd From Page and Holmes ilfolecular Evolution A Phylogenetic Approach Phylogeny Distance Methods Parsimony oMaximum Likelihood Look at changes in each column of alignment Metric to estimate Population Drift Computationally more expensive Neighbor Joining Combines computational speed with uniqueness of result 0 Clustering method hence has no optimality criteria 0 Often used in conjunction with Minimum Evolution to estimate the minimum evolution tree Neighbor Joining and Minimum Evolution Compute the Neighbor Joining Tree and see if any local rearrangement produces a shorter tree Not guaranteed to give the minimum evolution tree Neighbor Joining Algorithm Related to cluster analysis but removes the assumption of ultrametric data Does not assume data comes close to fitting an additive tree need to use an appropriate model of evolution Keeps track of nodes on tree Considers onlv closest pairs and not all possible pairs in each step of star decomposition Nelghbor J01111ng tie Since B and D hm cumulated nutatio t a big ate rm A wint on en39on 1gt UPGM A method Luce 39 Nleghbor J01111ng The raw data of the tree are represented b tollowmg dlstance matll A B C DE 7 10 9 l 5 4 7 6 8 7 65 1898 We have in total 6 OTUs N6 Neighbor Joining Step 1 We calculate the net divergence r i for each OTU from all other OTUs HA 5476830 rrB 42 rrC 32 Neighbor Joining Step 2 Now we calculate a new distance matrix using for each pair of OTUs the formula MIj dij ri r0s N2 or in the case ofthe pair AB MABdAB rA rBs N2 13 A B C D E B 13 C ll5 ll5 D 10 10 105 E 10 10 lO5 13 F lO5 lO5 ll ll5 ll5 Now we start with a star tree A l F B Neighbor Joining Step 3 Now we choose as neighbors those two OTUs for which Mj is the smallest These are A and B and D and E Let s take A and B as neighbors and we form a new node called U Now we calculate the branch length from the internal node U to the external OTUs A and B SAU dAB 2 rA rB 2N 2 1 SBU dAB SAU 4 Neighbor Joining 0 Step 4 Now we define new distances from U to each other terminal node dCU dAC dBC dAB 2 3 dDU dAD dBD dAB 2 6 dEU dAE dBE dAB 2 5 dFU dAF dBF dAB 2 7 and we create a new matrix Neighbor Joining Quartet Puzzling Quartet pulmg is a le omputationally expensive method than maximum likelihood to determine the phylogentic tree Procedure 1 Compute the 1 maximum likelihood trees for all possible quartets 2 Quartet Puzzling step Combine the quartet trees into a n taxon tree that tries to conform to all the neighbor relations of all the quartet trees 3 Repeat steps 1 and 2 many times and use the majority consensus tree Quartet Puzzling Given the original tree topology for 5 taxa Two pos a Quartet Puzzling All 339 Possible quartets N1 abch N1 aech N1 ebc11 391 N1 aebdl b N1 aebc1 Quartet Puzzling Quartet puzzling step procedure 1 Take one of the quartet with the neighbor relation Nab39cd 2 Add a penalty of l to every edge such that the addition of the newt a e will yield the incorrect topology 3 Repeat for all the neighbor relations 4 The branch with the lowest weight is the branch where the taxa e show be added Quartet Puzzling Quartet with neighbor Addin0 taxa e between relatlon Nabcd b c and d e Yieldgt rung topolug Quartet Puzzling C N1 aeClb gt N1 216 b l Quartet Puzzling Chose one quartet tree Pick the taxa to add Use all neighbor relations other than the one deciding the quartet tree used to find weights 011 branches Add the taxa to the branch with the lowest penalty Minimum Evolution Given an unrooted metric tree for 11 sequences there are 2113 branches each with branch length ei The sum of these branch lengths is the length L of the tree The minimum evolution tree is the tree which minimizes L Minimum Evolution similar to parsimony But length comes from pairwise distances between the sequences not from t of nucleotide sites Use linear programming or least squares to nd optimal solution Minimum Evolution Phylogeny Character State Methods Parsim0ny oMaxmum Likelihood Look at changes in each column of alignment Metric to estimate Population Drift Computationally more expensive PHYLOGENY Character States Taxa 1 Taxa 2 Taxa 3 Taxa 4 Taxa 5 ATTGCCATT ATG GCATT ATCTATCTT ATCAAATCTT ACTG ACC Informative characters columns Look at all possible trees For each column calculate cost Minimum score best tree Maximum Parsimony Taxa 1 Taxa 2 Taxa 3 Taxa 4 Taxa 5 ATTGCCATT ATGGCATT ATCTATCTT ATCAAATCTT ACTGACC Informative characters h nimum number of changes Multiple substitutions homoplasy Maximum Parsimony Smallest number of evolutionary changes 0 First used 011 protein data Eck amp Dayhoff 1966 0 Applied to Nucleotide data Fitch 1977 0 Brute force search of tree space 00V 39 39 E 39lOV LLO lWVOlV JLO LVJ 39OiV LiV 39 OS39SLV llVOOE 39JJV g z0g00 9103s A uoulgsmd Iaplzmqo A39q 1910121qu 991i uouusmd 30 1803 00v s 10v 110 iWVOiV 110 m 01v 11v os mv iiVOOE 11v g00 9103s A uoulgsmd g uumloD Iaplzmqo A39q IOJOIZJIZIID 991i uouusmd 30 1803 Tree Space Search Tree Space 0 Exhaustive Search Brute Force 0 Branch and Bound Ef cient Heuristic Methods Hill Climbing 0 Genetic Algorithms GAML Maximum Likelihood Goal Construct a phylogenetic tree from DNA sequences whose likelihood is a maximum Felsenstein 1981 Procedure 7 Start with a given topology and use the maximum likelihood method to optimize branch lengths 7 Make local modi cations to the topology and reoptimize the branch lengths 7 New taxa are added one by one optimizing branch lengths and topologies each time 7 Assumes an evolutionary process that is a reversible Markov process 7 V61 y computationally expensive to use Likelihood of a Tree We want to find Ltree Prdatatree Given the data a1CT a2CG and a3AT Consider the tree T CG We can calculate the likelihood of this tree if we fill in the internal nodes Likelihood of a Tree Since this is a Markov proce e can consider each site separately from the other 1ch reduces the complexity of the calculation Example C ldZ 1 A T r I A T CT CG T G tree 1 tree3 Prdataltree l Prdataltree 2 Prdataltree 3 Likelihood of a site speci c tree We can calculate from the transition matrix and the distances on each branch the probability of each change The product of these multiplied by the probability of the original base gives the likelihood of a site specific tree Since there are two unknown nodes the double sum of all possible values for each ACTG gives the likelihood for the original tree Maximum Likelihood Statistical model for changes in nucleotides Likelihood that that change occurred 0 Much more computational intensive than parsimony Hypothesis Testing TransitionsTransversions HKY Kimura 2 parameter model Jukes Cantor 1 parameter Maximum Likelihood Taxa 1 ATTGCCATT Taxa 2 ATGGCATT Taxa 3 ATCTATCTT Taxa 4 ATCAAATCTT Taxa 5 ACTGACC Statistical model for changes in nucleotides TransitionsTransversions HKY Kimura 2 parameter model Jukes Cantor 1 parameter Likelihood that that change occurred 0 Much more computational intensive than parsimony Likelihood of a Tree Ltree Pr datatree lVIultiplv likelihood for each character position Recursive de nition of Likelihood Saves computational time Likelihood of a Tree Ltree Pr datatree lVIultiply likelihood for each character position Recursive de nition of Likelihood 20