Bioinformatics Methods BINF 630
Popular in Course
Popular in BioInformatics
This 12 page Class Notes was uploaded by Nathanael Schowalter on Monday September 28, 2015. The Class Notes belongs to BINF 630 at George Mason University taught by Iosif Vaisman in Fall. Since its upload, it has received 41 views. For similar materials see /class/215256/binf-630-george-mason-university in BioInformatics at George Mason University.
Reviews for Bioinformatics Methods
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/28/15
Lecture 7 Phylogenetic Analysis Additional Reference Molecular Evolution A Phylogenetic Approach Roderic D M Page and Edward C Holmes Uses of Phylogentic Analysis Evolutionary trees Multiple sequence alignment Evolutionary Problems 1 The fossil record suggests that modern man diverged from apes about 56 million years ago Modern Homo sapiens emerged between 10000060000 years ago ii DNA and sequence alignment by Paabo support this iii Work based on mitochondrial DNA by Wilson et al suggest the modern man emerged only 200000 years ago with the divergence into different races 50000 years ago 1 mitochondrial DNA circular 2 maternal inheritance 3 10X faster mutation rate than nuclear DNA Algorithms Types of Data Distances Nucleotide sites U39PGMA Neighbor Joining mmm iv uumsnig Maximum Minjmum Parsimony Evolution Maximum Likelihood pomew 3 Pl l39 ll 1101191113 A1119 unldo From Page and Holmes Molecular Evolution A Phylogenetic Approach Preliminaries Taxon taxa plural or operation taxon unit is a entity whose distance from other entities can be measures ie species amino acid sequence language etc Comparisons are made on measurements or assumptions concerning rates of evolutionary change This is complicated by back mutations parallel mutations and variations in mutation rate We will only consider substitutions Amino Acid Sequences i For example the amino acid substitution rate per site per year is 53 X 10399 for guinea pig but only 033 X 10399 for other organisms ii The evolutionary time is the average time to produce one substitution per 100 amino acids Amino Acid Sequences Example 7 There are 2 differences in a sequence of 100 amino acids when comparing calf and pea histone H4 Since plants and animals gagged 1 billion years ago Tu 05 billion years 1011 iii probability of substitution 7 several way to calculate it The best way is using the PAM matrices Q Nucleotide Sequences Different from amino acid sequences due to redundancy in the genetic code ie several codons can code for a particular amino acid Most substitutions in the 3rd position are synonomous UC is the RNA coding for serine 7 the corresponding DNA would be AG Since evolution should depend on function and this is conferred by the amino acid sequence it has been suggested that the molecular clock should be based on the substitution rate in the third position of the codon In fact in the fibrinopeptides this is as high as the amino acid substitution rate Nucleotide Sequences iii In the definition of PAM matrices one assumes a discrete Markov Chain with the PAM matrix being the transition matrix for the Markov Chain Markov Chains Assume that we have a process that has discrete observable states X1 X2 When we monitor this over time we get a sequence of the states occupied q1 q2 where qi any of X1 X2 This sequence is a Markov Chain Note that while there can be an infinite number of states the Markov chain has a countable number of elements Markov Chains Another property of a Markov process is that history does not matter This means that the state assumed at time tl depends on the state assumed on t not on any other previous state This is called the Markov property Let X XW n 2 be a discrete time random process with state space 8 whose elements are s1 s2 X is a Markov chain if for any n 0 the probability that Xm1 takes on any value sk S is conditional on the value of Xn but does not depend on the values of Xm XM The onetimestep transition probabilities pjkn PrXn sk l Xn1 sjjkl2 n 12 Since X0 is a random variable called the initial condition pj0 PrltX0 sj jl2 Markov Chains Transition matrix 7 put the pjk into a matrix P A sequence of amino acids can be thought of as a Markov chain Stationary Markov process 7 the probabilities pjkn do not depend on n that is they are constant Another way of saying this is an initial distribution 11 is said to be stationary if nPt11 Irreducible 7 every state can be reached from every other state Application of Markov processes to evolutionary models i The PAM matrix has its substitution probabilities determined from closely related amino acid sequences it assumes that the substitutions have occurred through one application of the transition matrix ie no multiple substitutions and a given site and assumes that evolutionary distance results from repeated application of the same PAM matrix ii A better evolutionary model is needed text p 140144 This requires the use of a continuous Markov process rather than a discrete Markov chain This still has the Markov property Application of Markov processes to evolutionary models A time homogenous Markov process for the stochastic function Xt consists of a set of states Q 12 n a set of initial state distributions 752051 TEn and transition probability functions P11t P1nt Pm bn1tpint Application of Markov processes to evolutionary models We can apply this to nucleotide sequences Let Ql234 correspond to ACGT P11t P14t Pt Pn4t P44t PAAt PCAt PG At PT At PA Ct PC Ct PG CI PT Ct PAlGt PClGt PGlGt PTlGt PAlTt PClTt PGlTt PTlTt J ukesCantor Model a A G a a a a C T a Transitions Transversions Rates of Nucleic Acid Change The Jukes Cantor model assumes that u1u2u3u4a yielding the rate matrix Then P1P2P3P4a Use in Maimum Likelihood Calculation HKY Model Purines A T Pyrimidines C Transitions gt Transversions a gt B De nitions taxa 7 entities whose distance from other entities can be measured A directed graph GV E consists of a set V of nodes or vertices and a set EV of directed edges Then ij E means that there is a directed edge from i to j A graph is undirected if the edge relation is symmetric that is ij E iffg39j E A directed graph is connected if there is a directed path between any two nodes De nitions A directed graph is acyclic if it does not contain a cycle ie ij jk and ki all belong to E A tree is a undirected connected acyclic graph A rooted tree has a starting node called a root The parent node is immediately before a node on the path from the root The child node is a node that is follows a node De nitions An ancestor is any node that came before a node on the path from a root A leaf or external node is a node that had no children Nonleaf nodes are called internal nodes The depth of a tree is one less than the maximal number of nodes on a path from the root to a leaf
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'