### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Biological SequnceGenome Anly BINF 730

Mason

GPA 3.64

### View Full Document

## 35

## 0

## Popular in Course

## Popular in BioInformatics

This 20 page Class Notes was uploaded by Nathanael Schowalter on Monday September 28, 2015. The Class Notes belongs to BINF 730 at George Mason University taught by Staff in Fall. Since its upload, it has received 35 views. For similar materials see /class/215257/binf-730-george-mason-university in BioInformatics at George Mason University.

## Popular in BioInformatics

## Reviews for Biological SequnceGenome Anly

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/28/15

Lecture 6 Phylogenetic Analysis Algorithms Types of Data Distances Nucleotide sites UPGMA Neighbor Joining urqluo w39 uualstq Maximum Minimum Parsimony EVOImlOH Maximum Likelihood 0 90mm gammaml 101121113 X11113 urgd From Page and Holmes ilfolecular Evolution A Phylogenetic Approach Phylogeny Distance Methods Parsimony oMaximum Likelihood Look at changes in each column of alignment Metric to estimate Population Drift Computationally more expensive Neighbor Joining Combines computational speed with uniqueness of result 0 Clustering method hence has no optimality criteria 0 Often used in conjunction with Minimum Evolution to estimate the minimum evolution tree Neighbor Joining and Minimum Evolution Compute the Neighbor Joining Tree and see if any local rearrangement produces a shorter tree Not guaranteed to give the minimum evolution tree Neighbor Joining Algorithm Related to cluster analysis but removes the assumption of ultrametric data Does not assume data comes close to fitting an additive tree need to use an appropriate model of evolution Keeps track of nodes on tree Considers onlv closest pairs and not all possible pairs in each step of star decomposition Nelghbor J01111ng tie Since B and D hm cumulated nutatio t a big ate rm A wint on en39on 1gt UPGM A method Luce 39 Nleghbor J01111ng The raw data of the tree are represented b tollowmg dlstance matll A B C DE 7 10 9 l 5 4 7 6 8 7 65 1898 We have in total 6 OTUs N6 Neighbor Joining Step 1 We calculate the net divergence r i for each OTU from all other OTUs HA 5476830 rrB 42 rrC 32 Neighbor Joining Step 2 Now we calculate a new distance matrix using for each pair of OTUs the formula MIj dij ri r0s N2 or in the case ofthe pair AB MABdAB rA rBs N2 13 A B C D E B 13 C ll5 ll5 D 10 10 105 E 10 10 lO5 13 F lO5 lO5 ll ll5 ll5 Now we start with a star tree A l F B Neighbor Joining Step 3 Now we choose as neighbors those two OTUs for which Mj is the smallest These are A and B and D and E Let s take A and B as neighbors and we form a new node called U Now we calculate the branch length from the internal node U to the external OTUs A and B SAU dAB 2 rA rB 2N 2 1 SBU dAB SAU 4 Neighbor Joining 0 Step 4 Now we define new distances from U to each other terminal node dCU dAC dBC dAB 2 3 dDU dAD dBD dAB 2 6 dEU dAE dBE dAB 2 5 dFU dAF dBF dAB 2 7 and we create a new matrix Neighbor Joining Quartet Puzzling Quartet pulmg is a le omputationally expensive method than maximum likelihood to determine the phylogentic tree Procedure 1 Compute the 1 maximum likelihood trees for all possible quartets 2 Quartet Puzzling step Combine the quartet trees into a n taxon tree that tries to conform to all the neighbor relations of all the quartet trees 3 Repeat steps 1 and 2 many times and use the majority consensus tree Quartet Puzzling Given the original tree topology for 5 taxa Two pos a Quartet Puzzling All 339 Possible quartets N1 abch N1 aech N1 ebc11 391 N1 aebdl b N1 aebc1 Quartet Puzzling Quartet puzzling step procedure 1 Take one of the quartet with the neighbor relation Nab39cd 2 Add a penalty of l to every edge such that the addition of the newt a e will yield the incorrect topology 3 Repeat for all the neighbor relations 4 The branch with the lowest weight is the branch where the taxa e show be added Quartet Puzzling Quartet with neighbor Addin0 taxa e between relatlon Nabcd b c and d e Yieldgt rung topolug Quartet Puzzling C N1 aeClb gt N1 216 b l Quartet Puzzling Chose one quartet tree Pick the taxa to add Use all neighbor relations other than the one deciding the quartet tree used to find weights 011 branches Add the taxa to the branch with the lowest penalty Minimum Evolution Given an unrooted metric tree for 11 sequences there are 2113 branches each with branch length ei The sum of these branch lengths is the length L of the tree The minimum evolution tree is the tree which minimizes L Minimum Evolution similar to parsimony But length comes from pairwise distances between the sequences not from t of nucleotide sites Use linear programming or least squares to nd optimal solution Minimum Evolution Phylogeny Character State Methods Parsim0ny oMaxmum Likelihood Look at changes in each column of alignment Metric to estimate Population Drift Computationally more expensive PHYLOGENY Character States Taxa 1 Taxa 2 Taxa 3 Taxa 4 Taxa 5 ATTGCCATT ATG GCATT ATCTATCTT ATCAAATCTT ACTG ACC Informative characters columns Look at all possible trees For each column calculate cost Minimum score best tree Maximum Parsimony Taxa 1 Taxa 2 Taxa 3 Taxa 4 Taxa 5 ATTGCCATT ATGGCATT ATCTATCTT ATCAAATCTT ACTGACC Informative characters h nimum number of changes Multiple substitutions homoplasy Maximum Parsimony Smallest number of evolutionary changes 0 First used 011 protein data Eck amp Dayhoff 1966 0 Applied to Nucleotide data Fitch 1977 0 Brute force search of tree space 00V 39 39 E 39lOV LLO lWVOlV JLO LVJ 39OiV LiV 39 OS39SLV llVOOE 39JJV g z0g00 9103s A uoulgsmd Iaplzmqo A39q 1910121qu 991i uouusmd 30 1803 00v s 10v 110 iWVOiV 110 m 01v 11v os mv iiVOOE 11v g00 9103s A uoulgsmd g uumloD Iaplzmqo A39q IOJOIZJIZIID 991i uouusmd 30 1803 Tree Space Search Tree Space 0 Exhaustive Search Brute Force 0 Branch and Bound Ef cient Heuristic Methods Hill Climbing 0 Genetic Algorithms GAML Maximum Likelihood Goal Construct a phylogenetic tree from DNA sequences whose likelihood is a maximum Felsenstein 1981 Procedure 7 Start with a given topology and use the maximum likelihood method to optimize branch lengths 7 Make local modi cations to the topology and reoptimize the branch lengths 7 New taxa are added one by one optimizing branch lengths and topologies each time 7 Assumes an evolutionary process that is a reversible Markov process 7 V61 y computationally expensive to use Likelihood of a Tree We want to find Ltree Prdatatree Given the data a1CT a2CG and a3AT Consider the tree T CG We can calculate the likelihood of this tree if we fill in the internal nodes Likelihood of a Tree Since this is a Markov proce e can consider each site separately from the other 1ch reduces the complexity of the calculation Example C ldZ 1 A T r I A T CT CG T G tree 1 tree3 Prdataltree l Prdataltree 2 Prdataltree 3 Likelihood of a site speci c tree We can calculate from the transition matrix and the distances on each branch the probability of each change The product of these multiplied by the probability of the original base gives the likelihood of a site specific tree Since there are two unknown nodes the double sum of all possible values for each ACTG gives the likelihood for the original tree Maximum Likelihood Statistical model for changes in nucleotides Likelihood that that change occurred 0 Much more computational intensive than parsimony Hypothesis Testing TransitionsTransversions HKY Kimura 2 parameter model Jukes Cantor 1 parameter Maximum Likelihood Taxa 1 ATTGCCATT Taxa 2 ATGGCATT Taxa 3 ATCTATCTT Taxa 4 ATCAAATCTT Taxa 5 ACTGACC Statistical model for changes in nucleotides TransitionsTransversions HKY Kimura 2 parameter model Jukes Cantor 1 parameter Likelihood that that change occurred 0 Much more computational intensive than parsimony Likelihood of a Tree Ltree Pr datatree lVIultiplv likelihood for each character position Recursive de nition of Likelihood Saves computational time Likelihood of a Tree Ltree Pr datatree lVIultiply likelihood for each character position Recursive de nition of Likelihood 20

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

#### "There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.