Computational Biology Tools
Computational Biology Tools BME 110
Popular in Course
Popular in Biomolecular Engineering
This 10 page Class Notes was uploaded by Jacky Emmerich on Monday September 7, 2015. The Class Notes belongs to BME 110 at University of California - Santa Cruz taught by Staff in Fall. Since its upload, it has received 89 views. For similar materials see /class/182234/bme-110-university-of-california-santa-cruz in Biomolecular Engineering at University of California - Santa Cruz.
Reviews for Computational Biology Tools
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/07/15
Phylogeny FOR UMMIE 3 Modified by Dietlind Gerloff for use in BME110BIOL181 Winter 2009 I Wiley Publishing 2007 All Rights Reserved UWMIES Learning Objectives Understand the most basic concepts of phylogeny Be able to compute simple phylogenetic trees Understandremember the difference between orthologs and paralogs Understand what bootstrapping means in phylogeny WILEY Outline Finding out what to do with phylogeny Gathering sequences to make a tree Preparing your multiplesequence alignment Computing a tree Bootstrapping your tree to check its reliability Displaying yourtree Why Build a Phylogenetic Tree Phylogenetic trees reconstruct the evolutionary history ofyoursequences They tell you who is closer to whom in the big tree of life Phylogenetic trees are based on sequence similarity rather than morphologic characters WILEY Morphology Characters Species Trees Want to use the appropriate gradation of clock For full Tree of life very few genes change slowly enough Ribosomal RNA is a favorite Modern Tree of Life Molecular characters ribosomal RNA Eu KARYA Green J ms A930 1 6 sum Amoeba ae Pu nsl Mo Ia Sugth I Trypanosom I Amm k Mcmspondia 1I Dnoiaoeiiaes ff Cilia Gard V quot Manomow Jig Them Th omega woman r mmwmlcs I We quot lll Gm W W A I quot Metumania obiales I y Ohmquot 39 Thennococcaies K Soimcaheu I mbmaea If Green sun I K 39 bmria Pyroldlcliahs M znobzchriahs Hanchmyxs Sultoiobaies I Purple sulfur muni EUBACTE R IA m ijj39mm 3 xux ARCHAEA r 3939 ORIGIN or LIFE 3 Ways to Use Your Gene Tree Finding the closest relative of your organism Usually done with a tree based on the ribosomal RNA Discovering the function of a gene Finding the orthologues of your gene Finding the origin of your gene Finding whether your gene comes from another species WILEY Orthology and Paralogy Orthologous genes Separated by speciation A Often have the same function but this is not a re uirement Paralogous genes q xPuphcanon Separated by duplications Can have different functions In the graph Speciation A is paralogous with B fiiii1lliii 31 39 Bz 39 A1 IS orthologous wrth A2 A1 is also paralogous with 82 Working on the Right Data Garbage in ltgt garbage out The quality of your tree depends on the quality of the data Your first task is to assemble a very accurate MSA WILEY DNA or Proteins Most phylogenetic methods work on Proteins and DNA sequences If possible always compute a multiplesequence alignment on the protein sequences Translate the sequences if the DNA is coding Align the sequences 7 Thread the DNA sequences back onto the protein MSA with quot If your DNA sequences are coding and have more than 70 identity Compute the tree on the DNA multiplesequence alignment If your DNA sequences are coding and have less than 70 identity Compute the tree on the protein multiplesequence alignment WILEY Which Sequences Orthologous sequences o If you want to produce a species tree Show how the considered species have diverged Paralogous sequences If you want to produce a gene tree include them Show the evolution of a protein family May help speculate about the function of a protein of interest WILEY Establishing Orthology Establishing orthology is very complicated It is common practice to establish orthology using the best reciprocal BLAST A is a gene of Genome X B is a gene of Genome Y BLAST Gene A against Genome X B BLAST Gene B against Genome Y A x A is B s best friend and B is A s best friend Phylogeny purists dislike this method Creating the Perfect Dataset Problem Reason and Solution Avoid sequence Sequence fragments produce lowquality MSAs fragments Use the same fragments for all the sequences Avoid xenologs Avoid genes resulting from a horizontal gene transfer HGT Avoid recombinant Recombinant Sequences have two ancestors NOtei recombinant in the sense 0f hybrid sequences Th f h or chimeric sequences cloned expression 9539 n quotSet 9 quot99 rec 8tr t39 39 is OK if the sequence is not altered Avoid large complex Avoid proteins that have many paralogous in each genome families It is hard to nd orthologous in large families Avoid multidomain proteins Work on one domain at the same time Try to make a small set Large datasets are dif cult to align Add an outgroup to your An outgroup is an organism whose last common ancestor with the dataset is older than the common dataset ancestor ofthis dataset For instance chicken is an outgroup for man dog and horse WILEY Beware Horizontal Transfer Bacteria Eukarya Archaea Protcobaclcria Cyanobactcria Animalia Euryarchacota rerun acola A rrhezoa g vi Building the Right MSA Your MSA should have as few gaps as possible Typical alignment editing step consolidating gaps Some variability but not too much Some conservation but not too much chite www x KRPLSAYMLWLNSAR E g XWVTEVAKKGGELWRGLKDAATAKQNYIRA gyexxeow wheat mw KRAPSAFFVFMGEFR IfKQK x thVGKAAGERWKSLSEANKLKGEYNKAI AY XG Samp tryhr K KRAMTSFMFFSSDFR wmwwx S i wIVEMSKAAGAAWKELGPAEKDRERYKRE mmmwwwmww mouse Sw amp quotWWK KRPRSAYNIYVSESF Ei i 2 mmmw ampK LQGKLKLVNEAHKHLSPARDDRIRYDNE Building the Right Tree There are two types of treereconstruction methods Distancebased methods Statistical methods Statistical methods are the most accurate Maximum likelihood of success Parsimony Note Parsimony methods are intuitive favor to use a minimum number of substitutions but are only deemed to be accurate for very closely Statistical methods take more time o Limited to small datasets Distancebased Methods for Tree Reconstruction Distancebased methods are the most popular Neighbor Joining NJ UPGMA Note most MSA guide trees are UPGMA trees NEVER use those for actual phylogeny andor publications Distancebased methods involve 2 steps Measure the distances between pairs of sequences in the MSA Transform the distance matrix into a tree Note some distancebased methods do not use MSAs but pairwise alignments this IS not optimal but one can obtain decent trees that way if the proper statscomputation is applied The two most popular packages for making trees are Clustalw very simple not very sophisticated best avoided for phylogeny sabove Phylip very powerful less easily obtainedlearned TreeTop Genebee server is a web server that produces high quality trees WILEY Which Format Trees are displayed In graphic formats E E I I 32l1qutgil266756 26663 Always keep a versron of your tree In gamma1346726979 I 67l16pklgil26675a26821 newrck format ggg ggt Also called newhampshire or nh 33353139191135223927883 Note the parentheses in this format 12 16pkigi152359529464 4316pklgif22299811 26688 49f16pklgi324995 327313 With quotPhylodendronquot Display your nh tree forexample Kim Use iubiobioindianaedutreeapp E39 gg gl39233943693928641 3iqugi1299aza26234 s416pkgi31211ss7a2766gt WILEY Reading Your Tree There s a lot of vocabulary in a tree Nodes correspond to common ancestors The root is the oldest ancestor Often artificial Human T LEAFS wOTUs Only meaningful with a good outgroup Trees can be unrooted Branch lengths are only meaningful when BRANCH the tree is scaled Frog Cladograms are often scaled Spotting the difference Phenograms are usualy unscaled 393 easy 39 unsca39ed quot993 look verytoo regular OUTGROUP A good tree figure contains an indication of distance scale and branching confidence eg bootstrap values sadly we only rarely see these NUDE Mouse ROOT BRANCHLENGIH WILEY