New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Computational Biology Tools

by: Jacky Emmerich

Computational Biology Tools BME 110

Jacky Emmerich
GPA 3.7

Dietlind Gerloff

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Dietlind Gerloff
Class Notes
25 ?




Popular in Course

Popular in Biomolecular Engineering

This 12 page Class Notes was uploaded by Jacky Emmerich on Monday September 7, 2015. The Class Notes belongs to BME 110 at University of California - Santa Cruz taught by Dietlind Gerloff in Fall. Since its upload, it has received 63 views. For similar materials see /class/182230/bme-110-university-of-california-santa-cruz in Biomolecular Engineering at University of California - Santa Cruz.


Reviews for Computational Biology Tools


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/07/15
Linear Sequence Analysis What can you learn from a single protein sequence Calculate it s physical properties Molecular weight MW isoelectric point pl amino acid content hydropathy hydrophilic v hydrophobic regions Does not take into account posttranslational modifications of protein so are usually not 100 accurate Identify sequence motifs and families Signal sequences transmembrane domains coiledcoils post translational modification sites secondary structure non homologous Domains functional motifs homologous 3D Structure Analysis Visualization Domain structure global fold active sites point mutations SNPs splice sites Evaluate structure quality Calculate physical properties Surface areas distances sidechain conformations contact maps Structural alignment ie similarity to other structures Prediction Physical properties binding affinity pKa s stability specificity 3D structure homology modeling fold recognition de novo Advanced protein design docking of two proteins active site modeling Sequence Databases SwissProt ExPASy Highly curated updated less frequently TrEMBL ExPASy Translated nucleotide sequences Automatic translation fast but less info UniProt EBI Unified Protein Resource Combines SwissProt TrEMBL PIR sequences Sequence Analysis Sites For protein sequences and tools to analyze them the two major centers are ExPASy Expert Protein Analysis System Many tools httpcaexpasyorgtools Databases SwissProt TrEMBL NCBI Entrez Protein and Domains PIR Protein Information Resource folded into UniProt consortium no longer major resource site More Sequence Databases Nonredundant NR NCBI UniRefPlREB Reference RefSeq NCBI reannotated by NCBI DomainsFamilies Pfam protein families Sanger Center 4 mirror sites SMART Simple Modular Architecture Research Tool CDD Consened protein Domain Database NCBI combines Pfam SMART and COGS databases lnterPro based on UniProt at EMBLEBI Many others Structure Databases Experimental PDB Protein Data Bank Families SCOP CATH Dali database Homstrad ModelsPredictions ModBase SwissModel NOTE All these databases are described in January Database issue of Nucleic Acids Research plus other kinds of databases Also links to them Protein Sequence Analysis Tools ExPASy Proteomics Tools Calculate physical properties Predict sequence motifs what ExPASy calls Topology localization TM domains Signal sequences postranslational modifications Search pattern and profile collections PredictProtein and MetaPP A metasewer providing access to many servers with one submission form Secondary Structure Prediction Three good methods Psipred SamT02TO4T 06 PhD PredictProtein Compare a couple methods Use the threestate predictions SEQUENCE ltgt STRUCTURE ltgt FUNCTION Evolutionary selection operates on function Structure is more closely linked to function than is sequence so structure tends to be more consened than sequence Need to search farther in sequence space to find proteins with related structures and functions Detecting Remote Similarities Remote similarities can more easily be detected by comparing protein sequences DNA sequences change faster than protein sequences wobble position redundant codons 4 letter DNA code vs 20 letter amino acid code means that matches by chance are more likely in DNA The protein code has more information in it Multiple Sequence Alignment Multiple sequence alignment is probably the single most important bioinformatics tools Many applications require accurate MSAs PSIBLAST Family and domain classi cation Pattern identi cation Structure prediction secondary structure fold recognition Phylogeny Fullgenome alignments in browsers Conservation Patterns Cys pairs disulfide bonds His Ser catalytic sites Cys His metal binding sites Gly Pro ends of 2 structure elements turns Lys Arg Asp Glu ligand binding LysArgAspGlu pairs salt bridges Leu coiled coils leucine zippers Motifs secondary structure indels PSIBLAST Alignments The goal of BLAST is rapid detection by detecting highscoring local alignments It doesn t necessarily find the optimal global or local alignment Profiles throw away information for regions that are insertions relative to the query Methods Dynamic Programming Gives the optimal solution but prohibitively slow Progressive ClustalW o hftn39lwww ehiar quot39 quot quot 39 html most commonly used Tcoffee httpigsservercnrsmrsfrTcoffee a little better but slower Iterative better than progressive methods but slower Dialign HMMs Progressive Alignment 1 Calculate global pairwise alignments for all pairs Needleman and Wunsch N1N2 alignments required 2 Use pairwise alignment scores to calculate a guide tree describing the distance between all pairs of sequences 3 Align the sequences progressively Start with the two most closely related sequences Add in sequences in order of increasing distance ClustalW uses this method ClustalW Example Input 5 sequences detected by BLASTp using human SNAP25 as a query Default parameters output order input 5137171379 5 gt91731242 623 LLK gt91723224 09 YIGRITNDAREDEMEENVGQVNTMIGN LRNMAIDMGSELENQNRQIDRIKNKAEM gt917 929 303 IHDKAQSNEVRVESANKRAKN LITK gt91732 567202 Input Formats for Clustal programs FASTA format Download from NCBI ExPASy EBI Pfam Sequence names should be Unigue 15 characters or less Comprised of only A Zaz09 and Do not use or spaces ClustalW Output CLUSTAL w 182 Multiple Sequence Allgnments sequence format 1s Pearson ce P13795 Sequen 1 7 2 aa Sequence 2 Q1731242623 213 aa Sequence 3 Q1738224E9 195 aa Sequence 4 g17395933 8 235 aa Sequence 5 Q17325672E2 2m aa Start of Palrwlse allgnments A11gn1 Sequences A11gned Score 1 Scor 12 57 Sequen es 13 A1 gned 59 Sequences 14 Aligned 5 Sequences 15 A11gned Sequences 23 A11gned Sequences 24 A11gned Sequences 25 Aligned Sequences 34 A11gned Sequences 35 A11gned Sequences 45 A11gned Guide tree flle Start of Multlple Allgnment There are 4 groups A11gn1ng Group 1 Sequences 2 Score3E1E cro p 2 Sequences 3 Score3429 Group 3 Sequences 2 Score4233 crou 4 Se uence 5 Score3386 p q Allgnment Score 7423 CLUSTALeAhgnnenc flle created eb1extservoldrworkclustalwrZEIEI4E2E67EI1234219aln CIustaIW Guide Tree The guide tree shows the distances between sequences obtained from the initial painNise alignments This is the order that sequences were added into the MSA Guide tree is not a phylogenetic tree it sjust a rough estimate of similarity however a true phylogenetic tree can be generated after making an alignment Progressive Alignment Greedy algorithm Breaks problem up into smaller problems Finds best solution to each small problem Combine solutions to get answer to whole problem Not necessarily the global answer Doesn t use all information in solving subproblems Suboptimal answers for small problems may combine to give a better overall answer Gaps once created they stay as part of alignment for rest of alignment iterations ClustalW Alignment CLUSTAL m 132 multlple sequence allgnment Interleaved Formats Most common output formats for MSAs are interleaved MSF ASN BLAST queryanchored formats All sequences are stacked up and chopped into blocks of 60 residues Easy for humans to read but difficult to edit Tools for converting formats are available on the web Aligned FASTA A2M Format gtSN297RATl42il96 D kiwi KNSSLWrRr W WA gtSN2 97HU39MANl427197 DTTllRl w KNPHL rRWr A gtSN257TORMA957148 SGGYliiRRIiTDDA gt093578lli5 9 SGGFliiRRviTDDA gtSN257DROME98il49 QAGYliiGRliTNDA Uppercase an characters are alignment columns There must be the same number of aligned characters in all sequences Insertions that are not part ofthe alignment are indicated with lower case and characters These are not read ie they re only note mar s lll l on Bene ts Would indicate that th Easily machine readable mgquot W mmg Readable by most programs that read FASTA format eg Jalview Graphical ClustaIX or others Postscript PDF HTML Looks pretty and very visually informative Completely useless for further computational analysis DO NOT SAVE GRAPHICS AS YOUR ONLY OUTPUT Jalview Java alignment editor httpvngNjalvieworg Available as an online applet or as an application Makes nice pictures and allow interactive editing


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Janice Dongeun University of Washington

"I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

Steve Martinelli UC Los Angeles

"There's no way I would have passed my Organic Chemistry class this semester without the notes and study guides I got from StudySoup."


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.