### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Algorithms for Molecular Biology CSCI 4314

GPA 3.51

### View Full Document

## 22

## 0

## Popular in Course

## Popular in ComputerScienence

This 28 page Class Notes was uploaded by Allie West II on Thursday October 29, 2015. The Class Notes belongs to CSCI 4314 at University of Colorado at Boulder taught by Debra Goldberg in Fall. Since its upload, it has received 22 views. For similar materials see /class/231977/csci-4314-university-of-colorado-at-boulder in ComputerScienence at University of Colorado at Boulder.

## Reviews for Algorithms for Molecular Biology

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/29/15

Network models of evolution Wednesday April 19 2006 Todd A Gibson Notes Gene Duplication a 439 unequax mssmg War gm 0417 h M iTrmscupvon and RNA sphcmg alum mvon saquencas am quot NA Mann splmad omamng mg m H mm NA am m Ilanscuvuw em The Davenvamsna msmlmn mm mm m mum a genome mvamusome mm o m mpxme memsws eaugmereeus reeewe genes rmm bum parems ma mssm o Duphcate genes can be pruduced ma unequa russmg uverand retrupusmumng Notes Mutations frequently arise through transcription errors during DNA duplication Neutral mutations are nucleic acid mutations which do not change the protein that is translated from the 3 nucleic acid codon For example lsoleucine is translated from any of the following nucleic acid triples AU U AUC and AUA Calibrating a molecular clock is problematic because neutral mutation rates change over time Also and different species can have different rates due to reproductive rates Likewise calibrating a molecular clock to fossil data is an inexact science Genetic Evolution 0 Neutral theory of evolution 6 The large majority of mutations which occur to a gene do not affect the fitness of the organism Genetic drift then occurs as these neutral mutations drift through the population Molecular clock hypothesis Using the principles of neutral evolution to measure time by counting the genetic differences two homologous genes Positive selection An particular allele conveys an advantage to an individual and is therefore becomes more frequent in the population Duplicated genes are thought to be under relaxed selection which allows nonneutral mutations to drift into the population Notes Preferential Attachment 0 Watts and Strogatz 12 observed properties of smallworld networks short path lengths and large clustering coefficients Assumed a fixed number of vertices Assumed a uniform probability of wiringrewiring edges 0 Barabasi and Albert 2 presented a model for growing or evolving networks which exhibit scalefree properties observed in real networks including biological networks Networks grow with the addition of new vertices New vertices attach preferentially to existing vertices that are already wellattached Notes Atthough preterehttat attachment rhodet ts hot btotogtcaHy fattnfutt tt ts tatthmt th other dorhaths such as autnorcttattons and Web pages Ahother ephstderattOh of observed protetnrprotetn thteraetton networks ts that the tatt der cays tasterthah a powerrtaw Tnere ts ah upper ttrhtt to the number of thteraettons a proteth eah parttetpate th due sterte etashes th proteth comptexes Astne tabte betoW snowst tne powertaw exponent forbtotogtcat networks tends to be eSS nan W0 tttttttttttttt ttt tt t tttstt H a tu ttttwtt tthtt etxtms a tutuMt t tttt shut N wttttt 3 Biologically faithful models of evolution 0 Observed proteinprotein interaction networks exhibit a scalefree behavior 0 Preferential attachment does not reflect process of gene duplication o Preferential attachment lacks the Smallworld properties present in proteinprotein interaction networks preferential attachment produces a small clustering coefficient 8 o Vertex duplication with complete duplication of edges produces power law exponents gt 2 Partial duplication is required to achieve exponents lt 2 3 o Duplication and Divergence model 9 introduced partial duplication and random edge addition a 75 a 391quot o What is the starting graph to evolve from Notes Compressing a new vertex s edge addition and deletion events into a single timestep can still faithfully reproduces the topology of a PPI network but it frustrates attempts to measure dynamical properties of the network as it grows Wholegenome duplications are thought to trigger important evolutionary events such as the development of vertebrates 1 O Limitations of models 0 Statistical physics models present the addition and deletion of all interactions in a single timestep o No consideration for homodimers o No consideration for genome duplication events Notes Diploid Having two copies of each chromosome Humans are diploid Polyploid Having more than two copies of each chromosome Many plants are polyploid Polyploidy results from one or more ancient genome duplication events Many plants are polyploidy and are therefore popular as model organisms for studying genome duplication Modeling gene and genome duplications Maere et al Modeling gene and genome duplications in eukaryotes 2005 7 Gene duplication is ubiquitous Select an organism with welldefined genome duplication events Develop and execute a method for measuring the age of genes on the selected organism Build a mathematical model characterizing gene duplication genome duplication and gene loss Fit model to observed age distribution Analyze distributions by functional group Notes The distribution of synonymous substitutions Ks serves as a surrogate for the age of the genes within Arabidopsis thaliana The goal is to model the gene and genome duplications that occurred for Arabidopsis thaliana Therefore the Ks values between members of paralogous families were used to infer the number of duplications and their age Paralogous protein individual families were identified using allagainstall protein pairwise alignments and clustering sequences by similarity To build the phylogenetic tree an iterative clustering method was used Starting with each individual protein as a cluster the two clusters with the lowest average Ks were combined into a single cluster The resulting branches in the tree represent the n 1 gene duplications Building Arabidopsis thaliana KS dist ution Maere at a Modemg gene and genome dunmarrow m swanores mus 7 Caicuiated we age msmnuuen fur Arabrdopsrs manana paranume o iuentmeu aH paraiuguus prutem fammes m Arabtdopsrs manana o Fureach paraiuguus prutem ramuy with n proteins nn 7 12 pawse estimates or synunymuus sunsmutmns K5 were caicuiated o A prmegeneue tree was bthtu MEMWIHE n 7 1 prutem duphcatmn Warns 0 Hum metree a msmnunun of K5 vames was bum 3R Numhll ufdu ells 00511522533544135 Synonymous substitutions por synonymous om Ks Notes m is age in KsIO increments t is the timestep The subscript on Do etc indicates the event to which the addition or loss of gene duplicates belongs ln genome duplication parlance the number of genome duplication events which have occurred for an organism are termed 1R 2R etc indicating the number of Bounds of genome duplication that have occurred For Arabidopsis thaliana there is compelling evidence for the 3R hypothesis that three rounds of genome duplication occurred 7 In the context of the mathematical model OR represents the regular gene duplications that occur throughout evolution 1R 2R and 3R refer to the first second and third genome duplication events respectively The 04 and 1 parameters were optimized to produce curves close to the observed distri butions Mathematical Model of Arabidopsis thaliana Evolution Goal to model gene and genome duplications which result in observed Ks distribution 0 Continuous gene duplication 50 D01t 0 Z Dtotlt 1 GO x 1 o Genome duplication event i occurring at timestep t 50 Di1t Z Dtotlt 1 GO x 1 It1R 7t2R 7t3R 0 Continuous loss of duplicates Dt Di 1t 1x 1O i10 gt 11 E 0123 0 Total duplicates of age 8 at time t Dtotltat 2192490715 239 0 Convert age distribution of genes into Ks distribution via Poisson smoothing Ame A 10 50 Dllt7 a Z Dt0tlt 7 A1 Notes In plants having backup ie secondary metabolism is helpful in surviving attacks such as an insect eating leaves Results of mathematical model whole Paranams number m relairmd duplicates Ks mz an l2R IZHR maven Ksumr 0 Three rounds of genome duplication produce the best fit 0 Genes which are required in stoichiometric quantities eg transcription factors show a high decay rate from smallscale duplication events and a low decay rate from genome duplication events 0 Other genes eg secondary metabolism show low decay rate in both smallscale and genome duplication events Notes Although the Arabidopsis thaliana genome has been completely sequenced few protein protein interactions have been identified 20 Research Directions 0 The mathematical model provides a tool for exploring evolution of an actual biological network Add delete vertices according to calculated rates Use duplicationdivergence to model edges Model genetic mutations to approximate observed Ks values Fit mathematical model to yeast which has welldefined interactions Attach network to yeast model Run model in reverse to reveal ancestral network Use a genetic algorithm model to simulate network evolution in a population 21 Notes In the PDF plot the red line is the linear least squares fit The slope of this fit is m 09548 The green line is a linear least squares fit of the first five points and the slope is m 1996 In the Complementary CDF plot the red line is the linear least squares fit and has a slope of m 09 However the standard calculation m is modified since the CDF is complementary 1 m gt 19 Of the methods listed Complementary CDF is preferred The basic linear squares fit yields too shallow a line due to the fat tail The first 5 points linear fit is better but suffers from a high variance Like the linear fit logarithmic binning also shallows the slope due to the fat tail Maximum Likelihood also produces good results but is not as straightforward as the Complementary CDF to compute 22 Determining fit of data to powerlaw PDF rue beia19 Complementary CDF rue beia19 mun moon l l 100 l erbx gt x edges Numberolvences m me o l l l l l l l l l 10 mm mm moon l 10 100 mm moon Numberoledges Numberoledges Probability Density Function Pk oc k Linear least squares fit First 5 points linear fit Logarithmic binning Maximum Likelihood 5 Complementary Cumulative Distribution Function PX gt 8 Linear least squares fit 23 Notes The first equation shown on the facing page is the lognormal distribution It is identical to the normal distribution except the log of the random variable is taken Note also the lm term which results from the derivative required for the change of variable The second equation is the logarithm of the first equation consistent with taking the log arithm of a powerlaw distribution The notable feature is that a high standard deviation causes the third term to drop to nearly zero for a large range of x The remaining first two terms ap pear linear on a loglog plot leading to the mischaracterization of the plot as derived from a powerlaw 24 Data Bias Powerlaw versus lognormal o Columb et al 4 show essentiality studies are flawed because data is biased o Stumpf et al 11 and Arita 1 show that lognormal can often fit biological data better than a scale free distribution 1 1 2 2 i 10g M 20 10 7 6 mom 1 lo 110 2 logfx logx loga27r a a 25 This page intentionally left blank References 1 Masanori Arita Scalefreeness and biological networks J Biochem Tokyo 13811 4 Jul 2005 2 A L Barabasi and R Albert Emergence of scaling in random net works Science 2865439509 512 Oct 1999 3 Fan Chung Linyuan Lu T Gregory Dewey and David J Galas Dupli cation models for biological networks J Comput Biol 105677 687 2003 4 n Stphane Coulomb Michel Bauer Denis Bernard and MarieClaude MarsolierKergoat Gene essentiality and the topology of protein inter action networks Proc Biol Sci 27215731721 1725 Aug 2005 5 n M L Goldstein S A Morris and G G Yen Problems with fitting to the powerlaw distribution The European Physical Journal B 412255 258 September 2004 6 M Kimura Evolutionary rate at the molecular level Nature 217129624 626 Feb 1968 7 n Steven Maere Stefanie De Bodt Jeroen Raes Tineke Casneuf Marc Van Montagu Martin Kuiper and Yves Van de Peer Modeling gene and genome duplications in eukaryotes Proc Natl Acad Sci U S A 102155454 5459 Apr 2005 8 n E Ravasz A L Somera D A Mongru Z N Oltavai and A L Barabasi Hierarchical organization of modularity in metabolic net works Science 29755861551 1555 2002 9 Ricard V Sole Romualdo PastorSatorras Eric Smith and Thomas B Kepler A model of largescale proteome evolution Advances in Com plex Systems 543 2002 27 10 JUrg Spring Genome duplication strikes back Nat Genet 312128 11 12 13 129 Jun 2002 Michael PH Stumpf Piers J Ingram lan Nouvel and Cartsen Wiuf Statistical model selection methods applied to biological networks In Corrado Priami Pedro Pablo Gonzalez and Andrea Omicini editors Transactions on Computational Systems Biology volume 3737 of Leo ture Notes in Computer Science pages 65 77 Springer 2005 D J Watts and S H Strogatz Collective dynamics of smallworld net works Nature 39300280836440 442 1998 Jianzhi Zhang Evolution by gene duplication an update Trends in Ecology amp Evolution 186292 298 June 2003 28

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "I made $350 in just two days after posting my first study guide."

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.