New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here


by: Kyler Ondricka

Bioinformatics BIOL 7023

Kyler Ondricka

GPA 3.84


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in Biology

This 0 page Class Notes was uploaded by Kyler Ondricka on Monday November 2, 2015. The Class Notes belongs to BIOL 7023 at Georgia Institute of Technology - Main Campus taught by Staff in Fall. Since its upload, it has received 11 views. For similar materials see /class/233999/biol-7023-georgia-institute-of-technology-main-campus in Biology at Georgia Institute of Technology - Main Campus.


Reviews for Bioinformatics


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 11/02/15
lyAhtquotWDquot 39 22 A Model of Evolutionary in the eight years since we last examined the amino acid exchanges seen in closely related proteins1 the infor mation has doubled in quantity and comes from a much wider variety of protein types The matrices derived from these data that describe the amino acid replacement prob abilities between two sequences at various evolutionary distances are more accurate and the scoring matrix that is derived is more sensitive in detecting distant relationships than the one that we previously derived 3 The method used39in this chapter is essentially the same as that de scribed in the Atlas Volume 34 and Volume 51 Accepted Point Mutations An accepted point mutation in a protein is a replace ment of one amino acid by another accepted by natural selection It is the result of two distinct processes the rst is the occurrence of a mutation in the portion of the gene template producing one amino acid of a protein the second is the acceptance of the mutation by the species as the new predominant form To be accepted the new amino acid usually must function in a way similar to the old one chemical and physical similarities are found be tween the amino acids that are observed to interchange frequently Any complete discussion of the observed behavior of amino acids in the evolutionary process must consider the frequency of change of each amino acid to each other one and the propensity of each to remain unchanged There are 20 X 20 400 possible comparisons To collect a use ful amount of information on these a great many observa tions are necessary The body of data used in this study includes 1572 changes in 71 groups of closely related proteins appearing in the Atlas volumes through Supple ment 2 The mutation data were accumulated from the phylo genetic trees and from a few pairs of related sequences The sequences of all the nodal common ancestors in each tree are routinely generated Consider for example the much simplified artificial phylogenic tree of Figure 78 345 Change in Proteins M O Dayhoff RM Schwartz and B C Orcutt The matrix of accepted point mutations calculated from this tree is shown in Figure 79 We have assumed that the likelihood of amino acid X replacing Y is the same as that of Y replacing X and hence 1 is entered in box YX as well as in box XY This assumption is reasonable because this likelihood should depend on the product of the fre quencies of occurrence of the two amino acids and on their chemical and physical similarity As a consequence of this assumption no change in amino acid frequencies over evolutionary distance will be detected By comparing observed sequences with inferred ances tral sequences rather than with each other a sharper Figure 78 Simplified phylogenetic tree Four quotobservedquot proteins are shown at the top Inferred ancestors are shown at the nodes Amino acid exchanges are indicated along the branches LHIODOWZD Figure 79 Matrix of accepted point mutations derived from the tree of Figure 78 346 picture of the acceptable point mutations is obtained In the first amino acid position of the illustration in Figure 78 A changes to D and A changes to C but C and D do not directly interchange If we had compared the observed sequences directly we would have inferred the change of C to D also In practice some of the positions in the nodal sequences are blank ambiguous For these we have treated the changes statistically distributing them among all observed alternatives The total numbers of accepted point mutations ob served between closely related sequences from 34 super families grouped into 71 evolutionary trees are shown in Figure 80 In order to minimize the occurrence of changes caused by successive accepted mutations at one site the sequences within a tree were less than 15 dif ferent from one another and ancestral sequences were even closer Of the 190 possible exchanges shown in Figure 80 35 were never observed These usually involved the amino acids that occur infrequently and are not highly mutable and exchanges where more than one O 0 C7 32 U m 67 4 L N 0 Ch I 1 LO J 07 w b l O 1 N N ATLAS OF PROTEIN SEQUENCE AND STRUCTURE 1378 nucleotide of the codon must change Of the 1572 ex changes the largest number 83 was observed between Asp and Glu two chemically very similar amino acids with codons differing by one nucleotide About 20 of the interchanges far more than one would expect f0r such similar sequences involved amino acids whose codons differed by more than one nucleotide Presumably in any one tree changes at some of the amino acid positions are rejected by selection and multiple changes at the mutable sites are favored Many of the changes expected from the mutations of one nucleotide in a codon are seldom observed Presumably these mutations have occurred often but have been rejected by natural selection acting on the proteins For example there were no ex changes between Gly and Trp Mutability of Amino Acids A complete picture of the mutational process must include a consideration of the amino acids that did not L Leu 95 quot39 37 O 0 75 15 17 40 253 K Lys 57 4quot 322 85 O 147 104 40 23 43 39 V Net 29 0 O 0 20 7 7 0 57 207 90 F Phe 20 39 7 0 0 0 O 17 20 90 167 0 17 S Ser 772 1339 472 98 117 47 86 450 26 T Thr 590 2C 169 57 10 37 31 50 14 H Trp 0 27 3 0 O O 0 1 3 20 E 36 0 30 0 10 0 40 V Val 365 EC 13 17 33 27 37 97 30 661 303 17 77 10 3950 43 186 9 N D C Q E G H Ala Arg Asn Asp Cys Gin Giu Gly His Figure 80 Numbers of accepted point mutations X101 accumu lated from closely related sequences Fifteen hundred and seventy lle Leu Lys Met Phe Pro Ser Thr l L K M F P S T Id Y Trp Tyr iai two exchanges are shewn Fractional exchanges result when ancestral sequences are ambiguous change as well as those that did For this we need to know the probability that each amino acid will change in a given small evolutionary interval We call this number the quotrelative mutabilityquot of the amino acid In order to compute the relative mutabilities of the amino acids we simply count the number of times that each amino acid has changed in an interval and the num ber of times that it has occurred in the sequences and thus has been subject to mutation The relative mutability of each amino acid is proportional to the ratio of changes to occurrences Figure 81 illustrates the computation for a simple case in which 8 changes relatively often A less often and D never Aligned A D A sequences A D 8 Amino acids A B D Changes 1 1 0 Frequency of occurrence 3 1 2 total composition Relative mutability 33 1 0 Figure 81 Sample computation of relative mutability The two aligned sequences may be two experimentally observed sequences or an observed sequence and its inferred ancestor In calculating relative mutabilities from many trees the information from asequences of different lengths and evolutionary distances is combined Each relative muta bility is still a ratio The numerator is the total number of changes of this amino acid on all branches of all protein trees considered The denominator is the total exposure of the amino acid to mutation that is the sum for all branches of its local frequency of occurrence multiplied by the total number of mutations per 100 links for that branch The relative mutabilities of the amino acids are shown in Table 21 On the average Asn Ser Asp and Glu are most mutable and Trp and Cys are least mutable The immutability of cysteine is understandable Cysteine is known to have several unique indispensible functions It is the attachment site of heme groups in cytochrome and of FeS clusters in ferredoxin It forms crosslinks in other proteins such as chymotrypsin or ribonuclease It seldom occurs without having an impor tant function The substitution of one of the larger amino acids of distinctive shape and chemistry for any other is rather uncommon At the other extreme the low mutability of glycine must be due to its unique smallness that is ad vantageous in many places Even though serine sometimes functions in the active center it much more often per ATLAS OF PROTEIN SEQUENCE AND STRUCTURE 1978 347 Table 21 Relative Mutabilities of the Amino Acidsa Asn 134 His 66 Ser 120 Arg 65 Asp 106 Lys 56 Glu 102 Pro 56 Ala 100 Gly 49 Thr 97 Tyr 41 He 96 Phe 41 Met 94 Leu 40 Gln 93 Cys 20 Val 74 Trp 18 aThe value for Ala has been arbitrarily set at 100 forms a function of lesser importance easily mimicked by several other amino a ids of similar physical and chemical properties On the average it is highly mutable Amino Acid Frequencies in the Mutation Data The relative frequencies of exposure to mutatiOn of the amino acids are shown in Table 22 These frequencies fi are approximately proportional to the average composi tion of each group multiplied by the number of mutations in the tree The sum of the frequencies is 1 Mutation Probability Matrix for the Evolutionary Distance of One PAM We can combine information about the individual kinds of mutations and about the relative mutability of the amino acids into one distancedependent quotmutation probability matrixquot see Figure 82 An element of this matrix M gives the probability that the amino acid in column j will be replaced by the amino acid in row i after a given evolutionary interval in this case 1 PAM Table 22 Normalized Frequencies of the Amino Acids in the Accepted Point Mutation Data Gly 0089 Arg 0041 Ala 0087 Asn 0040 Leu 0085 Phe 0040 Lys 0081 Gln 0038 Ser 0070 lle 0037 Val 0065 His 0034 Thr 0058 Cys 0033 Pro 0051 Tyr 0030 Glu 0050 Met 0015 Asp 0047 Trp 0010 3138 The nondiagonal elements have the values M1 27quot 1 i where A is an element of the accepted point mutation matrix of Figure 80 1 is a proportionality constant and mi is the mutability of the jth amino acid Table 21 The diagonal elements have the values 3 1 Consider a typical column that for alanine The total probability the sum of all the elements must be 1 The ATLAS OF PROTElN SEQUENCE AND STRUCTURE 1978 probability of observing a change in a site containing alanine the sum of all the elements except MAAJ is proportional to the mutability of alanine The same pro portionality constant it holds for all columns The in dividual nondiagonal terms within each column bear the same ratio to each other as do the observed mutations in the matrix of Figure 80 A The quantity 100 X 2111111 gives the number of amino acids that will remain unchanged when a protein 100 links long of average composition is exposed to the evolu tionary change represented by this matrix This apparent evolutionary change depends upon the choice of k in this case chosen so that this change is 1 mutation Since there are almost no superimposed changes this also represents 1 PAM of change If it had been four times as large the initial matrix would have represented 4 PAMs the discus sion which follows would not be changed noticeably GRIGINAL AMINO RCIB 1 R N 11 c c r a H 1 1 x 11 r e s 1 11 r 1 Ala Ar Asn Asp Cys Gin Glu Gly His lie Len Lys Met Fire Pro Ser Thr Trp Tyr Val A Ala 9867 2 9 1o 3 a 17 21 2 e 4 2 6 2 22 35 o 2 1a 11 Arg 19913 1 o 1 10 o o 10 3 1 19 i 1 4 s 1 8 r1 1 N Asn 4 19822 35 o 1 a a 21 3 1 13 o 1 2 21 9 1 4 1 n 95 5 3 42 9859 o 5 53 a a 1 lt1 3 a o 1 5 3 o a 1 1 Cys 1 1 11 119973 9 o 7 9 1 1 o 11 o a 1 s 1 o 3 2 1 Sin 3 9 a s o 9825 22 1 23 1 2 5 1 11 a 2 2 0 lt1 1 2 le to o 7 56 lt1 19 9259 4 2 3 1 1 1 o 3 4 2 o 1 2 5 G Gly 21 1 12 11 1 3 79935 1 1 1 2 1 1 3 21 1 o c s E 11 as 1 e 18 1 1 2o 1 09912 n 1 1 r1 2 3 1 1 1 a 1 E 1 lie 2 2 3 1 2 1 2 0 09872 9 2 12 7 o 1 7 o 1 33 E L Leu 3 1 3 o o 6 1 1 1 22 9947 2 4s 13 3 1 3 4 2 1s 3 K Lys 2 3 25 a o 12 7 2 2 1 19925 211 o 3 a 11 a 1 1 a 1 hot 1 a o c 2 o 1 o s a 4 9811 1 a 1 2 o c 4 r are 1 1 1 o 9 o 11 1 2 a 6 o a 9996 11 2 1 3 28 o 1 2m 13 5 2 1 1 a 3 2 s 1 2 2 1 19925 12 i o o 2 3 Ser 29 11 34 7 11 4 5 1e 2 2 1 7 4 3 17 9840 311 s 2 2 1 Thquot 22 2 13 4 1 3 2 2 1 11 2 a 6 1 5 12 9871 o 2 9 vi Trp o 2 o o o o o o o o o o o 1 o 1 o 9976 1 o r Tyr 1 a 3 o 3 o 1 o 4 1 1 o o 21 11 1 1 2 9915 1 1 Val 13 2 1 1 3 2 2 3 3 51 11 1 12 1 3 2 to o 2 99111 Figure 82 Mutation probability matrix for the evolutionary distance or 1 P AM An element of this matrix M gives the prob ability that the amino acid in column 1 will be replaced by the amino acid in row i after a given evolutionary interval in this case 1 accepted point mutation per 100 amino acids Thus there is a 056 probability that Aap will be reoiaced by Glu To simplify the appearance the elements are shown multiplied by 10000 Simulation of the Mutational Process For evaluating statistical methods of detecting relation ships for developing methods of measuring evolutionary distances between proteins and for determining the accuracy of programs to construct evolutionary trees we need to have examples of proteins at known evolutionary distances The mutation probability matrix provides the information with which to simulate any amount of evolu tionary change in an unlimited number of proteins Further we can start with one protein and simulate its separate evolution in duplicated genes or in divergent organisms By considering many groups of sequences related by the same evolutionary history a measure is readily obtained of the expected deviations clue to ran dom fluctuations in the evolutionary process if we only require that on the average one mutation takes place in the evolutionary interval oil PAMwe can use a simulation requiring one random number for each amino acid in the sequence as follows To determine the fate of the rst amino acid say Ala a uniformly distrib uted random number between 0 and l is obtained The first column of the mutation probability matrix Figure 82 gives the relative probability of each possible event that may befall Ala neglecting deletion for simplicity if the random number falls between 0 and 9867 Ala is left unchanged If the number is between 9867 and 9868 it is replaced with Arg if it is between 9868 and 9872 it is replaced with Asp and so forth Similarly a random number is produced for each amino acid in the sequence and action x is taken as dictated by the corre sponding column of the matrix The result is a simulated mutant sequence Any number of these can be generated their average distance from the original is l PAM although some may have no mutations and some may have two or more The effects on the sequence of a longer period of evolution may be simulated by successive applications of the matrix to the sequence resulting from the last applica tion For simulations in which a predetermined number of changes are required a ovastep process involving two random numbers for each mutation can be used Starting with a given sequence the first amino acid that will mutate is selected the probability that any one will be selected is proportional to its mutability Table 2llThen the amino acid that replaces it is chosen The probability for each replacement is proportional to the elements in the appropriate column of Figure 82 Starting with the resultant sequence a second mutation can be simulated and so on until a predetermined number of changes have been made in this process superimposed and back muta tions may occur ATLAS OF PROTEIN SEQUENCE AND STRUCTURE 1978 349 The l PAM matrix can be multiplied by itself N times to yield a matrix that predicts the amino acid replace ments to be found after N PAMs of evolutionary change in a sequence of average composition 0n the average the results of the simulations above match the predictions of the corresponding matrices Mutation Probability Matrices for Other Distances The mutation probability matrix M corresponding to l PAM has a number of interesting properties see Figure 82 If in a simulation it is applied to a protein with the average amino acid composition given in Table 22 on the average the composition of the resulting mutated proteins will be unchanged Repeated applications of the matrix to proteins of any other composition will give mutants that change toward average composition any such matrix has implicit in it some particular asymptotic composition Theredis a different mutation probability matrix for each evolutionary interval These can be derived from the one for l PAM by matrix multiplication if the 1PAM matrix is multiplied by itself an infinite number of times each column of the resulting matrix approaches the asymptotic amino acid composition rArArArA fainfnln in in At a great distance there is very little relationship infor mation left in the matrix For example at a distance of 2034 PAMs all of the matrix values are within 5 of their limiting values except for the TrpTrp element which is 75 higher than the limit and the CysaCys ele ment which is 11 higher The matrix for 0 PAMs is simply a unit diagonal no amino acid would have changed 1 0 0 0 GO 391 M0 0 The mutation probability matrix for 250 PAMs is shown in Figure 83 At this evolutionary distance only one amino acid in five remains unchanged However the 350 amino acids vary greatly in their mutability 55 of the tryptochans 52 of the cysteines and 27 of the glycines would still be unchanged but only 8 of the highly mute ble asparagines would remain Several other amino acids particularly alanine aspartic acid glutamic acid glycine lysine and serine are more likely to occur in place of an original asparagine than asparagine itself at this evolu tionary distance This is understandable from the data giving the preferred mutations and the relative mutabili ties Asparagine is highly mutable therefore it changes to other amino acids These are less mutable and may not change again This effect is much more conSpicuous in the case of methionine Surprisingly a methionine origi nally oresent would have changed to leucine in 20 of the cases but would remain methionine in only 6 Over onethird of the mutations in methionine are specifically to leucine Figure 80 Leucine is less than onehalf as mutable as methionine Table 21 ATLAS OF PROTEW SEQUENCE AND STRUCTURE 3978 From the series of distancedependent mutation prob ability matrices we can compute detailed answers to the question quotHow does the evolutionary orocess affect the similarity of related protein sequences Estimation of Evolutionary Distance There is a different mutation probability matrix for each evolutionary interval measured in PAMs For each such matrix we can calculate the percentage of amino acids that will be observed to change on the average in the interval by the formula 100m grim I Table 23 shows the correspondence between the observed percent difference between two sequences and the evolu tionary distance in PAMs We use this scale to estimate ORIGINAL AMINO ACID 1 s1 1 11 c a E c H 1 1 K M r a s r 11 1 ii Al 5 Arg Fsn Ass Cys Gl n 31 u c1 y Hi 5 ll e Let Lys he Phe Pro See Thr Trp Tyr Val 1 Ma 13 e 9 9 s a 9 12 e a 6 7 2 4 11 11 11 2 4 9 a Arg 1 12 4 3 2 s 3 2 6 3 2 9 4 1 1 4 a 7 2 2 11 4sn 4 r 5 7 2 s a 4 6 3 2 5 3 2 4 5 4 z 3 3 1 Ass 5 4 a 11 1 7 10 a 6 3 2 s 3 1 4 s 5 1 2 3 c Cys a 1 1 1 52 1 1 2 2 2 1 1 1 1 2 3 2 1 4 2 o rain 3 s s s 1 m 7 3 7 2 3 s 2 1 4 3 3 1 2 3 r 511 5 4 2 11 1 9 12 5 e 3 2 s 3 1 4 5 s 1 2 3 2 3 Gly 12 5 10 1a 4 1 9 27 s s 4 e s 3 a 11 e 2 3 7 if H His 2 s s 4 2 r a 2 15 2 2 3 2 2 3 3 2 2 3 2 g 1 112 3 2 2 2 2 2 2 2 2 1e 6 2 6 5 2 3 4 1 3 9 g L Leu 6 4 4 3 2 6 4 3 5 1s 24 4 211 13 5 4 a 6 7 13 E K Lys 6 18 111 e 2 10 e s a s 4 24 9 2 s a a 4 3 5 7 14 he 13 1 t 1 o 1 1 1 1 2 3 2 6 2 1 1 1 1 1 2 r Pine 2 2 1 1 1 1 1 3 5 6 1 4 32 1 2 2 a 20 3 a pm 2 5 5 4 3 s 4 s 5 3 2 4 3 2 20 s 5 1 2 4 3 Ser 9 5 7 7 6 1 9 6 s 4 7 5 3 9 113 9 4 4 5 1 1m 3 5 6 s 4 s s 5 a 6 4 a 5 3 5 e 11 2 l 3 6 11 Tr c 2 o o 0 11 o o 1 a 1 o 0 1 a 1 c 55 a 1 Tyr 1 1 2 1 3 1 1 1 3 2 2 1 2 5 1 2 2 3 31 2 1 Val r 4 4 4 4 4 4 s 4 15 1o 4 1r 5 s s 7 2 4 11 Figure 83 Mutation pyobability matrix for the evolutionary dis tance of 250 PAMs To simclify the appearance the elements are shown multiplied by 100 ln comparing two sequences of average amino acid frequency at this evolutionary distance there is a 13 probability that a position containing Ala in the first sequence will contain Ala in the second There is a 3 chance that it will contain Arg and so forth The relationship of two se quences at a distance of 250 PAMs can he demonstrated by 13 tistical methods Table 23 Correspondence between Observed Differences and the Evolutionary Distance Observed Evolutionary Percent Distance Difference in PAMs 1 l 5 5 10 39l l 15 17 20 23 25 30 30 38 35 47 40 56 45 67 50 80 55 94 60 l i 2 65 133 7390 159 5 195 80 248 85 328 evolutionary distances from matrices of percent difference between sequences These estimated distances were used in the computations of evolutionary trees in this book The differences predicted for a given PAM distance differ by up to 25 from those that we reported in Volume 5 A more complete scale is given in Table 36 of the Appen dix Relatedness Odds Matrix The elements Mi of the mutation probability matrix for each distance give the probability that amino acid i will change to i in a related sequence in that interval The normalized frequency fi gives the probability that i will occur in the second sequence by chance The terms of the relatedness odds matrix are then 3 2 3 The odds matrix is symmetrical Each term gives the prob ability of replacement per occurrence of l per occurrence of j ATLAS OF PROTEIN SEQUENCE AND STRUCTURE 1978 351 Amino acid pairs with scores above 1 replace each other more often as alternatives in related sequences than in random sequences of the same composition whereas those with scores below 1 replace each other less often The information in the ZSOPAM odds matrix has proven very useful in detecting distant relationships be tween sequences When one protein is compared with another position by position one should multiply the odds for each position to calculate an odds for the whole protein However it is more convenient to add the loga rithms of the matrix elements The log of the 250PAM odds matrix is shown in Figure 84 The Chemical Meaning of Amino Acid Mutations Patterns have been visible in the accepted point muta tions since the beginning of protein sequence work lsoleucinevaline and serinethreonine were frequently observed alternatives it was obvious that this interchange ability had something to do with their chemical similari ties in the large amount of information that now exists far more detailed correlations are visible and many more functional inferences can be made lo the log odds matrix of Figure 84 the order of the amino acids has been rearranged to show clearly the groups of chemically similar amino acids that tend to replace one another the hydrophobic group the aromatic group the basic group the acid acidamide group cysteine and the other hydrophilic residues Some groups overlap the basic and acid acid amide groups tend to replace one another to some extent and phenylalanine interchanges with the hydrophobic group more often than chance expectation would predict These patterns are imposed principally by natural selection and only second arily by the constraints of the genetic code they reflect the similarity of the functions of the amino acid residues in their weak interactions with one another in the three dimensional conformation of proteins Some of the properties of an amino acid residue that determine these interactions are size shape and local cancentrations of electric charge the conformation of its van der Waals surface and its ability to form salt bonds hydrophobic bonds and hydrogen bonds Computing Relationships between Sequences We use log odds matrices as scoring matrices for detect ing very distant relationships between proteins Such secrv ing matrices based ultimately on accepted point muta tions can discriminate significant relationships from Simultaneous structure prediction and multiple alignment of structural RNAs Talk originally presented by Anders Krogh Outline RNA Revolution Finding RNA genes Structure prediction Comparative RNA structure analysis RNAaIifold MASTR algorithm Results RNA Revolution Most human genome gets transcribed Noncoding RNA s are important in evolution amp regulation eg miRNA s siRNA s Genome gt Transcriptome gt Proteome Massive regulation by RNA Finding RNA genes They have no equivalent of ORF s Not conserved in sequences as protein coding genes but for many the structure is conserved Bioinformatics methods to understand RNA structures Prediction of structure 20structure predicted by minimizing free energy 39 eg MFOLD and RNAfoId 39 Based on free energy of stacking base pairs and loops Most predictions are not correct 3D structure predictions are not mature yet Comparative RNA structure analysis we Covariance 94 chuccsc uC Gs qu sAcuucss c j Mutual Information quotl G quotCGG quot 6395 Needs a lot of sequences for estimating 16 frequencies Zap fl 31b 3909 Wall a b Base pair entropy works better but not perfect bP 3909 flbP lbP 1 bP 09 1 bP q W I Noise is always positively amplified in entropy measures RNAalifold measure Predicts structure from alignments Calculate covariance for mutations Divide by possible pairs Ck Subtract fraction of nonbp Structure of all sequences is averaged over all the alignments RNAalifold results are lot better compared to standard methods like Gap penalty WC pairs MASTR Combined approach To get the optimal alignment and structure Objective functions 3 components sequence alignment term structure bp free energy covariance Etotal Ealignment 2 bp Estructure E covariace Optimizing alignment amp structure Start with random alignment amp no bps Simulated annealing random change exp change Ex T T is an artificial temp always accepted if change is negative Repeat until Tis close to 0 The best solution is saved Bioinformatics Approach To Chemical Genomics Minoru Kanehisa Institute of Chemical Resear h Kyoto University Outline Management of large scale bata Building blocks to biological Systems Linking biological system amp environment System Space Chemical Structures Genomic Space V Genome Biomolecules Chemical Space Basic principles Ilnking biolagmal warm and natural warlcl Dlsaasa genes Imaging pl bES Drug targets etc I T Drug leads etc K I 3 l I II f System analysis modallng as imulati0n a quotl f gquot PATHWAY BRITE I Systgr Inf rn latlah 7 3 Screening Scraenmg GENESIKO 39 LIGAND Germmlt Inmrmauanf ClhErT39llllzal Inlnrmatmn o i 1 A Virtual cells and organisms K Mg f K Chamical SPEEB Mlembaloma Glyc me CrhElTIICEI gamma Genami space IIGEncme Transcnpwme Fruleamel New Approach Target Perturbant Perturbed System Kyoto Encyclopedia Genes amp Genomes KEGG Collection of 20 databases KEGG Disease arriving Jan 2008 KEGG Pathways amp Networks Genes Ligands l System Space KEGG Databases Database KEGG PATHWAV KEGG BRITE KEGG MDDULE KEGG DISEASE KEGG ORTHULDGY KEGG GENES KEGG UGENES KEGG EGENES KEGG GENDME KEGG SSDB KEGG EDMPOUND KEGG DRUG KEGG GLYCAN KEGG ENZYME KEGG REACHDN KEGG RPAIR Content Flathwa39gr maps Functional hierarchies Pathway modules to he released Diseases to he released KEGG orthelogi39yr ED groups Genes in highquality39 genomes Genes in tira genomes Genes as EST oontigs Organisms with complete genomes Sequenoe similarities and best hit relations Metabolites Drugs Glyoans Enzymes Enzymatic reactions Reaotant hairs and chemical transformations KEGG KEGG Pathway Network of biochemical pathways Created manually from published literature KEGG BRITE Functional hierarchies of biological system Similar to Gene Ontology More highly connected ontologies in genes than GO KEGG Identifiers Database aHQnment Identmer K way a number Corresuondsmasetofkrmmbers Em kobmumber carrespunasmasemiKCDGRNT nu Der carresuonas m a set or GENES entry names GENES LIGAND KEGG ontology KO K number KEGG Pathway node BRITE hierarchy node Over 700 genes Gene Dgenes Draft Genes Egenes ESTs Over 10000 KO Groups What Next Linking Gene expression Profile to Reaction Pattern profile Coexpression of genes Coocourrence s adjoining reaction Chemical Annotation in KEGG Linking exogenous chemical subsances to genomes Chemical Structure Similarity Comparison of bit representation vectors fingureprints Comparison of graph objects Chemical Building blocks Conserved substances Building blocks of compounds Variable substances Reactions Interaction prediction network prediction Network models Chemical Represeation Chemical structure based on atom type ie hybridization KEGG atom types Used in enzymatic reaction representation represented as combination of reactant pairs Representation Reaction Center Boundary of mismatch Matched ATOM Reaction Center Difference Atom


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Jennifer McGill UCSF Med School

"Selling my MCAT study guides and notes has been a great source of side revenue while I'm in school. Some months I'm making over $500! Plus, it makes me happy knowing that I'm helping future med students with their MCAT."

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.