BIOINFORMATICS II BIOL 4550
Popular in Course
Popular in Biology
This 262 page Class Notes was uploaded by Ezequiel Schaefer on Monday October 19, 2015. The Class Notes belongs to BIOL 4550 at Rensselaer Polytechnic Institute taught by Staff in Fall. Since its upload, it has received 43 views. For similar materials see /class/224832/biol-4550-rensselaer-polytechnic-institute in Biology at Rensselaer Polytechnic Institute.
Reviews for BIOINFORMATICS II
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 10/19/15
Bioinformatics 2 lecture 4 Rotation and translation Least squares superposition What happens When you move the mouse to rotate a molecule I MOUSC SCl ldS l IlOUSC kg 139 coordinates AXAy to the running program 2 Rotation angles are calculated 8X 2 AXscale 8y 2 Ayscale 3 Rotation matrices are calculated 1 O 0 Ex O cos6x sin 6x R 0 0 sin 6x cos6x y sing 7 l O LX cos6y O sin6y 0 cos6y What happens When you move the mouse cont39d 4 1 New atom coordinates are calculated 4 quot R R I I yx 5 The scene 1s rendered us1ng the neW coordinates All of this happens in a fraction of a second Rotation is angular addition atom starts at xrcos0t yrisin0t aXis of rotation Cartesian origin X r0tates t0 X39rcos0t3 y39rsin0t3 Convention angles are measured counter Clockwise Sum of angles formula cos oc3 cos oc cos 3 sin oc sin 3 sin oc3 sin oc cos 3 sin 3 cos oc A rotation matrix y X Irlcos or KW y Irlsin or B xy a X X39 r cos OLB y39 lrl sin ms 2 rCOS 0L COS B Sin 0t Sin B rsin 0t cos B sin B cos 0t 2 lrl COS 0t COS B rl Sin OOSin B Irl sin 0t cos B Irl cos 0t sin B XcosB ysinB ycosBxsinB 3639 cos 3 sin 3 I COS 06 cos 3 sin 3 x y sin 3 cos 3 I sin 0 sin 3 cos 3 y rotation matrix is the same for any r any or A rotation around a principle aXis The Z coordinate stays the same X and Y change cos 3 sin 3 O RZ sin3 cos 3 O O O 1 The Y coordinate stays the same X and Z change cos y 0 sin y 0 1 0 sin y 0 cos y The X coordinate stays the same Y and Z change 1 0 0 RX 0 cosoc sinoc By 0 sinoc cos a A 3D rotation matrix Is the product of 2D rotation matrices cos3 sin3 0 cosy 0 siny cosRosy sin3 cos3 sin3 cos3 0 0 1 0 sin3c0sy cos3 sin3siny 0 0 1 siny 0 cosy siny 0 cosy 9 04 19 3o 0 020 Qquot When multiplying matrices the order matters eos6y O sin6y 1 0 0 RRx 0 1 O 0 COSHX sin6x sinBy O cos6y 0 sion COSHX cos6y s1n6x s1n 6y s1n6y COSBx 0 cos 6x sin 6x s1n 6y s1n6x cos 6y cos 6x COSB y This is the matrix if the X rotation is rst then the Y rotation Rotating in opposite order gives a different matrix 1 O O COSBy O sin6y RRy O COSBX sin6x O 1 O O sin6x COSBX sin6y O COSBy cos 6y 0 sin6y s1n6x s1n 6y COSBx s1n 6x COSBy s1n6y COSBx s1n 6x COSBx cos 6y Reversing the rotation For the opposite rotation ip the matrix AC BD ABT CD X39cos3 sin3 X y sin3 cos3 y The inverse matrix 2 The transposed rnatriX cos 3 sin 3 cos 3 sin 3 1 0 sin 3 cos 3 sin 3 cos 3 0 1 NOTE cosb cosb sinb sinb 1 77 This is the transpose Example rotation in 2 steps Rotate the vector v123 around Z by 60 then around Y by 60 cos60 sin60 0 1 105 2086630 1232 sin60 cos60 0 2 1086620530 1866 0 0 1 3 0031 3 cos60 0 s1n600 1232 12320518660 30866 3214 0 1 0 1866 123201866130 1866 sin60 0 c0s60 3 1232086618660305 0433 Right handed 90 rotations 90 rotation around 1 0 0 X 00 1 010 001 Y 010 100 Helpful hint For a Rhanded rotation the sine is up and to the right of the sz39ne In Class exercise rotate a point Xyz 1 4 7 Rotate this point by 90 around the ZaXis Then Rotate the new point by 90 around the Yaxis What are the new coordinates 3D rotation conventions I axis of Z Euler angles or Sly mtatlon cosy sin 0 1 0 0 cosa sina 0 siny cosy 0 0 cos3 sin3 sina cosa 0 Orderof 0 0 l 0 sin3 cos3 0 0 l rotations 3 2 1 Each rotation is around a principle aXis Polar angles InpK Z9999 y Z9 9 Z cos sin 0 cosq O sinp COSK SlIlK O cosq O sinq cos simp O simp cos O O 1 O SlIlK COSK O O 1 0 sin cos O O O I sing O cosq O O 1 sinp O cosq O O 1 5 4 3 2 1 Net rotation K around an aXis aXis de ned by q and 1p Polar angles z 2 north pole K 7 I g q x prime meridean equator I Rotation of K degrees a und an Xis is located at 1 degrees longi an 11 egre latitude Special properties of rotation matrices They are square 2X2 01 3X3 higher dimensions in principle The product of any two rotation matrices is a rotation matrix The inverse equals the transpose R391 2 RT Every row column is a unit vector Any two rowscolumns are orthogonal vectors The crossproduct of any two rows equals the third le IRxl where R is a rotation matrix Read more about rotation matrices at httpmathworldwolframcomRotationMatriXhtml RMSD R00t Mean Square Deviation in superimposed coordinates is the standard measure of structural difference Similar to standard deviation Which is the square root of the variance Orengo p88 Where xi are the coordinates from molecule 1 and yi are the equivalent coordinates from molecule 2 gtkWhich atoms are equivalent is based on an alignment pseudo pseudocode program for computing rmsd sum 0 N 0 While ltthere is data to readgt Read coordinates for two aligned positions read X1Y1Z1 a X2Y2Z2 Compute the distance2 d Xl39X22y139Y22Z139Z22 sum it sum sum d keep track of how many pairs there are N N 1 Average and take square root Imsd sqrtsumN Least squares superposition Problem nd the rotation matrix M and a vector 1 that minimize the following quantity Where xi are the coordinates from one molecule and yi are the equivalent coordinates from another molecule equivalenl based on alignment Mapping structural equivalence aligning the sequence Any position that is aligned is included in the sum of squares 4DFRA ISLIAALAVDRVI ENAMPWNLPADLAWFKRNTLDKPVIMGRHTWESIG RPLPGRKNI lDFR TAFLWAQNRNGLTwDGHLPWHLPDDLHYFRAQTVGKIMVVGRRTYESFPKRPLPERTNV I 4 4DFRA ILSSQ PGTDDRVTWVKSVDEAIAAC GDVPEIMVIGGGRVYEQFLPKAQKLYLTHIDA lDFR VLTHQEDYQAQGAVVVHDVAAVFAYAKQHLDQELVIAGGAQIFTAFKDDVDTLLVTRLAG 4DFRA EVEGDTHFPDYEPDDWESVFSEFHDAI QN HSYCFKILERR lDFR SFEGDTKMIPLNWDDFTKVSSRTVED PALTHTYEVWQKK Unaligned positions are not Least squares At the position of best superposition we have an approximate equality First we eliminate v by translating the center of mass of both molecules to the originNow 1M f g we have 239 L L We have one equation 239 for each atom M has 9 unknowns If there are more equations than unknowns there is a unique solution What is it See the next two slides Least squares Least squares solves a set of a linear equations in the form This is 39shorthand39 notation for the equations V ctor vector Least squares continued Green elements are known Orange are unknmx n Fat Rectangles are matrices Thin rectangles are vectors Multiply both sides by transpose of x quotSquaringquot quotSquaredquot matrix is now uh square Which means we can invert it We can use the quotLU decompositionquot method Multiplying both sides by the inverse of quotsquaredquot matrix solves for a Summary a xTx391xTy least squares superimposed molecules In class exercise Continue MOE Tour 0start MOE using moe ngVisual 0X3l if using the SGI machines Open the MOE Tour web page Help gtTutorials gtGetting Started 0Go through the following sections at your own pace Saving and Loading a Molecule File Saving a Molecule in a Database Rendering the Molecule OSelecting Atoms Moving Selected Atoms Rotating About a Bond Introducing the Atom Manager Measuring Angles and Distances Measuring Energy Read Cp 6 7 for next time c 6f gang 0 Structure a LX843 3 A superposition and classi cation 9 4 Tleai39re Sandwich Roll FxK r A tavodcxln i aclamase Wxn STUDIES ON THE PRINCIPLES THAT GOVERN THE FOLDING OF PROTE IN CHAINS Nobel Lecture December 11 1972 by CHRISTIAN B ANFINSEN National Institutes of Health Bethesda Maryland The telegram that I received from the Swedish Royal Academy of Sciences speci cally cites studies on ribonuclease in particular the relationship between the amino acid sequence and the biologically active conformation The work that my colleagues and I have carried out on the nature of the process that controls the folding of polypeptide chains into the unique threedimen sional structures of proteins was indeed strongly in uenced by observations on the ribonuclease molecule Many others including Anson and Mirsky l in the 305 and Lumry and Eyring 2 in the 505 had observed and discussed the reversibility of denaturation of proteins However the true elegance of this consequence of natural selection was dramatized by the ribonuclease work since the refolding of this molecule after full denaturation by reductive cleav age of its four disul de bonds Figure 1 required that only one of the 105 BOVINE PANCREATIC RIBONUCLEASE Fig l The amino acid sequence of bovine pancreatic ribonuclease 3 4 5 103 possible pairings of eight sulfhydryl groups to form four disul de linkages take place The original observations that led to this conclusion were made together with my colleagues Michael Sela and Fred White in 1956 1957 These were in actuality the beginnings of a long series of studies that rather vaguely aimed at the eventual total synthesis of the protein As we all know Gutte and Merri eld 7 at the Rockefeller Institute and Ralph Hirschman and his colleagues at the Merck Research Institute 8 have now accomplished this monumental task The studies on the renaturation of fully denatured ribonuclease required many supporting investigations 9 10 ll 12 to establish nally the generality which we have occasionally called 13 the thermodynamic hypothesis This hypothesis states that the threedimensional structure of a native protein in its normal physiological milieu solvent pH ionic strength presence of other components such as metal ions or prosthetic groups tempera ture etc is the one in which the Gibbs free energy of the whole system is lowest that is that the native conformation is determined by the totality of interatomic interactions and hence by the amino acid sequence in a given environment In terms of natural selection through the design of macromole cules during evolution this idea emphasized the fact that a protein molecule only makes stable structural sense when it exists under conditions similar to those for which it was selected the socalled physiological state After several years of study on the ribonuclease molecule it became clear to us and to many others in the field of protein conformation that proteins devoid of restrictive disulfide bonds or other covalent cross linkages would make more convenient models for the study of the thermodynamic and kinetic aspects of the nucleation and subsequent pathways of polypeptide chain folding Much of what I will review will deal with studies on the exible and convenient staphylococcal nuclease molecule but I will rst summarize some of the older background experiments on bovine pancreatic ribonuclease itself SUPPORT FOR THE THERMODYNAMIC HYPOTHESIS An experiment that gave us a particular satisfaction in connection with the translation of information in the linear amino acid sequence into native con formation involved the rearrangement of socalled scrambled ribonuclease 12 When the fully reduced protein with 8 SH groups is allowed to reoxidize under denaturing conditions such as exist in a solution of 8 molar urea a mixture of products is obtained containing many or all of the possible 105 isomeric disul de bonded forms Schematically shown at the bottom right of Figure 2 This mixture is essentially inactive having on the order of l the activity of the native enzyme If the urea is removed and the scrambled protein is exposed to a small amount of a sulfhydryl groupcontaining reagent such as mercaptoethanol disul de interchange takes place and the mixture eventually is converted into a homogeneous product indistinguishable from native ribonuclease This process is driven entirely by the free energy of con formation that is gained in going to the stable native structure These experi 104 UNFOLDING Urea Mercoptoethnnol REFOLDING Fig 2 Schematic representation of the reductive denaturation in 8 molar urea solution containing 2mercaptoethanol of a disul decross linked protein The conversion of the extended de natured form to a randomly cross linked scrambled set of isomers is depicted in the lower right portion of the gure ments incidentally also make unlikely a process of obligatory progressive folding during the elongation of the polypeptide chain during biosynthesis from the NH to the COOHterminus The scrambled protein appears to be essentially devoid of the various aspects of structural regularity that character ize the native molecule A disturbing factor in the kinetics of the process of renaturation of reduced ribonuclease or of the unscrambling experiments described above was the slowness of these processes frequently hours in duration 11 It had been established that the time required to synthesize the chain of a protein like ribonuclease containing 124 amino acid residues in the tissues of a higher organism would be approximately 2 minutes 14 15 The discrepancy between the in vitro and in viva rates led to the discovery of an enzyme system in the endoplasmic reticulum of cells particularly in those concerned with the secre tion of extracellular SSbonded proteins which catalyzes the disul de inter change reaction and which when added to solutions of reduced ribonuclease or to protein containing randomized SS bonds catalyzed the rapid formation of the correct native disul de pairing in a period less than the requisite two minutes l6 17 The above discrepancy in rates would not have been observed in the case of the folding of noncrosslinked structures and as discussed below such motile proteins as staphylococcal nuclease or myoglobin can undergo virtually complete renaturation in a few a seconds or less The disul de interchange enzyme subsequently served as a useful tool for the examination of the thermodynamic stability of disul debonded protein 105 structures This enzyme having a molecular weight of 42000 and containing three halfcystine residues one of which must be in the SH form for activity 18 19 appears to carry out its rearranging activities on a purely random basis Thus a protein whose SS bonds have been deliberately broken and re formed in an incorrect way need only be exposed to the enzyme with its essential halfcystine residue in the prereduced SH form and interchange of disul de bonds occurs until the native form of the protein substrate is reached Presumably SS bonds occupying solventexposed or other ther modynamically unfavorable positions are constantly probed and progressively replaced by more favorable halfcystine pairings until the enzyme can no longer contact bonds because of steric factors or because no further net decrease in conformational free energy can be achieved Model studies on ribonuclease derivatives had shown that when the intactness of the genetic message represented by the linear sequence of the protein was tampered with by certain cleavages of the chain or by deletions of amino acids at various points the added disul de interchange enzyme in the course of its probing discovered this situation of thermodynamic instability and caused the random rcshuf ing of SS bonds with the formation of an inactive crosslinked network of chains and chain fragments eg 20 Vith two naturally occurring proteins insulin and chymotrypsin the interchange enzyme did indeed induce such a randomizing phenomenon 21 Chymotrypsin containing three SS bonded chains is known to be derived from a singlechained precursor chy motrypsinogen by excision of two internal bits of sequence The elegant studies of Steiner and his colleagues subsequently showed that insulin was also derived from a singlechained precursor proinsulin Figure 3 which is converted to 51 so 53 52 N s so 55 W CONNECTING NH PEPTIDE quot Fig 3 The structure of porcine proinsulin R E Chance R M Ellis and W W Bromer Science 161 165 1968 106 the twochained form in which we normally nd the active hormone by re moval of a segment from the middle of the precursor strand after formation of the 3 SS bonds 22 In contrast the multichained immune globulins are not scrambled and inactivated by the enzyme re ecting the fact that they are normal products of the disul de bonding of 4 preformed polypeptide chains FACTORS CONTRIBUTING To THE CORRECT FOLDING 0F POLYPEPTIDE CHAINS The results with the disul de interchange enzyme discussed above suggested that the correct and unique translation of the genetic message for a particular protein backbone is no longer possible when the linear information has been tampered with by deletion of amino acid residues As with most rules however this one is susceptible to many excpetions First a number of proteins have been shown to undergo reversible denaturation including disul de bond rup ture and reformation after being shortened at either the NH or COOH terminus 23 Others may be cleaved into two 24 25 26 or even three fragments which although devoid of detectable structure alone in solution recombine through noncovalent forces to yield biologically active structures with physical properties very similar to those of the parent protein molecules Richards and his colleagues 24 discovered the rst of these recombining systems ribonucleaseS RNaseS which consists of a20 residue fragment from the NHaterminal end held by a large number of noncovalent interactions to the rest of the molecule which consists of 104 residues and all four of the disul de bridges The work by Wyckoff Richards and their associates on the threedimensional structure of this twofragment complex 27 and on the identi cation of many of the amino acid side chains that are essential for comple mentation is classical as are studies by Hofmann 28 and ScoHonc 29 and their colleagues on semisynthetic analogues of this enzyme derivative Studies in our own laboratory 30 showed that the 20 residue RNaseSpeptide woaeoooeoodagoooooeo awooeeeeeoeaooooooooo gooooo o ooeooe eoog oooooeoooeewowowooooo e oooo oaooao og ooooeeoooooooeoooooog goowoeooeooooooo5mu Fig 4 Covalent structure of the major extracellular nuclease of Staphylococcus aureus 32 33 107 30 II II I 25i 39l ll 5 Nose T v 1 2 H 5 A 390 Gradient 3900 N0 9 Started a g Q Z 8 2 N lt1 2 Fig 5 Isolation of semisynthetic nucleaseT on a phosphocellulose column follow ing functional puri cation by trypsin digestion in the presence of calcium 3900 200 300 ions and thymidine3 5 diphosphate VOLUME 0F ELUATEmIl 41 fragment could be reduced by 5 residues at its COOHterminus without loss of enzymic activity in the complex or of its intrinsic stability in solution Other examples of retention of native structural memory have been found with complexing fragments of the staphylococcal nuclease molecule 25 31 This calciumdependent RNA and DNA cleaving enzyme Figure 4 consists of 149 amino acids and is devoid of disul de bridges and SH groups 32 33 Although it exhibits considerable exibility in solution as evidenced by the ready exchange of labile hydrogen atoms in the interior of the molecule with solvent hydrogen atoms 34 only a very small fraction of the total population deviates from the intact native format at any moment Spectral and hydro dynamic measurements indicate marked stability up to temperatures ofapproxi mately 55 The protein is greatly stabilized both against hydrogen exchange 34 and against digestion by proteolytic enzymes 35 when calcium ions and the inhibitory ligand 3 5 thymidine diphosphate pdTp are added Trypsin for example then only cleaves at very restricted positions the loose aminoterminal portion of the chain and a loop of residues that protrudes out from the molecule as visualized by Xray crystallography Cleavage occurs between lysine residues 5 and 6 and in the sequence ProLysLysGly residues 47 through 50 between residues 48 and 49 or 49 and 50 25 The resulting fragments 6 48 and 49 149 or 50 149 are devoid of detectable structure in solution 36 However as in the case of RNaseS when they are mixed in stoichiometric amounts regeneration of activity about 10 and of native structural characteristics occurs the complex is called nucleaseT NucleaseT has now been shown 37a to be closely isomorphous with native nuclease 37b Thus the cleavages 108 RELATIVE FLUORESCENCE INTENSITY Fig 6 Use of uorescence measurements to deter mine the relative hydrophobicity presuma bly re ecting nativeness in the case of nuclease of the molecular environment of the single tryptophan residue in this protein 300 320 340 360 380 400 39 41 WAVELENGTH OF EMISSION my and deletions do not destroy the geometric sense of the chain Recently it was shown that residue 149 may be removed by carboxypeptidase treatment of nuclease and that residues 45 through 49 are dispensible the latter con clusion the result of solid phasesynthetic studies 38 on analogues of the fragment 6 47 Earlier studies by David Ontjes 39 had established that the rapid and convenient solidphase method developed by Merri eld 40 for peptide syn thesis could be applied to the synthesis of analogues of the 6 47 fragment of nucleaseT The products although contaminated by sizeable amounts of mistake sequences which lack an amino acid residues due to slight incom pleteness of reaction during coupling could be puri ed by ordinary chromato graphic methods to a stage that permitted one to make de nite conclusions about the relative importance of various components in the chain Taking advantage of the limited proteolysis that occurs when nuclease is treated with trypsin in the presence of the stabilizing ligands calcium and pdTp Chaiken 41 was able to digest away those aberrant synthetic molecules of 6 47 that did not form a stable complex with the large native fragment 49 149 After digestion of the complex chromatography on columns of phospho cellulose Figure 5 yielded samples of semisynthetic nucleaseT that were essen tially indistinguishable from native nucleaseT For example the large enhance ment of uorescence of the single tryptophan residue in nuclease located at position 140 in the 50 149 fragment upon addition of the native 6 49 fragment was also shown when instead synthetic 6 47 peptide isolated from semisynthetic nucleaseT that had been purified as described above was added Figure 6 109 TYR 113 ARG ASP 35 21 ASP 19 Fig 7 Amino acid residues in the sequence of nuclease that are of particular importance in the catalytic activity and binding of substrate and calcium ions 42 The dispensability or replaceability of a number of residues to the stability of the nucleaseT complex was established by examining the uorescence activity and stability to enzymatic digestion of a large number of semisynthetic analogues 42 As illustrated in Figure 7 interaction with the calcium atom required for nuclease activity normally requires the participation of four dicarboxylic amino acids Although the activities of complexes containing synthetic 6 47 fragments in which one of these had been replaced with an asparagine or glutamine residue were abolished with one partial exception asparagine at position 40 three dimensional structure and complex stability was retained for the most part Similarly replacement of arginine residue 35 with lysine yielded an inactive complex but nevertheless one with strong three dimensional similarity to native nucleaseT A second kind of complementing system of nuclease fragments 31 consists of tryptic fragment 1 126 and a partially overlapping section of the se quence 99 149 prepared by cyanogen bromide treatment of the native molecule shown schematically in Figure 8 These two peptides form a com plex with about 15 the activity of nuclease itself which is suf ciently stable in the presence of pdTp and calcium ions to exhibit remarkable resistance to digestion by trypsin Thus many of the overlapping residues in the complex 110 I49 Fig 8 A schematic view of the three dimensional structure of staphylococcal nuclease 37b 53 1 126 99 149 may be trimmed away with the production of a deriva tive 1 126 111 149 Further degradation of each of the two compo nents the former with carboxypeptidases A and B and the latter with leucine aminopeptidase permits the preparation of 1 124 114 149 which is as active and as structurally similar to native nuclease as evidenced by esti mates of hydrodynamic spectral and helical properties as the parent un g degraded complex A number of synthetic analogues of the 114 149 sequence have been prepared 43 which also exhibit activity and native physical properties when added to 1 126 I will discuss below the manner in which these complexing fragments have been useful in devising experiments to study the processes of nucleation and folding of polypeptide chains MUTABILITY or INFORMATION FOR CHAIN FOLDING Biological function appears to be more a correlate of macromolecular geometry than of chemical detail The classic chemical and crystallographic work on the large number of abnormal human hemoglobins the species variants of cyto chrome c and other proteins from a very large variety of sources and the isola tion of numerous bacterial proteins after mutation of the corresponding genes have made it quite clear that considerable modi cation of protein sequence may be made without loss of function In those cases where crystallographic studies of threedimensional structure have been made the results indicate that the geometric problem of designing through natural selection mole cules that can subserve a particular functional need can be solved in many ways Only the geometry of the protein and its active site need be conserved except of course for such residues as actually participate in a unique way in a catalytic or regulatory mechanism 44 Studies of model systems have led 111 to similar conclusions In our own work on ribonuclease for example it was shown that fairly long chains of polyD Lalanine could be attached to eight of the eleven amino groups of the enzyme without loss of enzyme activity 45 Furthermore the polyalanylated enzyme could be converted to an extended chain by reduction of the four SS bridges in 8 M urea and this fully denatured material could then be reoxidized to yield the active correctly folded starting substance Thus the chemistry of the protein could be greatly modi ed and its capacity to refold after denaturation seemed to be dependent only on internal residues and not those on the outside exposed to solvent This is of course precisely the conclusion reached by Perutz and his colleagues 46 and by others 47 who have reviewed and correlated the data on various protein sys tems Mutation and natural selection are permitted a high degree of freedom during the evolution of species or during accidental mutation but a limited number of residues destined to become involved in the internal hydrophobic core of proteins must be carefully conserved or at most replaced with other residues with a close similarity in bulk and hydrophobicity THE COOPERATIVITY REQUIRED FOR FOLDING AND STABILITY OF PROTEINS The examples of noncovalent interaction of complementing fragments of proteins quoted above give strong support to the idea of the essentiality of cooperative interactions in the stability of protein structure As in the basic rules of languages an incomplete sentence frequently conveys only gibberish There appears to exist a very ne balance between stable native protein structure and random biologically meaningless polypeptide chains A very good example of the inadequacy of an incomplete sequence comes from our observations on the nuclease fragment 1 126 This fragment contains all of the residues that make up the active center of nuclease Neverthe less this fragment representing about 85 of the total sequence of nuclease exhibits only about 012 the activity of the native enzyme 48 The further IZO A lOO D I Reduced viscosity 8 439 Molar alliplicily220 nm 3 so I o 5 Fig 9 2 5 quot Changes in reduced viscocity and u molar ellipticity at 220 nm during 2 4o the acidinduced transition from E native to denatured nuclease g El and I Reduced viscosity A a 20 and A molar ellipticity at 220 nm a l and A Measurements made a A during the addition of acid I and o 39 A u A measurements made during the 39 l l I 39 addition of base 2 3 4 5 6 7 8 A N Schechter H F Epstein and 9 C B An nsen unpublished results 112 addition of 23 residues during biosynthesis or the addition in vitro of residues 99149 as a complementing fragment 31 restores the stability required for activity to this un nished gene translation The transition from incomplete inactive enzyme with random structure to competent enzyme with unique and stable structure is clearly a delicately balanced one The sharpness of this transition may be emphasized by experi ments of the sort illustrated in Figure 9 Nuclease undergoes a dramatic change from native globular structure to random disoriented polypeptide over a very narrow range of pH centered at pH 39 The transition has the appear ance of a twostage process either all native or all denatured and indeed twostate mathematical treatment has classically been employed to describe such data In actuality it has been possible to show by NMR and spectro photometricexperiments 49 that one of the 4 histidines and one tyrosine residue of the 7 in nuclease become disoriented before the general and sudden disintegration of organized structure However such evidences of a stepwise denaturation and renaturation process are certainly not typical of the bulk of the cooperatively stabilized molecule The experiments in Figure 9 involving measurements of intrinsic vicosity and helixdependent circular dichroism are typical of those obtained with most proteins In the case of nuclease not only is the transition from native to denatured molecule during transfer from solution at pH 32 to 67 very abrupt but the process of renaturation occurs over a very short time period I will not discuss these stop ow kinetic experiments 50 in detail in this lecture In brief the process can be shown to take place in at least two phases an initial rapid nucleation and folding with a halftime of about 50 milliseconds and a second somewhat slower transformation with a half time of about 200 milliseconds The first phase is essentially temperature independent and therefore possibly entropically driven and the second temperature dependent NUCLEATION 0F FOLDING A chain of 149 amino acid residues with two rotatable bonds per residue each bond probably having 2 or 3 permissible or favored orientations would be able to assume on the order of 4149 to 91quot different conformations in solution The extreme rapidity of the refolding makes it essential that the process take place along a limited number of pathways even when the statistics are severely restricted by the kinds ofstereochemical ground rules that are implicit in a socalled Ramachandran plot It becomes necessary to postulate the exis tence of a limited number of allowable initiating events in the folding process Such events generally referred to as nucleations are most likely to occur in parts of the polypeptide chain that can participate in conformational equilibria between random and cooperatively stabilized arrangements The likelihood of a requirement for cooperative stabilization is high because in aqueous solution ionic or hydrogen bonded interactions would not be expected to com pete effectively with interactions with solvent molecules and anything less than a sizeable nucleus of interacting amino acid side chains would probably have a very short lifetime Furthermore it is important to stress that the 113 NATIVE NH239 39 FORMAT l NATIVE moo FORMAT II 1 NATIVE NATIVE NH2 FORMAT FORMAT COOH I II RANDOM RANDOM quotquot2 I H I COOH Fig 10 How protein chains might fold see the text for a discussion of this fairly reasonable but subjective proposal amino acid sequences of polypeptide chains designed to be the fabric of protein molecules only make functional sense when they are in the three dimensional arrangement that characterizes them in the native protein structure It seems rea sonable to suggest that portions of a protein chain that can serve as nucleation sites for folding will be those that can icker in and out of the conformation that they occupy in the nal protein and that they will form a relatively rigid structure stabilized by a set of cooperative interactions These nucleation cen ters in what we have termed their native format Figure 10 might be expected tO involve such potentially selfdependent substructures as helices pleated sheets or betabends Unfortunately the methods that depend upon hydrodynamic or spectral measurements are not able to detect the presence of these infrequent and transient nucleations To detect the postulated flickering equilibria and to determine their probable lifetimes in solution requires indirect methods that will record the brief appearance of individual native format molecules in the population under study One such method recently used in our laboratory in a study of the folding of staphylococcal nuclease and its fragments employs speci c antibodies against restricted portions of the amino acid sequence 51 Figure 8 depicts the three dimensional pattern assumed by staphylococcal nuclease in solution Major features involving organized structure are the threestranded antiparallel pleated sheet approximately located between resi dues 12 and 35 and the three alphahelical regions between residues 54 67 99 106 and 121 434 Antibodies against speci c regions of the nuclease molecule were prepared by immunization of goats with either polypeptide 114 CONFORMATIONAL SPECIHCITY 0F INACTIVATING ANTIBODIES 3950 l f I l NoAntibocy 00260 Anti99l49r 120 Fig 1139 1932319319 Inhibition of nuclease activity by anti 99 149 and lack of inhibition oNgZe by anti99 149 made against the LOO 39 i 7 l 1 1 O 1 2 3 4 e tide 99 149 resumabl in a p p p y TIME minutes random conformation 51 fragments of the enzyme or by injection of the intact native protein with subsequent fractionation of the resulting antibody population on af nity chromatography columns consisting of agarose bearing the covalently attached peptide fragment of interest 51 52 In the former manner there was prepared for example an antibody directed against the polypeptide residues 99 149 known to exist in solution as a random chain without the extensive helicity that characterizes this portion of the nuclease chain when present as part of the intact enzyme Such an antibody preparation is referred to as anti99 149 the subscript indicating the disordered state of the antigen Nhen on the other hand a fraction of antinative nuclease serum isolated 39 on an agarosenuclease column was further fractionated on agarose99 149 a fraction was obtained which was speci c for the sequence 99 149 but presumably only when this bit of sequence occupied the native format This latter conclusion is based on the observation that the latter fraction termed anti99 149 the subscript n referring to the native format exhibited a strong inhibitory effect on the enzymic activity of nuclease whereas anti99 126r or anti99 l49r were devoid of such an effect see Figure 11 This conclusion was further supported by the observation that the conformation stabilizing ligands pdTp and calcium ions showed a market inhibitory effect on the precipitability of nuclease by antil l26r and anti99 149r but had little effect if any on such precipitability by antil l49n 51 This nding reinforced the idea that many of the antigenic determinants recognized by the antifragment antibodies are present only in the unfolded or non native conformation of nuclease Analysis of the reaction between anti 99 149 and nuclease could be shown by measurements of changes in the 115 INHIBITION OF ANTIBODYINDUCED INACTIVATION F39 I F I o No Antibody IL I000 Antibody 489 Nose99l49 a O 500 L E E I I Anthody Alone 8 s 2 50 E Antibody I2pq Naset99rl49 gt 5 1 I25 L l I J l 0 l2 48 24 36 TIME seconds Fig 12 Semilogarithmic plot of activity vs time for assays of 005 ug of nuclease in the presence of O 0 no antibody 6 ug of anti99 126n EI EI 6 ng of anti 99 126n plus 12 ug of 99149 and A A 6 ug of anti99 126n plus 48 ug of 99 149 The dotted line represents onehalf of the initial activity kinetics of inhibition of enzyme activity Fig 12 to be extremely rapid km on the other hand is negligibly small The system may be described by two simultaneous equilibria the rst concerned with the ickering of fragment 99 149 which we shall term P from random to native format and the second with the association of anti99 l49n which we shall term simply Ab with fragment P in its native format ie Pn P PrPn Kconf Aan Ab P AbP I quot 7 n Kaecoc Ab Pu Aan K W Kassoc39Ab39Pr Two equilibria involving fragment 99 149 of nuclease with the corre sponding equilibriumconstant expressions The amount of unbound antibody in the second equilibrium may be estimated from measurements of the kinetics of inactivation of the digestion of denatured 116 Table 1 Studies of the equilibrium between the peptide fragment 99 149 in its random form 99 149 and in the form this fragment assumes in the native structure of nuclease 99 149n Abbreviations P fragment 99 149 Ab antibody 9 Aan K quotquot KAb1PT1 Abltotal sites PT Ablrree sites Ablbound sites Kconx 0f PT M M ts M AM as p 0076 0 18 0076 0 0076 065 20 0068 00080 220 X 10 0022 0076 20 24 0057 0019 202 X 10 0020 390076 26 27 0051 0025 229 X 10 0023 0076 78 35 0039 0037 147 X 10 0015 0076 65 33 0042 0034 151 X10 0015 Kconf ZOiOA39 X 10 DNA substrate by a standard amount of nuclease added to the preincubated mixture of fragment 99 149 and anti99 l49n Making the assumption that the af nity of anti99 149n for 99 149n in its folded P form is the same as that determined for this antigenic determinant in native nuclease the value for the term Kconf may be calculated from measureablc para meters A series of typical values shown in Table 1 suggests that approxi mately 002 of quotragment 99 149 exists in the native format at any mo ment Such a value although low is very large relative to the likelihood of a peptide fragment of a protein being found in its native format on the basis of chance alone Empirical considerations of the large amount of data now available on correlations between sequence and three dimensional structure 54 together with an increasing sophistication in the theoretical treatment of the energetics of polypeptide chain folding 55 are beginning to make more realistic the idea of the a priori prediction of protein conformation It is certain that major advances in the understanding of cellular organization and of the causes and control of abnormalities in such organization will occur when we can pre dict in advance the three dimensional phenotypic consequences of a genetic message BIBLIOGRAPHY Anson M L Advan Protein Chem 2 361 1945 Lumry R and Eyring H j Phys Chem 58 110 1954 Hirs C H W Moore S and Stein W H J Biol Chem 235 633 1960 Pottsj T Berger A CookeJ and An nsen C B Biol Chem 237 1851 1962 Smyth D G Stein W H and Moore 3 J Biol Chem 238 227 I963 Sela M White F H and An nsen C 8 Science 125 691 1957 Gutte B and Merri eld R Bj Biol Chem 246 1922 1971 Hirschmann R Nutt R F Veber D F Vitali R A Varga S L Jacob T A Holly F W and Denkewalter R G Am Chem Soc 91 507 1969 White F H Jr and An nsen C 13 Ann N Y Acad Sci 8 515 1959 ED 117 O 12 n 02 14 I I lvl h I leNOVU 21 22 23 24 25 26 27 28 29 37a 38 39 40 41 42 43 44 118 White F H JrJ Biol Chem 236 1353 1961 An nsen C B Haber E Sela M and White F HJr Proc Nat Acad Sci U S 4713091961 Haber E and An nsen C B Biol Chem 237 1839 1962 Epstein C J Goldberger R F and An nsen C B Cold Spring Harbor Symp Quant Biol 23 439 1963 Dintzis H 1 Proc Nat Acad Sci U S 47 247 1961 Can eld R E and An nsen C B Biochemistry 2 1073 1963 Goldberger R F Epstein C J and An nsen C B J Biol Chem 233 628 1963 Venetianer P and Straub F B Biochim Biophys Acta 67 166 1963 Fuchs S DeLorenzo F and An nsen C B J Biol Chem 242 398 1967 DeLorenzo F Goldberger R F Steers B Givol D and Anfinsen C B J Biol Chem 241 1562 1966 Kato I and An nsen C B J Biol Chem 244 5849 1969 Givol D DeLorenzo F Goldberger R F and An nsen C B Proc Nat Acad Sci U S 53 766 1965 Steiner D F Trans N Y Acad Sci Ser II 30 60 1967 An nsen C B Developmental Biology Supplement 2 l 1968Academic Press Inc USA 1968 Richards F M Proc Nat Acad Sci U S 44 162 1958 Taniuchi H An nsen C B and Sodja A Proc Nat Acad Sci U S 56 1235 1967 Kate 1 and Tominaga N FEBS Letters 10 313 1970 W ycko H W Tsernoglou D Hanson A N Knox J R Lee B and Richards F M J Biol Chem 245 305 1970 Hofmann K Finn F M Linetti 1 Montibeller J and Zanetti C J Amer Chem Soc 86 3633 1966 Scoflone B Rocchi R Marchiori F Moroder L Marzotto A and Tamburro A M J Amer Chem Soc 89 5450 1967 PottsJ TJr Young D M and An nsen C B J Biol Chem 236 2593 1963 Taniuchi H and An nsen C B J Biol Chem 246 2291 1971 ConeJ L Cusumano C L Taniuchi H and An nsen C B J Biol Chem 246 3103 1971 Bohnert J L and Taniuchi HJ Biol Chem 247 4557 1972 Schechter A N Moravek L and An nsen C BJ Biol Chem 244 4981 1969 Taniuchi H MoraVCk L and An nsen C B Biol Chem 244 4600 1969 Taniuchi H and An nsen C B J Biol Chem 244 3864 1969 Taniuchi H Davies D and An nsen C B Biol Chem 247 3362 1972 Arnone A Bier C J Cotton F A HazenE EJr Richardson D C Richardson J S and Yonath A J Biol Chem 246 2302 1971 Sanchez G R Chaiken I M and An nsen C B J Biol Chem 1973 in press Ontjes D and An nscn C B J Biol Chem 244 6316 1969 Merri eld R B Science 150 178 1965 Chaiken I MJ Biol Chem 246 2948 1971 Chaiken I M and An nsen C B J Biol Chem 246 2285 1971 Parikh 1 Corley L and An nsen C B J Biol Chem 246 7392 1971 Fitch W M and hiargoliash B Evolutionary Biology 4 67 1970 Edited by Th Dobzhansky M K Hecht and W C Steere AppletonCenturyCrofts New York 1970 CookeJ F An nsen C B and Sela M J Biol Chem 238 2034 1963 Perutz 1 F Kendrew J C and Watson H C J Mol Biol 13 669 1965 Epstein C J Nature 210 25 1966 D Sachs H Taniuchi A N Schechter and A Eastlake unpublished work Epstein H F Schechter A N and Cohen J S Proc Nat Acad Sci U S 68 2042 1971 50 52 53 54 55 Epstein H F Schechter A N Chen R F and An nsen C B J Mol Biol 60 499 1971 Sachs D H Schechter A N Eastlake A and An nscn C B Proc Nat Acad Sci U S 69 3790 1972 Sachs D H Schechter A N Eastlake A and An nsen C B J Immunol 109 1300 1972 Sachs D H Schcchter A N Eastlake A and An nsen C B Biochemistry 11 4268 1972 An nsen C B and Scheraga H Adv in Prot Chem Vol 27 1973 in preparation H A Scheraga Chemical Reviews 71 195 1971 119 Bioinformatics 2 Molecular Modeling Intro to protein and amino acid structure Purpose of the course Learn the theory and application of modeling methods Understand the successes failures limitations and shortcomings of modeling algorithms Recognize a good versus a bad model and know how to fix errors Draw biological conclusions from molecular models Design experiments based on modeling Problems addressable by molecular modeling What does B look like Does A bind to B What molecules bind to B How would the structure of B change if What molecules bind to B but not to C Sequence determines structure Structure determines function Sequence database force fields protein structure database experimental data function What a protein looks like MSAI QASWP SGTEC IAKYNFHGTAEQDLPFC KGDVL39I39 I VAV39I39KDPNWYKAKNKVGREGI I PA NYVQKRE GV Evolutionary models ie alignment consider a protein to be a string of characters similar to DNA Why Because evolution happens at the DNA level Because of simplicity Because in theory this is all the information you need to make a protein What a protein really looks like Proteins fold into compact solvent excluded globules mostly Most atoms are highly ordered some are exible disordered Proteins can be completely solvated or they can be embedded in a membrane Proteins may contain ligands or covalent modifications or if you prefer backbone trace ballandStiCk 00390er from N39 cartoonshowing m0deLSh0Wng term39nus to 039 secondary structure covalent bonds terminus the backbone atoms partial double bond 0 R O H I cH H I CH xN C ICH H g R1 R3 Backbone atom names are N Calpha and C or NCAC O is also considered a backbone atom though not in the covalent path of the chain All atoms in all amino acids have conventional names Calpha is chiral except for glycine When an Lamino acid is drawn with the alphaH forward and the Rgroup in the back the letters read clockwise spell CORN The Corn Crib is a good way to remember which side the Rgroup ie sidechain goes on Hydrogen bonds Must be electronegative atoms N or O I 00 Acceptor must have a free ar electron pair 6 Hydrogen bonds are a linear arrangement of three atoms two electronegative O or N and an electropositive hydrogen in the middle The atoms are closer together than expected for a non bonded interaction but not close enough for a covalent interaction Typical backbone Hbond O i C These atoms are ICH N arranged in a line 4 approximately in H H N C C H Hbonds between backbone atoms define secondary structure Alpha helix Righthanded helix Hbond is from the oxygen at ito the nitrogen at i4 0L helices have an overall dipole because the Hbonds are all in the same direction k msldue I Helices are not as cylindrical as the cartoon suggests beta sheets o H o M Nl M C chamdnecnan C N C C N C Igt l l H H H o o LH 0quot CH Nl CH C chamdnecnan C C C N C Igt l H H o H o A H A H C U N C U 39L chamdnecnan C N C lt339 l l l Two types of pairings parallel and antiparallel Rgroups lie above and below the plane of the sheet Antiparallel beta sheet In M5 l N lm c M 1 J alt gtleo Mug x 0 z olll C W H W 0 A Looked at on end the peptide bonds form a zigzag Parallel sheets look the same Sidechains ACDEFGHI KLMNPQRS39I39VWY There are 20 natural amino acids meaning those recognized by tRNAsynthetases some say seleno cysteine is also a natural amino acid I don t know These chemical nature of these 20 sidechains account for the folding and function of proteins Memorize them Cartesian coordinates have an implied reference frame X Y Z ATOM 1 N VAL 1 0616 1613 20826 100 6881 8DFR 152 ATOM 2 CA VAL 1 0737 1197 19414 100 6536 8DFR 153 ATOM 3 C VAL 1 0597 2511 18644 100 6265 8DFR 154 ATOM 4 O VAL 1 1207 3526 18989 100 6513 8DFR 155 ATOM 5 CB VAL 1 1994 0410 19048 100 6755 8DFR 156 ATOM 6 CG1 VAL 1 2452 0572 20132 100 6801 8DFR 157 ATOM 7 CG2 VAL 1 3154 1279 18586 100 6694 8DFR 158 O O A 2 angstroms 1039 m 1A 2 01 nm Z O Coordinates are relative to a Y 4 Xray crystallography solves structures in Cartesian coordinates reference frame Internal coordinates are independent of reference frame Internal coordinates model the covalent structure of the molecule 0Components C 15 4 A C 0bond lengths 0bond angles C C torsion dihedral angles planar groups 1090 x 39palI39WiSC distances C C20 CO NMR structures are solved in Internal Coordinates Short peptides can be expressed as a set of torsion angles 4 1i 00 X1 X2 4 122 882 180000 787256 762962 5 116210 180000 49292 intemal coor inates of a 5residue peptide Cartesian coordinates same peptide Type IB hairpin If there are sufficient constraints then internals coordinates may be converted to Cartesian coordinates Measuring a torsion angle by eye 900 900 Use the righthand rule Positive torsion is in the direction of the ngers on the thumbside of the rotation axis In class exercise Make a righthanded 60 torsion angle using a paperclip Compare with your neighbor by superposition Are they superimposable If your paperclip was the alanine backbone where would C3 be CO Cl Q Is 6O more or less stable than 60 for ALA What about for GLY Converting internal coordinates t Cartesian These two molecules have identical torsion angles and only slight di erences in backbone bond lengths and bond angles I w 77 Errors accumulate Before Cartesian cartography Internal coordinates versus global coordinates A Vespucci s map of the world made before J Harrison s clock 1735 using internal coordinates and after using global coordinates In class exercise viewing and drawing amino acids Use Rasmol to open a protein file Select each of the amino acids one at a time draw them accurately and label rotatable bonds Hbond donors and acceptors chiral centers Do all 20 amino acids Rank AAs from pol to nonpolarMake educated guesses Rasmol How to select one amino acid To open file XXXXpdb in Rasmol type rasmol XXXX pdb In RasMol rasmol prompt is gt gt restrict ala or any amino acid 3 letter code Click on one of the displayed residues gt restrict nnn where nnn is the residue number To return to whole protein gt select protein gt wireframe 100 or select Displaygtsticks Drawing amino acids Draw using approximately correct bond angles Draw only polar hydrogens Hbond donors Draw a at each Hbond acceptor Circle each chiral center Draw a rotating arrow around each rotatable bond Rank the amino acids from polar to nonpolar and small to large Drawing tryptophane CEO Bioinformatics 2 lecture 8 Energy minimization Molecular dynamics Monte Carlo Simulated annealing Choose term projects Hypothesis driven modeling Modeling is all about experimental design Ask a question get an answer The value of the answer depends on how well designed the question An example of a molecular modeling hypothesis Ligand A binds to protein X How to test this hypothesis Build a model for protein X Try to fit ligand A into a site on X Energy minimize Calculate the energy Simulate binding Calculate the on rate Calculate the predicted Kd Etc Validating the method Modeling quotexperimentsquot can answer questions But how good is the answer To validate the method we quotpredictquot whether some hypothesis is true or false We do this several times and we ask how many times we are right The accuracy of a method in validation studies is the quotconfidencequot Simulation methods Step size Time is continuous but simulations are discrete The smaller the step size time unit the more it approximates the true dynamics But it takes more compute power to calculate o large step size Atoms pass through each other smallstep size m atom size T l 1 rue mo ecu at Atoms collide dynamits is continuous Energy minimization finds the local minimum in an quotenergy landscapequot Q What is an energy landscape A The directions in the landscape are the variable parameters The height is the energy Q What are the variable parameters A This depends on the experimental design It could be all atom positions high dimension or just one distance Example energy landscape for an ion in an electric eld 50eV r 2 membrane 0 membrane OeV The variable parameters are xyz for the calcium ion The energy landscape is calculated by placing the ion at every position and calculating E using a Force Field such as AMBER lUI Monte Carlo s11nulatlon Also called the Metropolis algorithm Given an energy landscape or a force eld which is simply a way to calculate the energy landscape and a Temperature 1 Make a randomchange move 2 Calmlate the newenergy and the old energy 3 If the new energy is lower accept the move 4 If the new energy is higher calculam P eiAET 5 Pick a random number between 0 and l If the number is less than P keep the move otherwise reject it 6 Repea Note AB is new energy minus old energy An energy landscape for an ion 2D landscape for an amino acid I o N amp o 40 v 139 9 u r wquot 3 1239 390 391 20 III a 5 7711 my I t n we 0 A MC simulation will sample the energy landscape In class exercise Monte Carlo simulation In class exercise Monte Carlo simulation Work in groups of three Go to random0rg and generate apage of random numbers in the range 1 to 4 In a separate window generate a page of random numbers in the range 1 to 100 Get apiece of graph paper with a diagonal line and a box that extends 3 squares above the line Each square in the graph paper has an energy proportional to the Ywordinate Any point off of the graph below the line or on the banier has an infinite energy Choose a T 05 10 20 40 each group chooses a different T Calculate the acceptance cutoff x 100explT Student 1 Start with the atom pencil in the middle of the graph andmove according to instructions from Student 2 unless Student 3 says quotrejectquot Student2 Select a random direction using random number range 14 lnorth 2east 3south 4west Student3 If the energy goes up lnorth choose arandomnurnber 1100 If it is greater than 6 instruct Student 1 to reject go back otherwise quotacceptquot Probability of crossing a barrier is a function of temperature but not step size Which is more probable n n Probability of acceptingtl step 1 square P e39Am Probabilitny accepting 3 steps 1 square eachsz e E T e39AEr e39AET Probability of accepting 1 step 3 squares P3 e 3AEr Molecular Dynamics quotMDquot attempts to simulate the motions of atoms subject to forces using Newton39s Second Law Reminder of the quotplayersquot involved bond stretching bond bending torsion van der Waals Coulombic Other forces may also be used H bonds mean field Newton39s Second Law F ma Force equals mass times acceleration Force units N 2 kg ms2 Two atoms attract or repel causing motion along the vector joining the atoms Summing forces summing vectors Two atoms attract or repel causing motion along the 0 vector joining the atoms red The force on the center atom is the sum of the three vectors blue The actual amount the atom moves depends on its mass m and the time step 113 at g n mmh mm K3 xzi i a1 i 511 9 mil aims we a I 8 mm Kimmiiwn gv van der Waals VDW energy amp force E g i i E x12 x6 r12 r6 FZdEdX28 12T 6 7 F x x k r r1 r2 where r1 and r2 are the VDW radii Typical VDW radii Oxygen 152A Nitrogen 155A Carbon 170A Hydrogen 12A Sulfur 1on For example an oxygen and nitrogen have zero VDW energy at a distance of307A ExercisczCalculatc the VDW force vector Space between black 1inesl 0 rblackrblue redr 10 0 dallorlAfsz 00 fs Measure the distances Calculate the force Peptides sampling an energy landscape using MD movies not shown Interpreting a MD trajectory w 4 g A simulation is sampled at regular intervals The samples may be clustered Each cluster represents a small region of the energy landscape The larger the cluster the lower the free energy at that point in the landscape The number of times the simulation crossed from one cluster to another is a measure of the energy barrier 10 The special case of an quotequilibriumquot simulation 5 A t t A A t 5 r A x t t 4 A 3 A A A A A A A 2 A A A 1 AA D HAFQ Lli tea RPIARML ms m 2 4 s n 5 1012141618 5 r 20 5 IlIIlCb39 gt 39 5 39 4 4 AA 3 A 3 A 2 2 1 1 D KKLQKLID lt69 3 NAHQELE lt49 5 m 15 an o 5 10 15 5 cluster gammA a m m o 20 25 so u 520 25 30024651012141618 tyt 5 When is a simulation at quotequilibriumquot When the second half of the simulation looks identical to the first half What happens if we sample this landscape at different temperatures a r l t lw quotfl fggl m s s Energy Loan minimum 7 mismlded Global minimum Heat attens the landscape lowers barriers iquot MM Energ SmeMMw memmmm w Localminimum mismlded Global minimum Simulated Annealing cooling SWWMMWWMWMWWMW wmmmmMMwmmmm MMWWWMWMMMMMmmmm MMWMMMWMWMMmmW MMm mmmmmmwwmm smmmmmmmmme mMMMm Types of MD simulations Constrained rigidbody torsion space molten zone fixed trajectory etc invent your own Unconstrained one temperature equilibrium simulated annealing temperature exchange Temperature exchange parallel simulations as an alternative to simulated annealing T 273 283 293 303 313 323 333 Heym El El E IE E El y ere ye pe pe 9 59 4 4 lt 2 lt a as a a a a a Each computer runs a simulation periodically queries its neighbor about the energy of its simulation If the energy of the lower T simulation is higher than the energy of the higher T simulation the simulations are swapped The higher energy simulation goes to the higher temperature the lower energy one goes to the lower temperature if ElawT gt EhighT then trade jobs Monte Carlo option y mwj 111qu Effect pf temperature exchange is to atten the barriers but not change the minima Normal one temperature landscape High T high energy sampled as if atter N Low T low energy unchanged Periodic boxes are used to create a continuous system with no edges pictures cannot be printed Choose term projects Form groups Choose a topic that interests you or Choose one from the list NOTE the deadlines for completing different stages Bioinformaties 2 lecture 10 Ramachandran angles Sidechain Chi angles Rotamers Dead End Elimination Theorem Backbone angles phi and psi psi phi In 1968 GNRamachandran built a model like this alaralarala 4 4 L two freely rotamble packbone angles phi and psi Ammratom dismnces atwe too were notpermissible What angles were permissible Ramachandran Plot 1300 E 94 Best 5 396 i This lotisfor E P w all aminoacids 5 SI Allowed P m exeeptproand a I 3 2 g E 180 o o 0 a 80 phi 180 0 D The regions labeled alpha and beta represent valleys of stability surrounde by a high energy plateau Values 0 phi are limited primarily to the range between 60 degrees and 7150 degrees For psi the range is limited to regions centered about 750 degrees and 120 degrees to view all residues in your protein plotted phi Versus pSl Backbone angle statistics w 20 Colors represent the frequency in bins of 60 10 X10 of phipsi 39 angles EB and H are most common L l 0 and x are found most often in Gly 60 Allowed regions are islands Are bonds L2U really quotfreely rotatablequot L80 Sideehain angle space rotamers A random sampling of Phenylalanine sidechains When superimposed fall into three classes rotamers This simpli es the problem of sidechaln modeling All we have to do is select the right rotamers and we39re close to the right answer Sidechain modeling Given a backbone conformation and the sequence can we predict the sidechain conformations i Energy calculations are sensitive to small changes So the wrong sidechain conformation will give the wrong energy Goal of sidechain modeling quotr quot f leen the sequence and X l A only the backbone atom 439 J Egan5 0 Y 1 coord1nates accurately Azb f mgdefll the posztzons OfIhe f J f VI S 66 am 1 a r x r 39 Cwquot N g w 1quot2 l ne lines true structure WJ v think lines sidechain predictions M x 10 using the method of Desmet et a1 xx Desmet et al Nature V356 pp339 342 1992 Steric interactions determine allowed rotamers 3 bond or 1 4 interactions de ne the preferred angles but these may differ greatly in energy depending on the atom groups involved m quottn quotpquot quot60 gaUChe 180 antitrans 60 gauche EXCIClSCI measure a rotamer select these atoms Create a tripeptide TWV using Protein Builder NOW create quotmetersquot for the chil and chi2 angles Dihedral from right side menu Select N CA CB CG 1 2 3 4 Select CA CB CG CDl 2 3 4 5 Trp sidechain is hard to rotate W sidechain is shown here lying p90 60 over Thr p90 60 t105 180 t90 180 m0 65 m95 65 Rendering the molecule as space lling Render gtSpace lling allows you to better Visualize the contacts Rotamers of W 90 90 105 90 5 95 Rotamer Libraries Rotamer libraries have been compiled by clustering the sidechains of each amino acid over the Whole database Each cluster is a representative conformation or rotamer and is represented in the library by the best sidechain angles chi angles the quotcentroidquot angles for that cluster Two commonly used rotamer libraries gtkJane amp David Richardson httpkinemagebiochemdukeedudatabasesrotamerphp Roland Dunbrack httpdunbrackfcccedubbdepindexphp rotamers of W on the previous page are from the Richardson library Exploring Rotarners using MOE L 39rp quot L 7 3 i V quot l r I r V e I D u gt 9 The env1ronrnent o a buried leuc1ne in 1A07 The interior of a protein is tightly packed Bad packing produces voids or collisions Exercise Rotamer explorer Open 1A07 from the Protein Database Edit gtAdd hydrogens Compute gtpartial charges Select an amino acid in the interior SE Edit gtRotamer Explorer get from MOE Select rotamer With the lowest energy Are the current chi angles close to the angles of a rotamer HOW close Is it the lowest energy rotamer Select Mutate The coordinates are permanently changed Exercise Rotamer explorer Select an amino acid on the surface SE Edit gtRotamer Explorer get from MOE Are the current angles close to a rotamer Is it the lowest energy rotamer What interactions does the best rotamer have Mutate Then select a nearby sidechain and do the same thing How many times would you have to mutate before you could be sure of having the lowest energy rotamer set Dead end elimination theorem There is a global minimum energy conformation GMEC Where each residue has a unique rotamer In other words GMEC is the set 0fr0tamers that has the lowest energy Energy is a pairwise thing Total energy can be broken down into pairwise interactions Each atom is either xed backbone or movable sidechain xed xed xed movable movable movable E is a constant E template E depends on rotamer but independent of other rotamers E depends on rotamer and depends on surrounding rotamers Theoretical complexity of sidechain modeling The Global Minimum Energy Con guration GMEC is one unique set of rotamers How many possible sets of rotamers are there Where 111 is the number of rotamers for residue 1 and so on Estimated complexity for a protein of 100 residue With an average of 5 rotamers per position 5100 2 81069 DEE reduces the complexity of the problem from 5L to approximately 5L 2 I Dead end elimination theorem Each residue is numbered 139 or j and each residue has a set of rotamers r s or I So the notation ir means quotchoose rotamer r for position iquot The total energy is the sum of the three components W xed xed movable movable template Where r and s are any choice of rotamers Eglobal Z EGMEC for any choice of rotamers I Dead end elimination theorem I 01f ig is in the GMEC and it is not then we can separate the terms that contain ig or it and re write the inequality is less than EnotGMEC Etemplate szit jg ijkEUgkg g Jg Canceling all terms in black we get Eir zj Eirjs gt Eig zj Eigjs So if we nd two rotamers ir and it and EU Zj minS Eirjs gt Eil Zj maXS Eiljs Then ir cannot possibly be in the GMEC Dead end elimination theorem Eir Zj minS Eirjs gt Eil Zj maXS Eiljs This can be translated into plain English as follows If the quotworst case scenarioquot for rotamer t is better than the quotbest case scenarioquot for rotamer r then you can eliminate r Exercise Dead End Elimination Using the DEE worksheet 1 Find a rotamer that satis es the DEE theorem 2 Eliminate it 3 Repeat until each residue has only one rotamer What is the nal GMEC energy DEE exercise Three sidechains Each with three rotamers Therefore there are 3X3X327 ways to arrange the sidechains 0 Each rotamer has an energy Er which is the non bonded energy between sidechain and template 0 Each pair of rotamers has an interaction energy Er1r2 which is the non bonded energy between sidechains DEE exercise 00010 0500 0 5 C C a 5 2 b 3 1 r E DEE exercise instructions If the best case scenario for r1 is worse than the worst case scenario for r2 you can eliminate r1 l The best worst energies are found using the worksheet Add Er1 to the sum of the lowest highest Er1r2 that have not been previously eliminated 2 There are 9 possible DEE comparisons to make la versus lb la versus 1c lb versus 1c 2a versus 2b etc etc For each comparison nd the minimum and maximum energy choices of the other rotamers If the maximum energy of 391 is less than the minimum energy of r2 eliminate r2 3 Scratch out the eliminated rotamer and repeat until one rotamer per position remains Sequence design using DEE Did you notice that Rotamer Explorer in MOE allows you to choose a different sidechain Bioinformaties 2 lecture 9 Ramachandran angles Sidechain Chi angles Rotamers Dead End Elimination Theorem Backbone angles phi and psi psi atom discanees Lhatwere too close were notpermissible What angles were permissible Ramachandran Plot This plotis for 180 Best w Allowed 1 5 m 180 all ammo acids and 180 The regions labeled alpha and beta represent Valleys of stability surrounde 39 ited prirnarily to the range by a high energy plateau Values 0 phi phi are lirn 180 Use SEQMeasuregtRamadismdrsm Plot between 60 degrees and 7150 degrees For psi the range is limited to regions centered about 750 degrees and 120 degrees to View all residues in your protein plotted phi Versus psi 20 39 6U 60 L20 L80 1 1 L80 L20 Backbone angle statistics Colors represent the frequency in bins of 10 X10 of phipsi angles EB and H are most common L l and x are found most often in Gly Allowed regions are islands Are bonds really quotfreely rotatablequot i Sideehain angle space rotamers A random sampling of Phenylalanine sidechains When superimposed fall into three classes rotamers This simpli es the problem of sidechaln modeling All we have to do is select the right rotamers and we39re close to the right answer Sidechain modeling Given a backbone conformation and the sequence can we predict the sidechain conformations i Energy calculations are sensitive to small changes So the wrong sidechain conformation will give the wrong energy Goal of sidechain modeling quotr quot f leen the sequence and X l A only the backbone atom 439 J Egan5 0 Y 1 coord1nates accurately Azb f mgdefll the posztzons OfIhe f J f VI S 66 am 1 a r x r 39 Cwquot N g w 1quot2 l ne lines true structure WJ v think lines sidechain predictions M x 10 using the method of Desmet et a1 xx Desmet et al Nature V356 pp339 342 1992 Steric interactions determine allowed rotamers 3 bond or 1 4 interactions de ne the preferred angles but these may differ greatly in energy depending on the atom groups involved m quottn quotpquot quot60 gaUChe 180 antitrans 60 gauche EXCIClSCI measure a rotamer select these atoms Create a tripeptide TWV using Protein Builder NOW create quotmetersquot for the chil and chi2 angles Dihedral from right side menu Select N CA CB CG 1 2 3 4 Select CA CB CG CDl 2 3 4 5 Trp sidechain is hard to rotate Rotamers of W p90 60 90 p90 60 90 t 105 180 105 t90 180 90 m0 65 5 m95 65 95 W sidechain is shown here lying over Thr Rendering the molecule as space lling Render gtSpace lling allows you to better Visualize the contacts Rotamer Libraries Rotamer libraries have been compiled by clustering the sidechains of each amino acid over the Whole database Each cluster is a representative conformation or rotamer and is represented in the library by the best sidechain angles chi angles the quotcentroidquot angles for that cluster Two commonly used rotamer libraries gtkJane amp David Richardson httpkinemagebiochemdukeedudatabasesrotamerphp Roland Dunbrack httpdunbrackfcccedubbdepindexphp rotamers of W on the previous page are from the Richardson library Exploring Rotarners using MOE L 39rp quot L 7 3 i V quot l r I r V e I D u gt 9 The env1ronrnent o a buried leuc1ne in 1A07 The interior of a protein is tightly packed Bad packing produces voids or collisions Exercise Rotamer explorer Open 1A07 from the Protein Database Edit gtAdd hydrogens Compute gtpartial charges Select an amino acid in the interior SE Edit gtRotamer Explorer get from MOE Select rotamer With the lowest energy Are the current chi angles close to the angles of a rotamer HOW close Is it the lowest energy rotamer Select Mutate The coordinates are permanently changed Exercise Rotamer explorer Select an amino acid on the surface SE Edit gtRotamer Explorer get from MOE Are the current angles close to a rotamer Is it the lowest energy rotamer What interactions does the best rotamer have Mutate Then select a nearby sidechain and do the same thing How many times would you have to mutate before you could be sure of having the lowest energy rotamer set Dead end elimination theorem There is a global minimum energy conformation GMEC Where each residue has a unique rotamer In other words GMEC is the set 0fr0tamers that has the lowest energy Energy is a pairwise thing Total energy can be broken down into pairwise interactions Each atom is either xed backbone or movable sidechain xed xed xed movable movable movable E is a constant E template E depends on rotamer but independent of other rotamers E depends on rotamer and depends on surrounding rotamers Theoretical complexity of sidechain modeling The Global Minimum Energy Con guration GMEC is one unique set of rotamers How many possible sets of rotamers are there Where 111 is the number of rotamers for residue 1 and so on Estimated complexity for a protein of 100 residue With an average of 5 rotamers per position 5100 2 81069 DEE reduces the complexity of the problem from 5L to approximately 5L 2 I Dead end elimination theorem Each residue is numbered 139 or j and each residue has a set of rotamers r s or I So the notation ir means quotchoose rotamer r for position iquot The total energy is the sum of the three components W xed xed movable movable template Where r and s are any choice of rotamers Eglobal Z EGMEC for any choice of rotamers I Dead end elimination theorem I 01f ig is in the GMEC and it is not then we can separate the terms that contain ig or it and re write the inequality is less than EnotGMEC Etemplate szit jg ijkEUgkg g Jg Canceling all terms in black we get Eir zj Eirjs gt Eig zj Eigjs So if we nd two rotamers ir and it and EU Zj minS Eirjs gt Eil Zj maXS Eiljs Then ir cannot possibly be in the GMEC Dead end elimination theorem Eir Zj minS Eirjs gt Eil Zj maXS Eiljs This can be translated into plain English as follows If the quotworst case scenarioquot for rotamer t is better than the quotbest case scenarioquot for rotamer r then you can eliminate r Exercise Dead End Elimination Using the DEE worksheet 1 Find a rotamer that satis es the DEE theorem 2 Eliminate it 3 Repeat until each residue has only one rotamer What is the nal GMEC energy DEE exercise Three sidechains Each with three rotamers Therefore there are 3X3X327 ways to arrange the sidechains 0 Each rotamer has an energy Er which is the non bonded energy between sidechain and template 0 Each pair of rotamers has an interaction energy Er1r2 which is the non bonded energy between sidechains DEE exercise 00012 0500 0 5 C C a 5 2 b 3 1 r E DEE exercise instructions If the best case scenario for r1 is worse than the worst case scenario for r2 you can eliminate r1 l The best worst energies are found using the worksheet Add Er1 to the sum of the lowest highest Er1r2 that have not been previously eliminated 2 There are 9 possible DEE comparisons to make la versus lb la versus 1c lb versus 1c 2a versus 2b etc etc For each comparison nd the minimum and maximum energy choices of the other rotamers If the maximum energy of 391 is less than the minimum energy of r2 eliminate r2 3 Scratch out the eliminated rotamer and repeat until one rotamer per position remains Sequence design using DEE Did you notice that Rotamer Explorer in MOE allows you to choose a different sidechain Building a small molecule Secondary structure prediction Building aspartame Building aspartame Starting with an empty Moe Window EditgtBuild gtMolecule or Builder Create backbone using atoms buttons NCCNCCOC Notice the chain is made in the fully reduced state Building aspartame Select carbonyl groups Click double bonds Add sidechains Select the b ack H on the rst 39SeleCt thefmm H on the alphacarbon Click C then second alpha39carbon Click C then benzene Building aspartame 39FiX ionization of NH3 Select N In Builder click quot1quot a proton is added 39FlX hybridization of NH Doubleclick second N Choose Geometry quotsp2quot Click quotApplyquot Click quotMinimizequot What is energy minimization Energy minimization is a molecular simulation the leads the system to a lower potential energy This is similar to the problem of nding the parameters that minimize a function but there are generallytoo many parameters No 0 ptimal solution is possible Energy minimization is a h euristic method How is the energy of a molecular model calculated Energy is a function of 1 Them ofthe atoms 2 Their m 3 Their The n ames and n umbers tell the program what element the atoms are how they are b onded and what oxidation state they have Molecular mechanics energy An energy function is a sum over a set of simple functions This sum is the socalled e nergy of the system Efltaa2gtfltaa3gtfa2a3gtflta1a2a3gt etc Each simple energy function f may have 23 or more atoms as parameters coordinates names and numbers Each function uses stored information about each atom name to choose c onstants within each function Together the entire set of functions and constants is called a force eld Molecular mechanics A molecular mechanics energy function includes the following components and others 0 bonded bond lengths bond angles torsion angles 39non bonded OLennard J ones or Vander Waals Coulomb or electrostatic van der Wan spheres charge char p A S interaction constraintrestraint restraint a function that approaches a minimum as the parameters approach ideal values F or example the distance AB is restrained to 3 8A using the restraint EAB D A B 382 constraint a function that reduces the number of variable parameters in the system F or example atoms AB C and D are constrained to be in the same plane Planar groups may be constrained Distance restraints Harmonic and Morse potentials are restraint functions Restraint forces are applied to move the atoms to their ideal distances angles Ei j 00xi T2 where x U is the distance 0 1 2 3 4 5 betweeni andj and T is the ideal distance between i andj Building aspartame the easy way Close current system EditgtBuild gtProtein Click ASP PHE Unselect by clicking in empty space Click quotCquot a methane appears Select it and use the mela midde mouse to move it close to the C00 group Select methane C and one 0 from thePhe COO In Builder click single bond Mnjmize Representing protein structure Secondary structure 1D three states Local structure motifs backbone angles Supersecondary structure beta alpha beta motif etc Inter residue distances 2D contact maps Tertiary structure backbone only 3D coordinates Sidechain c onforrnation rotamers Domain domain interactions interface Quaternary structure proteinprotein interactions Predicting protein structure I Secondary structure 11 eural nets HMMs Local structure s equence pro les Supersecondary s tructur simulations rules Inter residue distances n eural nets rules covariance Tertiary structur simulations homology Sidechain c onforrnation e nergy calculations Domain domain interactions energy calculations Quaternary structure energy calculations secondary structure alphabet 3D protein coordinates may be converted to a 1D secondary structure representation using DSSP or STRIDE EEEE SS EEEE GGT EE E HHHHHH HHHHHHHHHGGTT DSSP iatabase of ie condary itructure in ir oteins Both programs use W see next slide DSSP symbols H helix backbone angles 50 60 and H bonding pattern i gt i4 E extended strand backbone angles 120120 With beta sheet H onds p arallelantiparallel are not distinguished S beta bridge isolated backbone H bonds collectivel beta turn speci c sets of angles and l i gti3 H bond y called G23 10 heliX or turn ii3 H bonds L I Pi heliX ii5 H bonds rare for Loop unclassi ed None of the above Generic loop or beta str d With 0 regular H bonding Accuracy of 3state predictions 7 77 H HHHHHHHHHHHHHHGG Tf r39 h LILH HHHHHL L H HHHHHHHHHHHHHHHHH LL Q3 score of 3 state symbols that are correct Measured on a quottest setquot Test set 2 An Independent set 01 cases proteln that were not used to train or in any way derive the method being tested Best methods PHD Burkhard Rost 72 74 Q3 HMMSTR Bystro 74 75 Q3 Psi pred David T Jones 76 78 Q 3 PSIpred a secondary structure predictor http bioinfc s u c1acukpsipIedpsiform html run Illlill lillillililluhl Ere em mara wuummm39mEvrinanrrExcnunw Cant JillIllnllll lllldllt wuzmmmuz Muannnnwakpnv so u PSIFRED Jones et a1 is currently the best server for secondary structure prediction according to CASP results PsiPred A neural network input to hidden units weights hidden units to SS state weights awHHdarqznwnmammwmrduwm Sequence p ro le Hidden units output units input units e 10 ion eac pos1 ion is e s e w1 the greatest sum of weights Psipred a neural net Step 1 Run PSI Blast gt output sequence pro le Step 2 15 residue sliding window 2 315 weights multiplied by hidden weights in 1st neural net Output is 3 weights 1 weight for each state H E or Lp 1 pOSlthIl Step 3 45 input weights multiplied by weights in 2nd neural network summed Output is nal 3 state prediction Reminder H sum f each amino 39d 215 WES sz 2 sequence weights Red high prob ratio LLRgt1 reen background prob ratio LLRz 0 Blue low prob ratio LLRlt 1 PsiPred Training the neural network NN lwelghts a re found thatmim39mize error NN output is compared with the Hl Hd kd lh t mb mlj dm Hidden units output units Sequence pro le True 55 E EEE ssE EEE GGI39EEE 39IT Prediction E FEE L LLL LLLLE EEEE LL Errors OOOOOOOOllllllOOOOOO10100000000000000000100 rov1des both What can you do with a secondary structure prediction 1 Find out if a homolog of unknown structure is missing any of the SS secondary structure units ie a helix or a strand 2 Find out Whether a helix or strand is extended shortened in the h emolog Why does it work Proteins fold Via a 2state model folded or unfolded Usually no intermediates are observed If secondary structure depends on the entire sequence then Why is a lS residue window enough to predict SS 7 357 Maybe SS fonns rst Simple homologybased model a MOE exercise 1 Open course web page and link the S equence l Open MOE go to Sequence Editor 0 trlq SE EditgtCreate Sequence Peptide sequence only Paste S equence 1 from the web page Call it S trepto VSEMeasure gtPredict secondary structure look at it VSEHomology gtS earchPDB Load chain 1 Search Load alignment then Load all don39t load query Displaygtcompound name Delete all sequence except query and l EM7A Simple homologybased model a MOE exercise 2 Align the sequences S E Homologygt Align Color the residues SE DisplaygtColor residues Function Change 1EM7A into your query sequence one residue at a time using SE EditgtProteingtMutate Select an amino acid then click on a residue in 1EM7A to mutate it Do not mutate the query W atch the MOE Window swap sidechains as you mutate When done look at it Are there any bad contacts Simple homologybased model a MOE exercise 3 When done mutating the new 1EM7A is the same sequence as the query energy MOE SelectiongtProtein gtbackbone MOEEdit gtFiX This prevents the backbone atoms from moving during energy minimization MOE SelectiongtInvert MOE Editgt Un x this allows s idechains to move VCompute gtenergy minimizegtF orce eld gt AMBER94 Look at this window Do you know what the weights do Simple homologybased model a MOE exercise 4 Bioinformatics 2 lecture 3 Rotation and translation Superposition structure comparison Structure classification Rotation is angular addition y atom starts at x lrlcosoc y lrl sinoc axi s of rotation Cartesian origin r0tates to x39 M COSOLB Y39 M sin0 B quotCony e ntion angles are measured 39Counter oloCkWise Sum of angles formula cos XB cos x cos B sin x sin 3 sin XB sin x cos B sin 3 cos x A rotation matrix x Irlcos 0t y Irlsin 0t X39 M cos OHB y39 M sin OHB rcos 06 cos B sin 06 sin B rsin 0c cos B sin B cos on lrl cos 06 cos B Irl sin ocsin B Irl sin on cos B Irl cos on sin B XcosB ysinB ycosBxsinB x cos3 sin6 rcosa cos3 sin6 x y39sin cos rsinasin cos y rxotati39oniniatrix risgthe fsame for r any X A rotation around a principle axis The Z coordinate stays the same X and Y change cos sin 0 Bl s1n cos 0 0 0 1 The Y coordinate stays the same X and Z change cos 7 0 sin 7 y 0 1 0 sin 7 0 cos 7 The X coordinate stays the same Y and Z change 1 0 0 BX0 cosa sina 0 sina cosa A 3D rotation matrix Is the product of 2D rotation matrices cos3 sin3 0 cosy 0 sin7l cosicosy sin3 cos3 sin3 cos3 0 0 1 0 sinicosy cos3 sinBsin7I 0 0 1 siny 0 cosy siny 0 cosy 4 30 amp oood 00 k 20s What happens when you move the mouse 1 El 1 O O RX 1 Mouse sends mouse coordinates AxAy to the running program 2 Rotation angles are calculated LX 9X Axscale 9y 2 Ayscale 3 Rotation matrices are calculated 0 0 cos 6y 0 sin 9y cos 6X sin 9X R 0 1 0 sin 6X cos ex 3111 6y 0 cos 6y What happens when you move the mouse 1 quot cont39d 4 New atom coordinates are calculated 39 Iquot Iquot y x 5 The scene is rendered us1ng the new coordinates When multiplying matrices the order matters cos 6y 0 sin 6y 1 0 0 R R O 1 O 0 cos6x sin6x sin 6y 0 cos 6y 0 sin 0x COS 0x cos 6y s1n 6X s1n 6y s1n 6y cos 6X 0 cos 6X sin 6X s1n 6y s1n 6X cos 6y cos 6X cos By This is the matrix if the X rotation is first then the Y rotation Rotating in opposite order gives a different matrix 1 O O cos6y O sin6y RR 0 cosex sin6x 0 1 0 O sinQX cosex sin6y O cos6y cos 6y 0 sin 6y 2 s1n 6X s1n 6y cos 6X s1n 6X cos 6y s1n 6y cos 6X s1n 6X cos 6X cos 6y Reversing the rotation For the opposite rotation ip the matrix Th th ABTAC ISIS 6 ransgose C D B D x39 cos sin x y39 sin cos y The inverse matrix 2 The transposed matrix cos sin cos sin 1 0 sin cos sin cos 0 1 NOTE 003B 003B sinB sinB 1 Example rotation in 2 steps Rotate the vector v12 3 around Z by 60 then around Y by 600 108600 sin60 o 1 105 2086630 1232 sin600 108600 021086620530 1866 0 0 1 3 0031 3 108600 0 sin60 1232 12320518660 30866 3214 0 1 o 1866 123201866130 1866 611600 0 108600 3 12320866186603O5 0433 Righthanded 90O rotations 90 rotation around 100 f X 00 1 y 010 x 001 x Y 010 Z 100 0 10 Z 1 0 x 0 z Helpfulhint For a R handed rotation the sine is up and to the right of the sine In Class exercise rotate a point Xyz 1 4 7 Rotate this point by 90 around the Zaxis Then Rotate the new point by 90 around the Yaxis What are the new coordinates 3D rotation conventions axis of Z Euler anggles 0L Bzy rotation Order of 1 quot rotations Each rotation is around a principle axis Polar angles pr Z y Z y J 1 mi 1 Net rotation K around an axis axis de ned by I and V Polar angles z 2 north pole I y l I x prime meridean I 1 I equator Rotation of K degrees pnd an xi 3 agis located atd degrees longi iieii d degree latitude 7 Special properties of rotation matrices Read more about rotation matrices at httpmathworldwolframcomRotationMatriXhtml RMSD Root Mean Square Deviation in superimposed coordinates is the standard measure of structural difference Similar to standard deviation which is the square root of the variance Orengo p88 Where xi are the coordinates from molecule 1 and yi are the equivalent coordinates from molecule 2 pseudo pseudocode program for computing rmsd sum 0 N 0 while ltthere is data to readgt Read coordinates for two aligned positions read bel zl x2yZZZ Compute the distance2 d xi39x92Y139YZ2Z139Z22 sum it sum sum d keep track of how many pairs there are N N l Average and take square root Imsd sqrtsumN Least squares superposition Problem find the rotation matrix M and a vector v that minimize the following quantity Where xi are the coordinates from one molecule and yi are the equivalent coordinates from another molecule equivalent based on alignment Mapping structural equivalence aligning the sequence Any position that is aligned is included in the sum of squares 4DFRzA T TT r rm G ENAM w W MW RHW WF T rRPLPGRKNI inn n m n W WU FR wvcmn nnlin p PERTI quot 4DFRzA Ir M A Md IAACrrGU min itch PKAQKLYLTHIDA inn IAGGAQI 4DFRzA n Lbulnr J l U FKILERR lDFRzi rhGUlKMl Ky Unaligned positions are not At the position of best superposition we have Least squares a m 4 an approximate equality First we eliminate v by translating the center of mass of both molecules to the ori inNow quot 39 g M31 a g 3quot l we have We have one equation 139 for each atom M has 9 unknowns If there are more equations than unknowns there is a unique solution What is it See the next two slides Least squares Least squares solves a set of a linear equations in the form XlNaN Y1 Q x11a1x12a2 x21a1x22a2 XmaN 47 Q 4 lea1xMza2 XMNaN 65 0 This is 39shorthand39 notation for the equations matrix vector vector Least squares continued Green elements are known U tmnen lljllmIJUHH Fat Rectangles are matrices Thin rectangles are vectors Multiply both sides by E transpose ofx quotSquaringquot quotSquaredquot matrix is now uh I square Which means we can U invert it We can use the quotLU decompositionquot method Cl Multiplying both sides by the inverse of quotsquaredquot matrix solves for a Summary a xTx 1xTy leastsquares superimposed molecules In class exercise Superimpose 2 molecules by hand Open msightll Open two PDB files 35hdpdb 1jzmpdb Display first chain quotAquot usually as Htrace only Choose a better color if necessary Adjust the depth queuing if necessary Superimpose the two molecules Use the reference pages that follow Do these pairs 3sdhpdb ljzmpdb hard 3sdhpdb lh97pdb harder 3sdhpdb lphnpdb hardest InsightII reference getting a molecule Molecule gtGet gt Getfile typePDB Filename Trace of chain A only Molecule gtDisplay gt Display operationzOnly Molec attribAtoms Atom setTrace Molec Spec A left click on a molecule to see the object name chain IDA atom spec Look for chain ID if any InsightII reference d 391 quotquotquotquot quot 7p quot 39 change depth queuing at V slab hit F12 39 39 39 39 39 39 39 39 39 39 39 click here mVlS J v Viewer Color molecule Molecule gtColor gt AttributezAtoms Molec Spec Atom setspecified Colorltselectgt object name Delete objects type Delete Object I or Delete Object InsightII reference How to superimpose two molecules 39Select first molecule hit F10 and click on it Only first molecule will move Move it to superimpose on 2nd molecule 39Select quotworldquot hit F11 Rotate both together to get a different view 39Select either molecule F 10 click Refine the superposition Repeat until converged Note you can connect to a molecule by typing connect XXX where XXX is the quotobject namequot for your molecule You can repeat and edit this or any command using the arrow keys InsightII reference How to Save your work File gtsavefolder Don39t check quotreplacequot unless the file already exists How to Save coordinates in PDBformat This will create a file with just the atoms that are displayed Le first display all heavy atoms Molecule gtPut gt put file typezPDB Assemblymelee Filename transformedzchecked displayedzchecked referenoeiobjzchecked reference nam Don39t check quotreplacequot unless the file already exists Command put molecule PDB XXX filepdb transformed displayed reference YYY Superimpose 2 molecules automatically Structure superposition programs have to do two things 1 Align the sequence 2 Minimize the RMSD Can39t do first without the second Can39t do the second without the rst Often it is impossible to get a good sequence alignment even though there is structural homology Remote homologs are more likely than close homologs The existence of large numbers of remote homologs shows us that true structural similarity is hard to see in the amino acid sequence Structural conservation is stronger than sequence conservation likelihood the quottwilight zonequot percent identity for structural homologs Example of structural homologs analogs 4DFR Dihydrofolate reductase 1YAC Octameric Hydrolase Of Unknown Specificity 59 se uence identit best ali nment 1YAC structure solved Without knowing function Alignment to 4DFR and others implies it is a hydrolase of some sort probably uses NAD cofactors DHFR in yellow and orange YAC in green and purple Example of structural homologs analogs sheets only helices only l7 Structural alignment algorithms Alignment algorithms create a onetoone mapping of subsets of one sequence to subsets of another sequence Structurebased alignment types Geometricintermolecular Algorithms may be do this by minimizing the intermolecular distances or root meansqnare deviation rmsd in superimposed alpha carbon positions Geometiiciintramolecular Algorithms minimize thetdifference between aligned39contac tmaps or distance matrices Intramolecular distances areused N onGeometric Algorithms align structural properties such as buried or secondary structure type usually using dynamics programming DP DALI a intramolecular geometric structural alignment algorithm DALI Distance matrix based ALIgnment Liisa Holm amp Chris Sander 1 Generate a distance matrix for Dij 2 distance each protein between alpha carbon i and The distance matrix alpha carbonj contains all pairwise distancessymmetrical Shapes in distance matrices In this dide 1 means close in space l l hairpin helix pmalldzgds antiparallel strands Aligning two distance matrices Cut and paste alignment of distance matric es Resulting sequenc e alignment DALI algorithm S is a score that is amaXimum when the alignment is optimal where Phi is a constant theta minus the absolute difference of the two distance matrices after they are aligned to each other red squares are 6x6 Making the DALI pairs list Structure A Structure B S 2 6X6 s vs V Each pair of 333 6x6 s corresponds to a gapped V gt 202 alignment V gt 125 5 DALI alignment generate pairs list a distance matrix 4dfr lyac extend pairs list 21 Can nonsequential alignments be found one paw an overlapping pair the two pairs combMed chapsa 3D ED 10 Non sequential similarities exist Two different folds superimposed with the loops removed 22 SSAP alignment A View is the set of all vectors from one residue Each residue has its own SSAP alignment Views 139 and j must have similar backbone angles otherwise the score is zero View for Template residue 139 View for Target residue j The difference between the two Views is a measure of how similar the structures are when Viewed fromi andj SSAP alignment residue level score matrix For each potential ij pair we find the best DP alignment that includes it ie global DP starting at 00 and ending at ij plus global DP starting at ij and ending at the lower right hand corner SSAP alignment summary matrix DP score for ij goes here summary matrix n 1 single alignment The summary matrix is made up of DP scores for each ij position Then a second round of DP is run through the summary matrix Final score is scaled to the range 0100 Where 100 is an identical match 24 Two other servers for structure based alignment CE Combinatorial Extension httpclsdsceducehtml VAST httpwwwncbinlmnihgovStructureVAS TVastshtm1 CE interface Rzpmsantauve i 148 1 155 E 1551155 1531192 157 was 113 11112 9113711 731m 951553 was 991419 was THIAMINWDSPHGKITASE m 13 EC 1751 11511311 E7 25 CE alignment Structure Alignment 2DRCA Neighbors FDBHMWHSSSSSWFFF ssss pm pm 2mich 1617 MENmPmquwrkmLnxwmsuummsmun mmmnn 1 n my Ema nmvumzm in st mm YDQYWAIJEA mu ssgmnwww vauc nvevzmvm avnmgmm 3r 1 as m mmrvnv rm gummy vmrmu v noun m m msum 5mm ldtlluxyt i z Fry 1 mt vhmrv m l mutb v39 quotm quot Length nf ahgnmmt 2mm CE alignment of 15 analogs Exercise use CE to get structural homologs Set your browser to httpclsdsceducehtm1 Find structural alignments by selecting from ALL or REPRESENTATIVES from the PDB Submit your protein and chain Or use 4dfrzA Select 2 structures Then hit Get alignment or use 4DFR and lYAC Download as PDB file Save it For use in InsightH divide the file into 2 one for each protein Structure classification The SCOP database Contains information about classification of protein structures and Within that classification their sequences httpscopberkeleyedu 27 SCOP Classification heirarchy global characteristics no evolutionary relation 2 f01d Similar topology Distant A lutiona cousins 3 superfanuly 6V0 W 4 familyv39 Clear structural homology 5 protein Clear sequence homology 1 class 6 species functionally identical unique sequences Fold 28 domain Most genes represent multidomain proteins 40 of known structures crystal NMR are multidomain proteins but Most of all proteins are multidomain60 in uncellular organisms 90 in eukaryotes Where do my domains start and end 29 Multidomain proteins Domain boundaries can be seen as quotweakquot connections in the structure quotWeakquot means few contacts and few chain cross overs Domain boundaries can be seen in multiple sequence alignments if the alignments are of whole genes l I l I I In class exercise Find the domains Download lakl from the PDB or choose randomly it doen39t matter Open in in Insightll Display it as a quottracequot Are there multiple domains What residues lie at the domain boundary Color domains differently 30 protein Classes 1 311011269I number of sub categories 126 2 aIIB 81 3 ocB 87 4 ocB 151 5 multidomain 21 6 membrane 21 7 small 10 8 coiled coil 4 9 lowresolution 4 10 peptides 61 11 designed proteins 17 Not true classes possibly not complete or erroneous Class ocB proteins Mainly parallel beta sheets betaalpha beta units Folds TIMbarrel 22 swivelling betabetaalpha domain 5 spoHaa like 2 avodoxinlike 10 restriction endonuclease like 2 ribokinaserlike 2 Many folds have historical names TIM barrel was first seen in TIM These classifications are done by eye mostly chelataselike 2 31 fold avodoxin like 3 layers ocBoc parallel beta sheet of 5 strand order 21345 Superfamilies lCatalase C terminal domain 1 2CheYlike 1 Note the term layers These are not domains No implication of structural independence 3Succinyl CoA synthetase domains 1 4Flavoproteins 3 5Cobalamin vitamin B12 binding domain 1 60rnithine decarboxylase N terminal quotwingquot domain 1 Note how beta Sheets are 7Cutinase1ike1 described number of 8Esteraseacetylhydrolase 2 Strands order NgtC 9Formateglycerate dehydrogenase catalytic domain like 3 10Type II 3 dehydroquinate dehydratase l fold level similarity common topological features catalase avodoxin At the fold level a common core of secondary structure is conserved Outer secondary structure units may not be conserved 32 SuperfamilyFlavoproteins FlaVOdOXin39relaEd 7 NADPH cytochrome p450 reductase N terminal domain Quinone reductase These molecules do not superimpose well but side by side you can easily see the similar topology See struet s align l to I mostly Family quinone reductases binds FAD Prote1ns NADPH quinone quinone reductase reductase type 2 Different members of the same family superimpose well At this level a structure may be used as a molecular replacement model 33 CATH c a P35 0 Class J 2 x mp 3U Arch1tecture A xm39 j 33 39 HOIHOIOgy Tlearrel ganhwlch Roll A I r T agdoxin l lactaniase MlXN lmblAI httpwwwbioche1nuclacukbsmcathnewindexht1nl TOPS topology cartoons A simple way to draw a protein AV beta strand beta strand alpha helix connections pointing up pointing down A A A A Aparallel beta sheet V A V A An anti parallel beta sheet TOPS topology cartoons connection in middle means on top connection on side means on bottom A right handed BOLB unit A left handed BOLB unit rarely seen TOPS and contact maps quot39 39 A quotcontact mapquot for a BOLB unit 35 In Class exercise TOPS cartoon Display 4dfr chain A in Insightll Use Molecule gtrendersecondary gt helices sheets turns loops Draw a TOPS cartoon of this protein 36 Bioinformatics 2 lecture 6 Loop modeling Energy minimization Steps in homology modeling 39 Identify a sequence of interest 39 Search database for homologs of known structure 39 Align homologs with each other and with query 39 Add structural homologs if necessary 39 Define SCRs and Designated Loops Assign coordinates 39 Loop search or loop generate Assign loop coordinates 39 End repair Splice repair Other repairs 39 Energy minimization Analysis Interpretation Why am I doing this Reminder The reason we do homology based modeling is that we want to predict the structure of a protein and we know a homolog structure If it is enough to simply predict that it is quothomologousquot then we don39t need to make a model We just make an alignment We make a model in order to predict how the query structure di ers from the template Structural difference suggest functional differences The ycaC gene What is it ycaC is an 621bp ORF in Ecoli uncharacterized no assigned function Distant homology with bacterial hydrolase genes 20 identity but substrate cannot be determined What does it do Is it an antibiotic resistance gene First step database search consensus 4 pr 3 4 W944 j 78 1yAcA W YLTTS 59 gl 2506786 U r W rm 1 67 gl 20140520 13 IW39FQN 72 gl 586862 1 W 77 VT 39Pl l l Nll 55 gl 2648379 1 39 7g i A i L 50 gl 11937 26 wquot v 1 0 79 gl 140602 1 0 WW quot939 quot7 52 gl 3876766 10 r 39 WV mmr39w 59 70 80 90 100 11 120 Multiple sequence alignment g g 3 Q1 2648379 51 M WW 7777 rims 89 Tools used to build MSA Psi Blast Pfam CDD COGS SeqLab Useful models found limSA lnf9A But these are missing large pieces 8 0 J39E 131 g1 2506786 112 v V 7 397 We 7 W V 158 g1 20140520 133 VIquot 9 W quotWV mm 39T NV VDH umn 179 L L 6 Steps in homology modeling Identify a sequence of interest I Search database for homologs of known structure Align homologs with each other and with query 39 Add structural homologs if necessary 39 Define SCRs and Designated Loops Assign coordinates 39 Loop search or loop generate Assign loop coordinates 39 End repair Splice repair Other repairs 39 Energy minimization Analysis Interpretation CE structural alignment Since the basis set sequences have very low similarity we want to collect some more structures of the same kind to enrich the basis set Even distant homologs or structural homologs are better than building loops from scratch Insert CE web site demonstration here For those of you reading this at home check out clsdsceducehtml Transferring an externally generated alignment into InsightII Sadly InsightH cannot import alignments even of the most common formats Instead we must do the alignment from within But InsightH s alignment tools are shoddy at best Here s how to import an alignment from say a CE structural superposition 1 Load basis set structures and extract sequences in InsightII Load query sequence 2 Open two windows on the same screen one for the alignment SeqLab Netscape whatever one for the sequence window of InsightH If possible show only the basis set sequences in SeqLab etc 3 For each basis set sequence use middlemouse to scroll sideways rightmouse to insert gaps After the alignment We have already discussed the next two steps Defining the SCRs these should be SSEs Assigning the Designated Loops these are not necessarily loops in the secondary structure sense Open and study yaciassignedpsv Note the placement of the boxes in the structure by using HOMOLOGYgtSequencesgtColor and hit Color by Calpha Choose a color and a box or an atom and execute The associated structure will have that color When to use SCRs from multiple templates Normally it is best to pick SCRs from one template the best template In this case the best template lnba does not have a helix at position 70 76 And let s suppose it is predicted to be a helix by Psi FRED and there is a basis set member lhso that has a helix in the right place So we use it But most of the time we would try to pick SCRs from one template Steps in homology modeling Identify a sequence of interest Search database for homologs of known structure Align homologs with each other and with query 1 Add structural homologs if necessary Define SCRs and Designated Loops Assign coordinates 39 Loop search or loop generate Assign loop coordinates 39 End repair Splice repair Other repairs 39 Energy minimization Analysis Interpretation Loop Search If any member of the basis set not necessarilly the primary source of your SCR boxes has a loop of the right length you should use it Box is and assign coordinates as Designated Loop to be explained later If no member of the basis set has a loop of the right length then we try to nd a loop of the right size in the database Use HomologygtLoopsgtSearch Click on the two boxes bordering the loop When you hit execute the program will search the database You cannot use this for the N and C termini It will fail if the loop is very long and it cannot be zero length Loop Search How Loop Search works Start residue and Stop residue are defined by the SRC boxes which must have coordinates already assigned Flex residues is filled in automatically when you pick the start and stop This is the length of the loop to search for Pre eX and Post eX are the number of residues before and after the loop We will try to find a loop from the database that fits these pre and post ex atoms We hit execute and Loop Search Up to 10 loops of the right length in the database are superimposed on the par and post ex residue and the RMSD is calculated InsightH keeps fh39e39bops l O O with the best RMSDs quot39llllll O i O 339 flex re51dues I I 0 g b I C I III If the gap distance is less than 3819 X number of flex residues then the pre and postflex residues must move gapd1sTa ce 39 39 pre ex residues I post ex residues Exercise Loop Search In Insightll delete Filegtrestore7folder yaciloop1psv Turn off the display of NBACEB HSOCEC and YACCEA Zoom in on the loops red Use LoopsgtDisplay to show one loop at a time Select one that fits around the previously defined loop nearby LoopsgtAssignC00rds choose loop 3 NOTE TACCEA is the true structure of the ycaC gene You can use it to check the quality of the model Exercise Loop Search Proceed to do a Loop Search for each 1 segment LoopsgtSearch gtclick on the two neighboring boxes Zoom in on the loop region Then LoopsgtDisplay to visualize individual loop candidates Write the number of your favorite loop If you can39t see the loops well reduce them to the trace only MoleculegtDisplay gtONLY trace select each loop object or to backbone atoms rendered as sticks MoleculegtRender gtsticks 01 low quality Keep notes on the growing model loop search choice 2 70 76 79 84 95 103 r 107 113 end repair 117 130 132 138 140 157 165 173 What makes a good loop Aligns well with pre and post ex residues Does not collide with other loops or the backbone Fills space No big voids Has polar sidechains out non polar sidechains in Travels over non polar surfaces mostly ie Does not bury charged residues Has positive phi angles at Glycines mostly Loop Generate Occassionally InsightII cannot find a good loop in th edatabas e A quotlast resOrt optiOn for building coordinates into a loop region is L00ps gtGenerate It39works just like Loops gtSearch but using at different algorithm Levinthal s random tweak method Levinth al should haVe known better Loop quality and info level Q Why are randomly generated loops likely to be worse than database derived loops Exercise Loop Generate In Insightll Use your current model or optionally delete Filegtrest0re7folder yaciloop2psv Turn off the display of NBACEB HSOCEC and YACCEA L00psgtGenerate L00psgtDisplay write the number of your favorite loop L00psgtAssignCoords Loops gtassigncoords choose loop 3 Steps in homology modeling W Identify a sequence of interest W Search database for homologs of known structure W Align homologs with each other and with query Add structural homologs if necessary Define SCRs and Designated Loops Assign coordinates Loop search or loop generate Assign loop coordinates 39 End repair Splice repair Other repairs 39 Energy minimization Analysis Interpretation Done modeling loops End repair HomologyRefinegtEnd7repair is an automatic function It fills in any missing coordinates at the N and C termini If there is a large piece of the structure missing in the alignment we need external tools to model it to be discussed later For today we can ignore the C terminal part You may un display it if you like constraintrestraint Constrained energy minimization Energy minimization using molecular mechanics repairs the following H bond lengths PW bond angles torsion angles 1quot Iquot nonbonded collisions A 1quot Van de WMIS x 4 a spheres Constrained energy minimization energy minimization in the context of constraints For example in splice repair constraine to their current pOSlthhS They are used in the energy Calculations but they cannot move Harmonic potentials and Morse potentials Harmonic and Morse potentials are restraint functions f l A force is applied to move the atoms to their ideal distancesangles Morse potential potential energy in units of n Harmone approximation l 2 3 4 5 bond length 5 Vi o a 5 gm ms See Orengo p 129 Why do a simulation Why not just put every atom at its ideal position Can39t we solve for this position using Calculus Yes it is possible to build a molecule With exactly ideal bond lengths and angles Start at one end and connect one amino acid at atime But this would not produce the same 3D coordinates as a simulation Why see next slides Internal coordinates for proteins 140 120000 111409 120000717 9 551 Ideal bond lengths angles plus torsion angles are enough to build the 3D structure Type Ill hairpin The strange properties of internal coordinates when linked in a chain Cartographers adore I 733 used internal coordinates Errors in internal coordinates A global reference point the stars accumulate along the chain and Harrison39s perfect clock in 1733 removed the error accumulation problem Splice repair a constrained simulation The bonds between modeled segments SCRs designated loops other loops may be distorted RefinegtSplice repair does a limited simulation to fix the stereochemistry around these splice points Run Splice repair and watch the stereochemistry fix itself Re negtSplicerepair gt addall execute enddefinition gtsteepest 5 100 execute enddefinition gtconjugate 5 100 execute 15 Manual reposmoning Crossing chains can never be repaired by energy minimization since atoms would have to cross a very high energy barrier To move the backbone to a new position we first make a cut then rotate the backbone around selected torsion angles Then repair the cut Exer01se manual repos1tion1ng USIIlg transfor gttOfSlOIl delete filegtrest0re7folder yacitorsionpsv Find the crossed loops Cut peptide bond between 26 27 Torsion around 20 phipsi and 30 psiphi BIOPOLYMERmodifygtb0ndgtbreak select atoms Transformgtt0rsi0ngtaddselect 20NCAC Do the same thng adding torsions for 30CCAN Be sure to select atoms in the direction of the movable part If you choose atoms C CA N then the N terminal side will move If you choose N CA C then the C terminal side will move Move the chain using middlemouse Toggle torsions using F7 Move the chain to where it does cross any chains Exercise manual repositioning using transformgttorsion Put up a distance monitor Try to make this distance short lt3A MeasuregtDistancegtselect C26 N27 When the loop is untangled save the new torsion angles Transformgtt0rsi0ngtclear keep Re create the broken bond BIOPOLYMERmodifygtb0ndgtcreate select atoms Exercise relaxation Relaxation as performed by Insight is constrained energy minimization You may choose the unconstrained atoms the ones that are allowed to move then Insightll will move these atoms subject to restraints and molecular mechanics forces Van der Waals repulsion and torsion angles are molecular mechanics forces bond lengths and bond angles are restraints If necessary delete and restore yacrelaxpsv RefinegtrelaX gtaddloop sides execute enddefinition gtsteepest 5 100 execute enddefinition gtconjugate 5 100 execute RefinegtrelaX gtaddloop backbone execute enddefinition as before Then add SCR sides Steps in homology modeling W Identify a sequence of interest W Search database for homologs of known structure W Align homologs with each other and with query W Add structural homologs if necessary Define SCRs and Designated Loops Assign coordinates Loop search or loop generate Assign loop coordinates End repair Splice repair Other repairs 39 Energy minimization Analysis Interpretation Setting the potentials for energy minimization Unconstrained energy minimization is the next and final step Before we can do energy minimization Insightll needs to know a few things 39Coordinates for all atoms including hydrogens 39Atom types 39Bonds and bond types for all atoms 39Charges or partial charges for all atoms These are set using the Force Field button quotFFquot on the left side menu bar Select quotPotentialsquot Set all to quotfixquot Execute Set all to quotacceptquot Execute Energy minimization using Discover Discover73StrategygtSimple Minimize Discover will not let you minimize unless the potentials have been set You may have to add hydrogens Biopolymer module andor set the potentials FF menu Review Xray crystallography Rfactor resolution Bragg39s law Bfactor R Ensemble NOESY TOCSY spin system distance geometry PDB website How to find How to download Rotation superimposition RMSD Dali method Contact maps Secondary structure propensityprediction GOR PsiPred Chou Fasman Q3 score Ch Ce score Protein structure classification SCOP CATH databases class fold architecture topology analog homolog Levinthal39s Paradox Anfinsen39s thermodynamic hypothesis Molecular machanics force field simulation constraintrestraint lnsightll basis set structure alignment SCR designated loop loop search etc etc Spelunking Secondary structure Hbonds torsion angles sidechains memorize theml TOPS diagrams Bioinformatics 2 lecture 20 oSaBAoiaovIN o 4 3333 333 3303 oo o ww UHmw 3033 oe3 zmwm 333 o m 33333333333 3 33333333333 3 3333333333333 i o o o iio 223233mmwmu mmmwmmampmhH eeeoeewz3333n eexnz33mme3 v 333nmmooomH m 3 3 o z m a lt 3 333333 3333333 3333333 ooo a HEIIMM 33333 33333 3mltzx o3o3o ww me m on alt oo folding sequence Sequence space maps to structure space sequence families K f as many tO One Short history of protein design Site directed mutagenesis minimization J Wells 198039s 90 s Coiled coils helix bundles DeGrado 198039s 9039s Binary patterning Hecht 1990 s Extreme protein stabilization Mayo 199039s Binding pocket design Hellinga 2000 New fold design Kuhlman amp Baker 2002 4 Protein protein interface design Gray amp Baker 2004 Open source protein design algorithm EGAD Pokala 2005 Novel enzyme by design Baker 2008 Experimental approaches Computational approaches 0 in vitro evolution 0 Dead End elimination 0 phage display 0 binary patterning rational design Minimizing proteins Step 1 Step 2 Step 3 Step 4 Step 5 Identity binding Design smaller rind Optimize affinity Remove residues Remove residues determinants by to display primary by phage display 20 to 28 and 1 to 4 to make alanine scan determinants reeptimize 154123111013 miniANP ANP atrial natriuretic peptide Relative Relative ANP variant phage af nity peptide af nity for NPRA for NPR A 1 SLRRSSC7FGGRMDRIGAQSGLGCZSNSFRY 1 1 2 SLRRSSC7FGGRMDRIGC17QSGLGSNSFRY gt1500 ND 3 SLRRSCGSFGGRMDRIGQ7QSGLGSNSFRY 104 ND 4 SLRRCSSSFGGRMDRIGQ7QSGLGSNSFRY 480 ND 5 SLRRSCGHFGGRMDRlAQ7QSGLGSNSFRY 1 ND 6 SLRRSCGHFGGRMDRlAC 300 40000 7 SLRRSCSHFGGRMDRIAQ7NRshort spacer 8 43 8 SLRRSCGHFGGRMDRlAQ7NR long spacer 50 43 9 SLRRMCGHFGGRMDRISQ7YR long spacer 8 6 10 MCGHFGGRMDRISCWYR ND 7 Bmg L1 et a1 Selence 1995 11 FiC7ChaGGRIDRIFRC18 ND 7 200 Binary patterning in proteins Kamtekar et al Science 1993 Helix l Helix 2 Helix 3 Helix 4 EIEMKDQEW me XOIEDLIDDLEHLLN KLEHQEIJHLLQ DIQNLIWO WMPQQHK QVQQLLQQLKNLIE EVQNLVEQXRNILE KPQQH QQLBKHE DFKKI39LMIDOIIN r QIHQIHNHFNQVLS mm ELDQLLQQVRDLLK QLDBILBBIEQUCK i F aquot Erommnm ELEQLI39E HIQEITN nmmmnno HIBQLLDKFQEWB DIQKLLNDV KBILN DWHQ D ICKHLVN LENFLENL NVQKLVQDVQHLYN KLBDLLEDLQEVLK HIONVIEDIBDPIQ KLQE KEPQQVLD NIKBI39BELEBINE Pal residuum Nonpcllr ruidun SQNNQZ IHQMU HHFFH QUIIJNH Lu 094 Ulub p w qmmw 01qu Mom OMUIU Computational protein design using DeadEnd Elimination Select positions for mutating Select allowed amino acids at those positions For the selected amino acids try all sidechain or1entat1ons rotamers Chose the set of rotamers that gives the lowest Cenergy99 Sidechain Rotamers Sidechain conformations fall into three classes called rotational isomers or rotamers A random sampling of Phenylalanine sidechains Wbackbone superimposed Sidechain rotamers 1 4 interactions differ greatly in energy depending on the moieties involved m quottn quotpquot quot60 gaUChe 180 antitrans 60 gauche Rotamer stability is dependent on the backbone qnp angles W sidechain is shown here lying over Thr backb 116 Rotamers of W I 11 60 90 60 90 180 105 PI 140 y160 0372 0238 PIQ60 LIP40 0079 0005 Rotamer Libraries Rotamer libraries have been compiled by clustering the sidechains of each amino acid over the Whole database Each cluster is a representative conformation or rotamer and is represented in the library by the best sidechain angles chi angles the quotcentroidquot angles for that cluster Two commonly used rotamer libraries gtkJane amp David Richardson httpkinemagebiochemdukeedudatabasesrotamerphp Roland Dunbrack httpdunbrackfcccedubbdepindexphp rotamers of W on the previous page are from the Richardson library sidechain prediction 5 I I V Given the sequence and l A only the backbone atom g jg V coordinates accurately A3 Alb f rnodel the posztzons OfIhe fw r AFC szdechams HJ thick lines sidechain predictions using the method of Desmet et al xx 2x cf39i g g w 1quotz ne lines true structure WJ W Desmet et al Nature V356 pp339 342 1992 Theoretical complexity of sequence design Estimated number of sidechain rotamers R2193 Typical small protein length L100 residues Sequence complexity 20100 l310130 Rotamer complexity 193100 36gtquot10228 Complexity OfDEE algorithm 0 RZLZ 3608 lDead end elimination theorem Each residue is numbered 139 or j and each residue has a set of rotamers r s or I So the notation ir means quotchoose rotamer r for position iquot The total energy is the sum of the three components W xed xed movable movable Eglobal Etemplate ZiszUr js Where r and s are any choice of rotamers Eglobal Z EGMEC for any choice of rotamers IDead end elimination theoreml 01f ig is in the GMEC and it is not then we can separate the terms that contain ig or it and re write the inequality is less than EnotGMEC Etemplate szil jg ijkEUgkg g Jg Canceling all terms in black we get Eir zj Eirjs gt Eig zj Eigjs So if we nd two rotamers ir and it and EU Zj minS Eirjs gt Eil Zj maxS Eiljs Then ir cannot possibly be in the GMEC Dead end elimination theorem l Eir 2 mins Eirjs gt Eil 2 maXS Eiljs DEE theorem can be translated into plain English as follows If the quotworst case scenario quot for t is better than the quotbest case scenario quot for r then you always Choose t DEE algorithm template Find two columns rotamers Within the same residue Where one is always better than the other Eliminate the rotamer that can always be beat repeat until only 1 rotamer per residue DEE algorithm template Find two columns rotamers Within the same residue Where one is always better than the other Eliminate the rotamer that can always be beat repeat until only 1 rotamer per residue a 2b1 C Er1 Sequence design using DEE a b c b 1 h a b 1 1 1 2 1 5 2 0 a b 3 5 1 0 5 1 2 0 Em c 5 5 1 0 0 0 3 5 2 1 3 5 0 0 1 1 0 7 39 1 5 5 Er1r2 12 5 0 3 0 1 1 4 3 0 1 0 do A a 2 0 0 0 12 4 0 y 392 a b 2 5 0 0 5 3 iAspl 0 o a 5 1 0 1 0 0 12 39 V b 2 2 3 1 3 1 c r i 39 quotLeui 2 0 0 5 0 0 0 0 0 12 2 Rotamers within the DEE framework can have different atoms ie they can be different amino acids Using DEE we choose the best set of rotamers Now we have the sequence of the lowest energy structure In the example we have D or L at position 3 some amazing accomplishments in protein design Proteins can be made superstable natural seq Folded L i a y ii w H r A quot lt 3quot v fliw Mean Residue Ell complisllmems in prmein le Computationally redesigned proteins are consistently stable Acv Redesigned acylphosphatase 1000 D W mun 200D J4 mun My L 4001 R 5000 u 2 mm x f J DM Gu 25 mo v wx fy W G 25 MD UM 35195 195 205 245 255 265 Wavelength 1 nm Dantas et 31 J Mal BialA 2003 332 4497460 othSZ integrin I domain in 2 conformations 2 crystal structures are known They differ in the highlighted region Shirnaoka et a designed sequences for each form open and closed The two designs were shown to have different physiological properties PDB codes 1an5 1n92 Shimaoka M Shifman J M Takagi 1 Mayo S L Springer T A 2000 Computational design of an integrin ldomain stabilized in the high af nity conformationquot Nature Struc Biol 78 6747678 some amazing acconiplislmients in 1 Newmnmngs escanbe de gned Used to bind arabinose now it binds seratonin Longer L L Dwyer M A Smith J J amp Hellinga H W Nature 423 18190 2003 Recent success of sequence design Redesigning a binding site a 0 H5 0 H H o l I o D f Serotonin TNT ngCH 0 R LvLactaxe M NH 7 n H N a eveNev K DEE with alternative sequences and ligands Ligand conformers a c c a b ab ab 11 1 2152 0 a 3 5 0512 0 Lb Er c 5 5 1 003 5 2 13 5 0011 0 I 1 5 5 Er1r2 12 5 0 3 0 2 1 11 4301 0 c 4 a20 0012 4g 0 b2 5 005 3 3 0 a5 1 010 033 12 v 2 2 313 1 LeHi 2 Em 00 5000 0 0122 Each alternative ligand position is another rotamer An appropriate binding site was found a The native ligand arabinose is approximately the same size as the targeted ligand seratonin J Nature 1854190 A space was carved out for the ligand All sidechains in the binding site were truncated to alanines and a space was de ned yellow for the new ligand Lots of possible ligand orientations were made Ligand orientations were treated like rotamers in DEE Looger L L Dwyer M A Smith J J amp H39ellinga H W Nature 423 185 190 2003 Sidechains were chosen using DEE Ligand position 3 along With sidechain identity and sidechain rotamer are chosen simultaneously Nature 427339 185 190 Hbonding is key Successful designs have all Hbond donors and acceptors satisfied some amazing accomplishments in protein design New folds can be designed G85 078 New proteins can be designed that have never been seen before The designs are accurate compare red and blue above and they are highly stable Kuhlman et al5cience V3025649 1364 1368 2003 Designing an enzy 1 me An enzyme must bind the transition state tighter than the ground states Highresolution crystal structure closely matches the computational design lnclass exercise redesign an active site Make cocaine Use Builder See diagram Bind it to a Gprotein coupled receptor adenergic receptor PDB code 1BAK Avoid collisions with backbone Ignore sidechains Bury it deep for maximum affinity Use rotamer explorer to modify the sequence ofthe binding site Two considerations The shape of the site must fit the Hgand The protein must still fold changes should be similar sidechains if possible Bioinformatics 2 lecture 11 Molecular surfaces Electrostatic maps The Hydrophobic Effect What is a molecular surface A molecular surface closed 3D quotmanifoldquot What39s a quotmanifoldquot Here is a 2D manifold gt Extrapolate What does it mean A molecular surface is the border or interface between a molecule and its environment Water that sits on the surface behaves differently than water that is surrounded water by water Surfaces can be used to model salvation Surfaces have 0sizearea electrostatic properties shape properties A g Rendering a surface A surface of any shape may be camposed of 3D triangles that is 3 sets of xyz coordinates one for each vertex To display the surface on the screen each triangle is rotated and translated according to the current frame of reference 0Continuous triangles make a continuous surface Then each pixel is assigned a brightness according to the angle between the triangle and the light source oPhong shading may be applied to simulate curvature In this case each pixel in the triangle has a different brightness depending on Where it is a complex surface made with triangles A cowshaped manifold Phongshaded triangles give the illusion of higher resolution The Connolly surface Conceptually roll a probe sphere over the molecule oEverywhere the center of the sphere goes is the Solvent Accessible Surface SAS oEverywhere the sphere touches including empty space is the Solvent Excluded or quotConnollyquot Surface SES atOlTllC COOI mates atom ra 1us Surface shapes rolling probe sphere AK x Green convexconvex J contact surface of probe with atom r Brown convex concave toroidal surface when touches two atoms Yellow concave concave reentrant surface when touches three atoms Coloring by atom by shape How do surfaces interact Molecular surfaces dictate what the function of a molecule is whether it is a binding signaling protein an enzyme a channel or an integral membrane protein 39Shape interactions 39Charge interactions Complementary surfaces Complementary surfaces leave relatively less unfilled void space Void space is unfavorable Nature abhors a vacuum n n tmi5km u There is only one way to make spaoe HEFMHI munm I m empty but many ways to fill it The entropy is the number ofdi erent slates available to the system Lower entropy implies higher free energy Better fit less wasted space lower energy I a z i A I 5 4 4 If we compare two surfaoe 4 Pp surface interactions all else being equal the one with more p 5 Void spaces has a higher free I 5 A I 5 energy and is therefore less 5 44 I favorable 5 1 The one that fills space has a 4 4 lower energy I 4 4 4 r 5 1 The Hydrophobic Effect What happens to the surface and volume as two hydrophobic atoms come together N I I I I I I I I I l I I I I I I I I I I I The Hydrophobic Effect White area is excluded volume Water can39t go closer than one radius 14A r quot a The Hydrophobic Effect First excluded surface green is approx compensated by new quottoroidalquot surface brown I x a I lt I I I I I I I I t I I l 1 l I I I I I I I The Hydrophobic Effect Toroidal surface grows faster than atom surface is lost SES goes up a Is a I l 1 I I x I 539 i I I 39 I I I l I I I A I I I I I The Hydrophobic Effect At short D toroidal surface stops growing atom surface shrinks faster a T xa 5 1 l Ii I I I I I I l l I I I I quot A barrier to hydrophobic collapse black AG calculations using explicit waters blue SES green excluded volume AG SES Vol Distance When hydration shells first touch surface goes up but volume goes down most hydrophobic C GLI I C Ammo A01d 1 VA V m H d h b39 39 V F L V y top 0 lclty kip C E LYMYW Aqw M Scales MA H Ts G X Hs W Y TS G T WY T P P Y we N D EH H s Q D N Q RR D E N K 21315 R K R R K 1 2 3 4 1 ji jdnin Surfdce dnd Inside Valurres in Glabuldr Proteins thure Z771979491 Zi K Walfenden L Anderssan Pi Cullis dud L1 Sauthgdte Affinities afAmirm Acid Side CndirIs far Solvent Wdter Biochemistry 201981849855i 3 j Kyte dn i Daalite A Simple Method for Displdyirua the Hydrapdthic Cndrdeter afd ProteinJi Mal Biol 15719BZ105 13Z elawitz a Lesser Ri lee dud M Zen ls Hydrophobicity afAmirm Acid Residues in Glabuldr Cripan 2291935334s3amp Crystallographlc waters f111 holes in the surface
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'