STTEACHLEARN SCIEN BSC 5936
Popular in Course
Popular in Biological Sciences
This 46 page Class Notes was uploaded by Kari Harber Jr. on Thursday September 17, 2015. The Class Notes belongs to BSC 5936 at Florida State University taught by Staff in Fall. Since its upload, it has received 17 views. For similar materials see /class/205433/bsc-5936-florida-state-university in Biological Sciences at Florida State University.
Reviews for STTEACHLEARN SCIEN
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/17/15
Basic math for biology Lei Li Florida State University Feb 6 2002 The EM algorithm setup Parametric models P9 Data full data YX partial data Y Missing data X Likelihood and maximum likelihood estimate log P9Y log PgY7 X 7 log P9X Y logLY 9 logLYX 9 7 log PX Y 9 Maximum likelihood estimate 9 maximizes logLY 9 However usually the MLE based on the full data has a closed a form The EM algorithm key idea Take conditional expectation with respect to Y y at the parameter 9 and et Q99 E9l10gLY7X9iY Y H9 9 Ee10g PXlY WY y logLY 9 Q99 H99 EM algorithm Iterate between the following two steps with an initial value 9 o Estep Calculate Q 96 for a current value of 9 o Mstep Maximize Q 96 with respect to 9 The EM algorithm the magic Conditional expectation can be calculated in cases such as exponential family Don t worry about Hi96 Jensen39s inequality Shannon39s first theorem Partiallikelihood always goes up Convergence7 local maximum Bayesian inference Parametric models pdatait9 A prior distribution of 9 7r6 6 is a random variable Posterior distribution of 9 7 pdatah9 Md imagery 949 MAP solution the 6 that maximizes it Posterior mean the expected value of 6 wrt the posterior distribution Gibbs sampling Conditional distributions of X17 X2 known 10X1 IiiX2 I2 10X2 I2iX1 Ii Gibbs sampling scheme N 5 Generate I Start with z m39 ln according to 1011ng 1 Generate 1 according to pIgiX1 z n1 and go back to Step 1 Joint distribution of X17X2 given by Amw nhy n 172 Bayesian treatment of missing data problem Data full data YX partial data Y Parametric models pYXl6 What we need pX 9lY and hence p 9lY We can apply Gibbs sampling if we know the following 0 10Xl97Y 0 109lX7Y Markov Chain 0 Markov property the future and past are independent given the present knowledge 0 Markov models Why do we need them 7 Time dependence and stochastic process 7 Markov property simple but general enough 7 Characterized by transition matrix7 Pij Structure and notation Hidden process Markov Chain XE takes values from n states si and transition probability matrix pij PXt1 slet Observation each hidden state XE emits a random variable 0 taking values from m letters vj and emission probability ejk POt vlet Parameters A initial distribution of hidden 39states 9 091739 7671 101739 51k Topology of the hidden Markov Chain represents our a priori knowledge The time scale of observation process is not necessarily 1D alignment of two sequences The three basic problems in HMM o Likelihood what is the probability of a sequence of observations Forwardbackward algorithm 0 Parameter estimation what are the maximum likelihood estimates of parameters EMalgorithm o Decoding what is the most likely sequence of states that produced a given sequence 0 observations Viterbi decoding marginal decoding The forward algorithm Let 112 P0102017Xt 3139 A 1 Initialization a1i Oieml 2 Induction az1 02iPiji jo 1 7 11 3 Termination POi 21 aTi Complexity nn 1T 71 n multiplications and nn 71T 71 additions The backward algorithm Let Pot1ot2 OT XE 3139 A 1 Initialization BTU 1 2 Induction n 5 Z Pijemw wl i1 3 Termination T PO 2 am Complexity n2T computation Marginal decoding Let 71239 PXt siiO 0 then MU 0422 50 0412 512 P0 0W 2211 azi zi The state that maximizes this marginal posterior probability gives the solution of marginal decoding The Viterbi decoding Goal find x max1 PX7 O olA Soul optimal path on a directed acyclic graph DAG Intermediate variables in the recursion Let mi be the probability of most probable path ending in state 5139 Namely max PX1 I1 Xt71 mahXt 5230102 otl mil 701 Keep track of the argument which maximizes the above quantity MU NH 5 gt The Viterbi decoding recursion Initialization 771239 61301 1km 0 Induction mm gig maxim1km zj 121129271 77171i17ij Termination PrO 7 71 7 7 IT 7 121339 WW Traceback I 771I1 The EM algorithm in HMM 0 Missing data in HMM the hidden states X o Conditional expectation 7N PXz Si O 0v 51239439 PXt siXt1 Sj O o A where azi Pijej 0 1 t1j E 27 tlt XL XL atzpij j0 1 tl o MLE of the full data 7 LT BLT 152171 A k Et10 vk 717 1517 6 17117 7 Ma 1 t o Computation under ow Homology Modeling A Brief Introduction W Ross Ellington Biology1MB Traditional approaches to 3D structure determination Xray crystallography Sufficient protein Natural protein is often heterogeneous Can it be expressed Requires full length expression construct If so in soluble state Often recombinant protein is present as insoluble inclusion bodies Will it crystallize Perhaps the most timeconsuming task Will the crystals be of diffraction quality Some crystals Will simply not diffract How do you solve the phase problem heavy metals selenomethionine molecular replacement NMR spectroscopy Same problems above with respect to protein availability Protein must be N15 andor C13 andor H2 labeled bacteria must grow in isotopically enriched media samples may cost gt 2000 to prepare Protein must be stable for weeks Restriction is terms of size of protein lt 2530 kD All methods are time and labor intensive but yield atomic level resolution 15 3 A Proliferation of deduced amino acid sequences 1 RTPCR PCR amplification of RT products using redundant gene specific primers very rapid generates large amounts of data 2 cDNA libraries Cloning of RT products when screened yield large numbers of cDNA sequences 3 Genomic DNA sequencing efforts Eukaryotes man mouse chicken pufferf1sh zebrafish Ciona tunicate sea urchin fruit y C elegans nematode and many unicellular pathogenic species Gene finder software extracts EXONs and assembles ORFs Prokaryotes gt 25 genomes have been sequenced Human genome codes for gt30000 proteins Convert sequences to 3D structure gt 40 sequence identity HOMOLOGY MODELING actually extends to much lower degrees of identity 2040 sequence identity THREADING lt25 sequence identity AB INITIO modeling Homology modeling assume query sequence and homologs share the same basic folds approach is to pick out the best template from a suite of homologs Homologous proteins share a common evolutionary origin ancestor Orthologs the same gene present in different species ca hemoglobin in fish vs camels Paralogs products of gene duplication events brain muscle sarcomeric mitochondrial and ubiquitous mitochondrial creatine kinases in man The potential for homology modeling gt 500000 protein sequences are known There are gt10000 experimentally determined 3D structures of proteins 33 of known protein sequences have similarities to a proteins for which the 3D structure has been determined Therefore there is the potential for homology modeling of gt 150000 proteins Identify related structures Templates TARGET TEMPLATE SEQUENCE STRUCTURES Eb Select templates g E 9 F g F Align target sequence with template structures ALIGNMENT TARGET xmnnnmumwnnovmmnmmw TEMPLATE KLVSswPDDVHKEVcVGPATRKVAGHMHHnSVNGDLVI Build a model for the target uslng lnlonnatlon from template structures TARGET MODEL Evaluate the model Find homologs 1 Pairwise comparison compare target sequence individually with other sequences in a database of sequences FASTA BLAST Easy to use online sites probe with protein BLASTp tBLASTn 0r nucleotides BLASTn NCBI httpWwwncbinlrnnihgovBLAST EMB httpWwwchembnetorgsoftwareBottomBLAST html EXpasy httpusexpasycrgtoolsblast 2 Psi Blast constructs multiple sequence alignments of many sequences which are iteratively sampled from a database position speci c scoring matrix is used to sample database for additional homologs very useful when percent identity is low NCBS National Center for Biological Sciences Bangalore http capsncbsresincampasspsi blasthtml 3 3D template matching pairwise comparison of the query sequence and a protein of known structure structure dependent scoring function is used to ascertain whether a query protein adopts or not any one of the library of 3D folds useful when pairwise or multiple sequence approaches have not identified homologs Imperial College 7 3D PSSM httpwwwsbgbioicacuk3dpssm 4 PROPSEARCH uses the amino acid composition instead In addition other properties like molecular weight content of bulky residues content of small residues average hydrophobicity average charge aso and the content of selected dipeptidegroups are calculated from the sequence as well 144 such properties are weighted individually and are used as query vector The weights are then trained on a set of protein families with known structures using a genetic algorithm Sequences in the database are transformed into vectors as well and the euclidian distance between the query and database sequences is calculated Distances are rank ordered and sequences with lowest distance are reported on top University of Montpelier httpwwwinfobiosuduniv montpl frSERVEUIVPROPSEARCPUPre sentatio nhtml Template choice 1 N W b a anemone D2 AK 1 AK se sea anemone D Higher the sequence h anMCKsar identity the more likely 90 ratM CKubx h NECK b the template Will be chugggAbL CKXbi Chaetaptems MC Chaetaptems dAmCK Branchmstama CK Most closely related from a phylogenetic point of view Template environment solvent pH temperature quaternary structure sea cucumber AK Emma LK Neamhes GK Quality of the template structure resolution and R factor enemy 545 Length Rama 15m cape 5 Percent Snmlanm 5n43u Percent Identity 4uns Hatch mayday threshnlds in the alignmenclsl mmnw 2 71 UrechlscaupuLKcxc x hmulusakpEp Occuber 29 2mm 1242 Al1gnment of target 1 mm pnrmrsmsrmmmmmm 44 ll HHll query Wlth 1 mnomnmimcrmmmfsncxsumammnvrnsnammmnt 5n 45 Drmrmmmrmmermmmvmermrmr 94 templateus1ng m m lllH l New ll 54 umemmcmmermmrm mmmm mu CLUSTAL GAP 95 WD45mmmecmmrrcrcrrrsemmmr 143 39 H quotllllllll39ll 1m PKEUGDmTLvDLDPGGQx IIsTRVRCGRSLQGYPFNPCLTAEQ IKEHEE 15D 144 vrnames Lsmmmmmrmmrormrme 193 NCBIetcw1th lmmw mmmu l 151 mmemrmmrmmmrmmmmm 2m manual 494 wwwmmmn mrrnrrevrsmmmmmre 243 l l l l l l m lll lHHlHH l adjustments m l m H 2m cRYquRGIFImm Lvmmammnsnoxcmmmwn 249 294 KHZ WALoLoKRGmGEIrrmvnnv mISNAARLKIGEREFVQLLID 342 n m HHlHH mm H x 295 xvmnmxrmwmmcm rasmcvmIswmmnaymvnmop 344 343 cvmunnmmmommmm 355 l Hl 345 cumxmaxmnn 357 Look for highly conserved regions this helps with the manual adjustments of al1gnments MS TS QN K I K MTAVAG Pi WM v x r r M39T V l iiMT39T V l K iiMS SD K i 4 44 4 4 4 4 4 4 4 444444444444444 44 EXAMPLE Chaelaplerus va apedalus polychaete marine worm mitochondrial creatine kinase vs chicken sarcom eric MiCK Xray crystal structure published in 1995 by FritzWolf et al PDB lcrk maiicy 105 Length Rszxu 3 m Gaps 2 Percent simich 77 65 Percent Idmclty GB 553 Hszch dxsplay c hxesholds for the aiigmmisi DEmI39KY 2 i Emacmmim m s i i HAGTFGRLLAERVTAALFMAGSGVLTTEYLLIIUUNVKA39KVHEIQIKLFPP 5n 5 SANYPDLSuHmlIHASmTWIYAKLRDKVTPNGVTLNLEIU39KWDNPGH SA ii iiii iii ii iii iiiiiiii iiii i iiiiiiiiiii i mmiizmmcmmmmmmmmiwnuimmx EIEI imcmmsc ii i IFDPVI PF RI GLVAEDEESYKVFAD mm mTDLDSTKL 113 iiiiiiiiiiiiiiiiiii iiiii iiiiii i iui PFIHVR IVAEDEESYKVFAE minismimmmmsxi 15D LN DFKYVLSSRVR39KGREIRGLSLPPAESMERREVERVWEA G i 3 iii iiiiiiiiiii iiiiiiiiiiiiiiiiiii iii ii ii 5 THGUFDERYVLSSRVR39KGRSIRGLSLPPAESMERREVENVWTALAELK 2m mi mm uEmIDHFLFDKPVSPLLLASGHARDWPDARG213 iii iii ii iii iiiiiiiiiiiii iiiiiiiiiii 2D mLSGKYYSLHIHSEEDDuULIDDHFLFDKPVSPLLTEAGHARDWPDARG 25 21A mmiimmmmizvmummizwmi muuus m iiii i iiii iiiiiiiiiii iiiii ii iii ii ii ii 5 mmmiwximizimizvmmsmwmi LREVERLIKE mu 26A minimumawncisiiismimmiiiqmmmiiimmm m iiiiiiiii iiiiiiiiiiiiiiiiiiii i i i iiii i i minimumawncisiiismimmimmsmmmmiz 35m A inmmvommimisiisoizisxsiwuivuummmimimxiz as iiiiiiiiiiiii i iiii ii i iii iii i ii i ii 35 inmmvommmisuwmxsxvxivuivmmmcm mu in mm HMLDK 375 ii ii i mi LEKEQDIKVPPPLPUFGRK A15 x magmaka pep February 25 zuuz A El Generate coordinates for SCRs and VRs using templates0 Webbased 1 Swiss Model Glaxo Welcome 2 What If EMBL Pro grams Modeller free Unix Linux Rockefeller U N Insight 11 homology Unix MSI 4 httpwwwexpasvorgswissmod What If http wwwcmbikunn1whatif Modeller httpwwwsalilaborgmodellermodellerhtml Insight 11 http wwwaccelgscominsighthomologyhtml PDB leheader HEADER TRANSFERASE maimxisa 1cm TITLE MITOCHONDRIAL CREATINE KINASE comm M01119 1 comm 2 MOLECULE CREATINE KINASE comm 3 CHAIN A E c D comm 4 Ec 2 7 3 2 comm 5 EIOLOGICALiLVNIT OCTAMER scum mun 1 scum 2 ORGANISM scmmm mm mm Background 7 scum 3 ommmymom CHICKEN information scum 4 0mm Ham SOURCE 5 TISSUE MuscLE SOURCE 6 CELL SARCOMER SOURCE 7 ORGANELLE MITOCHONDRIA KEYWDS TRANSFERASE CREATINE KINASE EXPDTA ximy DIFFRACTION AUTHOR KFRITZWOLFTSCHNYDERTWALLIMANNWKAESCH REVDAT 1 D77JUL797 1cm n mu AUTH x EEITHOLE T SCHNYDER TWALLIMANNW ma SCH mu TITL STRUCTURE OF MITOCHONDRIAL CREATINE KINASE mu REF NATURE v 331 341 1995 JRNL REF ASTM NATUAS UK 155 uu2aiuaaa nuna PDB leremarks 333333 3 3 33 333333 2 3333333333 33 333333333 33333 3 333333 3 3333333333 333333 3 33333 333333 3 3333033 3 3333333 Properties 333333 3 333333 3 3333 3333 33 3333333333 0fthe 3 333 3 structure REMARK 3 RESOLUTION RANGE Low lAIlGSTROMS E REMARK 3 DATA CUTOE Am El like 3EMARK 3 DATA CUTOE E m3 AESIFH UULL resollltiorly REMARK 3 DATA CUTOE E L l ESIE H NULL REMARK 3 COMPLETEIIESS woRKINGTE5T S 996 error etc 333333 3 333333 03 3333333333333 43333 333333 3 REMARK 3 FIT TO EATA USED IN REFINEMENT REMARK 3 cRosswALmAnoU METHO UULL REMARK 3 FREE V E TEST SET SELECT oU UULL REMARK 3 R V LUE LMoRKIUs SET n217 REMARK 3 FREE R VALUE 234 REMARK 3 FREE R VALUE TEST SET SIZE 1 m REMARK 3 FREE R VALUE TEST SET COUNT UULL REMARK 3 ESTIMATED ERRoR 01 FREE R VALUE 3 NULL PDB file helixsheet HELIX 61 61 ALA 1BR LYSD 11D HELIX 62 62 ARG 143 GLY D 159 HELIX 63 63 GLY 162 LEU D 164 HELIX 64 64 GLU 176 ASP D 194 HELIX 65 65 PRO 195 ALA D 2mm HELIX 66 66 MET 241 ARG D 262 HELIX 67 67 PRO 279 ASN D 291 HELIX 6E 69 PRO 295 LYS D 299 HELIX 69 69 PHE 3D3 LEU D 31D HELIX 7D 7D GLU 341 GLU D 363 SHEET 1 A E GLY A 166 SEE A 17D El SHEET 2 A E GLY A 211 ASN A 215 71 N HIS A 214 o LYS A 167 SHEET 3 A E PHE A 22D ILE A 224 71 N ILE A 224 o GLY A 211 SHEET 4 A 9 THE A 23D LYS A 237 71 N ILE A 233 o LEU A 221 SHEET 5 A E vAL A 121 ARG A 13D 71 N ARG A 13D 0 THE A 23D Secondary structure a helices and 3 sheets N H wmmwmmmw5w N PDB le atomic coordinates of atoms 11014 2 ca THE 1 1 s1a1 51ss1 n7az 1aa aaaa c 11014 3 c THRA 1 7 324 52 sas n5aa 1 aa aa a3 c 11014 4 0 THE 1 1 7 a73 53 a4a az5a 1 an as as 0 11014 5 ca THE 1 1 5 395 51 971 z1as 1 aa aa 45 c 11014 s 061 THE 1 1 5 saa 5a 979 31az 1 aa aa aa 0 11014 7 ccz THE 1 1 3 aa7 51 as5 1a3s 1 aa aa 95 c 111014 a 11 VAL a z a 5s5 52 223 a7sa 1 an a7 75 11 111014 9 ca VAL a z a 753 53 a43 n5aa 1 an a7 41 c 11014 1 c VAL a 2 1a 31s 52 sas 7aa74 1 an as 54 c 11014 11 o VAL a z 1a5a1 51 527 71172 1 aa aa 17 0 11014 12 ca VAL a 2 1a asz 52 am 15as 1 an a7 73 c 11014 13 c6 VAL a z 11a1z 54 aa7 1s35 1 an a7 za c 11014 14 ccz VAL a 2 1a 237 52 5a1 2 95a 1 an as 79 c 11014 15 11 1113 a 3 1a 417 53 713 71 73s 1 an a4 79 11 11014 1s ca 1113 a 3 1a 774 53 4az 73132 1 an a1 24 c 11014 17 c 1113 a 3 11457 54 sa7 73727 1 an 7a 71 c 11014 1a 0 1113 a 3 1aa3a 55 729 73959 1 an 74 aa 0 11014 19 ca 1113 a 3 9 52a 53 131 73947 1 an a3 7a c 11014 za CG 1113 a 3 a aa7 5z a4a 75 39a 1 an a4 72 c 11014 21 11111 1113 a 3 9 a75 51 557 r5aa3 1 an a5 s5 11 11014 22 CM 1113 a 3 a a5 53 ss7 rs45z 1 an a4 35 c 11014 23 c121 1113 a 3 1a a54 51 sa7 42m 1 an a5 45 c 11014 24 aaz 1113 a 3 1a1aa 5za77 775s5 1aa a54a 11 Three general methods for initial model construction 1 Assembly of rigid bodies backbone of target is arranged to template positions of QC carbons RAPPER Cambridge University is a discrete conformational sampling algorithm for restraintbased protein modelling It has been used for allatom loop modelling whole protein modelling under limited C alpha restraints comparative modelling ab initio structure prediction structure validation and experimental structure determination with Xray and nuclear magnetic resonance spectroscopy http ravenbioccamacuk 2 Segment matching coordinate reconstruction most hexapeptide segments of protein structure can be clustered into approximately 100 structural classes subset of atomic positions in template are used as guiding positions to fit segments from the target into these positions can be used to model main chain and side chain atoms SegMod component available in UCLA s Genemine httpwwwbioinformaticsuclaedugenemine 3 Satisfaction of spatial restraintsmodel is built on minimizing Violations of spatial restraints constraints as de ned from the template homology derived restraints mainchain and sidechain dihedrals mainchain CACA distances sidechain mainchain distances etc and stereochemical bond angles dihedral angles nonbonded atomatom contacts MODELLER A Sali amp TL Blundell Comparative protein modelling by satisfaction of spatial restraints J Mol Biol 284 779815 1993 A Fiser FlK Do amp A Sali Modeling of loops in protein structures Protein Science 9 17531778 2000 Initial model will yield a framework of SCRs consisting of the following template sup Backbone of target protein is arranged to a carbons of the Secondary structural elements are present I and 11J angles are arranged in SCRs O Initial modeling will yield a framework which lacks two important components a VRs non conserved loops and b side chains mum M15 Ith m caxxecthucl LxEK m x mamas m rehnnly 25 2m in m n39slmn39n n l mrmmLmMmrmassmnsuwavmmmmy 5n N smLsnmxm Iwwmmymwumnmmwm u H m m H m mm m l mumm 51 smmmmwmurmwmmnm aswncxn39mmws mu 55 mmummsmmmmmmmm mmmsm m HHm mmmm H l H m H mm miynmmmnttsmrmxmwx 5mm Hum Nu m mmmssmmummmsmmmwmm 153 m mumm mmummmu m u u w Imnmmmssxmmxsxnuawusmmmwmmk 2nn m 5mm mnmnmmmummwms 213 w m ll lH Hmwa WHHHH 2nsmsmsumsmnnmmnmmwvseurmmmw 2m 2 a x nmummsnmmmmmmmmms 253 m lHH mumm Hm u m u H H 20 xumnnwxmmmxsmsmmvmrmummm an m xmmtmmrmsnnskawnmmmmm m HmHH mmummmm l l l m l 3n xmmtmmrmsmnsmummsmmmmtnx m a Lunznwvmmm39rmxsvsnmsxslvnwnnmmsmumm 353 mmwm l HHHHHHHH u m LunznwvmwmwnxsnnmsmMunwmmmcm Ann 31 m Lmsn mum m Lmsw mmynrm m Loop modeling changes in loops usually occur in exposed regions that connect secondary structure elements No reliable methods are available for constructing loops longer than 5 residues Two general approaches p t Ab initio 100p prediction conformational search or enumeration of conformations in a given environment guided by a scoring or energy function Also available on the Cambridge University RAVEN site httpravenbioccamacukloop2php 2 Database approachesConsists of finding a portion of the main chain which fits the stem regions of a particular loop The database of protein folds is searched and fitted within the constraints of the loop sequence stem structure and energy optimization criteria Loop database at PKUBIOS China httpmdlipcpkueducnmoldesoldmemliwzhome loop htm Side chainsdefined by X1 39 chiZ chi4 X2 dihedral angles S LAltQr NHZ 5quotquot chi3 protein side chains play a key role in molecular recognition and packing of hydrophobic cores of globular proteins significant correlations eXist between X1 X2 and I and 1 angles side chain conformations eXist in a limited number of canonical shapes rotamers rotamer libraries can be constructed where only 350 conformations are taken into account for each side chain approaches involve iterative sampling of rotamers for each side chain into the target backbone with scoring matrices and attention paid to steric clashes of sidechain with mainchain Re nement by molecular mechanics restrain the region of the model that is most likely correct and focus on suspect areas or perform on entire molecule idealize bond geometry and remove unfavorable nonbonded contacts by protein force fields using AMBER Assisted Model Building with Energy Refinement httpamberscrippsedu CHARMm Chemistry at HARvard Molecular mechanics httpwwwaccelryscominsightcharmmhtml GROMOS Molecular Dynamics httpwwwigcethzchgromos l7 this essentially is an energy minimization approach during force eld calculations there is a tendency for the structure to drift away from the control structure This drift can be reduced by a limiting the number of minimization steps and b restraining many of the alpha carbons Errors in homology models a errors in side chain packing as proteins diverge the packing of side chains in the core changes b distortions or shifts in correctly aligned regions sequence divergence may result in main chain conformation changes even though overall fold is the same c errors in regions without a template occurs in segments of a target protein for which there are no equivalents in the template 18 d errors due to misalignments most common error and e incorrect templates is a problem that occurs when using distantly related templates d EDN KPPQFTWAQWFETQHINMTSQQCTNAM lll A KETAAAKFERQHMDSSTSAASSSNYCNQMMK aaaaaaaaaaa aaaaaaa Evaluating models Explicit approach overlay the model on the template and evaluate rmsd root mean square deviation of the oc carbons eXperimental rmsd values by Xray crystallography range from from 05 A for the same protein to l A for proteins with gt 50 identity successful model has lt 2 A rmsd from template if template has a sequence identity gt 60 then the above criterion would be met with a success rate gt 70 sequence identity of target and template is critical l9 EDN CRAEPI NM23 o TEMPLATE TARGET I MODEL TARGET D TEMPLATETARGET DIFFERENCE a ALIGNMENT ERROR 00 STRUCTURE OVERLAP 01 o 40 60 80 100 20 o SEQUENCE IDENTITY Judging quality of homology models 1 ACCURACY how well it fits the templates upon which it was built rmsd deviation 2 CONFIDENCE FACTOR Model Bfactor SWISS MODEL provides a Bfactor which is an uncertainty factor higher the value the lower the amount of actual structural support is present for a particular portion of the model 3 INCORRECTNESS presence of hydrophobic groups on the surface or polar groups in the interior that do not have hydrogen bonding or ionic bonding capabilities satisfied by their neighbors or steric clashes or unreasonable structural parameters such as bond angleslengths 20 4 Ramachandran diagram plot shows mainchain conformational angles there are a nite number of I and IP angle combinations as de ned in the plot glycines which lack side chains often fall out of allowed regions Residues other than glycines in a model that are not present in allowed regions are suspect The Ramachandran Plot Left handed p alphahelix psi Right handed alphahelix 43023 430 phi D phi 130 5 REASONABLENESS de ned as whether the model is in keeping with expectations for similar proteins Assessed by summing up the probabilities that each residue should occur in the enVironment in which it is found in the model For all PDB models each of the 20 amino acids has a certain probability of belonging to one of the following classes solventaccessible surface buried polar eXposed nonpolar heliX sheet or turn Regions in the model that do not t expectations are suspect E Threading Energy a criterion for reasonableness higher the energy lower the reason ableness g 21 Detection of CI I OI S 1manual inspection 2 checking stereochemistry PROCHECK wwwbiochernuclacuk WHATCHECK WWWsanderemblheidelbergde SQUID wwwyorvikyorkacuk 3 statistical analyses of structures based on compiled 3D features of many proteins VERIFY3D wwwdoembiuclaedu ProsaII wwwcamesgbacat Identify related structures Templates TARGET TEMPLATE SEQUENCE STRUCTURES Select templates Align target sequence with template structures ALIGNMENT xmnnnmumwnnovmmnmmmv TEMPLATE KLVSSENPDDVHKEVGVGPATRKVAGHMPIHIISVNGDLW Build a model for the target l uslng lntonnatlon from template structures TARGET MODEL Evaluate the model 22 SwissModel First Approach Mode Please fill these fields Your Email address V 7 i MUST be correct Your Name 7 7 7 7 77 Request title m2 added to the results j REMEMBER my name and email for next login Provide a sequence or a SWISSPROT AC code El NOTES A SWISSPROT AC code looks like this P04406 Sequences can be provided in either RAW SWIS SPROT FASTA or GCG format Now Send Request or Reset Form Options De ne the lower BLAST PN limit for template selection Lower BLAST limit 3399in MWE De ne the templates you wish to use for this request In some cases you may wish to de ne a set of template structures to be used for a modelling attempt As an example modelling a serpin Serine esterase inhibitor similar to the plasminogen activator inhibitor antithrombin 111 etc will generally fail since these proteins have two distinct structures in the database 1 the activated form of all true serpins 2 the precursor form as found in the serpin analogue Ovalbumin It is thus best to chose the correct templates you wish to base your model on To do so you may use ExPDB templates your own templates or a combination of both 0 Using ExPDB templates 1 Search f0r suitable templates in the ExNRL 3D database 2 Select the entries you consider appropriate from the hit list andor check if their codes exist in the ExPDB database 3 List up to 5 entries in the window below separated by space NOTE The ExPDB database is derived from the PDB database Each chain is in a separate le and the residues have been renumbered contineouslyl The ExPDB codes are built from the PDB codes as explained here 23 39 Results options The SWISSMODEL server will return all results via Email g7 Swisngb Viewe quotsa39r39ema i aodermd the 1e mp1ates as a W i SwissdeViewer project le and a log le tracing all actions lmo le taken by the sewer l includes the nal model co ordinates le in PDB format and a 39 Normal mode 310g le tracing all actions taken by the server Short model W W i r 1wil1 retuingily the nal modelcoordinates le H Send the results as plain ASCII mail instead of email attachment a 7 Include a WhatCheck report of the final model SwissModel Optimise Mode Please ll these fields Your Email address J MUST be correct Your Name l i Will be added to the results Request ntle 7 7 7 7 7 g header r RE EWE my name and email for next login Provide a SWISSMODEL project le Now iws d cii or mar 24 Quallty 1415 Length 419 Ratio 3763 Gaps 2 Percent Slmllarlty 7766EI Percent Identlty 68883 Match dlsplay thresholds for the allgnments IDENTITY 1 correctchaetM1CKtxt x ch1ckenM1CKsarpep February 25 zuuz 14u1 MRLGTSNVHLRYER 15 SANYPDLSQHNNIMASNLTPVIYAKLRDKVTPNGVTLNLCIQTGVDNPGH 51 SADYPDLRKHNNCMAECLTPAIYAKLRDKLTPNGYSLDQCIQTGVDNPGH 65 PFIKTVGLVAGDEESYEVFADLFDKCIDERHGGYKPWDKHPTDLDSTKL HMHH H M 1D1 PFIKTVGMVAGDEESYEVFAEIFDPVIKARHNGYDPRTMKHHTDLDASKI 151 THGQFDERYVLSSRVRTGRSIRGLSLPPACSRAERREVENVVVTALAGLK 214 IWHNDKKNFLVWVNEEDHTRVISMQKGGNMREVFDRFCNGLQKVENLIQS 251 IWHNNDKTFLVWINEEDHTRVISMEKGGNMKRVFERFCRGLKEVERLIKE 351 LQKRGTGGVDTAAVADVYDISNLDRMGRSEVELVQIVIDGVNYLVDCEKK 364 LERGQRIDDLIEK 376 4D1 LEKGQDIKVPPPLPQFGRK 419 14 5D 64 1mm 113 15D 2mm 213 25B 263 3mm 4mm Chicken sarMiCK 25 26 27 9 9 0 5 Q 5 O n g 0 E n S Q X 28 Where do we go from here 39I39High throughput structure determination efforts will never come close to determining the bulk of structures 39I39However a dictionary of most of the possible folds is a realistic 39I39Given these folds homology modeling affords great promise in identification and characterization of structural genes whose products have an unknown function 30
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'