Comp Structural Bioinfo
Comp Structural Bioinfo ECS 129
Popular in Course
Popular in Engineering Computer Science
This 13 page Class Notes was uploaded by Ashleigh Dare on Tuesday September 8, 2015. The Class Notes belongs to ECS 129 at University of California - Davis taught by Staff in Fall. Since its upload, it has received 27 views. For similar materials see /class/191732/ecs-129-university-of-california-davis in Engineering Computer Science at University of California - Davis.
Reviews for Comp Structural Bioinfo
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/08/15
Name ECS 129 Structural Bioinformatics Finals December 10 2008 l The final exam is open book open notes 2 You have 2 hours no more I will strictly enforce this 3 The final is divided into 2 parts and graded over 110 points with 10 additional possible extra credit poinm at the end ofPart H 4 You can answer directly on these sheets preferred or on loose paper 5 Please write your name at the top right of each page you turn in 6 Please check your work If possible show your work when multiple steps are involved 7 I will send you an email with your overall grade homework midterm nal Answers to the questions in the final will be available on the class web page on Thursday Part I 12 questions each 5 points total 60 points Most ofthese questions are multiple choices in each case find the most plausible answer 1 The current estimate for the number ofhuman genes is 30000 Assuming that a gene ntains on average l000 nucleotides genes occupy 30 million bases out ofthe estimated 32 billions total nucleotides of the genome The remaining nucleotides are 0 A alien DNA that currently remains dormant o B junk DNA 0 C the real genetic information as genes are only backup storage which is only used in case of emergency 0 D a combination of control regions for genes RNA coding regions and regions whose purpose is still not known 2 The figure below shows a nonstandard nucleotide base pair identify it note that dX indicates a deoxyribonucleotide as contained in a DNA molecule while rX refers to a ribonucleotide as found in an RNA molecule 0 AdA7dC o BrA7rC o CrA7dT o DrA7dC Name 3 The gure below shows a small peptide of six amino acids give its sequence hint there is one charged amino acid at physiological pH 7 from pH 55 to pH 80 A AWEFGF B AWDFPY C AYDYPW D AWNFPY OOOO 4 You want to design a small peptide that can interact with the maj or groove of a double stranded DNA molecule Your constraints are the peptide should be helical at least predicted to be mostly helical based on Chou and Fassman see appendix D it should contain 12 residues and it should be charged at physiological pH pH 55 to pH 8 Which of the following peptide would be the best candidate A MPGCLWQALGLP B MPGLEWQLEGLP C MPGYTWTTVGST D MPGLEDELEDG 0000 5 Which of the following statements is probably false A RNA can impair gene expression B A protein may exist in different conformations inside a cell C All proteins start with a methionine D Many bacteria on earth have their own genetic code that differ from the genetic code we know 0000 6 Peptide Nucleic Acids or PNAs are synthetic oligomers with a protein backbone on which bases purines and pyrimidines are linked every second N Unlike DNA PNAs do not contain sugars or phosphate groups PNAs are represented as proteins from Nter to Cter Find the sequence from Nter to Cter of the PNA shown below 7 Name A NterTACGTACter B NterCGTACGCter C NterCATGCACter D NterTGCATGCter One of these statements about protein structure prediction is incorrect o A Homology modeling predicts the structure of a target protein using the structure of a homologous protein the template 0 B Chou and Fasman developed a method for secondary structure prediction 0 C foldinghome is a program that predicts the structure of proteins using distributed computing 0 D ab initio methods predict the structure of a protein by interpolation from the structure of its messenger RNA We want to nd the best alignments between the 2 DNA sequences AATGTC and AGCTC The scoring scheme S is de ned as follows Sii lO Sij 5 ifi andj are both purines or both pyrimidines and Sij 0 ifi is a purine andj is a pyrimidine or if i is a pyrimidine andj is a purine There are no gap penalties The score Sbest and the number N of optimal alignments are show your nal dwamic programming matrix for full credit 0 A Sbest 40 N 1 o B Sbest 45 N 2 o C Sbest 45 N 1 o D Sbest 40 N 3 The same small RNA molecule with a speci c catalytic activity has been found in 4 different species The four corresponding sequences are different but it was possible to align them without gaps This alignment is given below Which of the two structural models show below ts best with this alignment hint only look for perfect co variation Namz Sequtnte alignment CCCGCCCCUUUUCCGAGGGUCAUCGGAACCA CCGUUCGCUUGCGAUCCGAACA UCGCCCU GCCUGAGCUUCGGGACCUCACUGUCCCGCCA CCCUAUACUUAAUAGU UAUACAACAAUUCCA suuclummmdels A B r c r r s c G n u c I A c G G u c c i a quot u u u cu c uu F c o L 10 The parenthesis diagram for model B above question 9 is CCCGCCCCUUUUCCGAGGGUCAUCGGAACCA What can you say about this representation 0 A We had to use two types of symbol for representing the base pairs The H en lo mm a mi structure contains non nested base pairs 0 B This is a typo and all brackets should be replaced with parentheses I I I I I I I U I can I U consecutive nucleotides A in CGA and rst Gin GGG D This parenthesis representation conesponds to model A above and not model B Name 11 The socalled Rosetta stone for predicting proteinprotein interactions is O A Gene fusion B Gene coeXpression C Presence of the name of the two proteins concerned in the same scienti c paper D A very old stone recently found in Gizeh Egypt next to the Sphinx that describes the code for proteinprotein interactions in three languages hieroglyphic demotic and Greek 12 Only one of these statements is correct 0 O A Dynamic programming is used to predict tertiary contacts in RNA such as kissing hairpins and pseudoknots B BLAST align protein sequences using the Smith and Waterman dynamic programming approach C Most RNA molecules are single stranded while DNA is usually double stranded D All genes have at least one but less than ten introns Part 11 one problem 4 questions 10 points each total 40 points The following eukaryotic DNA sequence was given to you 5 CCCTTAATGCGTATCGCTCACGAGATGTTGGGCGGCTAA3 You are told that this sequence or its complementary codes for one gene Find the longest gene or open reading frame ORF corresponding to this DNA sequence remember that there are 6 possibilities ie 3 possible reading frames for one strand and 3 possible reading frames for its complementary Transcribe this ORF into an RNA sequence Name 2 As this is a eukaryotic sequence the gene you have identi ed may contain an intron For simplicity we will assume that introns always start with GU and end with AG in the RNA sequence Identify all possible introns and explain why their removal would result in the loss of the gene 3 Based on question 2 just above we know that the RNA is not spliced Find the sequence of the protein it encodes 4 Predict the secondary structure of this protein using the Chou and Fassman method with the propensities given in Appendix D Name 5 Extra credit 10 points Can you nd a single mutation at the DNA level of that gene that will modify the corresponding protein such that it will be predicted to be fully extended ie predicted to be a strand by Chou and Fassman Note there are several possible answers Part III 10 points Protein Nucleic Acids PNA see question 6 pa1t I above attract a lot of attention their inventors have already deposited twenty patents because of two main properties They are not degraded by nucleases proteins that degrade nucleic acids or proteases proteins that degrade other proteins They can bind to DNA and RNA Propose different applications of PNAs both for fundamental research and for medical research Name Appendix A Amino Acids Hydrophobic Amino Acids CD2 CG2 CG1 J CD 1 CB CG CA CA CB GLY G ALA A CA Leu L Val V CA CG CZ CEl CG1 CG2 CB CEZ I 39 D1 CB CD CG CA N CA 9m Ha I Pru P 2 3 Phe F D quot Met M Polar Amino Acids 1 7 CG2 0 CB CB 0G CA CA Ser S Thr T His H ND 0E1 D m2 C CG C CA Asn N Gln a Name Polar Amino Acids 2 NZ OE OEZ CE CG CD CD 39 CG 0 D CA CG Glu E 3945 CA CB SG 0 1 Y CG ODZ CB CA CysC CA ASMD Appendix B Nucleotides o NHz I j gt N K N H N H Tmninem AdmixMA NH 0 N N I A l 39 gt N o HzN N CYWSWW Appendix C Genetic Code Name U C A G U Phe Ser Tyr Cys U Phe Ser Tyr Cys C Leu Ser STOP STOP A Leu Ser STOP Trp G C Leu Pro His Arg U Leu Pro His Arg C Leu Pro Gln Arg A Leu Pro Gln Arg G A He Thr Asn Ser U Ile Thr Asn Ser C Ile Thr Lys Arg A Met Start Thr Lys Arg G G Val Ala Asp Gly U Val Ala Asp Gly C Val Ala Glu Gly A Val Ala Glu Gly G Appendix D Chou and Fassman Propensities Name Solutions to Part I and Part II of the final Part I Question 1 D option A was appealing but the correct answer is D Question 2 C rAdT look at sugars to nd the ribo and deoxyribo sugars pyrimidine is thymine as it contains a CH3 Question 3 C C and D both match the gure and the only ambiguity is for amino acid 2 since the peptide contains a charged residue at physiological pH it must be E at position 2 and not Q Question 4 B C is predicted to be in extended strand con guration A B and D are predicted as helices but D only contains 11 residues and A is mostly hydrophobic and does not contain charged residues Question 5 D The genetic code is universal ie applies to all form of life on Earth Question 6 B Question 7 D There are no known correlation between the structure of a protein and the structure of its corresponding RNA Question 8 C The dynamic programming matrix for this question is one trace back is shown in red the two other options are shown with arrows Question 9 The correct structural model is B I show below the columns of the alignment that coVary with letters and numbers 2 columns haVing the same letter on top coVary 12 3 abcdef32 1 fedcba LLLULLLLUUU ULL ILAULbbAACCA CCGUUCGCUUGCGAUCCGAACAGAUC GCCCU GC CUGAGCUUC GGGACCUCACUGUCCCGC CA CCCUAUACUUAAUAGUUAUACAACAAUUC CA Question 10 A Model B is a pseudoknot which cannot be represented with parentheses only We need both parentheses and brackets Name Question 11 A The existence of gene fusion is considered the Rosetta stone for understanding proteinprotein interactions It is de nitely not a stone found in Egypt though it would be nice Question 12 C RNA is indeed single stranded RNA secondary prediction using dynamic programming only detects secondary structures and cannot predict pseudoknots BLAST is a heuristic approach that does not use dynamic programming Bacterial genes do not have introns Part II The following eukaryotic DNA sequence was given to you 5 CCCTTAATGCGTATCGCTCACGAGATGTTGGGCGGCTAA3 Question 1 We don t know if the sequence given corresponds to the coding strand so we need to check both this sequence S and its complementary C 5 TTAGCCGCCCAACATCTCGTGAGCGATACGCATTAAGGG 3 The complementary strand C does not contain any ATG Start codon The initial sequence S contains two ATG in phase and one TAA stop codon in phase with both ATG Consequently the longest ORF goes from the rst ATG to TAA 5 ATG CGT ATC GCT CAC GAG ATG TTG GGC GGC TAA3 The corresponding RNA sequence is 5 AUG CGU AUC GCU CAC GAG AUG UUG GGC GGC UAA3 Question 2 There are two GU and one AG in the RNA sequence 5 AUG CGU AUC GCU CAC GAG AUG UUG GGC GGC UAA3 Based on the positions of these markers there is one putative intron GU AUC GCU CAC GAG If we remove this putative intron from the RNA sequence 5 AUG CAU GUU GGG CGG CUA A3 Name The start and stop codons are no more in phase and we have lost the gene Therefore this is not an intron and the RNA sequence remains intact Question 3 The protein sequence is obtained directly using the genetic code Nter 7 Met Arg Ile Ala His Glu Met Leu Gly Gly 7 Cter Question 4 To predict the secondary structure of this peptide we use the Chou and Fassman propensities M R I A H E M L G G P0c 147 096 097 129 122 144 147 130 056 056 P 097 099 145 090 108 075 097 102 092 092 Helix nucleation sequence IAHEML extension add R and M on Nter side and rst G on Cter side Computer average over 9 rst residues 111 gt 103 9 rst residues predicted to be helical Strand no nucleation site The prediction is therefore HHHHHHHHHO Note if you use the nucleation sequence HEMLGG to predict the helical content you nd that the whole peptide is helical This is a problem of the Chou and Fassman scheme I counted both answers as correct Question 5 Both Glu and Met are strong helix stabilizers according to the Chou and Fassman propensities If we replace Glu with a strong strand stabilizer we create a strand nucleation site that can be extended up to the Nter Met and to the rst Gly on the Cter Glu can be mutated to Val with a single mutation GAG gt GUG and Val has a small P0c 091 and a strong P 147 With the new sequence MRIAHVMLGG the propensities are M R I A H V M L G G P0c 147 096 097 129 122 091 147 130 056 056 P 097 099 145 090 108 149 097 102 092 092
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'