Introduction to Bioinformatics
Introduction to Bioinformatics BIOL 473
Cal State Fullerton
Popular in Course
Popular in Biological Sciences
This 11 page Class Notes was uploaded by Mozelle Windler on Wednesday September 30, 2015. The Class Notes belongs to BIOL 473 at California State University - Fullerton taught by Staff in Fall. Since its upload, it has received 52 views. For similar materials see /class/217033/biol-473-california-state-university-fullerton in Biological Sciences at California State University - Fullerton.
Reviews for Introduction to Bioinformatics
Report this Material
What is Karma?
Karma is the currency of StudySoup.
Date Created: 09/30/15
Developed by Charlotte Hall James Kirkpatrick Cassandra Krupansky Julie Pomerleau PurEose The purpose of these modules is to introduce the fundamentals of bioinformatics to Masters of Arts in Teaching Science MAT S students Three modules will be presented which include an introduction Introduction Module basic gene finding skills using the word processor review of basic concepts of cellular biology Module 1 operating bioinformatics tools including BLAST TACG and NCBT Module 2 The goal of the modules is to stimulate interest for the MAT S students and to become competent with the bioinformatics tools and be able to share this information with their students Introductory Module The purpose of this module is to stimulate interest in bioinformatics demonstrating to the learner what can be discovered and applied using the hemoglobin protein molecule as an example This will include analyzing the three dimensional structure of the molecule giving the ability to see the protein beyond the surface Also evolutionary relationships among various organisms in terms of their variation of hemoglobin will be presented Activity 1 What is Hemoglobin http sickle bwhharvard eduhemoglobin html Hemoglobin is a protein that is carried by red cells It picks up oxygen in the lungs and delivers it to the peripheral tissues to maintain the viability of cells Hemoglobin is made from two similar proteins that quotstick togetherquot Both proteins must be present for the hemoglobin to pick up and release oxygen normally One of the component proteins is called alpha t e other is beta Before birth the beta protein is not expressed A hemoglobin protein found only during fetal development called gamma substitutes up until birth red blood cell with normal hemoglobin http blology fullerton edubiol473sicklecell htm How is hemoglobin made Like all proteins the quotblueprintquot for hemoglobin exists in DNA the material that makes up genes Normally an individual has four genes that code for the alpha protein or alpha chain Two other genes code for the beta chain Two additional genes code for the gamma chain in the fetus The alpha chain and the beta chain are made in precisely equal amounts despite the differing number of genes The protein chains join in developing red blood cells and remain together for the life of the red cell How do abn rmai hemoglobins arise The composition of hemoglobin is the same in all people The genes that code for hemoglobin are identical throughout the world Occasionally however one of the genes is altered by any of a variety of quotaccidentsquot that can occur in nature These alterations in the genes called quotmutationsU are very rare Since genes are inherited and they contain the information needed to make a protein if a mutation produces an abnormal hemoglobin gene in a person the gene will be passed on to his or her children The children will produce a modified hemoglobin identical to that of the parent Most mutations in hemoglobin produce no problem Sickle red blood cell Occasionally however the alteration in the httpblologyfullertonedublol473 protein changes aspects of its behaVior The types Slddec lJmm of disorders that can result include sickle cell disease and thalassemia Activity 2 Viewing the Three Dim n i nal Structure of the H m gl bin M l ul httpinfobiocmueduCoursesBio h mM l Ruilde cksHbhtmlhbtop Note See the bottom of the webpage if you need to download the Chime program for molecule viewing While viewing the molecule look for the following information 1 Number of subunits 2 Number of heme groups 3 Metal ion found in each heme group Be sure to right click on the molecule in protein only view to view the following options 1 Display wireframe sticks etc 2 Color chain vs amino acid You can also control the rotation by the buttons or by using left click with the mouse What information does this perspective give you about the macromolecular structure of hemoglobin Activity 3 Web based Activity Using a search engine such as googlecom conduct a search to find information about the genetic and physiological differences between normal hemoglobin sickle cell hemoglobin and thalessemia write a brief paragraph on your findings Module 1 Introduction The main goal for Module 1 is to introduce the basic skills and knowledge in molecular biology needed to be functional in bioinformatics Module 1 consists of four activities I Activity 1 Word processing I Activity 2 Central Dogma basic molecular biology knowledge I Activity 3 Reading single part I and three frame part 2 translations I Activity 4 Interpreting Codon Degeneracy Objectives Using word processor as a bioinformatics tool 2 Reinforce central dogma transcriptiontranslation DNA gt PROTEIN 3 Analysis of 1 frame and 3 frame translations to a reference protein 4 Introduction to codon degeneracy Activity 1 The following exercise will be based on the following article quotComputer Applications in Biomolecular Sciences Part 2 Bioinformatics and Genome Projectquot excerpted from Biochemical Education 2000 Microsoft Word can be an important tool in analyzing and working with gene files used in bioinformatics The purpose of this exercise is to become familiar with the use of quotFindquot and quotReplacequot functions as well as setting the Word document to correct font and margins Having a document in correct font and margins are both important in bioinformatics To use MS Word as a bioinformatics tool you should change the font of sequences to Courier New and the font size to 10 Also the margins should be changed so that the text will fit correctly If you are familiar with Microsoft Word functions of changing font and size setting page margins FindReplace then you should skip Activity 1 and go on to Activity 2 For a quick review of how to use the functions Change Font Type amp size Page Margins 39Find39 or 39Find and Replace39 click here Activity 2 The goal of the activity 2 is to review important concepts of the central dogma To do this you will explore the following web site httpbiologyfullertonedubiol473d nmatran Iatehtm Note To gain a deeper knowledge of a given concept utilize the links provided Once you have become familiar with the concepts of the central dogma try to answer the following questions Activity 3 The goal of activity three is to analyze a single frame and three frame translated nucleotide sequence for the presence of a reference protein Reference protein The purpose of using a reference protein is to provide a known protein sequence from an organism and compare it to another protein sequence from another organism for similarities homologies The reference protein provided below is a sequence from the human hemoglobin Beta chain Each amino acid is represented by a single letter code See below for the universal genetic code chart translationquotMVHLlHLLK l LWbK N UL bbLALGRLL IBWlQRFFE SFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPE NFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH Nucleotide sequence to be translated gcctattggtctattttccc acccttaggctgctggtggtctacccttggacccagaggttctttgagtcctttggggat ctgtccactcctgatgctgttatgggcaaccctaaggtgaaggctcatggcaagaaagtg ctcggtgcctttagtgatggcctggctcacctggacaacctcaagggcacctttgccaca ctgagtgagctgcactgtgacaagctgcacgtggatcctgagaacttcagggtgagtcta tgggacccttgatgttttctttccccttcttttctatggttaagttcatgtcataggaag Part 1 Translated sequence from TACG single frame Note The first two nucleotides n and represent the first two nucleotides seen in the sequence to be translated above however they are not part of the translation because it was translated in the third frame 1 itggtctattttcccacccttaggctgctggtggtctacccttggacccagaggt 60 L L R L L V Y T Q R F 61 tctttgagtcctttggggatctgtccactcctgatgctgttatgggcaaccctaaggtga 120 G P K 121 aggctcatggcaagaaagtgctcggtgcctttagtgatggcctggctcacctggacaacc 180 A H V 181 tcaagggcacctttgccacactgagtgagctgcactgtgacaagctgcacgtggatcctg 240 K S E L H C D L 241 agaacttcagggtgagtctatgggacccttgatgttttctttccccttcttttctatggt 300 N F R V S L W D P L 301 taagttcatgtcataggaag 360 S C H R K Find the asterisk in the translation what does it signify What is the corresponding codon Do you see any segment of the reference protein in this single frame translation If so underline it or color code it in the reference protein Part 2 Translated sequence Three frame You now have three translated frames of the same nucleotide sequence and you need to determine which one of the three frames corresponds to the reference protein using the word processor Once you find the correct frame containing the amino acid sequence from the reference protein you need to color code it in the translated sequence using the text color function of your word processor Note The three frames of translation allow the researcher to obtain all possible translated combinations in one direction from 5 to 3 Steps For search and find function using the word processor 1 Use the edit menu and highlight the find function 2 Identify a piece of the reference protein about three amino acids long from the middle of the sequence and place in the find box Make sure to put two spaces between each letter to correspond with the translation output 3 Hit the find next key and the word processor will do locate the corresponding sequence in the translation Steps For color coding using word processor 1 Highlight the appropriate sequence in the translation then 2 From the menu bar click on the Letter tab A and choose your color OR 3 From the format menu click the font tab and then from that menu click the font color and choose your color 1 gcctattggtctattttcccacccttaggctgctggtggtctacccttggacccagaggt 6O 1 A G C W W S T R G 2 P I G L F S H P A A G G L P L D P E V 3 L L V Y F P T L R L L V V Y P W T Q R F 6l tctttgagtcctttggggatctgtccactcctgatgctgttatgggcaaccctaaggtga l2O l S P L G I C P L L L W T L R 2 L V L W G S V H S C C Y G Q P G E 3 F E S F G D L S T P D A V M G N P K V K l2l aggctcatggcaagaaagtgctcggtgcctttagtgatggcctggctcacctggacaacc lBO l R L M A R K C S V P L M A W L T W T 2 G S W Q E S A R C L W P G S P G Q P 3 A H G K K V L G A F S D G L A H L D N L l8l tcaagggcacctttgccacactgagtgagctgcactgtgacaagctgcacgtggatcctg 240 l S H V S C T V T S C T W I L 2 Q G H L C H T E A A L Q A A R G S 3 K G T F A T L S E L H C D K L H V D P E 24l agaacttcagggtgagtctatgggacccttgatgttttctttccccttcttttctatggt 300 l T S G V Y T 2 E L Q G E S M G P L M F S F P F F S V 3 N F R V S L W D P C F L S P S F L W L 30l taagttcatgtcataggaag 360 l G KFMSE 3 SSCHRK Activity 4 The goal of activity four is to become familiar with the concept of codon degeneracy Codon degeneracy Universal Genetic Code Translation amp Degeneracy single letter code A R N D C Q E G H I I K MF P S 39139 W V 3 letter abbrev AlaArgAsnAsprsGlnGluGlyHisIleLeuLysyEEPheProSerThrEEETeral nucleotide codon 5 GCACGAAACGACUGC GACACAUACUAAAAEEEUUCC CAEEEUACGUA S RNA codonsdegeneracy C C U U U G G C U C C G U C C C U C G G G U G G G G G U U U U U U U U or or or AGA UUA ABC G G U Termination Signals STOP UAA UAG UGA or URR Amino acids specified by each codon sequence on mRNA Key for the above table Questions When examining the codon chart you will find that many amino acids have more than one coding option This concept is called degeneracy a more than one coding option exists for a given amino acid 1 What amino acid is represented by V Using the one frame translation give the genomic codon option and frequency for each V 2 Are any possible codons for V not found in this sequence Summary Now that you have completed module 1 and the introductory activity you should now have a solid grasp of What bioinformatics is about but more importantly have started to interpret and use key tools such as a word processor for finding genetic sequences You have also developed vital skills that will be helpful in searching for important sequences and analyzing genetic relationships Module 2 Bioinformatics for MATS Students Objectives 1 Analyzing features in a GenBank record 2 Correlating GenBank information about introns and exons to a graphical representation of the human beta globin gene Begin by linking to the nucleotide sequence record for the human hemoglobin beta gene regbn database entry This is a particularly well studied gene thus the database record is unusually lengthy with plenty of information useful to researchers Acquaint yourself with the parts of the GenBank database record for a nucleotide sequence by answering the following questions The questions presented are edited from those presented at the DarwinZOOO website httpwwwri kh r hbergercomdarwin2000 Rick Hershberger Activity 1 1 Fill in the information contained in the following parts of the human hemoglobin beta gene record I Locus Definition I Accession and NID I Organism I Comments Answer the following question How many proteins are encoded within this region of human chromosomal DNA How are they different from each other I Features 0 CDS coding an n exon sequences joined to form the protein code 0 translation resulting amino acid sequence 0 mutation 7 Answer the following questions a How many different mutations exist in the beta globin gene that result in thalassemia b What numerlcal nucleuclde nusmuns are sueneua What are the nesumng nsse gnsngesa th Scroll uuun through the features cable uncll you reach hE gene entry at nucleuclde 52137 Th 5 the reg an at the genune char encodes the beta chaln u hemogluhln s Record the nucleuclde numbers char correspond cu the gene 52137 t E r three sEgmencs char correspond cu the mRNA for n Record the nucleuclde s in the e 1 man denotes the rEgluns char nsve been snngeu Dgecher numb ch15 gene Th uln nuts 1 Julnl zli Scroll uuun cu the next cps enm NDEE use the cps for necsegxunm nor necse gluhln chalassemla snu record the nucleucldE nusmuns found 111 hE Juln seguenge JDnL62E7 1 Note There s s dliierence u fty nucleucldes neween hE gene snu mRNA 521371 versus the cps 521571 encrles L39san the figure 1 below explaln the Signlilcance u ch15 up enenge Flame 1 Human cmumusume incuslnq nn hemaqlnhln genes 5 untranslated 3 untranslated 1m Schema and xelallve me of exnns c mm of pegmnn gene relaclunshlps neween the ilve members at the u mg 1 at Chromosomal mt e u E nseuuu gene wan snu ht deplcclun u the gene gluhln gene 1 s as men as the eanlncrun structure at the pegxun n Using the diagram above explain the difference observed between the end nucleotide in the gene and mRNA versus the CDS Use your CDS join segments above in c above to identify the intron and exon sequences The first and last nucleotides of the exon are joined by the dots and the introns are represented by the nucleotides missing where the commas are 3 Based on the features table what should be the sequence of nucleotides beginning at position 62187 What nucleotides should be just before position 63610 Check your answer now using the full nucleotide sequence found at the bottom of the record 4 Nucleotide 62285 is not present in the CDS Why Activity 2 In activity 2 you will retrieve a nucleotide sequence from Entrez and use BLAST Local Alignment Search Tool to find closely related sequences Basic of BLAST analysis options as listed below There are several types blastn for nucleotide nucleotide comparisons blastp for protein protein comparisons tblastn compares the protein quotSequence 1quot against the nucleotide quotSequence 2quot which has been translated in all siX reading frames blastx compares the nucleotide quotSequence 1quot against the protein quotSequence 2quot tblastx compares nucleotide quotSequence 1quot translated in all siX reading frames against the nucleotide quotSequence 2quot translated in all siX reading frames To learn about the BLAST program use the BLAST HHOHaL Once you become familiar with BLAST and GenBank record of human hemoglobin use them to solve the following problem taken from httpbiologyfullertonedubiol473week5htm 1 Obtain the nucleotide record for human hemoglobin found on chromosome 11 Get it from Entrez using the abbreviation HUME Bquot 2 Copy the nucleotide sequence for G gamma globin into a word document 3 Use Blastn to obtain the top 5 nonhuman hits that are genetically similar to human G gamma globin Works Cited Hershberger Rick Darwin2000 website Module 1 httpwwwri kh r hhergercomdarwin2000 The Central Dogma of Biology httpbiologyfullertonedubiol473dogmatranslatehtm Sansom C E and Smith C A 2000 Computer applications in biomolecular sciences Part 2 bioinformatics and genome projects Biochemical Education 28 127 131