Tuesday Week 4, Tuesday Week 5 Notes
Tuesday Week 4, Tuesday Week 5 Notes Biol 119
Popular in Genomics and Bioinformatics
Popular in Biology
This 7 page Class Notes was uploaded by Anastassia Erudaitius on Wednesday April 27, 2016. The Class Notes belongs to Biol 119 at University of California Riverside taught by Dr. Hayashi and Dr. Stajich in Spring 2016. Since its upload, it has received 14 views. For similar materials see Genomics and Bioinformatics in Biology at University of California Riverside.
Reviews for Tuesday Week 4, Tuesday Week 5 Notes
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 04/27/16
Genomics Lecture 4/19/16 Suzie the gorilla – Long read sequencing technology Nanopore sequencing o You know the sequence of the hairpin so once you see that sequence you know it is the beginning of the complementary sequence o The nanopore isn’t reading the nucleotide sequences like the other DNA sequencing technologies they just record the differences in electrical disruption Comparison with known sequence databases o Might have a lot of single gene studies so we would want to look at these databases o GenBank – NCBI o International collaboration is key to moving genomics forward o GenBank is accessible to anyone in the world with an internet connection Searching for sequence similarity o We don’t have to write the code to go search the database (algorithms used) o BLAST – finds regions of similarity between biological sequences protein or nucleotide sequences o BLAST can be used to infer functional and evolutionary relationships between sequences o BLAST gives you a functional annotation o blastn – you have a nucleotide sequence and you are searching a nucleotide database this just requires one search of the database (one pass) o blastp searching protein in protein databse takes one pass o blastx – you have a nucleotide and you translate it and then you search the database 3 ways to translate forward frame 3 ways to translate reverse frames Must try coding amino acids in all possible reading frames that is 6 total possibilities Must search the protein database 6 times, with each forward translation and each reverse translation o tblastn – this takes a lot longer than blastx because this has 6 different subdatabases made from it o tblastx – this is the slowest blast search this happens in a case where there is very little known about the species and the database you are comparing it to not sure where the open reading frames are and are comparing it to genome sequences but no one has annotated those genome sequences o Should know all the blast programs BLAST – Local Alignment o Looking for smaller regions that align o Breaking it down into a small unit and will look for a good match for that unit BLAST – global alignment o Trying to match up the whole stretch o Trying to find a match of an entire sequence o Have a sequence (could be a genomic sequence, could be a translation) o What if your database is made from cDNA? You might get local alignments but not global alignments for some sequences A gene can have introns but cDNA does not have introns Alternative splicing of exons in a gene results in different cDNA sequence than DNA Therefore it may match better locally rather than globally Common in cDNA studies you don’t get the whole transcript, only get part of it Local alignment with “words” – slide 9 o Word size = length of sequence fragments o Scoring matrix – what is the cost (how do you score) of a misalignment (G change to C) Might have been a deletion of an insertion of a base In this example though we are talking about protein not nucleotides o Substitution matrix – People downloaded protein sequences that they knew were related to each other (dog cytochrome b, human cytochrome b, goldfish cytochrome b) and you look at the frequency of one amino acid change to another This amino acid in nature changes frequently you would give this amino acid a very high cost In nature if an amino acid doesn’t change frequently it would have a lower cost Slide 11 o GTW G matching G is a 6 T matching T is a 5 W matching W is a 11 Total is 22 o GSW The only one that has changed is S T to S has a score of 1 So this has a lower score Slide 12 o Example to try to come up with the total score Slide 13 o You can set this minimal threshold higher or lower May want to be more specific – set the threshold higher Maybe you get nothing back – set it lower o Threshold means do not consider ones below the threshold (these matches are so weak there is no reason to pursue them) o Probably going to be a lot of entries that match GTW o You go to the left and right and see what the score is you keep extending if your score is getting better and better but you stop once your score starts dropping o The BLAST hit might actually cover your entire query as if you did a global alignment (even though you did a local) o You don’t pursue matches that didn’t make the minimum threshold Slide 14 o Whether you see a horizontal line is wherever you have a block match o When you see an E-value of 1.3 probability-wise you may expect 1.3 sequences to match o You want LOW E-values we don’t want to pursue things that are relevant by change Slide 15 o Probability formula, can plug in E-value to come up with probability want things that are statistically significant o E-value of 1 comes up with P-value of 0.63 o Low E-value = statistically significant Genomics Lecture 4/26/16 What can happen to duplicated genes? o Within a genome a gene can duplicate several times Divergent evolution o After gene duplication events each gene copy becomes independent o Speciation events results in two daughter species o Each circle represents a gene o Species one has 4 gene copies, and species 2 has 4 gene copies o The 4 genes within species one are all paralogs to one another – they are within the same genome and can be traced back to one ancestral gene o Once a gene is duplicated, each gene copy takes on its own independent path Frog and frog are paralogs; mouse and mouse are paralogs, etc. Often regulation of the gene pair is different When developmental times became longer there was opportunity for specialization o Genes active in feet vs. hands Concerted evolution o Ancestral genome has one gene, then there are gene duplication events, the gene copies remain similar to one another o The genes remain very similar to each other o If you have multiple copies of one gene, and a mutation occurs in one of the copies and damages it, it is okay because there are other copies that are still functional o If a mutation arises and it is beneficial then that beneficial one might spread via unequal crossover events o What you would expect – all the gene copies within a species are identical or nearly identical to one another – so you can no longer trace back the orthologs between species 1 and 2 just by looking at the sequences alone (if you do a blast sequence …) o The ribosomal RNA genes multiple copies of rRNA genes, these copies are in a group, and the groups themselves are repeated, and are evolving by concerted evolution o Let’s say you use PCR and want to characterize the large subunit RNA gene – the gene product will look as if it is a single locus because the different RNA genes all have the same sequence this is because of concerted evolution o You’re getting more gene product because of concerted evolution Birth-and-death evolution o White – functional genes o Black – non-functional genes pseudogenes o If it is not functional there is no natural selection to retain the reading frame for example o It’s is possible the premature stop codon, or other type mutations may have killed the gene, and made it non-functional o Duplications are happening (some are non-functional, but not all the gene copies in each species can be traced back to the ancestor) o Very dynamic process, where genes are being duplicated and genes are going pseudo(being inactivated) o It is difficult to trace back the exact history of the gene because so much scrambling is going around (so much dying of genes) o Unprocessed pseudo gene a segment of a chromosome might duplicate that gene is initially functional and might later lose function Look very similar to the ancestral gene (can detect it by frameshifts and other phenomena) o Processed pseudogenes – when that gene copy is created from a mature mRNA, reverse transcriptase Acquire mutations very quickly, there is no natural selection on them Colored blocks corresponds to the exons When you do your blast search you may find a globin-like genes, but then when you process the whole gene you may not actually find the regular things you would expect of a functional gene (TATA box, etc.) Can start sequencing the mammalian genome, and when you do a functional sequencing of the gene it will tell you a lot about the gene Review What genomes look like Justifying a genome project How to practically go about getting a genome sequence (get money, know something about repetitive DNA, what vectors you will use to make clones, etc.) Sequence matching (BLAST) o Unlikely to find something totally novel because if it is totally novel it won’t exist within the database Gene families humans have 23,000-25,000 genes (the likely place to get a new gene, is to copy another gene) Multiple Choice – 40 questions, 2.5 points each Inverted repeats are repetitive regions Practical o Why using those programs, how to use them, and how to interpret the result that comes out of the program o Define terms, like databases o No computer activity o Fill in the blank, T/F, short answers, and MC o 14 questions, write in pen
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'