New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Tuesday Week 4, Tuesday Week 5 Notes

by: Anastassia Erudaitius

Tuesday Week 4, Tuesday Week 5 Notes Biol 119

Marketplace > University of California Riverside > Biology > Biol 119 > Tuesday Week 4 Tuesday Week 5 Notes
Anastassia Erudaitius

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

These notes include Tuesday's lecture of Week 4, and Tuesday's lecture of week 5. it covers what Dr. Hayashi discusses in lecture.
Genomics and Bioinformatics
Dr. Hayashi and Dr. Stajich
Class Notes
UCR, Bio, 119, genomics, bioinformatics
25 ?




Popular in Genomics and Bioinformatics

Popular in Biology

This 7 page Class Notes was uploaded by Anastassia Erudaitius on Wednesday April 27, 2016. The Class Notes belongs to Biol 119 at University of California Riverside taught by Dr. Hayashi and Dr. Stajich in Spring 2016. Since its upload, it has received 14 views. For similar materials see Genomics and Bioinformatics in Biology at University of California Riverside.


Reviews for Tuesday Week 4, Tuesday Week 5 Notes


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 04/27/16
Genomics Lecture 4/19/16  Suzie the gorilla – Long read sequencing technology  Nanopore sequencing o You know the sequence of the hairpin so once you see that sequence you know it is the beginning of the complementary sequence o The nanopore isn’t reading the nucleotide sequences like the other DNA sequencing technologies  they just record the differences in electrical disruption  Comparison with known sequence databases o Might have a lot of single gene studies so we would want to look at these databases o GenBank – NCBI o International collaboration is key to moving genomics forward o GenBank is accessible to anyone in the world with an internet connection  Searching for sequence similarity o We don’t have to write the code to go search the database (algorithms used) o BLAST – finds regions of similarity between biological sequences  protein or nucleotide sequences o BLAST can be used to infer functional and evolutionary relationships between sequences o BLAST gives you a functional annotation o blastn – you have a nucleotide sequence and you are searching a nucleotide database  this just requires one search of the database (one pass) o blastp  searching protein in protein databse  takes one pass o blastx – you have a nucleotide and you translate it and then you search the database  3 ways to translate forward frame  3 ways to translate reverse frames  Must try coding amino acids in all possible reading frames  that is 6 total possibilities  Must search the protein database 6 times, with each forward translation and each reverse translation o tblastn – this takes a lot longer than blastx because this has 6 different subdatabases made from it o tblastx – this is the slowest blast search  this happens in a case where there is very little known about the species and the database you are comparing it to  not sure where the open reading frames are and are comparing it to genome sequences but no one has annotated those genome sequences o Should know all the blast programs  BLAST – Local Alignment o Looking for smaller regions that align o Breaking it down into a small unit and will look for a good match for that unit  BLAST – global alignment o Trying to match up the whole stretch o Trying to find a match of an entire sequence o Have a sequence (could be a genomic sequence, could be a translation) o What if your database is made from cDNA?  You might get local alignments but not global alignments for some sequences  A gene can have introns but cDNA does not have introns  Alternative splicing of exons in a gene results in different cDNA sequence than DNA  Therefore it may match better locally rather than globally  Common in cDNA studies you don’t get the whole transcript, only get part of it  Local alignment with “words” – slide 9 o Word size = length of sequence fragments o Scoring matrix – what is the cost (how do you score) of a misalignment (G change to C)  Might have been a deletion of an insertion of a base  In this example though we are talking about protein not nucleotides o Substitution matrix – People downloaded protein sequences that they knew were related to each other (dog cytochrome b, human cytochrome b, goldfish cytochrome b) and you look at the frequency of one amino acid change to another  This amino acid in nature changes frequently  you would give this amino acid a very high cost  In nature if an amino acid doesn’t change frequently it would have a lower cost  Slide 11 o GTW  G matching G is a 6  T matching T is a 5  W matching W is a 11  Total is 22 o GSW  The only one that has changed is S  T to S has a score of 1  So this has a lower score  Slide 12 o Example to try to come up with the total score  Slide 13 o You can set this minimal threshold higher or lower  May want to be more specific – set the threshold higher  Maybe you get nothing back – set it lower o Threshold means do not consider ones below the threshold (these matches are so weak there is no reason to pursue them) o Probably going to be a lot of entries that match GTW o You go to the left and right and see what the score is  you keep extending if your score is getting better and better but you stop once your score starts dropping o The BLAST hit might actually cover your entire query as if you did a global alignment (even though you did a local) o You don’t pursue matches that didn’t make the minimum threshold  Slide 14 o Whether you see a horizontal line is wherever you have a block match o When you see an E-value of 1.3  probability-wise you may expect 1.3 sequences to match o You want LOW E-values  we don’t want to pursue things that are relevant by change  Slide 15 o Probability formula, can plug in E-value to come up with probability  want things that are statistically significant o E-value of 1 comes up with P-value of 0.63 o Low E-value = statistically significant Genomics Lecture 4/26/16  What can happen to duplicated genes? o Within a genome a gene can duplicate several times  Divergent evolution o After gene duplication events each gene copy becomes independent o Speciation events results in two daughter species o Each circle represents a gene o Species one has 4 gene copies, and species 2 has 4 gene copies o The 4 genes within species one are all paralogs to one another – they are within the same genome and can be traced back to one ancestral gene o Once a gene is duplicated, each gene copy takes on its own independent path  Frog  and frog  are paralogs; mouse  and mouse  are paralogs, etc.  Often regulation of the gene pair is different  When developmental times became longer there was opportunity for specialization o Genes active in feet vs. hands  Concerted evolution o Ancestral genome has one gene, then there are gene duplication events, the gene copies remain similar to one another o The genes remain very similar to each other o If you have multiple copies of one gene, and a mutation occurs in one of the copies and damages it, it is okay because there are other copies that are still functional o If a mutation arises and it is beneficial then that beneficial one might spread via unequal crossover events o What you would expect – all the gene copies within a species are identical or nearly identical to one another – so you can no longer trace back the orthologs between species 1 and 2 just by looking at the sequences alone (if you do a blast sequence …) o The ribosomal RNA genes  multiple copies of rRNA genes, these copies are in a group, and the groups themselves are repeated, and are evolving by concerted evolution o Let’s say you use PCR and want to characterize the large subunit RNA gene – the gene product will look as if it is a single locus because the different RNA genes all have the same sequence this is because of concerted evolution o You’re getting more gene product because of concerted evolution  Birth-and-death evolution o White – functional genes o Black – non-functional genes  pseudogenes o If it is not functional there is no natural selection to retain the reading frame for example o It’s is possible the premature stop codon, or other type mutations may have killed the gene, and made it non-functional o Duplications are happening (some are non-functional, but not all the gene copies in each species can be traced back to the ancestor) o Very dynamic process, where genes are being duplicated and genes are going pseudo(being inactivated) o It is difficult to trace back the exact history of the gene because so much scrambling is going around (so much dying of genes) o Unprocessed pseudo gene  a segment of a chromosome might duplicate  that gene is initially functional and might later lose function  Look very similar to the ancestral gene (can detect it by frameshifts and other phenomena) o Processed pseudogenes – when that gene copy is created from a mature mRNA, reverse transcriptase  Acquire mutations very quickly, there is no natural selection on them  Colored blocks corresponds to the exons  When you do your blast search you may find a globin-like genes, but then when you process the whole gene you may not actually find the regular things you would expect of a functional gene (TATA box, etc.)  Can start sequencing the mammalian genome, and when you do a functional sequencing of the gene it will tell you a lot about the gene Review  What genomes look like  Justifying a genome project  How to practically go about getting a genome sequence (get money, know something about repetitive DNA, what vectors you will use to make clones, etc.)  Sequence matching (BLAST) o Unlikely to find something totally novel because if it is totally novel it won’t exist within the database  Gene families  humans have 23,000-25,000 genes (the likely place to get a new gene, is to copy another gene)  Multiple Choice – 40 questions, 2.5 points each  Inverted repeats are repetitive regions  Practical o Why using those programs, how to use them, and how to interpret the result that comes out of the program o Define terms, like databases o No computer activity o Fill in the blank, T/F, short answers, and MC o 14 questions, write in pen


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Amaris Trozzo George Washington University

"I made $350 in just two days after posting my first study guide."

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."


"Their 'Elite Notetakers' are making over $1,200/month in sales by creating high quality content that helps their classmates in a time of need."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.