If to be a 90% confidence interval for μX −μY. Assume that the standard deviations are known to be σX = 15 and σY = 25.
4/5/2016 Bio 119 – Genomics and Bioinformatics Lecture 3 – Week 2 Genome size in bacteria and archaea there is a rough relationship between gene size, gene content, and genome complexity (free-living vs. is it obligate, etc.) In eukaryotes the range in genome size can be very extreme Smaller – metazoan mitochondrion genome or a plant chloroplast genome o Animal mitochondria genome are very streamlined smaller o Chloroplast genomes have a size range but they do allow for flexibility Eukaryotic genomes o often consist of repetitive sequences o Can vary by many orders of magnitude o Organized into chromosomes (which are linear) o Telomeres on the ends of chromosomes o Chromosomes are composed of DNA and chromatin (protein + DNA) Chromosome morphology o Location of the centromere can vary depending on where it is you can have different arm lengths o If the centromere is in the center – metacentric location o If it is off-center then you will have a short arm and a long arm acrocentric o Telocentric – centromere way off to one end o Centromeres have characteristic DNA sequences Made up of repetitive sequence A typical repetitive centromere genome sequence: (GGAnT) 1 Unclear where the first GGAAT started Structure and morphology of telomeres o Repetitive sequences like (TTAGGG) ann (TG ) 1-3 n o Some species will have very long telomere sequences, other will have short telomere sequences o The 3’ end has an overhang (it is longer than the 5’ strand) Very important for the regulation of telomeres Needs to have this exposed so telomerase can modify it Telomerase will extend it add more sequences to it The strands are asymmetrical in length therefore one strand will have a higher G content than the other As cell lineages age the telomeres erode off As cells proliferate the telomeres get short, when cells are no longer proliferating the telomeres are at their shortest If you want to immortalize a cell line you need to add telomerase to your cell culture Organisms who continue to grow throughout their whole life have more active telomerase Humans have determinate growth Some reptiles or blue whales don’t seem to have determinate growth Variation in chromosome number within a species o In a honey bee the male is a haploid organism so he only has half the chromosomes of a female o Spider female has more chromosomes than the male they are the same in terms of genome content except males have 2 less sex chromosomes 2 o In humans if we have chromosome variation within the species it usually is developmentally lethal o Polysomy Humans have 46 chromosomes, but if you have trisomy 21 then you have one extra copy of chromosome 21 just because you have an extra copy, just a little bit too much DNA there are such profound consequences o Translocation event in image: one of the chromosome 21 copies got stuck on one of the chromosome 14 copies This individual doesn’t have any extra DNA but this translocation must somehow affect gene regulation because this individual does have some kind of disorder The number and location of chromosomes have very profound effects for us within a species Variation in chromosome number between species o Polyploidy means an entire genome duplicates a whole extra copy of a genome May be useful in different types of environmental challenges Very specific percentage of angiosperms have polyploid origins Cultivated Easter lily – tetraploid makes the flower much bigger than a regular lily Seedless citrus – almost all are polyploids (triploid) – when it comes time to make seeds, the seeds must be formed from meiosis, for triploids meiosis isn’t functional How do you farm these It’ll fail for sexual reproduction, but it can be propagated You have to graft it it’s like cloning it Some polyploid lizards but only in female lineages 3 A mammalian genome tends to have 4 copies of a particular gene that Drosophila will only have 1 copy of how can you explain that by gene duplication by polyploidy Think of an evolutionary tree common ancestor will split into insects, the other into vertebrates There can be a whole genome duplication for one of the lineages, and not the other, this can happen again and again Good evidence that in lineages leading to us there were polyploid events (in distant past) Slide 8 – How do you explain why humans have 46 chromosomes rather than 48 May have had a translocation event (2 separate chromosomes fused with one another) May have lost one Our genome sizes are the same so which hypothesis is the most likely o The translocation event, because if you look at our chromosome 2 it is much larger than those of apes o When people did DNA sequences of chromosome 2 we found remnants of other telomeres and centromeres o When you look at a genome you can find things that are active, but also can find evidence of things that have happened in the past Muntjac deer – all species have approximately the same genome size o 12 species o Other species are like the Chinese muntjac (2n = 46) o Indian muntjac has 2n=6 (female) and 2n=7 (male) o The Indian muntjac -- experiment 4 Chromosomes are very large, each pair was painted a different color Split the chromosomes up into little pieces Hybridized the chromosomes to the Chinese muntjac chromosomes Looked for what kind of pattern painting they got for the Chinese muntjac chromosomes do they get painted with one color or multiple colors They get mostly painted with one color shows that there have been a tremendous amount of fusion events Distribution of repetitive DNA o Could do DNA denaturation-renaturation experiments and using UV light to learn how repetitive DNA is within a genome o Where is the repetitive DNA located in the genome Clustered DNA that is repeated around the centromeres and telomeres both highly repetitive In the human genome The mitochondrial genome has very little repetitive DNA Because we have more than one mitochondria, an individual may not have 100% mutant mitochondrial genome, they may only have a percentage of mitochondria mutants We have highly repetitive (aka satellite DNA) and middle repetitive Satellite DNA is often associated with the centromeres and telomeres Centrifuge can separate DNA by its molecular weight Middle repetitive – tandem repeats (in a row and all facing the same direction), and non-tandem repeats Tandem arrayed multiple copy genes 5 o These rRNAs are always found linked to each other, so if you transcribe this region you will transcribe all three at the same time o Human mitochondrial DNA has just one copy of rRNA gene o A common feature of plant DNA is inverted repeats o Corn has the most copies of rRNA gene cluster Microsatellites o Microsatellite variation is one of the ways we can tell people apart from one another o Replication slippage when cells divide and they have to replicate their nuclear genome, the DNA Pol will occasionally make a mistake (add or delete some copies of these repeat units) results in microsatellite variation is biased to having a lot of As Trinucleotide repeats – Huntington’s disease o Huntington protein has variation in CAG repeats o If you have more than 35 repeats you’ll have some level of severity of Huntington’s disease o You want glutamine but you don’t want too much because the protein will misfold o Someone with a mutation in CAG repeats will get Huntington’s disease as they get older (the disease gets worse as you age) DNA transposons o If the elements are able to move about they are autonomous Need transposase to be autonomous o This transposase is able to recognize the flanking region of the genomic region that it is derived from it can pick that piece of genome out and stick it in some different place of the genome 6 Could have some evolutionary consequences may disrupt a promoter, etc. o A particular transposase gene correlates to a specific gene (flanking region repeats) o So a P element transposase can only recognize the P element gene Retrotransposons o Reverse transcriptase turns RNA into DNA o Autonomous Long terminal repeats or non-long terminal repeats, can replicate themselves LTR flanking on the edges of the gene pol gene is reverse transcriptase gene A retrovirus integrates its genome into our own genome (env) o Non-autonomous cannot get moved around on their own, these cannot replicate themselves SINE is going to be associated with a LINE, the SINE is missing its own enzyme gene SINE is extremely effective in getting replicated A SINE always depends on a LINE to get transposed Processed pseudogenes RNA goes back into DNA and then gets reinserted into the genome Can estimate its age by how much sequence change has accumulated within a certain region over a long time mutations will accumulate 7 4/7/2016 Bio 119 – Genomics and Bioinformatics Lecture 4 – Week 2 How do we get a genome sequence o You need people to initiate a genome project, and they need to have some ideas o Money and resources are needed to initiate a genome project o Some government funded projects Cannabis sativa o What are some reasons why there is interest in sequencing the cannabis genome Cannabis important for use in hemp, hemp plant is a good source in oils, in some parts of the world hemp is a food source Marijuana strain (high level of THC) vs hemp strain comparing gene expression levels in the transcriptome THC is the compound that is very prominent in medical and recreational marijuana Hemp strains make cannibidiol compound it doesn’t have the active THC compound in it Even though these two are within the same species there is genetic variation between them You might want to study sequence 1 genome, and multiple genomes of the same species because it can give you much more information about that species Slide 4 – Cacao o One is more difficult to grow than the other o But the one that is difficult to grow produces the better product o May want to study genome sequence to determine how we can make the one that grows the better product sturdier and easier to grow o When you read this article focus on why the study was done and the conclusions 1 o Know the major concept (like they found a lot of transposable elements) but don’t need to know the name of the transposable elements and the computer programs they used o Comparing your species to other species of interest is a really big part in understanding what is going on in the genome Slide 6 o You’re going to want to know your genome size o Know whether or not your genome contains transposable elements, repetitive sequences these things may affect your cost o There is always competition for money most grant proposals that get written don’t get funded when you write a grant proposal you need to come up with as many selling points for it as possible Is it relevant to agriculture Is this relevant to human disease Is it relevant to other factors (are we going to learn something about stress tolerance in relation to pollution or climate change) o Resequencing studies Let’s say someone has already published a genome sequence, but now you want to know something different that the initial study didn’t cover o Transcriptome studies Slide 7 o Honeybee genome size: 1n = 16 o ~300 Mb o Males are called drones and are haploid individuals, while the females are 2n o In a colony the queen and a drone reproduce, the vast majority of the colony are made up of workers o The workers all have the same genome but they are given different roles (soldiers, collectors, etc.) o Purpose of the honeycomb: each honeycomb section is going to get an egg and the honey if a food source for the bees and larvae o Important roles that bees play for our environment: pollinators 2 o Bees live in very high density o Why do we want to sequence the honeybee genome What is the difference is gene expression level between the drones, vs workers, vs females, etc People were concerned how to keep bee populations healthy Queen has same genome but lives much longer than a worker so why within the same species do you have some individuals that live longer than others How do bees interact with one another Could you train a bee to find things other than pollen Like looking for a landmine or bombs o A group of scientists did propose this and did get funded (by US Human Genome sequencing project and the US department of agriculture) because of all the implications knowing this genome could provide Genomic libraries are made from genomic DNA o Finding any particular information in a genomic library is very difficult (it is like finding a specific grain of sand on the beach) o A genomic library = a collection of smaller DNA fragments 1. Must cut genomic DNA into smaller DNA fragments a. Could be done mechanically b. Agitate it, shear it into little pieces c. In a honey bee you are going to have a chromosome that is really large, the only way to read the entire thing is by cutting it into smaller fragments, there is no way to read a full genome from start to finish because it is too long/too much information to read 2. Purchase a vector (often but not always circular DNA) a. Enzymatically cut it open, now it has these two receiving arms that can receive genomic DNA b. You basically paste a genomic DNA fragment into the vector c. This is how you make a library – all these different parts of the genome 3 3. Then you transform (insert them) the vectors into some kind of bacteria that can be selected for via antibiotic resistance a. The bacteria plasmids that are carrying the vectors you want must be ampicillin (or another antibiotic) resistant 4. Grow the bacteria on nutrient media a. Only the plasmids with genomic DNA fragments in it are going to grow Why do people use vectors of different sizes o If you use a plasmid you have to have more clones in the library (b/c the plasmids carry so much less DNA) Visualize a microscope slide, and printed on the slide are small pieces on DNA, and now we can go work with that to determine the sequence 4