Comp For Science

by: Ms. Nathen O'Keefe

Comp For Science CSC 210

Ms. Nathen O'Keefe
GPA 3.61


This 10 page Class Notes was uploaded by Ms. Nathen O'Keefe on Thursday September 17, 2015. The Class Notes belongs to CSC 210 at University of Miami taught by Staff in Fall.

Date Created: 09/17/15
32609 Gene Expression and Their Computational Analysis An Overview of Gene Expression 32609 Gene Expression A biological cell s function is thought of as determined by the proteins that are expressed in it Encoding DNA is the same for every cell Four different molecules A T C G A is complementary to T C is complementary to G DNA molecules are polarized and thus have directions the 5 end and the 3 end Watson Crick Complementarity DNA sequence and its reverse complement are paired DNA A DNA molecule is a phosphate attached on a sugar backbone The numbers assigned to the carbon molecules of the backbone indicate direction they are charged molecules 5 end and 3 end 39 There are four molecules A C G and T Complementarity of DNA A is complementary to T C is to G 5 AAAACCCCGTGTGT 3 is complementary antiparallel to 5 ACACACGGGG39T39T 3 and they hybridize with each other to form 5 AAAACCCCGTGT3 3 TTTTGGGGCACA5 RNADNA There are four molecules A C G and U DNA 5 AAAACCCCGTGTGT 3 is complementary antiparallel to RNA 5 ACACACGGGGUUUU3 and they hybridize with each other to form 5 AAAACCCCGTGT3 DNA 3 UUUUGGGGCACA 5 RNA 32609 Gene Expression The protein synthesis process consists of DNA gt RNA transcription Four different molecules A U C G RNA gt Protein translation Consecutive triples codon of RNA encode one amino acid molecule Genes contain parts that are not used in encoding of proteins which are removed during transcription to RNA Gene Expression Ideally if you can identify all the proteins that are quotexpressedquot in a cell and count them you get a pretty good idea of what s going on in the cell Factors exist that control expression after transcription eg microRNA so now it is known that the quantity of RNA may not directly correlate with the quantity of protein 32609 Capturing RNA Proteins are hard to capture and there aren t many molecules RNA molecules can be transcribed backed to DNA DNA can be ampli ed DNA can be dyed and captured Array Technologies Separation of RNA from cells Creation of DNA whose transcription matches the RNA called cDNA copy DNA RNA reverse transcriptase Creation of many copies of the DNA PCR Polymerase Chain Reaction There are enzymes that are able to assemble reverse complement of DNA sequences Dye the copy DNA sequences Trap the sequences on probes tethered on a glass surface Measure the re ection of light at certain frequencies 32609 32609 Microarray Technology mRNA Quanti cation Target preparation by cDNA ampli cation hybridization target probe Measurement of uorescent tag uorescency substrate Measurement forthousands of genes can be obtained Various Gene Expression Quanti cation Methods Whole sequence methods cDNA arrays two color systems Use the entire cDNA sequence as the probe Must know the exact sequence Lower density of probes on surface so as to avoid probe probe interactions Sequence fragment methods Affymetrics arrays Agilent arrays Design a collection of probes A probe might be shared among genes The set as a whole is unique to its corresponding gene 32609 Important Points about Gene Expression Data 0 The values are not absolute but are relative Within a single array data the values can be compared against each other For multiple array data the values can be compared assuming that the total amount is equal Can assume that a certain group of genes has consistent expression quothousekeeping genesquot 0 Much room for errors The speed of PCR is dependent on the base length of DNA Replication error Human error Expression is probabilistic Important Points cont d Gene expression of one sample is a vector A a1 an where the indices 1 through n correspond to the genes of interest Gene expression of multiple samples is a collection of vectors Ai ail ain 1ltiltm where m is the number of samples 32609 Exploration of Gene Expression Data Fundamental data layout principal component analysis Identifying genes that exhibit similar expression patterns clustering Correlating gene expression with a certain cell typecondition classi cation Principal Component Analysis 32609 Principal Component Analysis Understanding underlying structure of the data Covariance matrix C of data Aiai1 am where 1ltiltm The pq entry is the average of aip xpaiq Xq of all i where xp is the average of akp 1ltkltm An eigenvector is a vectorV v1 vn such that CV Av for some constant Agt0 Since C is symmetrix the pq entry is identical to the qp entry such vectors do exist Normalize the vector so that its norm the square root of the sum of the squares of the entries is 1 611 612 Cln C 621 622 Czn Cnl an cm 1 Cpqcqp apl72apk aqz ank 71 k1 1H n V VI vnC Ecklv1 Ecmvn Av1 Av k1 k1 PCAconfd The importance of an eigenvector is measured by the value A Each input data is a decomposed of a weighted sum of the eigenvectors Given a data A a1 an and an eigenvector V v1 vn V s weight in A is de ned to be the sum of viai for all values of i This is called the projection of A onto V PCAconfd Use the rst two or three components Project the data on each component and use the projection to plot the data May delineate underlying data structure 32609


