New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Statistical Bioinformatics

by: Anita Hettinger

Statistical Bioinformatics STAT 5570

Marketplace > Utah State University > Statistics > STAT 5570 > Statistical Bioinformatics
Anita Hettinger
Utah State University
GPA 3.98

John Stevens

Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

John Stevens
Class Notes
25 ?




Popular in Course

Popular in Statistics

This 58 page Class Notes was uploaded by Anita Hettinger on Wednesday October 28, 2015. The Class Notes belongs to STAT 5570 at Utah State University taught by John Stevens in Fall. Since its upload, it has received 18 views. For similar materials see /class/230499/stat-5570-utah-state-university in Statistics at Utah State University.

Similar to STAT 5570 at Utah State University

Popular in Statistics


Reviews for Statistical Bioinformatics


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 10/28/15
Recent Tools for Preprocessing Mass Spectrometry Data Utah State University Spring 2009 STAT 5570 Statistical Bioinformatics Notes 51 Outline 0 Introduction to Mass Spectrometry Issues in Preprocessing Recent Software Tools 0 Sample Analysis 0 Misc Notes Mass Spectrometry 0 Technology to assess composition of a complex mixture of proteins and metabolites o MALDI matrixassisted laser desorption and ionization 0 Biological sample mixed with a crystalforming energy absorbing matrix EAM o Mixture crystallizes on metal plate chip or slide 0 In a vacuum plate hit with pulses from laser 0 Molecules in matrix are released producing a gas plume of ions 0 Electric field accelerates ions into a flight tube towards a detector recording time of flight Dijkstra 2008 Coombes et al 2007 SELDITOF o Surfaceenhanced laser desorption and ionization 0 Special case of MALDI Ciphergen BioRad ProteinChip eightspot array 0 Surface of metal plate chemically modi ed to favor particular classes of proteins Coombes et al 2007 Tibshirani et al 2004 image from AMANpasteur 39 SELDI ProteinChip Technology 0 Within narrow time intervals 14 nanoseconds detector records the number of particles time of flight Detector Data Analysis h Workstation MassCharge 0 Animation wwwearnerorgchannecoursesbiologyarchiveanimationshiresaproteo3hhtml Coombes et al 2007 Tibshirani et al 2004 image from Yasui et al 2003 Other Separation Techniques 0 Gas Chromatography GC 0 also called gasliquid chromatography 0 Liquid Chromatography LC 0 also called high performance liquid chromatography H PLC 0 Common Features 0 molecules pass through a chromatographic column 0 time of passage depends on molecule characteristics 0 coupled with a detector to record timeof flight and report mass spectra GCMS LCMS 0 Successful separation reduces number of overlapping peaks Dijkstra 2008 9 Masstocharge mz ratio 0 Each molecule has a mass m and a charge 2 o The mz ratio affects the molecules velocity in the flight tube and consequently its time of flightt 0 Based on the law of energy conservation aVt t02 B Z 0 Parameters to d and B estimated using instrumentspecific calibration data V is electronvolt unit of energy Dijkstra 2008 Sample spectra Intensity 1 0000 1 4000 6000 2000 Spectrum Partial Spectrum 0 O O E O O O l b E 9 8 C O 7 8 O O 7 O 03 t t t t t t 1 1 1 1 5000 10000 20000 30000 1000 1200 1400 1600 1800 2000 mz mz TwoStep Analysis Approach 0 1 Preprocess Mass Spectrometry Data 0 Identify peak locations and quantify each peak in each spectrum 0 15 Identify Components Determine which molecule protein metabolite caused each peak 0 2 Test for Differences Similar to differential expression of genes between treatment and control Morris et al 2005 Coombes et al 2007 Preprocessing Issues 0 Calibration o Filtering Denoise Spectra o Detrend Remove Baseline from Spectra o Normalization of Multiple Spectra 0 Peak Detection 0 Peak Alignment 0 Peak Quantification Coombes et al 2007 0 Calibration 0 Mapping observed timeof flight to mz values 0 Experimentally 0 create a sample containing a small number of mass known proteins 0 obtain spectrum from sample using the mass spectrometry instrument 0 Parameters to d and B estimated using instrument specific calibration data zedt to28 Z 0 Also refers to finding common mz values for multiple spectra msPrepare function uses linear interpolation Dijkstra 2008 Coombes et al 2007 Morris et al 2005 Preprocessing Strategies 0 Choices 0 How to approach each preprocessing issue 0 Order of addressing each preprocessing issue 0 Current Software 0 Commercial usually manufacturerspecific R Packages R Development Core Team 2007 msProcess CRAN Lixin Gong examples used here PROcess Bioconductor Xiaochun Li caMassClass CRAN Jarek Tuszynski MassSpecWavelet Bioconductor Pan Du FTICRMS CRAN Don Barkauskas RProteomics caBlG Rich Haney Sample Data and Code 0 Reproducibility of results in these slides R code included in these slides o R Package msBreast dataset of 96 protein mass spectra generated from a pooled sample of nipple aspirate fluid NAF from healthy breasts and breasts with cancer 0 Observations with mz below 950 eliminated just noise from matrix molecules these observations can be just saturation too many ions hitting the detector so it can t count them Coombes et al 2003 Coombes et al 2005 Sample Data Format 0 An msSet object with a numeric vector of mz values a factor vector of spectra types and a numeric matrix of intensities 0 columns 96 samples spectra 0 rows 15466 mz values librarymsProcess load sample data dataBreast2003QC packagequotmsBreastquot z lt Breast2003QC look at this classz quotmsSetquot namesz quotmzquot quotintensityquot classzmz quotnumericquot lengthzmz 15466 classzintensity quotmatrixquot dimzintensity 15466 96 classztype quotfactorquot lengthztype 96 II II Visualize Two Spectra Spectrum 5 Partial Spectrum 5 8 V O used for F 3 O 0 all sample 3 a Z plots here g 0 E a s o s 8 unless a r otherwise 8 g noted 8 0 5000 15000 25000 1000 1200 1400 1600 1800 2000 mz mz Spectrum 50 Partial Spectrum 50 8 o E 39 0 g E F E 5 lt1 8 8 O 8 O I I I I I I g I I I I I I 0 5000 15000 25000 1000 1200 1400 1600 1800 2000 K mz mz Visualize two spectra parmfrowc22 sub lt 5 plotzmzzintensitysubxlab39mz39ylab39Intensity39 type39l39col39red39lwd2ltylmain39Spectrum 539 t lt zmz gt 950 amp zmz lt 2000 plotzmztzintensitytsubxlab39mz39ylab39Intensity39 type39l39col39red39lwd2ltylmain39Partial Spectrum 539 sub lt 50 plotzmzzintensitysubxlab39mz39ylab39Intensity39 type39l39col39red39lwd2ltylmain39Spectrum 5039 t lt zmz gt 950 amp zmz lt 2000 plotzmztzintensitytsubxlab39mz39ylab39Intensity39 type39139col39red39lwd2ltylmain39Partial Spectrum 5039 Q Filtering Denoising Spectra o Spectra contains random noise 0 Technical sources of variability chemical electronic Remove by smoothing spectra 0 Smoothing options Wavelet shrinkage default Multiresolution decomposition Robust running median Coombes et al 2007 lntens y 10300 10400 10500 10200 Denoising Options original denoise wavelet shrinkage denoise robust running median 1190 I I I I 1200 1210 an 1220 Here MRD original not shown Denoising 30 sec 21 lt msDenoisez default FUNquotwaveletquot zlmrd lt msDenoisezFUNquotmrdquot zlsmooth lt msDenoisezFUNquotsmoothquot Visualize the denoising parmfrowcllcexl5 t lt zmz gt 1190 amp zmz lt 1225 sub lt 5 plotzlmztzlintensitytsubxlab39mz39ylab39Intensity39 type39l39col39darkred39lwd25lty2main39Denoising Options39 pointszmztzintensitytsubpch16cexlcol39black39 lineszmztzintensitytsubltylcol39black39lwd2 lineszlsmoothmztzlsmoothintensitytsubcol39darkblue39 lty3lwd25 lineszlmrdmztzlmrdintensitytsubcol39orange39lty4 lwd2 legendxll90y10260c39original3939denoise wavelet shrinkage39 39denoise robust running median39 colc39black3939darkred3939darkblue39 ltyc123bty39n39lwd2cex8 Denoising what do options do 0 Wavelet shrinkage discrete wavelet transform 0 calculate DVVT o shrink wavelet coefficients calculated noise threshold and specified shrinkage function o invert the DVVT to get denoised version of series Multiresolution decomposition noise 0 here 0 calculate DVVT o invert components 0 sum nonnoisy components 0 Robust running median o Tukey s 3RS3R repeat running medians of length 3 to convergence split horizontal stretches of length 2 or 3 repeat running medians of length 3 to convergence 0 twiced add smoothed residuals to the smoothed values Local Noise Estimation 0 May be interested in where noise is 0 local noise smoothed noise 0 Smoothing options spline default cubic spline interpolation supsmu Friedman s super smoother ksmooth kernel regression smoother loess local polynomial regression smoother mean moving average O l Z gt Z S C C Z n O 1 Z b l N l Z m m A l L S O l a m 5 I I C 2 quotan I Z t Sp Detrend Baseline Subtraction Technical artifacts of mass spectrometry data Spectru m 14000 l o a cloud of matrix molecules hitting the detector at early times a o detector or ion overload 0 chemical noise in EAM No model for full generalizability of baseline only required to be smooth lntens y 10000 l 6000 l 2000 l Observed signal at time 2 O ft BtNSt8t noise normalization factor baseHne true Signal l l 5000 10000 Li et al 2005 Morris et al 2005 Coombes et al 2007 l 20000 l 30000 Baseline Options 0 Ioess default local polynomial regression smoother errors With all these can 0 spline cublic spline interpolation avoid can give negative 0 supsmu Friedman s supersmoother Signa39 0 approx linear or constant interpolation of local minima o monotone cumulative minimum can give negative signal o mrd multiresolution decomposition Coombes et al 2005 Randolph amp Yasui 2006 Intensity Intensity 6000 10000 14000 2000 200 400 600 800 1000 0 Baseline Estimation 1000 2000 3000 4000 mz MonotoneDetrended Signal 3000 3200 3400 3600 3800 4000 mz Intensity Intensity 3500 4000 3000 600 200 400 Baseline Estimation denoised monotone Ioess CheCk tuning supersmoother spline lsmoothlng parameters in these options 3000 3200 3400 3600 3800 4000 mz LoessDetrended Signal 3000 3200 3400 3600 3800 4000 mz baseline subtraction 15 min 23 lt msDetrendzZ FUNquotmonotonequot how to make the other options work bg lt apply22intensity2msSmoothLoessxzZmz z3loess lt msSetzZ intensity zZintensity bg baseline bg bg lt apply22intensity2msSmoothSupsmuxzZmz z3supsmu lt msSetzZ intensity zZintensity bg baseline bg bg lt apply22intensity2msSmoothSplinexzZmz z3spline lt msSetzZ intensity zZintensity bg baseline bg Visualize baseline subtraction parmfrowc22 t1 lt zmz gt 1000 amp zmz lt 5000 t2 lt zmz gt 3000 amp zmz lt 4000 sub lt 5 ulwd lt 1 cols lt c39black3939darkred3939darkblue3939darkorange3939darkgreen39 fort in listt1t2 plot22mzt22intensitytsubxlab39mz39ylab39Intensity39 type39l39colcolsllwdlltylmain39Baseline Estimation39 linesz3mztz3baselinetsubcolcols2lty2lwdulwd linesz3mztz3loessbaselinetsubcolcols3lty3lwdulwd linesz3mztz3supsmubaselinetsubcolcols4lty4lwdulwd linesz3mztz3splinebaselinetsubcolcols5lty5lwdulwd ulwd lt ulwd1 legendx3400y4400c39denoised3939monotone3939loess39 39supersmoother3939spline39lty15colcolslwd2bty39n39 Visualize detrended signal t lt t2 plotz3mztz3intensitytsubxlab39mz39ylab39Intensity39 type39l39col39darkred39lwd2ltylmain39MonotoneDetrended Signal39 plotz3mztz3loessintensitytsubxlab39mz39ylab39Intensity39 type39l39col39darkblue39lwd2ltylmain39LoessDetrended Signal39 Intensity Normalization 0 Make comparisons of multiple spectra meaningful 0 Basic assumption total amount of protein desorbed from sample plate should be the same for all samples amount of protein desorbed TIC total ion current 0 Normalization options Y vector of intensities tic default total ion current all spectra have same area under curve for spectra i K 39 K sumYl mediansumK o snv standard normal variate all spectra have same mean and standard deviation for spectra i K39 K meanKsdK Morris et al 2005 Randolph amp Yasui 2006 Intensity 400 600 800 1000 200 TIC Normalization Factor 1611 de trended TICnormalized quot SNV normalized 3400 3450 3550 3600 intensity normalization z4 lt msNormalizez3 FUN tic z4snv lt msNormalizez3 FUNquotsnvquot Visualize Normalization parmfrowc11cex15 sub lt 5 t lt zmz gt 3400 amp zmz lt 3600 scsub lt roundmedianz4ticz4ticsub3 1611039 plotz3mztz3intensitytsubxlab39mz39ylab39Intensity39 type39l39col39black39lwd1lty1ylimc01100 mainpaste39TIC Normalization Factor 39scsub linesz4mztz4intensitytsubcol39darkred39lty2lwd2 linesz4mztz4snvintensitytsubcol39darkblue39 lty3lwd2 legendx3500y1000c39detrended3939TICnormalized39 39SNVnormalized39lty13colc39black3939darkred3939darkblue39 lwdl3bty39n39 Normalization and Quality 0 Spectra with extreme normalization factors may suggest poor quality 0 May need to eliminate some spectra or arrays Normalization Factors 96 Spectra TIC Normalization Factor 5976 C de trended LO 7 8 7 TICnormalized O N quotquot SNV normallzed o I39 v 7 8 8 i 8 v 5 w o m a Q g e 7 g E I i O 2 i z o 7 q l i lg 1 O 7 H i l i In 7 Lo 1 I y I l A I I i r39 4 O O 3400 3450 3500 3550 3600 mz BioRad 2008 Normalization factors for quality check normfac lt medianz4ticz4tic boxplotnormfacmain39Normalization Factors 96 Spectra39 sub lt whichmaxnormfac 30 t lt zmz gt 3400 amp zmz lt 3600 scsub lt roundmedianz4ticz4ticsub3 1611039 plotz3mztz3intensitytsubxlab39mz39 ylab39Intensity39type39l39col39b1ack39lwd31ty1 mainpaste39TIC Normalization Factor 39scsub ylimc0300 1inesz4mztz4intensitytsubcol39darkred39lty2lwd3 1inesz4mztz4snvintensitytsubcol39darkblue39lty3 lwd3 legendx3400y300c39detrended3939TICnormalized39 39SNVnormalized391ty13colc39black3939darkred3939darkblue39 lwd3bty39n39 Peak Detection 0 Need to detect peaks in sets of spectra 0 Options simple a local maxima over a span of 3 sites whose signaltolocalnoise snr is at least 2 0 search elevated intensity simple higher than estimated average background across spectra at site cwt continuous wavelet transform no denoising or detrending necessary 0 mrd multiresolution decomposition must have used MRD at denoising step Coombes et al 2005 Tibshirani et al 2004 Du et al 2006 Randolph amp Yasui 2006 1500 1000 Intensity 500 Detected Peaks with Intervals I I l I l 1quot I I I I 3400 3500 3600 3700 3800 3900 4000 mz Closed Circles identify detected peaks here intervals based on nearest local minima at least some number 41 of sites away random seed matters here blue line represents average background peak detection 30 sec setseed1234 25 lt msPeakz4 FUNquotsearchquot Visualize Peak Detection parmfrowcllcexl5 t lt zmz gt 3400 amp zmz lt 4050 sub lt 5 plot25mzt25intensitytsubxlab39mz39ylab39Intensity39 type39l39col39black39lwdllty1 main39Detected Peaks with Intervals39 loc lt 25peaklistsubmassloc uset lt loc gt 3400 amp loc lt 4050 useloc lt locuset useintensity lt 25peaklistsubintensityuset left lt 25peaklistsubmassleft useleft lt leftuset right lt zSpeaklistsubmassright useright lt rightuset pointsuselocuseintensitycol39darkred39pch16cex15 ablinevcuseleftuserightcol39darkred39lty2lwd3 See 39average background39 smoothed lt msSmoothSupsmuxz4mzyz4intensitysub span05 lineszSmztsmoothedtcol39darkblue39lty3lwd3 Peak Alignment o Align detected peaks from multiple spectra using only detected peaks with signaltonoise above some threshold 0 Options 0 cluster 1dim hierarchical clustering with cuts between clusters based on technology precision Coombes et al 2005 Tibshirani et al 2004 0 gap adjacent peaks joined if within technology precision 0 vote iterative peak clustering Yasui et al 2003 o mrd Randolph amp Yasui 2006 smooth histogram of peak locations for all spectra take midpoints of valleys as common locations 0 mz on logscale at this step roughly constant peak width Tibshirani et al 2004 0 Precision 103 mass drift for SELDI data Peak Alignment 0 here spectra 25 detected peaks 0 239 common peaks aligned bottom to top Circles identify m m be 5mm h agt mun eaO L n MM moa O 8329 3525 3540 3580 3620 3500 mz peak alignment 10 sec 26 lt msAlignzS FUNquotclusterquot snrthresh10 mzprecision0003 Visualize Peak Alignment t lt 25mz gt 3490 amp 25mz lt 3635 plot25mztzSintensityt2type39l39xlab39mz39 ylab39Intensity Shifted39main39Peak Alignment39ylimc0950 libraryRColorBrewer usecol lt revbrewerpal4quotSet2quot fori in 25 lines26mzt150i2z6intensityti colusecolillwd3 loc lt 26peaklistimassloc uset lt loc gt 3490 amp loc lt 3635 pointslocuset150i226peaklistiintensityuset colusecolilpchlcexl5lwd3 legendx3512y950revc39spectrum 23939spectrum 33939spectrum 439 39spectrum 539colrevusecolltyllwd3bty39n39 add common locations pc lt asdataframe26peakclass t lt pcmassloc gt 3490 amp pcmassloc lt 3635 usecol lt brewerpal5quotSetlquot ablinevpcmassloctlty1colusecoll5lwd2 ablinevpcmasslefttlty2colusecoll5lwd2 ablinevpcmassrighttlty2colusecoll5lwd2 Peak Quantification 0 Peak area is assumed to be proportional to the corresponding detected numbers of molecules 0 Based on common set of peak classes quantify each peak by one of intensity returns matrix of maximum peak intensities for each spectrum within each common peak count returns matrix of number of peaks for each spectrum within each common peak Dijkstra 2008 Intensity 1 0000 1 4000 6000 2000 Visualize Peak Quantities Spectrum 5 Spectrum 5 o o o 2 E o 390 o a o 7 5 o E N 3 O x m o a o 2 D 8 O 7 39I39lH NHLILI AIMIJJUH IIH wa111th H M Mm 1 u M 1 1 1 1 1 1 1 1 1 1 1000 2000 5000 10000 20000 1000 2000 5000 10000 20000 mz mz Peak Intensity Matrix l I O T a i ll quotl x H 39 subsequent 8 gt experiments 5 36 arrays each g quot used 2 spots a s i Q U n j 8 11 l l 24 original u l i spectra quot39 39 3 arrays each l 5 i used all 8 spots I 50 1 00 1 50 200 239 common Peak Class Index Peaks Coombes et al 2003 quantified gt peak quantification 10 sec 27 lt msQuantify26 measurequotintensityquot Look at final peaks for one spectrum parmfrowcllcexl5 sub lt 5 plotzmzzintensitysubxlab39mz39ylab39Intensity39 type39l39col39red39lwd2ltylmain39Spectrum 539log39x39 peakloc lt asdataframez7peakclassmassloc peakquant lt z7peakmatrixsub plotpeaklocpeakquantlog39x39type39n39xlab39mz39 ylab39Peak Quantified39main39Spectrum 539 fori in 11engthpeakloc linesxcpeaklocipeakloci yc0peakquanticol39blue39 Visualize Peak Quantities for all spectra bluesramp lt colorRampPalettebrewerpal5quotBluesquot2 pmatrix lt tz7peakmatrix imageseqnumRowspmatrix seqncolpmatrix pmatrix xaxs quotiquot yaxs quotiquot main IPeak Intensity Matrix39 xlab IIPeak Class Indexquot ylab quotSpectrum Indexquot colbluesramp200 Peak Identification 0 Determining the exact species of protein or metabolite molecule that caused a peak to be detected 0 Requires additional experimentation and database searches 0 Have to compare results with fragmentation patterns of known proteins or metabolites 0 Single protein or metabolite may appear as more than one peak due to complexes andor multiple charges Coombes et al 2007 Dijkstra 2008 A Sidebar Caveat 0 Original time tof values are evenly spaced 0 mz values not evenly spaced 9 may give disproportionate weight to some mz values at normalization AUC m Z 05Vt t02 8 mz 15000 25000 I square root mz 120 160 40 60 80 0 5000 0 5000 10000 15000 0 time index Coombes et al 2007 Dijkstra 2008 5000 10000 time index 15000 Alternative View on time vs mz Scale 0 If replace mz with square root ie preprocess on time scale code next slide 0 no difference in TICnormalization would affect detrending except for monotone 0 Could affect peak detection and peak alignment 0 But at peak alignment step logscale mz supposed to make peak widths roughly constant c max intensity means something similar to peak area Look at doing this on the time scale 15 min zt lt Breast2003QC ztmz lt sqrtztmz zlt lt msDenoisezt zZt lt msNoisezlt z3t lt msDetrendzZt FUNquotmonotonequot z4t lt msNormalizez3t FUN tic meanz4intensityz4tintensity 1 setseed1234 zSt lt msPeakz4t FUNquotsearchquot origpeakloc lt 25peaklist5massloc newpeakloc lt 25tpeaklist5masslocA2 96 in origpeakloc 96 in newpeakloc compare them meanroundorigpeakloc5roundnewpeakloc5 1 Up through Peak Detection everything39s basically the same although alternative seeds may cause slight differences Mean Spectrum for Detection amp Alignment Peak detection using the mean spectrum is superior to methods that work with individual spectra and then match or bin peaks across spectra 0 increases sensitivity in peak detection especially lowintensity peaks avoids messy and errorprone peak alignment spectra must first be aligned on time scale small misalignments okay just broaden peaks in mean 0 But when to take mean before or after detrending denoising and normalizing no definitive answer yet but after seems reasonable Coombes et al 2007 Morris et al 2005 60 Spectrum Index 40 60 Spectrum Index Peak Intensity Matrix 1 1 1 1 50 100 150 200 Peak Class Index Peak Intensity Matrix Mean fr II T I u I Hr 1 i I e 1quot 39 1 1 1 1 20 40 60 80 Peak Class Index Peak Quantified Peak Quantified 10000 15000 5000 5000 10000 15000 0 Spectrum 5 11111111111111111111111I11n1m111111111111 1111111 11 1111 1 1 AL 1 1 1 1 1 1000 2000 5000 10000 20000 mz Spectrum 5 Mean 1 1 1000 2000 5000 10000 20000 mz 239 peaks 95 peaks Peak detection on mean spectrum setseed123 zSmean lt msPeakz4 FUNquotsearchquot usemeanTRUE z7mean lt msQuantifyzSmean measurequotintensityquot image matrix pmatrix lt tz7meanpeakmatrix imageseqnumRowspmatrix seqncolpmatrix pmatrix xaxs quotiquot yaxs quotiquot main 39Peak Intensity Matrix Mean39 xlab quotPeak Class Indexquot ylab quotSpectrum Indexquot colbluesramp200 individual spectrum sub lt 5 peakloc lt asdataframez7meanpeakclassmassloc peakquant lt z7meanpeakmatrixsub plotpeaklocpeakquantlog39x39type39n39xlab39mz39 ylab39Peak Quantified39main39Spectrum 5 Mean39 fori in 11engthpeakloc linesxcpeaklocipeaklociyc0peakquanti col39blue39 Sample Analysis Start to Finish Same Example 96 protein mass 951quot2 5tquot931 spectra generated from a pooled sample of nipple aspirate fluid NAF from healthy breasts and breasts with cancer quot 35 73 0 Starting point 96 separate txt files with two spacedelimited columns mz intensity and no header row in same directory CDataFilesNAFms msProcess can also import other e 713514259 1252 o o m r Startup memorylimitsize4000 printdate librarymsProcess read in txt files to create msList object filepath lt quotCzDataFilesNAFmsquot zlist lt msImportpathfilepath patternquottxtquot convert msList object to msSet object 2 lt msPreparezlist massmin950 dataname39example39 define type of spectra usetype lt repquotQCquot96 ztype lt asfactorusetype then 2 is equivalent to the Breast2003QC msSet object preprocess printdate about 5 minutes to here 21 lt msDenoisezFUN wavelet 22 lt msNoisezlFUNquotsplinequot 23 lt msDetrendzZ FUNquotmonotonequot 24 lt msNormalizez3 FUN tic setseed1234 25 lt msPeakz4 FUNquotsearchquot 26 lt msAlignzS FUNquotclusterquot snrthresh10 mzprecision0003 27 lt msQuantify26 measurequotintensityquot printdate about 2 minutes for preprocessing R objects of interest pseudoresults zintensity zlintensity zlnoise 22localnoise splinezlnoise by spectra zZintensity z3intensity z3baseline z4intensity z3intensity39 transformed 25peaklisti peaks for spectrum i dataframe with locations and ranges of 26peakclass matrix with locations and ranges of peaks for all spectra z7peakmatrix matrix that quantifies common peaks col for each spectrum row colnamesz7peakmatrix locations in mz of common peaks see 26peakclass for ranges of these peaks intensity intensity intensity 3000 4000 5000 6000 7000 1000 1500 500 3000 4000 5000 6000 7000 1 msPrepare for z 1 1 1 1 1 1 1 3400 3500 3600 3700 3800 3900 4000 masscharge msDetrend for 23 1 1 1 1 1 3400 3500 3600 3700 3800 3900 4000 masscharge msAlign for 26 1 11 1 1 1 1 3400 3500 3600 3700 3800 3900 4000 masscharge intensity intensity spectrum index 3000 4000 5000 6000 7000 1000 1500 500 40 60 80 20 msDenoise for 21 1 1 1 1 1 1 1 3400 3500 3600 3700 3800 3900 4000 masscharge msNormalize for 24 1 1 1 1 1 1 1 3400 3500 3600 3700 3800 3900 4000 masscharge peakmatrix for 27 50 100 150 200 peak class index measure spectrum intensity intensity intensity msNoise for 22 200 250 1 150 1 3400 3500 3600 3700 3800 3900 masscharge msPeak for 25 1000 1500 500 1 1 1 1 1 1 1 3400 3500 3600 3700 3800 3900 4000 masscharge Final object of interest z7peakmatrix rowspectrum sample columnpeak colnames mz of peak visualize the steps of preprocessing for a subset windowsrecordT parmfrowcllcexl5 sub lt 16 which spectra to display off lt 150 displayed shift between spectra xrange lt c34004000 mz range to display plotzprocessquotmsPreparequotsubsetsuboffsetoffxlimxrange plotzl processquotmsDenoisequotsubsetsuboffsetoffxlimxrange off lt 40 plotzZ processquotmsNoisequotsubsetsuboffsetoffxlimxrange off lt 150 plotz3 processquotmsDetrendquotsubsetsuboffsetoffxlimxrange plotz4 processquotmsNormalizequotsubsetsuboffsetoffxlimxrange plotzS processquotmsPeakquotsubsetsuboffsetoffxlimxrange plotzS processquotmsAlignquotsubsetsuboffsetoffxlimxrange imagez7 whatquotpeakmatrixquotcolheatcolors50 Misc Notes 0 May consider logtransforming intensities prior to preprocessing Morris et al 2005 After preprocessing may refine list of peaks by identifying some whose mz values are nearly exact multiples of others and hence potentially represent the same protein msCharge function in msProcess package After preprocessing note that peaks are not independent a casual assumption in the usual per gene tests for differential expression with microarray data Coombes et al 2007 o Denoising more important for MALDI than SELDI data smooth over isotopic envelope Tibshirani et al 2004 Misc Notes 0 MALDI produces mainly singlycharged ions so can think of m rather than mz of molecule Kaltenbach et al 2007 o Other quality checks of spectra are available Coombes et al 2003 distance from first principal components implemented in msQualify function in msProcess package Nonmonotone baseline may be more appropriate when raw spectra are not generally monotone decreasing Li et al 2005 No clear best preprocessing choices but many reasonable ones References BioRad 2008 Biomarker Discovery Using SELDI Technology A Guide to Successful Study and Experimental Design httpwwwbioradcomcmc uploadLiterature212362Bulletin 5642pdf o Coombes et al 2003 Quality Control and Peak Finding for Proteomics Data Collected From Nipple Aspirate Fluid by SurfaceEnhanced Laser Desorption and Ionization Clinical Chemistry 491016151623 o Coombes et al 2005 Improved Peak Detection and Quantification of Mass Spectrometry Data Acquired from SurfaceEnhanced Laser Desorption and Ionization by Denoising Spectra with the Undecimated Discrete Wavelet Transform Proteomics 541074117 Coombes et al 2007 PreProcessing Mass Spectrometry Data Ch 4 in Fundamentals of Data Mining in Genomics and Proteomics ed by Dubitzky et al Springer Dijkstra 2008 Bioinformatics for Mass Spectrometry Novel Statistical Algorithms Dissertation U ofGroningen httpirsubruqnlppn30666660X Du et al 2006 Improved Peak Detection in Mass Spectrum by Incorporating Continuous Wavelet TransformBased Pattern Matching Bioinformatics 221720592065 References Kaltenbach et al 2007 SAMPI Protein Identification with Mass Spectra Alignments BMC Bioinformatics 8102 Li et al 2005 SELDlTOF Mass Spectrometry Protein Data Chapter6 in Bioinformatics and Computational Biology Solutions Using R and Bioconductor edited by Gentleman et al 0 Morris et al 2005 Feature Extraction and Quantification for Mass Spectometry in Biomedical Applications Using the Mean Spectrum Bioinformatics 2191764 1775 o R Development Core Team 2007 R A language and environment for statistical computing wwwRproiectorg Randolph amp Yasui 2006 Multiscale Processing of Mass Spectrometry Data Biometrics 62589597 Tibshirani et al 2004 Sample Classification from Protein Mass Spectrometry by Peak Probability Contrasts Bioinformatics 201730343044 o Yasui et al 2003 An Automated Peak ldentificationCalibration Procedure for HighDimensional Protein Measures from Mass Spectrometers Journal of Biomedicine and Biotechnology 4242248


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Amaris Trozzo George Washington University

"I made $350 in just two days after posting my first study guide."

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.