week 4 bio-computing: working on tables and gene expression
week 4 bio-computing: working on tables and gene expression CSE 6613
Popular in Bio-computing
verified elite notetaker
Popular in Buttler Hall
This 5 page Class Notes was uploaded by Marina Notetaker on Friday September 9, 2016. The Class Notes belongs to CSE 6613 at Mississippi State University taught by Andy Perkins in Fall 2016. Since its upload, it has received 9 views. For similar materials see Bio-computing in Buttler Hall at Mississippi State University.
Reviews for week 4 bio-computing: working on tables and gene expression
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 09/09/16
September 7, 2016 a. Starting asking in the keyboard to open a file name: b. Ask to open the file to only read access c. So why using infilename and inf? Not only just one? It is because infilename is only a string that contains the name of the file, while inf is a file type object. That is what allows you to put that program into the file. d. File dot commands: f.read( ) = read all lines in the file f.readline( ) = read only one line f.write( ) what is inside the ( ) is what exactly it needs to do, location or what to write close does need anything because it knows it is supposed to close all. f.closed = do not need ( ). It is not really method; it says if the file is whether closed or not. Like true ot false. You use when you have more than one file and you need to know if the one you don’t want to use is closed or not. You can use conditional to know that: if f.closed: print (true) e. how to make the program to read your file: mylines = f.read ( ) : it returns to you all the lines of the file and puts them into a list. Each thing in the list would be a line in the file. myline = f.readline ( ): that reads one line from a file for line in inf: then you tab over whatever you want it to do from the lines f. to not have spaces between the lines when you print: line=line.strip ( ) : it could be other than line, you could call each string as x, so it would be “for x in inf”, print (x), x=x.strip(), etc g. To compute the GC content: .count ( ) = not the most efficient way to do, but it is the easiest. So you say what you want it to count. Note it is doing a loop from the first to the last line sequence you have listed. It will print the line (sequence) with the CG count. So it counted the number of G and C from each line. GC content is G+C / total bases on a sequence. So: Then it prints the percentage of GC content per line or per sequence. Same thing can be applied for an input sequence on your interpreter: Note LINE is each character from a list! h. Printing codons from a list: For strings in a list: For index in range (0, len(string), 3): Codon = string [index:index+3] GENE EXPRESSION DATA: Mainly you work on a table. It is basically a matrix. Example: have 3 samples and 3 genes. Gene1 have an RNA concentration of 1.5. When working on tables, you need codes to recognize each number of your table without turning it into a list. 1. Opening your file and printing each line 2. Turning into a list so each character will be analyzed separated So it will be line  = geneIS, line  = S1, line  = S2, line  = S3. Next time through the loop line  = gene1, line = 1.5, etc. 3. If you want only the results from sample one, you print fields for example 4. Average sample 1 across the different genes (column): a. First you need to tell the average starts in zero. So avg=0 and count =0. In which count is counting the number of lines. b. You need to indicate that you going to sum in a loop in the fields. So avg+= float(fields) c. HOWEVER, you don’t want to include the first line of your file because it’s not numbers! One way you can make it is using: f.readline( ) = it reads a line and goes to the next. It is different from mylines= lines.readline( ), because this second one store each line it reads d. Don’t forget you need to use float because once it is all a list, it becomes a string and you need to turn all into a number. 5. Averaging everyone together: a. Saving the header line: header = inf.readline( ). b. Split the headers into its parts, so Gene ID is header field , S1 is header field , etc. c. Because we have 4 head fields and only 3 numbers per line, we need to indicate that by numsmples = len(headerfields) -1 d. Turn average into a list: avg = [ ], and then you tell the length you want to build your list, so for index in range, than you will add each number of your list .append( ). e. However, now you need to indicate the fields to calculate… You could do by saying each one of the columns: avg  += fields  avg  += fields  avg  += fields  But It would work only for tables with 3 columns. f. But you can generalize it by doing: avg[index] += fields [index+1] g. The range should start at , where the numbers are. 6. Averaging each gene (rows) a. Because now you would have 2 types of average, you must call them differently. So it can be colavg and rowavg. b. Starting with telling that each avg is an empty list: rowavg = [ ] and colavg = [ ] c. Changing all avg from before to colavg d. Indicating what is row avg Note: it will print the average in different lines. >>> These are the codes to the row 7. Printing everything as a table you will need to include: a. header=header.strip( ) b. mylines = [ ] for line in inf: line = line.strip( ) fields = line.split () mylines.append(line) Now that you identify what are the columns and rows, you can make it print properly. a. Printing header with the average in a space from the last column: print(header+' \t avg') b. Command the length of what you will print for the lines average: for index in range(0, len(mylines)): print(str(mylines[index])+ '\t' +str(rowavg[index])) c. Print dots to separate the average of the columns: print('-'*38) >>> the number will vary depending upon the length of your table d. Command to print the average from the columns: print ('avg', end=' ') for avg in colavg: print ('\t' +str(avg), end=' ')
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'