Solution Found!
Sequence alignment. When a new gene is discovered, a standard approach to understanding
Chapter 6, Problem 6.26(choose chapter or problem)
Sequence alignment. When a new gene is discovered, a standard approach to understanding itsfunction is to look through a database of known genes and find close matches. The closenessof two genes is measured by the extent to which they are aligned. To formalize this, think ofa gene as being a long string over an alphabet = {A, C, G, T}. Consider two genes (strings)x = ATGCC and y = TACGCA. An alignment of x and y is a way of matching up these twostrings by writing them in columns, for instance: A T G C CT A C G C AHere the indicates a gap. The characters of each string must appear in order, and eachcolumn must contain a character from at least one of the strings. The score of an alignment isspecified by a scoring matrix of size (|| + 1) (|| + 1), where the extra row and column are toaccommodate gaps. For instance the preceding alignment has the following score:(, T) + (A, A) + (T, ) + (, C) + (G, G) + (C, C) + (C, A).Give a dynamic programming algorithm that takes as input two strings x[1 . . . n] and y[1 . . . m]and a scoring matrix , and returns the highest-scoring alignment. The running time should beO(mn).
Questions & Answers
QUESTION:
Sequence alignment. When a new gene is discovered, a standard approach to understanding itsfunction is to look through a database of known genes and find close matches. The closenessof two genes is measured by the extent to which they are aligned. To formalize this, think ofa gene as being a long string over an alphabet = {A, C, G, T}. Consider two genes (strings)x = ATGCC and y = TACGCA. An alignment of x and y is a way of matching up these twostrings by writing them in columns, for instance: A T G C CT A C G C AHere the indicates a gap. The characters of each string must appear in order, and eachcolumn must contain a character from at least one of the strings. The score of an alignment isspecified by a scoring matrix of size (|| + 1) (|| + 1), where the extra row and column are toaccommodate gaps. For instance the preceding alignment has the following score:(, T) + (A, A) + (T, ) + (, C) + (G, G) + (C, C) + (C, A).Give a dynamic programming algorithm that takes as input two strings x[1 . . . n] and y[1 . . . m]and a scoring matrix , and returns the highest-scoring alignment. The running time should beO(mn).
ANSWER:Step 1 of 2
The two sequences given are and with sizes and respectively. We are defining a matrix that stores the score of aligning the sequence. Each value defines the score of aligning the sequence with the sequence. We can define the recursive function to find the score of alignment as follows:
The case defines that the alignment of the sequence is copied as it is till the second last element and align the characters and. Theis determined based on the match or miss of aligning the sequence.
The case is to align with a gap.
The case is to align with a gap.