Edit distance In order to transform one source string of

Chapter 15, Problem 15-5

(choose chapter or problem)

Edit distance In order to transform one source string of text x1 : : m to a target string y1 : : n, we can perform various transformation operations. Our goal is, given x and y, to produce a series of transformations that change x to y. We use an array assumed to be large enough to hold all the characters it will needto hold the intermediate results. Initially, is empty, and at termination, we should have j D yj for j D 1; 2; : : : ; n. We maintain current indices i into x and j into , and the operations are allowed to alter and these indices. Initially, i D j D 1. We are required to examine every character in x during the transformation, which means that at the end of the sequence of transformation operations, we must have i D m C 1. We may choose from among six transformation operations: Copy a character from x to by setting j D xi and then incrementing both i and j . This operation examines xi. Replace a character from x by another character c, by setting j D c, and then incrementing both i and j . This operation examines xi. Delete a character from x by incrementing i but leaving j alone. This operation examines xi. Insert the character c into by setting j D c and then incrementing j , but leaving i alone. This operation examines no characters of x. Twiddle (i.e., exchange) the next two characters by copying them from x to but in the opposite order; we do so by setting j D xi C 1 and j C 1 D xi and then setting i D i C 2 and j D j C 2. This operation examines xi and xi C 1. Kill the remainder of x by setting i D m C 1. This operation examines all characters in x that have not yet been examined. This operation, if performed, must be the final operation. for Chapter 15 407 As an example, one way to transform the source string algorithm to the target string altruistic is to use the following sequence of operations, where the underlined characters are xi and j after the operation: Operation x initial strings algorithm copy algorithm a copy algorithm al replace by t algorithm alt delete algorithm alt copy algorithm altr insert u algorithm altru insert i algorithm altrui insert s algorithm altruis twiddle algorithm altruisti insert c algorithm altruistic kill algorithm altruistic Note that there are several other sequences of transformation operations that transform algorithm to altruistic. Each of the transformation operations has an associated cost. The cost of an operation depends on the specific application, but we assume that each operations cost is a constant that is known to us. We also assume that the individual costs of the copy and replace operations are less than the combined costs of the delete and insert operations; otherwise, the copy and replace operations would not be used. The cost of a given sequence of transformation operations is the sum of the costs of the individual operations in the sequence. For the sequence above, the cost of transforming algorithm to altruistic is .3 cost.copy// C cost.replace/ C cost.delete/ C .4 cost.insert// C cost.twiddle/ C cost.kill/ : a. Given two sequences x1 : : m and y1 : : n and set of transformation-operation costs, the edit distance from x to y is the cost of the least expensive operation sequence that transforms x to y. Describe a dynamic-programming algorithm that finds the edit distance from x1 : : m to y1 : : n and prints an optimal operation sequence. Analyze the running time and space requirements of your algorithm. The edit-distance problem generalizes the problem of aligning two DNA sequences (see, for example, Setubal and Meidanis [310, Section 3.2]). There are several methods for measuring the similarity of two DNA sequences by aligning them. One such method to align two sequences x and y consists of inserting spaces at 408 Chapter 15 Dynamic Programming arbitrary locations in the two sequences (including at either end) so that the resulting sequences x0 and y0 have the same length but do not have a space in the same position (i.e., for no position j are both x0 j and y0 j a space). Then we assign a score to each position. Position j receives a score as follows: C1 if x0 j D y0 j and neither is a space, 1 if x0 j y0 j and neither is a space, 2 if either x0 j or y0 j is a space. The score for the alignment is the sum of the scores of the individual positions. For example, given the sequences x D GATCGGCAT and y D CAATGTGAATC, one alignment is G ATCG GCAT CAAT GTGAATC -*++*+*+-++* A + under a position indicates a score of C1 for that position, a - indicates a score of 1, and a * indicates a score of 2, so that this alignment has a total score of 6 1 2 1 4 2 D 4. b. Explain how to cast the problem of finding an optimal alignment as an edit distance problem using a subset of the transformation operations copy, replace, delete, insert, twiddle, and kill. 15-6 Plannin

Unfortunately, we don't have that question answered yet. But you can get it answered in just 5 hours by Logging in or Becoming a subscriber.

Becoming a subscriber
Or look for another answer

×

Login

Login or Sign up for access to all of our study tools and educational content!

Forgot password?
Register Now

×

Register

Sign up for access to all content on our site!

Or login if you already have an account

×

Reset password

If you have an active account we’ll send you an e-mail for password recovery

Or login if you have your password back