Augds

Note Taker: Srivani Adathakula
Lecture # 4
Date : Sep 9^th ’98. AUGMENTING DATA STRUCTURES

Why is Augmentation needed?

Some engineering situations do not need more than a "text book" data structure such as a doubly linked list, a hash table or a binary tree, but many others require a dash of creativity. In order to meet the requirements of these special situations augmenting data structures is needed. So often, it is enough to augment a text book data structure by storing additional information in it, and designing new operations to support the desired application.

Can any Data Structure be augmented?

Any data structure can be augmented. But augmenting a data structure is not always straight forward, since the added information must be updated and maintained by the ordinary operations on the data structure.
       How to augment a data structure:
           The steps involved in augmenting a data structure.
            1    Choosing an underlying data structure.
            2    Determining additional information to be maintained in the underlying data structure.
            3    Verifying that the additional information can be maintained for the basic modifying operations on the underling data   structure.
            4    Developing new operations.

            It is not necessary that above steps should be followed in the same order. Most design work contains an element of trial and error, and progress on all steps usually proceeds in parallel. Now consider the above four steps for red-black trees. As step 1, red black tree is chosen as the underlying data structure.
For step 2, the size of the subtree rooted at node x is stored at every node x in the tree.
For step 3, we verify that insertion and deletion could maintain the size fields while still running in O(lg n).
For step 4, new operations are developed: OS-Select and OS-rank.       Definitions of Rank and Select:

        Rank: The rank of an element is its position in the linear order of the set.
        Select: Select is the operation of retrieving an element of a given rank from a linearly ordered set.
        Median: Median is a special case of select. It is the operation of retrieving an item with rank [n/2] in a set of ‘n’
                           ordered elements.

why rank can not be stored in each node?

The rank can not be stored at the node, since every time an insertion or deletion is performed, the value of it could become invalid.

The augmented red-black tree contains two parts: the value and the size of the subtree containing the node as the root.

Operations:
Retrieving an element with a given rank. OS-SELECT(x,i)
Determining the rank of an element. OS-RANK(x, y)

For the procedures OS-Select (x, i), OS-rank (x, y) please refer to p 282 of [CLR].

Here is the recursive version of the OS-Rank Algorithm.

Given the value of a node OS-Rank (x, y) gives the rank of that element in the tree.

OS-RANK(x,y)
1    if x = y
2    then return size[left[y]] + 1
3    else if ( key[x] < key[y] )
4    then return OS-Rank(x,left[y])
5    else return OS-Rank ( x,right[y]) + size[left[y]] + 1

          In the above algorithm, we are required to find the rank of x in the subtree rooted at y.

                   Initially the algorithm is called with y pointing to the root of the tree. The two values x and y are compared. If they are same then the rank of the required element is the rank of y, which is equal to "size[left[T]] + 1" i.e. root[T]. Otherwise the values key[y] and key[x] are compared. If key[x] < key[y] it means that the element is in the left sub tree of the given tree. So the function OS-Rank(x, left[y]) is called. Otherwise the element is in the right sub tree. So the function OS-Rank(x, right[y]) + size[left[y]] + 1 is called. On moving towards right the number of nodes in the left sub tree and the one for the root node are added. The recursion is continued until the rank of the required element is found.
INTERVAL TREES                           An Interval tree is a red-black tree that maintains dynamic set elements, with each element
                    containing an interval int[x].

The basic data structure is a red-black tree and the problem is to maintain set of intervals such that each interval [start time, finish time]. An interval is convenient to represent events that occupy a continuous period of time.

o What is the key value in an interval tree?

We use the start time as the primary key value of the interval tree. If duplicate keys are possible, then we could use the finish time as the secondary key.

Max[x] i.e. the maximum value of all right endpoint of intervals stored in the sub tree rooted at x.
o Why can we maintain max value? Since any interval’s high endpoint is at least as large as its low end point, max[x] is the maximum value of all right endpoints in the sub tree rooted at x. So by looking at the left and right children’s max values, the max value can be maintained at the node x. Ex: A set of seminars scheduling can be represented by interval trees. Seminars are represented by using red-black trees such that each node in the tree contains the start and finish time of a seminar. And this tree is augmented by storing the value max(X), as described above. Refer to Figure15.4 in p 291 of [CLR]for the augmented Red-Black tree. Operations:

Interval trees support the following three operations.

Interval-Insert: Interval-Insert (T,x) adds the element x, whose int field is assumed to contain an interval, to the interval tree T.

Interval-Delete: Interval-Delete(T,x) removes the element x from the interval tree T.

Interval-Search(T,i): Interval-Search(T,i) returns a pointer to an element x in the interval tree T such that int[x] overlaps interval i, or NIL if no such element is in the set. Note that this operation cannot be efficiently implemented without the augmented information.

Refer to p 292 of [CLR] for algorithms for the above operations.

DYNAMIC PROGRAMMING

What is Dynamic Programming?

Dynamic Programming is a method of solving the problems by combining the solutions of its subproblems as in Divide and Conquer method. If the number of subproblems is large and if there is considerable overlap of subproblems, then it is not useful to use Divide and Conquer method. But Dynamic Programming in this context solves every subproblem just once and then saves it’s answer in a table, thus avoiding the work of recompiling the answer every time the subproblem is encountered.

Dynamic Programming is an expensive method. It uses more space but it is very powerful and solves many fairly complicated problems.

Dynamic Programming is typically applied to optimization problems. In such problems there can be many possible solutions and the objective is to find a solution with the optimal value. There may be more than one optimal solution to problem.

Longest Common Subsequence Problem

Description: A subsequence of a given sequence is just the given sequence with some elements left out.

Ex: Given a sequence X = < x1, x2, … xm > and the sequence Z = < z1, z2, …,z3 > is a sub sequence of X if there exists a strictly increasing sequence (i1,i2,…,ik) of indices of x such that for all j = 1,2,…,k there is a xij such that xij = zj.

Note: Subsequence should always go from left to right. And the elements in the common subsequence need not be adjacent.

Common Subsequence: Given two sequences X of length ‘m’ and Y of length ‘n’, the sequence Z is said to be common subsequence of X and Y if Z is a subsequence of both X and Y.

Ex: X = < A, B, R, A, C, A, D, A, B, R, A>

Y = < R, A, B, A, D, A, B >

The sequence <A, B, A, D> is a Common subsequence of both X and Y. But it is not the Longest Common Subsequence.

In the Longest Common Subsequence problem two sequences X and Y are given and the objective is to find the longest common subsequence of X and Y.

For the above example

Input: The two sequences X and Y.

Output: < A, B, A, D, A, B >

For Algorithms refer to pages 316 and 317 of [CLR].

The recursive version for the algorithm LCS-Length(x,y,m,n) is as follows. Where X and Y are the given sequences. And m and n are the lengths of the two sequences X and Y respectively.

LCS-Length(x,y,m,n)
1    (Xm = Yn)
2    then return max{ LCS-Length (X,Y,m-1,n-1) + 1,
3                              LCS-Length (X,Y,m,n-1),
4                              LCS-Length (X,Y,m-1,n) }
5    else return max { LCS-Length (X,Y,m,n-1),
6                              LCS-Length (X,Y,m-1,n) }

But this method is not preferable, since many subproblems are revisited.
Dynamic Programming avoids recomputation of subproblems.
For computing the values the formula 16.5 in p 316 of [CLR] is used.
For the above example the matrices C and B are as follows.

C	A	B	R	A	C	A	D	A	B	R	A
R	0	0	1	1	1	1	1	1	1	1	1
A	\ 1 \	1	1	2	2	2	2	2	2	2	2
B	1	\ 2 \	¬ 2	2	2	2	2	2	2	2	2
A	1	2	2	\ 3 \	¬ 3	¬ 3	3	3	3	3	4
D	1	2	2	3	3	3	\ 4 \	4	4	4	4
A	1	2	2	3	3	4	4	\ 5 \	5	5	5
B	1	2	2	3	3	4	4	5	\ 6 \	¬ 6	¬ 6

Note that 6 is the length of the longest common subsequence. It is the entry on the last row, last column.

To print the Longest Common Subsequence from the above table the following algorithm is used.

PRINT-LCS(b,X, i, j)

1    if i =0 or j =0
2        then return
3    if b[i,j] = "\"
4        then PRINT-LCS(b,X,i-1,j-1)
5        print xi
6       elseif b[i,j] = " "
7        then PRINT-LCS(b,X,i-1,j)
8        else PRINT-LCS(b,X,i,j-1)
        This algorithm simply follows the directional arrows starting from the entry in the last row, last column.

From the above table this procedure prints < A, B, A, D, A, B>". The procedure takes time O(m+n), since at least one of i and j is decremented in each stage of the recursion.

ffd