Note taken by Yanxia Liu
Lecture #4
September 14, 1998
 

Dynamic programming discussion:

What are the features of the problem that can be solved using dynamic programming?

(Elements of Dynamic Programming)
There are three properties that we will see in most of the problems which are solved by using dynamic programming: To make it efficient and avoid revisiting subproblems, we use a table (memo) to store optimal solutions of solved subproblems.
The table needs to be organized properly for easy access.

Examples of problems solved using Dynamic Programming:

A triangulation of a polygon is the subdivision of the polygon into triangles. The length of the triangulation is simply the sum of all the edges in the triangulation.

An example shown below is a polygon with 2 different triangulations:
 
 

The problem is to minimize the total length of all the lines you draw. The most efficient solution is by using dynamic programming.
The moment you draw one line, you break the polygon into two pieces, then you can simply solve two subproblems. The hard thing is to decide where to draw the first line. Once you draw the first line, you can "recursively" solve the subproblems.
This problem is very similar to previous problems we discussed, so it's easy to write recursion solution, but it won't be efficient, and what you are going to do is using dynamic programming to solve the problem.
 

Greedy Algorithm:

Seminar scheduling problem:

    Input:     n seminars [si, fi], I = 1, 2, … n.
                 1 seminar room.
    Output: the maximum number of seminars that can be scheduled in one day.
                (No two seminars scheduled can be overlapped).

Analysis of the problem:

The problem is given as input a set of seminars. Each seminar is defined by what time it starts and what time it ends. We assume that we are only dealing with seminars for one particular day and that we only have one seminar room. We are looking at the maximum number of seminars scheduled for one day. We want to schedule as many seminars as possible in one room in one day.

In this problem, you just want the maximum number of scheduled seminars. You can also be asked for which of these seminars you are going to schedule in that day.

The objective is to schedule as many seminars as possible. Clearly, the obvious requirement is that two seminars scheduled cannot overlap, i.e., the seminars scheduled for output must be completely disjointed.

Algorithm:

GREEDY-ACTIVITY-SELECTOR(s, f)
Comment: Assume that f1 £ f2 ££ fn
          1.  n ¬ length[s]
          2. A ¬ {1}
          3.  j ¬ 1
          4.  For i ¬ 2 to n
          5.      Do if si ³ fj
          6.          then A ¬ A È {i}
          7.                  j ¬ i

 Explanation of the algorithm:

We assume that seminars are sorted according to the finished time of the seminar.
Pick the seminar that finishes first, schedule it. Any seminar that overlaps it is discarded. Then we have a set of seminars left, among which you greedily pick the one that finishes first.
s1 has smallest finished time, schedule s1, then we look at seminars s2 to sn, then check si to see whether it is before or after the last seminar we just picked, if it's starting time is larger than the finishing time the last seminar we just picked, schedule it, otherwise, discard it, and continue.
Variable j keeps track the last seminar scheduled, j is reset when you schedule a seminar.

Proof of Correctness:

Proof of correctness takes two steps to prove:
1. Seminar 1 is in some optimal solution("good" choice)
2. If A is optimal for S, then A' = A - {1} is optimal for S' = S - { i Î S: si<fi}

First to prove that seminar one was a good choice to include in the optimal solution. If you pick the one that finish first, then you have best choice of scheduling the maximum number of seminar after the first one finish, as long as you pick the one finished first.
Second to show that if a seminar is an optimal solution, then clearly we can schedule that seminar and remove everything overlaps it, and solve the rest of the problem.
If the whole strategy is correct, it should be correct for whatever it is leftover, and using induction on whatever is leftover to solve the rest of the subproblems.

Proof by contradiction of claim 1:
Assume [s1, f1] is not in any optimal solution. Consider an optimal solution T.
Assume [si , fi] is the interval that finishes first in T.
We can claim that [s1, f1] must overlap [si , fi], otherwise you can add [s1, f1] to this solution, and get extra seminar to schedule, contradicting the assumption that T is optimal.
We now claim that [s1, f1] cannot overlap any interval in T other than [si , fi]. This is because f1 £ fi and the starting time of all other intervals are after fi.

            |---------|
           s1           f1
                    |-------------|   |-----------| … |-------------|
                   si                  fi

What are the features of the problem that uses greedy algorithm?

(Elements of Greedy Strategy) Optimal solution can be obtained by putting together optimal solutions for subproblems. In other words, if you look at the optimal solution, then there is a substructure of optimal solution which is a optimal solution for some subproblem.

Let's see how it is true for seminar case:

Think about the subproblem that you pick the seminar that finished first, and remove all the seminars that overlap it, for the rest of the subproblem, you can get the optimal solution for whatever left of the subproblem. If you attach to it the first seminar, then you have the optimal solution for the whole problem.
In a sense, you solve that subproblem, and add to this one particular seminar, what you get is the optimal solution for the whole problem.

Examples of using greedy algorithm:

We give four examples of problems that can be solved by the greedy method.

        1. Seminar Scheduling with one seminar room.

        2. Minimum spanning tree.

        3. Huffman codes problem:

    You want to send a message to somebody across a perfect channel. Assume the message has limited alphabet, and assume it has one of these four characters: A, C, G, T and the message is ATTCGAATTGGAAACCACACG. What you like to do is to send this message using as few bits as possible.
    So actually what you like to do is to compress the information using encoding strategy, to find the code for A, C, G, T.
    So the problem is to find the optimal encoding solution that uses as few bits as possible to encode the message and also the encoded message can be decoded.

    One simple way to do the encoding is using only two bits for a character:
            A     00
            C     01
            G     10
            T     11
    When the code is the same length, it's easy to decode the encoded message.

    Let's assume you are told the frequency of the characters in the message. In this case, it really doesn't make any sense to use above code. You can do much better by using a shorter code for more frequent character and longer code for less frequent character.
    The question is how do I design these codes so that encoded message's length is as less as possible and can also be decoded.

    Following is a example:
            Character         Frequency         Code1         Code2         Code3
                  A                     70%                00               0                 0
                  C                     20%                01               1                10
                  G                       5%                10               11              110
                  T                       5%                 11               10              111

    Using code2 is easy to encode the message and total length is shorter, but decoding is difficult, or impossible.
    Using code3 is the best of the three.
    This problem can be solved using Greedy algorithms.

        4. Knapsack problem:

            Input:     item     value     weight
                           X1       v1           w1
                           X2       v2           w2
                            .          .              .
                            .          .              .
                            .          .              .
                           Xn       vn           wn

                        Knapsack's weight limit is B
        Output: Combination of items such that (1). Total weight £ B
                                                                    (2). Total value is maximized.

The problem is to find such combination of items. The problem is a hard problem. There is a way solving this using dynamic programming. There is no known polynomial time solution for this problem.
There is a different version of the knapsack problem known as the Fractional Knapsack problem, which can be solved using greedy algorithm. In this version, you are allowed to put fractions of items into the knapsack. For example, you could put in the knapsack half of item X1 and three-quarters of item X2, as long as the total weight is bounded by B,
i.e.,
1/2w1 + 3/4w2 £ B.
The value of such a choice of items would then be:
1/2v1 + 3/4v2.

The original knapsack problem is sometimes called the 0-1 knapsack problem since you could either pick the entire item or none at all.

Proof of correctness of greedy algorithm for the seminar

Schedule Problem: (by Congjun Yang)

Let's assume that [s1, f1], [s2, f2], … [sn, fn] is the output of the greedy algorithm and
[s'1, f'1], [s'2, f'2], … [s'm, f'm] is any other schedule. We'll prove n ³ m.
Claim: fi£ f'i  " 1 £ i £ n.

The fact that greedy algorithm stops after finishing scheduling the nth seminar means no seminars starting after time fn. By the claim: fi£ f'i, we can conclude that [s'n+1, f'n+1] doesn't exist. Thus n ³ m.

Proof of claim by induction:
i = 1, f1£ f'1 is true, since [s1, f1] is the seminar which finish the earliest.
Assume fi£ f'i, it's obvious that fi+1£ f'i+1, since [si+1, fi+1] is the one starting after time fi and finishing the earliest and [s'i+1, f'i+1] starts after time fi.