Dec, 7
On-line Algorithms

In general term, on-line problems are problems that occur when the input is not completely know at beginning. For example, ski problem and paging problem are on-line problems. Sorting actually is not an on-line problem because we have to know a complete input before we can make input sorted.

  1. Ski Problem.
Suppose that you are deciding to try the sport of skiing but you do not own a pair of skis. You are not sure how many times you will go skiing before you finally give up (put it at another way, you really do not know after how many times of skiing you will break your legs and can not go to skiing anymore.) Since you need a pair of skis before skiing with your friends, you will have to get a pair of skies somewhere. Here, you have two choices: It is very hard to decide which way you want to do. If you buy a pair of skis at the beginning and then you find out that you do not like this sport at all after a couple of runs, renting should be a better choice. On the other hand, if you rent the skis every time and finally found out skiing is a great sport for you after so many times, the right choice should be purchasing a pair of skis at the beginning. So, now comes the question, should you buy a pair of skis or rent them?

The answer to this ski problem is to rent skis until the total cost of renting is greater or equal to the price of a new pair skis.
 

Is this algorithm is the best answer in general for ski problem? The answer is yes. Because by doing that our net cost will never exceed twice the minimum possible cost no matter when we finally decide to stop skiing, In a typical setting, an on-line algorithm has to face with sequence of different requests that it must service. Each request must be serviced before the next is received. The algorithm incurs a cost for servicing each request. The figure of merit we use here is to compare the total cost of the on-line algorithm on a sequence of requests with the total cost of off-line algorithm that services the same sequence of request. Such an analysis is an amortized analysis. In this sense, long-term benefit is what we interested in. We do not compare the on-line algorithm to the off-line algorithm on any single request, but the whole request sequence.
  1. Paging Problem.
To make on-line algorithm more clearly, we move to paging problem to explain on-line algorithm in details. Consider a computer memory organized as two levels: If we have multi-applications running on same computer and main memory is small, the main memory can not hold all information at once. For example, currently computer running n applications and all k units main memory already used. Then (n+1)th application is requested by user to run and n+1 application needs 1 units memory to support. In this case, we use paging method to page some information currently in main memory and write them into memory backup. Later, if information we backed up required by application processing, we need to rewrite them back into main memory again. In our example, we need at least dump 1 unit information from k units main memory into memory backup if we need (n+1)th application run correctly. This idea is simple but it is hard to figure out which unit we should to page, because the memory unit you dumped from main memory used by a part of n applications. You do not know when you need this memory unit again.

In the worst case, the information you paged maybe needed by a application processing as soon as you write them into memory backup, then you need write it back into main memory just after you wrote it into memory backup. This can really slow down your processing time for your applications, if you have to do such thing too many times.

In our paging problem, each request in sequence specifies a memory unit. If memory unit is in main memory, request can directly processed, and no cost on that request. If not, this memory unit is fetched form the memory backup to the main memory at a unit cost. When the requested memory unit is fetched form the memory backup to the main memory and main memory is full, a memory unit currently in main memory must be evicted to make room for the new unit. A paging algorithm must provide an eviction rule for deciding what memory unit currently in the main memory is evicted to make room for the new memory unit. An on-line paging algorithm must make this decision without any knowledge of future requests.

There are some typical on-line algorithms for paging problem:

Are these online algorithms good enough when they compare with off-line algorithms? Answer is yes. Let k be the minimum number of misses made by the optimal off-line algorithm. We can find out that: if there are arbitrarily long sequence on which an on-line algorithm also makes k times as many misses as the offline algorithm. At this point, we claim follow theorem:

Let A be a deterministic on-line algorithms for paging problem, Then CA >= k
(CA is the competitiveness coefficient of A)

Proof:

Assume both optimal off-line algorithm and A(on-line algorithm) are managing separate main memory for the same request sequence. We will construct a sequence of k requests to items during which A makes k misses but the optimal off-line algorithm makes only one miss. Assume that to start with, both the off-line algorithm and A have the same set of k items in their caches. The first request is to an memory unit not in either main memory; each algorithm makes a miss. Let S be the set of k+1 items consisting of the k memory units initially in the off-line algorithm's cache together with the new memory unit. Each of the next k-1 requests is to a memory unit not currently in A's main memory. On these k-1 requests, A misses every memory unit whereas the off-line algorithm makes no misses for the following reason. When the off-line algorithm makes a miss at the beginning of the round, it must evict a memory unit. How does it decide which memory unit it should evict? In order to force the lower bound, it should presumably make this choice in a manner that depends on the on-line algorithm A. It does so as follow: it observes the behavior of A during the round and sees which memory unit in S is last evicted by A in the round. The off-line algorithm selects this "last memory unit" for eviction at the beginning of the round; because A is deterministic it can predict the behavior of A during the round and the identity of the last item.