On-line algorithm

Dec, 7 On-line Algorithms

In general term, on-line problems are problems that occur when the input is not completely know at beginning. For example, ski problem and paging problem are on-line problems. Sorting actually is not an on-line problem because we have to know a complete input before we can make input sorted.

Ski Problem.

Suppose that you are deciding to try the sport of skiing but you do not own a pair of skis. You are not sure how many times you will go skiing before you finally give up (put it at another way, you really do not know after how many times of skiing you will break your legs and can not go to skiing anymore.) Since you need a pair of skis before skiing with your friends, you will have to get a pair of skies somewhere. Here, you have two choices:

First, you can spend certain larger amount money (say unit P) to buy a new pair skis and use it whenever you want to go to skiing.
Second, you can use small amount money (say unit R {R<P}) to rent a pair of skis this time and do not worry about what you want to do next time.

It is very hard to decide which way you want to do. If you buy a pair of skis at the beginning and then you find out that you do not like this sport at all after a couple of runs, renting should be a better choice. On the other hand, if you rent the skis every time and finally found out skiing is a great sport for you after so many times, the right choice should be purchasing a pair of skis at the beginning. So, now comes the question, should you buy a pair of skis or rent them?

The answer to this ski problem is to rent skis until the total cost of renting is greater or equal to the price of a new pair skis.

P <= nR

(n is the number of runs)

Is this algorithm is the best answer in general for ski problem? The answer is yes. Because by doing that our net cost will never exceed twice the minimum possible cost no matter when we finally decide to stop skiing,

In the best case: You only ski once and find out this sport is not good at all. You only spend R.
In a good case: After you skied m (m<n) times, you decided not to ski any more. You spend mR (mR < P).
In a general case: After you ski t (t > n) times, for whatever reason, you quitted skiing; or you stop skiing frequently. You spend 2P (if nR = P) (The total amount of money your spent for rent skis before nth run plus the price of a pair skis you bought at nth run.) Because no matter how large or small t is, at nth run we already bought a pair of skis, after nth run, you do not have to spend anymore on skis.
In a very good case: Sometime after nth run, you found out skiing is a great sport for you and you are trying to ski as many times as possible for the rest of life. What are you spending is still 2P (if nR = P) (The total amount of money your spent for rent skis before nth run plus the price of a pair skis you bought at nth run.)
In the worst case: Just after you bought a pair of skis (just after nth run), you find out you do not like skiing anymore and never go ski again in your life (or you break you legs and can not skiing anymore.) In this case, you will also spend 2P (if nR = P) (The total amount of money your spent for rent skis before nth run plus the price of a pair skis you bought at nth run.)

In a typical setting, an on-line algorithm has to face with sequence of different requests that it must service. Each request must be serviced before the next is received. The algorithm incurs a cost for servicing each request. The figure of merit we use here is to compare the total cost of the on-line algorithm on a sequence of requests with the total cost of off-line algorithm that services the same sequence of request. Such an analysis is an amortized analysis. In this sense, long-term benefit is what we interested in. We do not compare the on-line algorithm to the off-line algorithm on any single request, but the whole request sequence.

Paging Problem.

To make on-line algorithm more clearly, we move to paging problem to explain on-line algorithm in details. Consider a computer memory organized as two levels:

Main memory which directly support CPU processing. Normally, this part of memory is fast but the size is limited. For example this part could be RAM in a PC. Here, we assume we only have k units of this kind memory.
Memory backup or virtual memory, it has a large size but slow. In a PC item, this part is a temp space in PC's HDD allocated by Operating System. We assume this part is infinite when we consider paging problem.

If we have multi-applications running on same computer and main memory is small, the main memory can not hold all information at once. For example, currently computer running n applications and all k units main memory already used. Then (n+1)th application is requested by user to run and n+1 application needs 1 units memory to support. In this case, we use paging method to page some information currently in main memory and write them into memory backup. Later, if information we backed up required by application processing, we need to rewrite them back into main memory again. In our example, we need at least dump 1 unit information from k units main memory into memory backup if we need (n+1)th application run correctly. This idea is simple but it is hard to figure out which unit we should to page, because the memory unit you dumped from main memory used by a part of n applications. You do not know when you need this memory unit again.

In the worst case, the information you paged maybe needed by a application processing as soon as you write them into memory backup, then you need write it back into main memory just after you wrote it into memory backup. This can really slow down your processing time for your applications, if you have to do such thing too many times.

In our paging problem, each request in sequence specifies a memory unit. If memory unit is in main memory, request can directly processed, and no cost on that request. If not, this memory unit is fetched form the memory backup to the main memory at a unit cost. When the requested memory unit is fetched form the memory backup to the main memory and main memory is full, a memory unit currently in main memory must be evicted to make room for the new unit. A paging algorithm must provide an eviction rule for deciding what memory unit currently in the main memory is evicted to make room for the new memory unit. An on-line paging algorithm must make this decision without any knowledge of future requests.

There are some typical on-line algorithms for paging problem:

Least recently used (LRU): Evict the item in the main memory whose most recent reference occurred longest ago.
First-in, first-out (FIFO): Evict the item that has been in the cache for the longest period.
Least frequently used (LFU): Evict the item in the cache that has been requested least often.

Are these online algorithms good enough when they compare with off-line algorithms? Answer is yes. Let k be the minimum number of misses made by the optimal off-line algorithm. We can find out that: if there are arbitrarily long sequence on which an on-line algorithm also makes k times as many misses as the offline algorithm. At this point, we claim follow theorem:

Let A be a deterministic on-line algorithms for paging problem, Then C_A >= k
(C_A is the competitiveness coefficient of A)

Proof:

Assume both optimal off-line algorithm and A(on-line algorithm) are managing separate main memory for the same request sequence. We will construct a sequence of k requests to items during which A makes k misses but the optimal off-line algorithm makes only one miss. Assume that to start with, both the off-line algorithm and A have the same set of k items in their caches. The first request is to an memory unit not in either main memory; each algorithm makes a miss. Let S be the set of k+1 items consisting of the k memory units initially in the off-line algorithm's cache together with the new memory unit. Each of the next k-1 requests is to a memory unit not currently in A's main memory. On these k-1 requests, A misses every memory unit whereas the off-line algorithm makes no misses for the following reason. When the off-line algorithm makes a miss at the beginning of the round, it must evict a memory unit. How does it decide which memory unit it should evict? In order to force the lower bound, it should presumably make this choice in a manner that depends on the on-line algorithm A. It does so as follow: it observes the behavior of A during the round and sees which memory unit in S is last evicted by A in the round. The off-line algorithm selects this "last memory unit" for eviction at the beginning of the round; because A is deterministic it can predict the behavior of A during the round and the identity of the last item.