Dec, 7
On-line Algorithms
In general term, on-line problems are problems that occur when the input
is not completely know at beginning. For example, ski problem and paging
problem are on-line problems. Sorting actually is not an on-line problem
because we have to know a complete input before we can make input sorted.
-
Ski Problem.
Suppose that you are deciding to try the sport of skiing but you do not
own a pair of skis. You are not sure how many times you will go skiing
before you finally give up (put it at another way, you really do not know
after how many times of skiing you will break your legs and can not go
to skiing anymore.) Since you need a pair of skis before skiing with your
friends, you will have to get a pair of skies somewhere. Here, you have
two choices:
-
First, you can spend certain larger amount money (say unit P) to
buy a new pair skis and use it whenever you want to go to skiing.
-
Second, you can use small amount money (say unit R {R<P}) to
rent a pair of skis this time and do not worry about what you want to do
next time.
It is very hard to decide which way you want to do. If you buy a pair of
skis at the beginning and then you find out that you do not like this sport
at all after a couple of runs, renting should be a better choice. On the
other hand, if you rent the skis every time and finally found out skiing
is a great sport for you after so many times, the right choice should be
purchasing a pair of skis at the beginning. So, now comes the question,
should you buy a pair of skis or rent them?
The answer to this ski problem is to rent skis until the total cost
of renting is greater or equal to the price of a new pair skis.
IF P <= nR
Buy a new pair of skis
ELSE
Rent a pair of skis
(n is the number of runs)
Is this algorithm is the best answer in general for ski problem? The answer
is yes. Because by doing that our net cost will never exceed twice the
minimum possible cost no matter when we finally decide to stop skiing,
-
In the best case: You only ski once and find out this sport is not good
at all. You only spend R.
-
In a good case: After you skied m (m<n) times, you decided not
to ski any more. You spend mR (mR < P).
-
In a general case: After you ski t (t > n) times, for whatever reason,
you quitted skiing; or you stop skiing frequently. You spend 2P (if
nR = P) (The total amount of money your spent for rent skis before
nth run plus the price of a pair skis you bought at nth run.)
Because no matter how large or small t is, at nth run we
already bought a pair of skis, after nth run, you do not have to
spend anymore on skis.
-
In a very good case: Sometime after nth run, you found out skiing
is a great sport for you and you are trying to ski as many times as possible
for the rest of life. What are you spending is still 2P (if nR = P)
(The total amount of money your spent for rent skis before nth
run plus the price of a pair skis you bought at nth run.)
-
In the worst case: Just after you bought a pair of skis (just after nth
run), you find out you do not like skiing anymore and never go ski again
in your life (or you break you legs and can not skiing anymore.) In this
case, you will also spend 2P (if nR = P) (The total amount of money
your spent for rent skis before nth run plus the price of a pair
skis you bought at nth run.)
In a typical setting, an on-line algorithm has to face with sequence of
different requests that it must service. Each request must be serviced
before the next is received. The algorithm incurs a cost for servicing
each request. The figure of merit we use here is to compare the total cost
of the on-line algorithm on a sequence of requests with the total cost
of off-line algorithm that services the same sequence of request. Such
an analysis is an amortized analysis. In this sense, long-term benefit
is what we interested in. We do not compare the on-line algorithm to the
off-line algorithm on any single request, but the whole request sequence.
-
Paging Problem.
To make on-line algorithm more clearly, we move to paging problem
to explain on-line algorithm in details.
Consider a computer memory organized as two levels:
-
Main memory which directly support CPU processing. Normally, this part
of memory is fast but the size is limited. For example this part could
be RAM in a PC. Here, we assume we only have k units of this kind
memory.
-
Memory backup or virtual memory, it has a large size but slow. In a PC
item, this part is a temp space in PC's HDD allocated by Operating System.
We assume this part is infinite when we consider paging problem.
If we have multi-applications running on same computer and main memory
is small, the main memory can not hold all information at once. For example,
currently computer running n applications and all k units
main memory already used. Then (n+1)th application is requested
by user to run and n+1 application needs 1 units memory to
support. In this case, we use paging method to page some information currently
in main memory and write them into memory backup. Later, if information
we backed up required by application processing, we need to rewrite them
back into main memory again. In our example, we need at least dump 1 unit
information from k units main memory into memory backup if we need
(n+1)th application run correctly. This idea is simple but it is
hard to figure out which unit we should to page, because the memory unit
you dumped from main memory used by a part of n applications. You
do not know when you need this memory unit again.
In the worst case, the information you paged maybe needed by a application
processing as soon as you write them into memory backup, then you need
write it back into main memory just after you wrote it into memory backup.
This can really slow down your processing time for your applications, if
you have to do such thing too many times.
In our paging problem, each request in sequence specifies a memory unit.
If memory unit is in main memory, request can directly processed, and no
cost on that request. If not, this memory unit is fetched form the memory
backup to the main memory at a unit cost. When the requested memory unit
is fetched form the memory backup to the main memory and main memory is
full, a memory unit currently in main memory must be evicted to make room
for the new unit. A paging algorithm must provide an eviction rule for
deciding what memory unit currently in the main memory is evicted to make
room for the new memory unit. An on-line paging algorithm must make this
decision without any knowledge of future requests.
There are some typical on-line algorithms for paging problem:
-
Least recently used (LRU): Evict the item in the main memory
whose most recent reference occurred longest ago.
-
First-in, first-out (FIFO): Evict the item that has been in the
cache for the longest period.
-
Least frequently used (LFU): Evict the item in the cache that
has been requested least often.
Are these online algorithms good enough when they compare with off-line
algorithms? Answer is yes. Let k be the minimum number of misses made by
the optimal off-line algorithm. We can find out that: if there are arbitrarily
long sequence on which an on-line algorithm also makes k times as many
misses as the offline algorithm. At this point, we claim follow theorem:
Let A be a deterministic on-line algorithms for paging problem, Then
CA >= k
(CA is the competitiveness coefficient of A)
Proof:
Assume both optimal off-line algorithm and A(on-line algorithm) are
managing separate main memory for the same request sequence. We will construct
a sequence of k requests to items during which A makes k misses but the
optimal off-line algorithm makes only one miss. Assume that to start with,
both the off-line algorithm and A have the same set of k items in their
caches. The first request is to an memory unit not in either main memory;
each algorithm makes a miss. Let S be the set of k+1 items consisting of
the k memory units initially in the off-line algorithm's cache together
with the new memory unit. Each of the next k-1 requests is to a memory
unit not currently in A's main memory. On these k-1 requests, A misses
every memory unit whereas the off-line algorithm makes no misses for the
following reason. When the off-line algorithm makes a miss at the beginning
of the round, it must evict a memory unit. How does it decide which memory
unit it should evict? In order to force the lower bound, it should presumably
make this choice in a manner that depends on the on-line algorithm A. It
does so as follow: it observes the behavior of A during the round and sees
which memory unit in S is last evicted by A in the round. The off-line
algorithm selects this "last memory unit" for eviction at the beginning
of the round; because A is deterministic it can predict the behavior of
A during the round and the identity of the last item.