NOTES FOR CLASS TAKEN ON 11/18/98
Taken by: Anup T Vachali
TOPIC
INTRODUCTION TO RANDOMIZED ALGORITHMS
INTRODUCTION
A randomized algorithm is one that makes random choices during its execution. For many problems a randomized algorithm is the simplest, the fastest, or both. Currently the best known algorithms for many geometric problems are randomized.
Complex problems often require tremendously slow processes to solve. Randomized algorithms save time by choosing a starting point arbitrarily, rather than deciding where the best starting point is. As a result, the algorithm will have a high probability of producing the correct solution. Several iterations may be required to ensure a correct solution, but this is often much faster than executing a complicated, deterministic algorithm one time. In addition, after three or four iterations of a randomized algorithm, the probability of success is near 100%.
As stated earlier, randomized algorithm makes random choices during its execution. This allows savings in execution time of a program. The disadvantage of this method is the possibility that an incorrect solution may occur. A well-designed randomized algorithm will have a very high probability of returning a correct answer. A failure to do so will simply result in another execution of the algorithm. Multiple executions will almost certainly have at least one successful pass.
Randomized or probabilistic algorithms are those
containing statements of the form:
x := outcome of a fair coin toss.
Monte-Carlo and Las Vegas Algorithms are examples of Randomized Algorithms. Both these algorithms are very similar. The Monte Carlo type will always produce some solution to the given problem. Occasionally, however, the solution may be incorrect. The Las Vegas algorithm only produces a solution when the right answer is found. It never returns an incorrect solution. In essence therefore, a Las Vegas algorithm is a Monte Carlo algorithm with verification. In a nutshell we could say that, Monte-Carlo algorithms are always right fast and probably correct and Las Vegas algorithms are probably fast and always right.
EXAMPLE
Consider an algorithm to find the element that is atleast as large as the median of an array. A deterministic approach would be to choose a pivot element near the median of the list and partition the list around that element. Most of the overall computation time of this algorithm will be spent on choosing an element near the median. Once this pivot is found, the remainder of the process will run quickly. The time complexity is O (n log n).
The randomized approach to this problem would be to choose a pivot at random, thus saving time at the beginning of the process. This approach would find the median in linear time, O (n). This simple example shows the most dramatic improvement in time complexity. The disadvantage is that if a terrible (bad) choice is made, that is if either the highest or lowest element is chosen, then this process will run extremely slowly. However, the probability that the worst element is chosen is very low. Thus the risk of failure may be worth taking in order to save time.
See Reference 3 for a more
comprehensive explanation.
DEFINITIONS
where, m is the expected value of the Random Variable.
Prob{ å X i ³
cm Hn } £
e
Hn [ 1+ c|n (c /e ) ]
i
We looked at the Sorting Problem in class
Sorting problem
Given a set N of n points on the real line R, find the partition H(N) of R formed by the given set of points N.
This is in fact very much like Quicksort, where an array is partitioned into two subarrays and then sorting each subarray recursively.
The sorting problem was solved using Randomized Quick Sort in both the recursive and incremental methods.
Randomized Quick sort (Recursive)
Algorithm:
Algorithm:
UNFOLD THE RECURSION
If the pivot is truthful to the permutation, then they are equivalent in that they cause the same set of comparisons.
Analysis: Expected time complexity of Quick sort.
Model of Analysis
Analysis must be independent of how points are distributed in space. There need not be a "uniform distribution". Also we assume that insertions happen in random order, i.e., all permutations of input order are equally likely.
Naïve Analysis of recursive version
Sub-problems have expected size of half of original set. Hence expected time complexity: O (n log n).
Non-Recursive version
The cost of the ( i +1)-th iteration is the size of the conflict list of the interval from which the (i+1)-th point was picked.
Expected time complexity of quick sort = sum of expected
sizes of the conflict sets from which the points are chosen.
Observation:
Given a random permutation of n points: p1,p2, ,pn. Then,
For reverse analysis, we use the second statement in the observation above instead of the first.
First fix N i+1. Then each point in N i+1 is equally likely to have occurred as the (i +1)-th point. The expected cost of the ( i +1)-th iteration is:
1/ (i +1) å J Î H ( Ni +1) ( size (J)) £ 2n / (i +1)
where size (J) is the size of the conflict set.
Total cost = n å i 1/(i +1) = O (n log n)
This is because 1 / (i +1) = 1/1 +1/2 + 1/3
+
.. = ln n.
Searching Problem
Associate a search structure S(N) with H(N) so that, given any points q Î R, one can "efficiently" locate the interval H(N) containing q.
H(N) ® S(N)
Randomized Binary Search Trees
H(N) ® B(N)
Idea
As the points are inserted and the partition is refined, construct a binary search tree. So quicksort can also give B(N).
There is a recursive view as well as an incremental view of this process.
Incremental View: HISTORY
H (N0)® H (N1)® .® H (Nn) = H (N)
B (N0)® B (N1)® ..® B (Nn) = B (N)
Construction Time for B(N) : O(n log n ).
Query Time Analysis
Assume the points p0, p1, .pn are inserted in that order.
History: H (N0)® H (N1)® .® H (Nn) = H (N)
Query point p, is in some interval of each partition.
Therefore, Query Time for point p is å i Yi
Reverse Analysis
Assume that the points p0, p1, .,pn are deleted in reverse of that order.
Reverse History: H (N) = H (Nn) ® .® H (N1) ® H (N0)
Hence, query cost = å (1/i) = O (log n)
Conclusion:
Randomized algorithms can be used to solve a wide variety of real world problems. Like approximation algorithms, they can be used to more quickly solve tough NP-Complete problems. An advantage over the approximation algorithms, however, is that a randomized algorithm will eventually yield an exact answer if executed enough times.
Additional References:
1. http://www.cs.bham.ac.uk/teaching/examples/simjava/ ----Java Simulation of a few examples of Randomized Algorithms.
2. http://www.cs.wvu.edu/~wallacec/a3/a3.html . .. A survey of Randomized Algorithms.
3. http://www.ics.uci.edu/~eppstein/161/960125.html Comprehensive explanation of the median and other such problems.