NP AND NP-Completeness
 
Date          :                10/07/98
Notes By  :   Sri Satish Ambati
 

Some Definitions

Problem : An abstract problem is defined as a function from a set I of instances to a set S of problem solutions.

Problem Instance: An instance of the problem is obtained by assigning values to the parameters of the problem.

P : I à S
 
Algorithm: An algorithm solves a problem if and only if it  gives correct solutions to all instances of the problem in a finite time.

Input Length is the length of an encoding of an instance of the problem. Time and space complexities are written in terms of the input length.

Worst Case Complexity is the maximum time/space required for any instance of length n.The Time Complexity of a problem, P, is the running time of the `best' algorithm for P. The Space Complexity of a problem, P, is the amount of memory used by the `best' algorithm for P.

Complexity Class à is the set of problems that are solvable by polynomial time algorithms.A problem is polynomial time solvable if there exists an algorithm to solve it in time O(nk) for some constant k.

A Decision problem takes as input a finite length binary string and returns as output a 0 or a 1, hence a decision
problem

P : { 0, 1 }* --> { YES,NO}

A Decision Problem may be seen as an infinite sequence of decision problems [Pn], where Pn : { 0, 1 }n -> { 0, 1 }

is the problem P limited to inputs of length n.
 

An Overview of NP and NP-Completeness

The term "NP" refers to a class of problems which can be solved by a nondeterministic algorithm in time  polynomial in the length of the input .

NP-Completeness

So far we've seen a lot of good news: such-and-such a problem can be solved quickly (in close to linear time, or at least a time that is some small degree polynomial of the input length).

NP-completeness is a form of bad news: evidence that many important problems can't be solved quickly.

Why should we care?

These NP-complete problems really come up all the time. Knowing they're hard lets us stop beating our head against a wall trying to solve them, and do something better:

What to do once we know a problem is hard?

Classification of problems

The subject of computational complexity theory is dedicated to classifying problems by how hard they are. There are many different classifications; some of the most common and useful are the following. (One technical point: these are all really defined in terms of yes-or-no problems -- decision problem structure) NP does not stand for "non-polynomial". There are many complexity classes that are much harder than NP. NP-completeness theory is concerned with the distinction between  P & NP.

 

The Complexity Class-NP :
 
Any set L ÍSis said to be a Language L over an alphabet S.

The Complexity Class P is the class of languages that are accepted in polynomial time by some algorithm A, i.e.,

P = { L Í {0,1}*: there exists an algorithm A that decides L in polynomial time}.

co-P is the complexity class of languages L such that ` L Î P. Since complement problem can be solved in polynomial time it follows that co-P = P.

Polynomial Time verification algorithms verify membership in a polynomial time

The complexity class NP is the class of languages that can be verified by a polynomial-time algorithm. More precisely, a language L belongs to NP if and only if there exists a two input polynomial-time algorithm A and constant c such that

L = { x Î {0,1}* : there exists a certificate y with | y | = O( |x|c ) such that A(x,y) = 1}.

The complexity class co-NP is defined as the set of languages L such that ` L Î NP. It is an open question whether NP =?= co-NP.
 

Reducibility and NP-Completeness

A language L1 is polynomial time reducible to L2,written as L1£ p L2 ,if there exists a polynomial time computable function

f : {0,1}* ® {0,1}* such that for all xÎ {0,1}* ,

xÎ L1 if and only if f (x)  Î L2 . We call the function f as reduction algorithm.

A problem p1 is reducible to problem p2 , if there exists an algorithm R that takes any instance i1 of p1 as input and produces an instance i2 of p2 as output,with the constraint that the solution for i1 is YES if and only if the solution for i2 is YES. Thus R converts YES(NO) instances of p1 to YES(NO) instances of p2.

Polynomial time reductions provide a formal means for showing that one problem is at least as hard as another. If L1 £ p L2 then L1 is not more than a polynomial factor harder than L2,i e., T(p1) £ T(p2) + polynomial.

NP-Completeness

NP-Complete languages are the hardest of the NP languages

A language LÍ {0,1}* is NP-Complete if

  1. LÎ NP, and
  2. L'£ p L for every L'Î NP.
A language that satisfies the second condition but not necessarily the first is called NP-hard.

Consequences of NP completeness

Here is a major consequence of NP completeness. Suppose that A is an NP complete problem . Then
If A is in P, then P=NP. That is, if there is a feasible algorithm to solve A, then there are feasible algorithms for every single NP complete problem.
If any problem in NP is not polynomial-time solvable then all NP-complete problems are not polynomial time solvable.

The question of whether P=NP is still open.

Transforming infinitely many decision problems into just one decision problem

In 1971 Steven Cook proved one of the most important results in modern computational complexity theory. This result is (now) known as

Cook's Theorem: SAT is NP-complete

The Circuit satisfiability problem is, " Given a boolean combinatorial circuit composed of AND, OR and NOT gates, is it satisfiable?".

Proof: See lemma 36.5 and 36.6 (pg 934-935) which put together prove that SAT is NP-complete.

[Let A be a two input polynomial time algorithm that can verify SAT. A needs a boolean circuit and certificate as inputs.For each logic gate in the circuit it checks that the value provided by the certificate on the output wire is correctly computed as a function of values in the input wires. then if the output of the entire circuit is 1 the algorithm outputs a 1.Otherwise,A outputs 0.Whenever a satisfiable circuit is input to the algorithm A,there is a certificate whose length is polynomial in the size of C that causes A to output a 1.Whenever an unsatisfiable circuit is input, no certificate can fool A into believing that the circuit is satisfiable. Algorithm A runs in polynomial time and thus SAT Î NP. Proving SAT is NP-hard involves representing the computation of A as a sequence of configurations each configuration broken down into parts.Each configuration represents the state of the computer for one step of the computation. Besides the algorithm 'A' and the certificates, it includes the program counter, auxiliary machine state and the working storage. The combinational circuit implementing the computer hardware has size polynomial in length of a configuration,which is polynomial in O(nk) and hence is polynomial in n.]

By the efforts of Richard Karp, the full consequences of Cook's theorem was well understood.

Within a few months of Cook's result dozens of new NP-complete problems had been identified. By 1994 well over 10,000 basic NP-complete decision problems were known.
 

The following well-known problems are all NP-complete

A proof that a decision problem is NP-complete is accepted as evidence that the problem is intractable since a fast method of solving a single NP-complete problem would immediately give fast algorithms for all NP-complete problems. Given a problem p inorder to prove it is NP-complete we have to first find a reduction from a known NP- complete problem and then prove it is NP.

We look at some of the problems:

3-SAT :

A literal in a boolean formula is an occurrence of a variable or its negation. A boolean formula is in conjunctive form, or CNF, if it is expressed as an AND of clauses, each of which is the OR of one or more literals. A boolean formula is in 3-conjunctive form,or 3-CNF,if each clause has at most three distinct literals.

for example: (x1V x2V --x3)AND(--x1V-- x2V x3)AND(x1V --x2V x3)

is in 3-CNF.Satisfiability of boolean formulas in 3-conjunctive normal form is NP-complete, i.e., deciding if there is an assignment of boolean values to the variables that will make the formula true is an NP-Complete problem.

Proof: 3-CNF SAT Î NP follows from a similar argument as the one for SAT Î NP. However to show that 3-CNF SAT is NP-hard it is enough to prove that SAT is reducible to 3-CNF SAT.

CLIQUE

Input Instance: n-node graph G( V, E ); positive integer k

Question: Does the graph have a clique of size at least k.

Output: 1 if there is a set of k nodes, W, in V such that every pair of nodes in W is joined by an edge in E; 0 otherwise.

CLIQUE is NP-complete.We now show that every problem in NP can be reduced to CLIQUE in polynomial time. For each given graph G=(V,E) we use the set VÍV of the vertices in the clique as a certificate for G.Checking whether V' is a clique can be accomplished in polynomial time by checking whether,for every pair u,v Î V', the edge (u,v) belongs to E. We can show that Clique problem is NP-hard by showing that 3-CNF SAT£ p CLIQUE (refer pg 947).

Hamiltonian Cycle (HC)

Input Instance: n-node graph, G( V, E )

Question: Is there a cycle of edges in G that includes each of the n nodes exactly once?

Output: 1 if there is a cycle of edges in G that includes each of the n nodes exactly once;0 otherwise.

 Three Processor Scheduling (O3PS)

Instance: a three by n matrix of times for n tasks on three processors, and an integer k;
Question: is there an assignment of tasks to processors such that the maximum runtime on all processor is at most k?
Multi-processor scheduling (mPS)
Instance: A multiset A of tasks, a measurement of the time required for each task l: A-->N, a deadline (real number) D;
Question: is there a partition of A into m disjoint sets such that the total time (sum of l(a)) for every element in a partition is always at most D?
Satisfiability (SAT)
Instance: a boolean formula;
Question: is there an assignment of truth values to boolean variables which will make the formula true?
There are many more. Note that by equating problems with sets of "yes" instances we mean, for example, that SAT is the set of boolean formulae which are satisfiable.

Think of y as a certificate that x is in A, and V(x,y) as a test that the proof is convincing.

Note that all the following problems are in NP:
 
problem  Role of y  V(x,y) 
Composite  y encodes the factors in a prime factorization. V(x,y) multiplies & checks product.
Hamiltonian Cycle y encodes a cycle  V(x,y) test that y is a cycle in the input crossing every edge in x exactly once 
3Processors  y is a schedule assigning tasks to processors  V(x,y) verifies that using schedule y will schedule all the tasks in x with a runtime of at most k 
3Partitions  y describes the sets in the partition of A  V(x,y) adds the time for each subsets making up the partition, to be sure it is always at most D 
SAT  y encodes an assignment of truth values to the boolean variables  V(x,y) tests that y will make x true