Problem: COMPOSITE-NUMBERThe question can be answered with a simple "yes" or "no." It is useful to think of a decision problem, such as COMPOSITE-NUMBER, as a set containing all the "yes" instances. Then the decision problem is equivalent to testing an instance for membership in the set; we say e.g. 21 is "in" COMPOSITE-NUMBER and e.g. 19 isn't.
Instance: A positive integer n
Question: Is n composite number? (I.e., n isn't prime or 1)
Some problems are not usually thought of as decision problems, but can be framed in this context so we can study them as decision problems. One example is the Travelling Salesman Problem. Given a weighted graph G, we want know the length of the shortest path going from a starting vertex (the "first" vertex) through all the other vertices exactly once and ending up back at the first. We can frame it as a decision problem like this:
Problem: TSPIf we have an algorithm that answers this decision problem, its likely that the algorithm computes the minimum length and compares it with k, so we can solve the general problem with the decision problem (if not, we can use a simple binary search to find the minimum length with arbitrary precision; we ask e.g. "Is it less than 10?" No. "Is it less than 20?" Yes. "Is it less than 15?" Yes. "Is it less than 12?" No. And so forth...).
Instance: A graph G=(V,E), a function w:E -> R giving the weight of an edge, a vertex v0 in V and a number k in R.
Question: Is there a path through G starting at vertex v0 such that the length of that path is at most k?(R is the real numbers)
Structural complexity theorists are interested in knowing, for decision problems, how hard it is to answer them with algorithms.
A decision problem can be answered by a C program that takes the instance of the problem as input and prints out "yes" or "no."
We measure the running times of these algorithms by counting the number of fundamental operations they perform as a function of the size of the instances they are presented. We can use all the familiar asymptotic notations to simplify characterizing these functions. However, we must be very specific about the encoding or representation of the instances so that we can talk meaningfully about their sizes.
For COMPOSITE-NUMBER, a reasonable encoding is an integer in binary or decimal form. The size of an integer n encoded in binary is (log n). (An unreasonable encoding of integers would be unary, i.e., a string of n 1's.)
For TSP, we can represent real numbers with, for example, rational numbers that are the ratios of arbitrary sized integers. We can represent the graph and weight function as an ASCII file giving the list of edges and weights.
This definition of algorithms becomes insufficient (and, as it turns out, overly complex) sometimes. An idealized idea of an algorithm implemented on a Turing Machine is often used; you'll see this if you take the Formal Languages class.
Decide-Primality return "yes"Unfortunately the algorithm is incorrect; it also answers "yes" for prime numbers. Since we are cynical structural complexity theorists, we suspect people are out there pulling stunts like this so we require that the algorithm give us some proof that an instance really is a "yes" instance (we may not care if the algorithm can prove that an instance is a "no" instance).
How can an algorithm prove an instance is in the set of "yes" instances? By providing a certificate that vouches for the instance. The certificate should be something that can verify, using a different (perhaps more trusted) algorithm, that the instance really is in the set. For COMPOSITE-NUMBER, a certificate would be a nontrivial factor of n. We can easily check whether this factor really divides n evenly, proving n is really composite.
Definition: P is the class of decision problems D such that there exists an algorithm A such that
Here are some examples of problems that are in P:
Problem: SINGLE-SOURCE-SHORTEST-PATHThis is in P because we can use Dÿkstra's Algorithm to answer the question, and Dÿkstra's Algorithm runs in time polynomial in the size of the instance. If we are interested in proving the result of the decision problem, we can use the sequence of vertices along a path of length at most k as a certificate; we can easily verify the length of this path without having to trust Dÿkstra's Algorithm.
Instance: A weighted, directed graph G = (V, E), a pair of vertices v and w, both in V, and a number k in R.
Question: Is there a path in G from v to w of length at most k?
Problem: LESS-THANThe size of the problem is O(log n + log m), since we are using a binary representation of the numbers. This problem can be decided in O(log n + log m) time, which is linear in the sizes of the numbers (i.e., k=1 in the definition of P).
Instance: Two integers, n and m.
Question: Is n less than m?
Lots of interesting problems are in P. Sometimes k in the definition is large; e.g. we don't relish having to wait for an O(n6) algorithm to finish, but it is better than any exponential function. P roughly corresponds to our notion of efficient computation; if a problem is in P, then it can be solved efficiently.
Some problems are not in P. For example, consider this problem:
Problem: HALTINGSome instances can easily be decided; programs with no loops and no recursion, programs that don't have the exit() call in them, and other trivial examples. However, there is no general algorithm for deciding this problem in polynomial time. (It turns out that there is no algorithm for doing this at all, in any kind of time, but you don't have to believe that until later.)
Instance: A C program A.
Question: Will A ever call exit()?
It isn't known for many problems whether or not they are in P. These are very interesting problems. For example:
Problem: GRAPH-ISOMORPHISMIt isn't known whether GRAPH-ISOMORPHISM is in P. Someone tomorrow could come up with a polynomial time algorithm for it (or something stronger than it), or a proof that is isn't in P, but today we just don't know.
Instance: Two graphs G and H.
Question: Are G and H isomorphic? (i.e., are they the same shape? Can we relabel the vertices of one to make it identical to the other?)
For example, a polynomial proof system for COMPOSITE-NUMBER would go like this:
Definition: NP is the class of decision problems for which there exists a polynomial time proof system.
So NP is the class of problems for which we can check proofs of "yes" instances in polynomial time. Note that P is the class of problems for which we can find proofs in polynomial time; we can just let c be empty and use the polynomial time algorithm to yield a trivial polynomial proof system. (NP stands for nondeterministic polynomial time, which refers to an alternate but equivalent definition that, believe it or not, tends to give students more headaches than this one.)
Clearly, COMPOSITE-NUMBER is in NP; we just saw a polynomial proof system for it. This doesn't mean we can solve COMPOSITE-NUMBER in polynomial time, just check proofs for it in polynomial time.
TSP is also in NP. A certificate for TSP would be a path (represented as a sequence of vertices) whose length is at most k. This list of vertices is of polynomial size, being the same size as the set of vertices. It can be checked in polynomial (indeed linear) time by just adding up the relevant weights.
GRAPH-ISOMORPHISM is also in NP. A certificate would be a method of relabelling the vertices. This method can be represented as a sequence of vertices, again, polynomial in the size of the graph. The algorithm to check the certificate simply relabels all the vertices accordingly and sees if both graphs have the same set of edges, a polynomial time operation (we can just sort the edges and compare the sets).
It is somewhat frustrating to computer scientists that this issue hasn't been put to rest. There are very many problems in NP; if we can show that just one of them isn't also in P, we have proven that P != NP. Also, there is a class of problems we will see later called the NP-complete problems for which one can show that, if just one of them has a polynomial time algorithm, then P = NP.
This question of P ?= NP has important practical consequences; if P = NP, then RSA is in big trouble because all of the sudden we know that its possible to factor large numbers in polynomial time. If P = NP, we know we can find an efficient algorithm to solve the Travelling Salesman Problem, something important in many real world applications. Many other NP problems would also become tractable. If P != NP, then we know that some very important problems just can't be solved efficiently.
The question of whether P ?= NP is (not so arguably) the most important open problem in computer science. There is a self-fulfilling myth that "most computer scientists believe that P != NP," and plenty of circumstantial evidence in the form of "we can prove P != NP if <insert plausible but equally hard to prove assertion here> is true," but as of today no proof.