As you probably know, whether is a major unsolved problem in computer science.

Even if you believe , it is tempting to think that NP P NP-complete – that every problem in NP can either be solved in polynomial time or is expressive enough to encode SAT.

Ladner’s theorem states that this intuition isn’t true by showing the existence of NP-intermediate problems: problems that are in NP but are neither in P nor NP-complete (assuming ). The proof in the Arora/Barak book is quite interesting, and this is a short writeup on how I understood it intuitively. I have sacrificed rigor for intuition; the full proof can be found in the book and various other places.

The proof describes a language that is NP-intermediate by construction. The construction almost feels like cheating as you’ll see. But the proof is technically correct, which is the best kind of correct.

Influencing Complexity by Padding

The proof utilizes a trick of padding problem instances to enable “faster” algorithms.

Consider a problem, POLYSAT, with instances of the form , where

  • SAT is the set of all SAT instances
  • means followed by s
  • is the length of

Given a string , the goal is to return if is of the form and is satisfiable, and otherwise.

POLYSAT can be solved in “polynomial time”: we can naively check all possibilities in polynomial time because the size of the input is .

For contrast, we wouldn’t have gotten this exponential speedup had the instances been of the form where is some constant – the extra padding of size lets us “hide” only a polynomial number of steps, and so this variation is not in unless .

Choosing the Correct Amount of Padding

The1 proof of Ladner’s theorem constructs a problem (henceforth referred to as ) by padding SAT by just the right amount: not enough to put the language in P (so less than ), but enough that it isn’t NP-complete (so more than ). And it uses a clever self-referential trick to accomplish this.

Instances of are of the form and (like POLYSAT) the goal is to decide whether the embedded SAT expression is satisfiable. In other words, given a string the algorithm has to return if:

  • is of the form
  • is a valid SAT problem
  • is satisfiable

Otherwise the algorithm has to return .

is defined so that iff the Turing Machine solves in polynomial time then for sufficiently large , otherwise is an monotonically increasing non-constant function. The exact definition is listed later.

We will show that both “ is in P” and “ is NP-complete” imply . This means must be NP-intermediate if .

(1) is not in P

If is in P then is , since for large enough it computes the fixed index of the Turing Machine that solves in polynomial time. But then is just SAT with a polynomial amount of padding so if is in P so is SAT. This leads to .

(2) is not NP-complete

If is complete then there is (by definition) a polynomial reduction, , from SAT instances to instances of . Let’s say runs in steps where is the size of the SAT instance being reduced. And maps a SAT instance, , to the string where is also a SAT instance. We will show that there is a constant such that if then . This means we could build a polynomial time SAT solver that repeatedly applies to reduce a SAT instance until it is of size , after which the problem can be brute forced in constant time. This again leads to .

can be shown as follows:

runs in so such that the reduction algorithm runs in at most steps where and are both constants. can write at most one character in each step of execution so the length of the “reduced” instance of can at most be characters longer than . I.e. .

We can manipulate the above to get .

is not (as shown in the “ is not in P” section) so such that . Let . Then . This implies by contradiction: and implies but we also have .

Finally, we set to to get the result we set out to prove.

Defining H

was defined as “iff the Turing Machine solves in polynomial time then for sufficiently large , otherwise is an monotonically increasing non-constant function of ”. As written, this does not work: needs to be in NP but this definition of does not even look decidable!

To get around this, we approximate the above definition so that it is computable. Specifically, finds the Turing Machine with index that computes correctly in steps for all with . If there is no such Turing Machine then is .

Concretely, (of type ) is computed as follows:

  1. For :
  2.       For and :
  3.             Run 2 for steps with input
  4.             If halts and its result matches 3
  5.                 Return
  6. Return

This algorithm takes a polynomial in number of steps: step 3 is run times and each execution takes a polynomial function4 of steps. Furthermore, is since for .

To prove , first apply twice on both expressions to get as and as . and is which is for and .

Since is polynomial in , is in NP: given a string of length a non-deterministic Turing Machine will “guess” and:

  • Check if the first elements of the string form a satisfiable SAT expression
  • Check that the last elements of the string are repeated
  • where can be computed in
  1. There are at least two proofs of Ladner’s theorem. In this blog post I’m only discussing the proof in the Arora/Barak book. 

  2. is the Turing Machine. 

  3. is if is satisfiable, and if not. 

  4. This polynomial depends on how “efficiently” can be simulated.