# Understanding Ladner's Theorem

As you probably know, whether is a major unsolved problem in computer science.

Even if you believe , it is tempting to think that *NP* *P* *NP*-complete – that every problem in *NP* can either be solved in polynomial time or is expressive enough to encode *SAT*.

Ladner’s theorem states that this intuition isn’t true by showing the existence of *NP*-intermediate problems: problems that are in *NP* but are neither in *P* nor *NP*-complete (assuming ). The proof in the Arora/Barak book is quite interesting, and this is a short writeup on how I understood it intuitively. I have sacrificed rigor for intuition; the full proof can be found in the book and various other places.

The proof describes a language that is *NP*-intermediate by construction. The construction almost feels like cheating as you’ll see. But the proof is technically correct, which is the best kind of correct.

## Influencing Complexity by Padding

The proof utilizes a trick of padding problem instances to enable “faster” algorithms.

Consider a problem, *POLYSAT*, with instances of the form , where

*SAT*is the set of all SAT instances- means followed by s
- is the length of

Given a string , the goal is to return if is of the form and is satisfiable, and otherwise.

*POLYSAT* can be solved in “polynomial time”: we can naively check all possibilities in polynomial time because the size of the input is .

For contrast, we wouldn’t have gotten this exponential speedup had the instances been of the form where is some constant – the extra padding of size lets us “hide” only a polynomial number of steps, and so this variation is not in unless .

## Choosing the Correct Amount of Padding

The^{1} proof of Ladner’s theorem constructs a problem (henceforth referred to as ) by padding *SAT* by just the right amount: not enough to put the language in *P* (so less than ), but enough that it isn’t *NP*-complete (so more than ). And it uses a clever self-referential trick to accomplish this.

Instances of are of the form and (like *POLYSAT*) the goal is to decide whether the embedded *SAT* expression is satisfiable. In other words, given a string the algorithm has to return if:

- is of the form
- is a valid
*SAT*problem - is satisfiable

Otherwise the algorithm has to return .

is defined so that iff the Turing Machine solves in polynomial time then for sufficiently large , otherwise is an monotonically increasing non-constant function. The exact definition is listed later.

We will show that both “ is in *P*” and “ is *NP*-complete” imply . This means must be *NP*-intermediate if .

**(1) is not in P**

If is in *P* then is , since for large enough it computes the fixed index of the Turing Machine that solves in polynomial time. But then is just *SAT* with a polynomial amount of padding so if is in *P* so is *SAT*. This leads to .

**(2) is not NP-complete**

If is complete then there is (by definition) a polynomial reduction, , from *SAT* instances to instances of . Let’s say runs in steps where is the size of the *SAT* instance being reduced. And maps a *SAT* instance, , to the string where is also a *SAT* instance. We will show that there is a constant such that if then . This means we could build a polynomial time SAT solver that repeatedly applies to reduce a SAT instance until it is of size , after which the problem can be brute forced in constant time. This again leads to .

can be shown as follows:

runs in so such that the reduction algorithm runs in at most steps where and are both constants. can write at most one character in each step of execution so the length of the “reduced” instance of can at most be characters longer than . I.e. .

We can manipulate the above to get .

is not (as shown in the “ is not in *P*” section) so such that . Let . Then . This implies by contradiction: and implies but we also have .

Finally, we set to to get the result we set out to prove.

### Defining H

was defined as “iff the Turing Machine solves in polynomial time then for sufficiently large , otherwise is an monotonically increasing non-constant function of ”. As written, this does not work: needs to be in *NP* but this definition of does not even look decidable!

To get around this, we *approximate* the above definition so that it is computable. Specifically, finds the Turing Machine with index that computes correctly in steps for all with . If there is no such Turing Machine then is .

Concretely, (of type ) is computed as follows:

- For :
- For and :
- Run
^{2}for steps with input - If halts and its result matches
^{3} - Return
- Return

This algorithm takes a polynomial in number of steps: step 3 is run times and each execution takes a polynomial function^{4} of steps. Furthermore, is since for .

To prove , first apply twice on both expressions to get as and as . and is which is for and .

Since is polynomial in , is in *NP*: given a string of length a non-deterministic Turing Machine will “guess” and:

- Check if the first elements of the string form a satisfiable
*SAT*expression - Check that the last elements of the string are repeated
- where can be computed in