Advanced Complexity Theory
Markus Bl¨ aser aser & Bodo Manthey Universit¨at at des Saarlandes Draft—February 27, 2010 and forever
2
1
Com Complex plexit ityy of opti optimi miza zati tion on pro probblems
1.1
Optimi Optimizat zation ion problem roblemss
The study of the complexity of solving optimization problems is an important practical aspect of complexity theory. A good textbook on this topic is the one by Ausiello et al. [ACG + 99]. The book by Vaziran Vaziranii [Vaz01] [Vaz01] is also recommend, but its focus is on the algorithms side. Definition 1.1. An optimization problem P is a 4-tuple (IP , SP , mP , goalP )
where 1. IP
⊆ {0, 1}∗ is the set of valid instances of P , P ,
2. SP is a function that assigns to each valid instance x the set of feasible solutions SP (x) of x, which is a subset of 0, 1 ∗ .1
{ }
3. mP : (x, y) x IP and y SP (x) objective function function N+ is the objective or measure function. function. mP (x, y ) is the objective value of the feasible solution y (with respect to x).
{
| ∈
∈{
∈
}→
}
4. goalP min, min, max spe specifies cifies the type type of the optimiz optimizatio ation n probl problem. em. Either it is a minimization or a maximization problem. When When the context context is clear, clear, we will drop the subscript subscript P . P . Formally ormally,, an optimization problem is defined over the alphabet 0, 1 . But as usual usual,, when we talk about concrete problems, we want to talk about graphs, nodes, weights, etc. In this case, we tacitly assume that we can always find suitable encodings of the objects we talk about. Given an instance x of the optimization problem P , P , we denote by S ∗P (x) the set of all optimal solutions, that is, the set of all y SP (x) such that
{ }
∈ mP (x, y ) = goal{mP (x, z ) | z ∈ SP (x)}.
(Note that the set of optimal solutions could be empty, since the maximum need not exist. The minimum minimum always always exists, since we m P only attains val+ ues in N . In the follo followin wingg we will assume assume that there are alway alwayss optima optimall 1
Some authors also assume that for all x ∈ IP , SP (x) = ∅. In this case the class NPO defined in the next section would be equal to the class exp-APX (defined somewhere else).
3
4
1. Complexity of optimization problems
solutions provided that SP (x) = .) The objective value of any y S∗ (x) is denoted by OPTP (x).2 Given an optimization problem P , P , there are (at least) three things one could do given a valid instance x:
∅
1. compute compute an optimal solution solution y
∈
∈ S∗(x) (construction problem ).).
2. com comput putee OPT(x OPT(x) (evaluation problem )
≥
3. given given an additional additional bound B , decide whether OPT(x OPT( x) B (if goal = max) or whether OPT(x OPT(x) B (if goal = min) (decision (decision problem ). ).
≤
The first task seems seems to be b e the most natural one. Its precise precise formalization formalization is however a subtle task. One could compute the function F : I ( 0, 1 ∗ ) mapping mapping each x to its set of optimal solutions S ∗ (x). Howeve Howeverr S∗ (x) could be very large (or even infinite). Moreover, one is almost always content with only one optimal solution. A cut of F is any function f : I Σ∗ that maps every x to some y S∗ (x). We say that we solve solve the construction construction problem problem associated with P if there is a cut of F that we can compute efficiently. 3 It turns out to be very useful to call such a cut again P and assume that positive statements containing P are implicitly -quantified and negative statements are -quantified -quantified.. (Do not worry worry too to o much now, everything everything will become clear.) The second second task is easy to model. model. We want want to compute compute the function function x OPT(x OPT(x). We denote this function by P eval eval. The third task can be modelled as a decision problem. Let
→ P { }
→
∈
∃
∀
→
P dec dec =
{
| OPT(x OPT(x) ≥ B } | OPT(x OPT(x) ≤ B }
x, bin(B bin(B ) x, bin(B bin(B )
{
if goal = max if goal = min
Our task is now to decide membership in P dec dec .
1.2
PO
and
NPO
We now define optimization analogs of P and NP. al l optimization problems problems P = (I, (I, S, m, goal) Definition 1.2. NPO is the class of all such that 2
∗
The name mP (x) would be more consequent, but OPT P (X ) is so intuitive and convenient. 3 Note that not every cut can be computable, even for very simple optimization problems like computing computing minimum spanning spanning trees and even on very simple instances. instances. Consider Consider a complete complete graph K n , all edges with weight one. Then every spanning tree is optimal. But a cut that maps K n to a line if the nth Turing machine halts on the empty word and to a star otherwise is certainly not computable.
1.3. Example: Example: TSP
5
∈
1. I P, i.e., we can decide in polynomial time whether a given x is a valid instance, 2. ther there is a polynom olynomial ial p such that for all x I and y p( x ), and for all y with y p( x ), we can decide y polynomial in x ,
||
||
∈
| |≤ | |
S(x), |y | ≤ ∈ S(x ∈ S(x S(x) in time
3. m is computable in polynomial time. Definition Definition 1.3. PO is the class of all optimization problems P
∈ NPO
such that the construction problem P is deterministically polynomial time computab omputable. le. (Re (Recall that this means means that there there is a cut that is polynom olynomial ial time computable). We will see the relation of PO and NPO to P and NP in Section 1.5. Even though it is not explicit in Definition 1.2, NPO is a nondeterministic complexity class. Theorem 1.4. For each P
∈ NPO, P dec dec ∈ NP.
Proof. Let p be the polynomial in the definition of NPO. The following nondeterministic Turing machine M decides P dec dec in polynomial time:
∈ I and bound B 1. M guesses a string y with |y | ≤ p(|x|). 2. M deterministically tests whether y ∈ S(x S(x).
Input: instance x
If not, M rejects.
≤
3. If y is, then then M computes computes m(x, m(x, y) and tests whether m(x, m( x, y ) (minimization problem) or m(x, m(x, y ) B (maximization problem).
≥
B
4. If the test is positive positive,, then M accepts, otherwise, M rejects. It is easy to see that M indeed decides P dec dec and that its running time is polynomial.
1.3 1.3
Exam Exampl ple e:
TSP
Problem Problem 1.5 (TSP, ∆-TSP). The Traveling Salesperson Problem ( TSP) is
defined as follows: Given a complete loopless (undirected) graph G = (V, E ) N+ assigning each edge a positive weight, and a weight function w : E find a Hamilto Hamiltonia nian n tour of minimum minimum weight. weight. If in addition addition w fulfills the triangle triangle inequality inequality,, i.e.,
→
{ } ≤ w({u, x}) + w({x, v})
w( u, v )
for all nodes u,x,v, u,x,v,
then we speak of the Metric Traveling Salesperson Problem (∆- TSP).
6
1. Complexity of optimization problems
In the example of the Traveling Salesperson Problem TSP, we have the following:
• The set of all valid instances is the set (of suitable encodings) of all
edge-weighted complete loopless graphs G. In the special case ∆- TSP, the edge weights should also fulfill the triangle inequality (which can be easily checked).
• Given Given an instance instance x, a feasible solution is any Hamiltonian tour of G,
i.e., a permutation of the vertices of G. (Note that for TSP, the set of feasible solutions only depends on the number of nodes of G.)
• The objective value of a solution y of an instance x is the sum of the edges used in the tour specified by y . (This can be interpreted as the length of the tour.)
• Finally, TSP and ∆-TSP are minimization problems. It is easy to verify, that TSP ∈ NPO. Howeve Howeverr it is very unlikely unlikely that it
is in PO. Even finding a very rough approximate solution seems to be very hard.
Exercise 1.1. Assume that there is a polynomial time algorithm that given
an instance x of TSP, returns a Hamiltonian tour whose weight is at most 2p(n) OPT for some polynomial p, where n is the number of nodes of the given given graph. graph. Then Then P = NP. (Hint: (Hint: Show Show that under this this assumptio assumption, n, one can decide whether a graph has a Hamiltonian circuit.)
·
1.4
Constr Construct uction ion,, eval evaluat uation ion,, and and decisi decision on
Let us investiga investigate te the relation between between the construction construction,, evaluati evaluation, on, and decision problem associated with a problem P NPO.
∈
∈ NPO. Then T T 1. P dec dec ≤P P eval eval and P eval eval ≤P P dec dec . T P . (Since this is a negative statement about P , 2. P eval P , it means that eval ≤P P . T P eval P .) eval ≤P P holds for all cuts P .) T Proof. We start with the first statement: P dec dec ≤P P eval eval is seen easily: On input x, bin(B bin(B ), we can compute OPT(x OPT( x) using the oracle P eval eval , and compare it with B . T P eval trickier: Since m is polynomial polynomial time computable, eval ≤P P dec dec is a little trickier: | | q ( x ) OPT(x OPT(x) ≤ 2 for some polynomial q . Using Using binary binary search, search, we can find
Let P Theorem 1.6. Let P
OPT(x OPT(x) with q (n) oracle queries.
1.5. NP-hard optimization problems
7
For the second statement, note that when we have an optimum solution, then we can compute OPT(x OPT(x). If P If P dec -complete, then the optimization optimization problem is not harder than dec is NP-complete, the decision decision problem. problem. Theorem 1.7. Let P
P dec dec .
-complete.. ∈ NPO such that P dec dec is NP-complete
Then P
≤PT
Proof. Assume that P is a max maximi imizati zation on proble problem, m, the minimi minimizat zation ion case is symmetric. Let q be a polynomial such for every x I and y S(x S(x), | | q ( x ) y q ( x ) and m(x, m(x, y) is bounded by 2 . For given x, fix some polynomial time computable total order on the set 0, 1 ≤q(|x|) . For y 0, 1 ≤q(|x|) , let λ(y) be the rank that y has with respect to this order. ˆ from P by defining a new objective function. We derive a new problem P ˆ is given by The objective function of P
∈
| |≤ | | { }
∈
∈{ }
m(x, m( ˆ x, y) = 2q(n)+1 m(x, m(x, y ) + λ(y ). Note that the first summand is always bigger than 2 q(n)+1 > λ(y ). This This implies that for all x and y1 , y2 S(x S(x), m(x, m( ˆ x, y1 ) = m(x, m( ˆ x, y2 ). Furthermore, urthermore, if m(x, m( ˆ x, y1 ) m(x, m( ˆ x, y2) then m(x, m(x, y1 ) m(x, m(x, y2 ). Thus Thus if y Sˆ∗ (x) then y S∗ (x), too. too. (Her (Heree Sˆ∗ (x) is the set of optimum solutions of x as an ˆ .) instance instance of P .) P ˆ x): We An optimal solution y of Sˆ∗ (x) can be easily derived from OPT(x OPT( ˆ x) with 2 q(n)+1. This remainder compute the remainder of the division OPT(x OPT( T P ˆ T P ˆeval . is λ(y) from which we can obtain y. Thus P P P eval T ˆ . Since P ˆeval ˆdec By Theorem 1.6, P NP and P dec eval P P dec dec dec dec is NP-complete T ˆ by assumption, P dec dec P P dec dec . Using transitivity, we get P P P dec dec .
∈
≥
∈
≤
1.5
NP-hard
≥
≤
≤ ∈
∈
≤
≤
optimization problems
Definition 1.8. An optimization problem P is NP-hard if for all L
L
≤
T P
P . P .
Theorem 1.9. If P is NP-hard and P
∈ NP,
∈ PO, then P = NP.
Exercise 1.2. Prove Theorem 1.9.
∈ NPO. If P dec dec is NP-hard, then P is NP-hard. Proof. Since P dec dec is NP-hard, L ≤P P dec dec for all L ∈ NP. Since many-one T is transitive, we reducibility is a special case of Turing reducibility and ≤P T P . get L ≤P P .
Let P Theorem 1.10. Let P
Some authors prefer to call an optimization problem NP-hard if P dec dec is NP-hard. -hard. Theore Theorem m 1.1 1.100 states states that this definitio definition n is potential potentially ly more restrictive than our definition.
8
1. Complexity of optimization problems
Corollary 1.11. If P = NP, then PO = NPO
Proof. There is a problem P in NPO such that P dec dec is NP-hard, for instance ∆-TSP. If P If P would belong to PO, then also P dec P by Theorem 1.7, dec a contradiction.
∈
2
App Approxi roxima mati tion on algo algori rith thms ms and and approximation classes
In the most general sense, an approximation algorithm is an algorithm that given a valid instance x is able to compute some feasible solution. Definition Definition 2.1. A deterministic Turing machine A is an approximation
algorithm for an optimization problem P = (I, (I, S, m, goal) if 1. the running running time A is polynomial, 2. A(x)
∈ S(x S(x) for all x ∈ I.
Of course, there are good and not so good approximation algorithms and we develop a framework to measure the quality or approximation performance of such an algorithm.
∈
1. Let P be an optimization problem, x I, and y S(x S(x). The performance performance ratio of y with respect to x is defined as
Definition 2.2.
PR(x, PR(x, y ) = max
→
m(x, m(x, y ) OPT(x OPT(x) , OPT(x OPT(x) m(x, m(x, y)
.
∈
1
2. Let α et α : N algorithm A is an α-approximation Q. An approximation algorithm A algorithm, algorithm, if for all x I,
∈
PR(x, PR(x, A(x))
≤ α(|x|).
The definition of PR(x, PR(x, y) basically means that in the case of a minimization problem, we measure how many times the objective value of the computed computed solution exceeds exceeds the objective objective value of an optimum optimum solution. In the case of a maximization problem, we do the same but we take the reciprocal. This may seem strange at a first glance but it has the advantage that we can treat minimization and maximization problems in a uniform way. (Be aware though that some authors use m( x, y )/ OPT(x OPT(x) to measure the approximati approximation on performance in case of maximization maximization problems. But this is merely a question of faith.)
→
1. Let F et F be some set of functions N Q. An optimization problem P NPO is contained in the class F F -APX if there is an f F such that there exists an f -approximation f -approximation algorithm for P . P .
Definition 2.3.
∈
1
∈
Note that m only attains positive values. Thus, the quotient is always defined.
9
10
2. Approximation algorithms and approximation classes 2. APX := O(1)(1)-APX.
(I hope that the elegant definition above clarifies why PR was defined for maximization maximization problems as it is.) There is a well-kno well-known wn 2-approximati 2-approximation on algorithm for ∆-TSP that is based on minimum spanning trees, thus ∆-TSP
∈ APX.
Even stronger is the concept of a polynomial time approximation scheme. scheme. Definition 2.4. A deterministic Turing machine A is a polynomial time ap-
proximation scheme (PTAS) (PTAS) for an optimization problem P problem P = (I,S,m, goal) if on input x, for all small enough > 0,
1. the running running time time of A of A is polynomial polynomial in the size of x of x (but not necessarily in ), and 2. A(x, ) is a feasible solution for x with performance ratio 1 + . We do not have to distinguish between minimization and maximization probl problem ems. s. If a solu soluti tion on y has performan performance ce ratio ratio 1 + in the case of a 1 maximization problem, then we know that m(x, m( x, y ) 1+ OPT(x). We have 1+ OPT(x
≥
1 =1 1+
− 1 + ≥ 1 − ,
which is exactly what we want. Definition 2.5. PTAS is the class of all problems in NPO that have a PTAS.
We have PO
⊆ PTAS ⊆ APX
If P = NP, then both inclusions are strict. Under this assumption, a problem in APX PTAS is Maximum Satisfiabili Satisfiability ty (see the next chapters chapters for a proof ), a problem in PTAS PO is Knapsack (solve the next exercise for a proof).
\
\
Problem 2.6. Knapsack is the following problem:
Instan Instancces: Solutions: Measure:
rational ational numb numbers w1 , . . . , wn (weights), p1 , . . . , pn (profits), and B (capacity bound) such that wν B for all ν . I 1, . . . , n such that wi B
⊆{
}
∈
i I
≤
≤
pi , the total profit of the items packed
i I
Goal:
∈
max
We may assume w.l.o.g. that all the pν are natural numbers numbers.. If this is not the case, assume that pν = xν /yν with gcd(x gcd(xν , yν ) = 1. 1. Let Let Y = y1 yn . We now replace pν by pν Y N. Any Any knapsac knapsack k that maximizes maximizes the old objective objective function also maximizes maximizes the new one. The size of the instance is only polynomially larger. (Note that we encode all inputs in binary.)
· ∈
···
2.1. Gap problems
11
1. Show that there there is an algorithm for Knapsack with running time polynomial in n and P := max1≤ν ≤n pν . (Com (Comput putee by dynamic programming sets of indices I (i, p) such that
Exercise 2.1.
• ν ≤ i for all ν ∈ I (i, p), • the sum of the pν with ν ∈ I (i, p) is exactly p, and • the sum of all wν with ν ∈ I (i, p) is minimum among all such set of indices.)
2. Show Show that we get a PTAS PTAS out of this pseudop pseudopoly olynom nomial ial algorithm algorithm as follows:
• Let S Let S = P/n and pˆν = pν /S for 1 ≤ ν ≤ n. • Find an optimum solution for the instance instance w1 , . . . , wn , pˆ1 , . . . , pˆn , and B .
Remark 2.7. The running of the PTAS constructed in the previous exercise
is also polynomial in 1 . This is called an fully polynomial time approximaapproximation scheme scheme (FPTAS) (FPTAS).. The corresp orrespond onding ing complex omplexity ity class class is denote denoted d by FPTAS. Exercise Exercise 2.2. A super fully polynomial time approximation scheme is a
PTAS whose running time is polynomial in log 1 . Show Show that that if Knapsack has a super fully polynomial time approximation scheme, then P = NP.
2.1 2.1
Gap Gap probl roblem emss
A promise problem is a tuple of languages Q = (L, U ) with L U . U . (Think of U as the univers universee of admiss admissibl iblee inputs inputs.) .) A Turing uring mac machin hinee decide decidess a promise problem, if for all x U , U , M ( M (x) = 1 if if x L and M ( M (x) = 0 if if x U L. On inpu inputs ts not not in U , U , M may output output whatever whatever it wants. wants. Since we do not have to care about the behaviour of M on inputs not in U , U , we can also think that we get an input with the additional promise that it is in U . U . The The eleme element ntss in L are often called yes-inst yes -instances, ances, the elements elements in U L are called no-instances, no -instances, and the elements not in U are called don’t care-inst care -instances ances.. ”Ordinary” ”Ordinary” decision problems problems are a special case of promise ∗ problems, we just set U = 0, 1 . Many-one-reductions can be extended to promise problems in a natural way way. Let Q = (L, U ) and Q = (L , U ) be two promise problems. Q is polynomial time many-one reducible to Q if there is a polynomial time computable function f such that
⊆
∈
∈ \
∈
\
{ }
∈ L =⇒ f ( f (x) ∈ L and x ∈ U \ L =⇒ f ( f (x) ∈ U \ L x
12
2. Approximation algorithms and approximation classes
That means that yes-instances are mapped to yes-instances and no-instances are mapped to no-instances. A promise problem Q is C-hard for some class C of decision or promise problems, if every problem in C is polynom p olynomial ial time many-one reducible to Q. Let P = (IP , SP , mP , goal) be an optimization optimization problem and Definition 2.8. Let P a < b. b . gap(a, gap(a, b)-P is the promise problem (L, U ) where
{ |
≤ a or OPT(x OPT(x) ≥ b}
{ |
≥ b} ≤ a}
U = x OPT(x OPT(x) and L=
x OPT(x OPT(x) x OPT(x OPT(x)
{ |
if goal = max if goal = min
That is, we get an instance x and the promise that the objective value is at most a or at least b and we shall decide which of these two options is the case. case. There There is a differe difference nce in the definit definition ion of L for maximization and minimization problems because the yes-instances shall be the inputs with solutions solutions that have a ”good” objective objective value. We will also allow a and b two N that depend on x . be functions N
→
||
gap(a, b)-P is NP-hard for polynomial time computable Theorem Theorem 2.9. If gap(a, functions a and b with input given in unary and output given in binary, then there is no α-approximati -approximation on algorithm algorithm for P with α < b/a, b/a , unless unless P = NP. Proof. Suppose on the contrary that such an algorithm A exists exists.. We only show the case goal P = min, the other case is treated similarly. similarly. Since gap(a, gap(a, b)-P )-P is NP-hard, there is a polynomial time many-one reduction f from SAT to gap(a, gap(a, b)-P )-P .. We design a polynomial time algorithm for SAT as follows: Input: formula φ in CNF
1. Compute x = f ( f (φ) and y = A(x).
||
2. If mP (x, y) < b( b ( x ), then accept, else reject.
∈
Let us see why why this this algorithm algorithm is correct correct.. If φ SAT, then OPTP (x) a(x) and mP (x, y) α( x ) OPTP (x) < b( b ( x ).
≤
≤ ||· || If φ ∈ / SAT, then OPTP (x) ≥ b(|x|) and mP (x, y) ≥ OPTP (x) ≥ b(|x|).
Thus the algorithm works correctly. It is obviously polynomial time. Therefore, P = NP.
2.2. Approximation preserving reductions and hardness
13
In Exercise 1.1, we have seen that there is no polynomial p such that TSP can be approximated within 2 p(n) where n is the number of nodes of the given given graph. Howeve However, r, note that n is not the size of the instance but 1 2 O(p O(p(n)n ). Thus gap(n, gap(n, 2n )-TSP is NP-hard Since TSP NPO, we get the following result. −
∈
Theorem 2.10. If P = NP, then APX NPO.
We can always approximate TSP within 2O(|x|) where x is the given instance, since with x symbols we can encode integers up to 2 O(|x|) . Thus Thus TSP is contained in the class exp-APX, as defined below.
||
Definition 2.11. exp-APX = 2p p is a polynomial -APX.
{ |
}
Thus the theorem above can be strengthened to the following statement.
Theorem 2.12. If P = NP, then APX exp-APX. Exercise 2.3. What is the difference between exp-APX and NPO?
2.2
Appro Approxima ximation tion preser preserving ving reductions reductions and hardnes hardnesss
Let P and P be two two optimi optimizat zation ion problems problems.. If P is reducible to P (in some sense to be defined), then we would like to turn approximate solution of P back into approximate approximate solutions of P . P . That is, we do not only need a function that maps instances of P of P to instances of P of P , we also need to transfer solutions of P back to solutions of P like we did for # P functions. Many of the reductions between NP-complete problems give also this second function for free. But what they usually do not do is that they preserve approximation factors. Problem 2.13. Maximum Clique ( Clique) is the following problem:
Inst Instan ancces: es: Solu Soluti tion ons: s: Measure: Goal:
grap graph h G = (V, E ) all cliq clique uess of G, i.e., all C with u = v , u, v E #C , the size of the clique max
{ }∈
⊆ V such that for all u, v ∈ C
Problem 2.14. Vertex Cover ( VC) is the following problem:
Inst Instan ancces: es: Solu Soluti tion ons: s: Measure: Goal:
grap graph h G = (V, E ) all subs subset etss C of V of V such that for each u, v
∅
{ } ∈ E , C ∩ {u, v} =
#C min
Exercise Exercise 2.4. There is an easy reduction Cliquedec
maps G to its complement.
≤P VCdec that simply
14
2. Approximation algorithms and approximation classes 1. How does does one get a clique clique of G of G from a vertex cover of the complement? 2. Assume we have a vertex cover that is a 2-approximatio -approximation. n. What approximation do we get for Clique from this?
Let P, P Definition 2.15. Let P,
∈ NPO. P is reducible to P by an approximation preserving reduction (short: P is AP-reducible to P or even shorter, P ≤AP P ) if there are two functions f, g : {0, 1}∗ × Q+ → {0, 1}∗ and an α ≥ 1 such that 1. for all x ∈ IP and β > 1, f ( f (x, β ) ∈ IP , 2. for all x ∈ IP and β > 1, if SP (x) = ∅ then SP (f ( f (x, β )) = ∅, 3. for all x ∈ IP , y ∈ SP (f ( f (x, β )), )), and β > 1, g(x,y,β) x,y,β ) ∈ SP (x),
4. f and g are are deterministic deterministically ally polynomi polynomial al time computabl computablee for fixed fixed β > 1, 5. for all x all x IP and all y all y SP (f ( f (x, β )), )), if y if y is a β a β -approximate solution of f ( f (x, β ), then g(x,y,β) x,y,β ) is an (1 + α(β 1))-approximate 1))-approximate solution of x.
∈
∈
−
(f , g , α) α) is called an AP-reduction from P to P .2 Lemma 2.16. If P
≤AP P and P ∈ APX, then P ∈ APX.
Proof. Let (f (f , g , α) α) be an AP-reduction from P to P and let A be a β approximati approximation on algorithm algorithm for P . Given x IP , A(x) := g(x, A (f ( f (x, β )), )), β ) is a (1+ α(β 1))-approximate solution for x. This follows directly from the definition of AP-reduction. Furthermore, A is polynomial polynomial time computable.
−
Let P Exercise 2.5. Let P
∈
P . ≤AP P . Show that if P ∈ PTAS, so is P .
The reduction in Exercise 2.4 is not an AP-reduction. This has a deeper reason. While there is a 2-approximation algorithm for VC, Clique is much harder harde r to approximate. approxim ate. H˚ astad [H˚ [H˚ as99] shows that any approximation approxi mation algo− 1 rithm with performance ratio n 0 for some 0 > 0 would imply ZPP = NP (which is almost as unlikely as P = NP). 2
The functions f, g depend on the quality β of the solution y . I am only aware of one example where this dependence seems to be necessary, so usually, f and g will not depend on β .
2.2. Approximation preserving reductions and hardness
15
Problem 2.17. Maximum Independent Set ( IS) is the following problem:
Inst Instan ancces: es: Solut Solution ions: s: Measure: Goal:
grap graph h G = (V, E ) indep indepen ende dent nt sets sets of G of G, i.e., all S S with u = v , u, v / E #S max
{ }∈
⊆ V such that for all u, v ∈
Exercise 2.6. Essentially the same idea as in Exercise 2.4 gives a reduction
from Clique to IS. Show that this is an AP-reduction. Definition Definition 2.18. 2.18. Let C
reductions) if for all P and C-hard.
proble oblem m P is C-hard (under (under AP⊆ NPO. A pr ∈ C, P ≤AP P . P . P is C-complete if it is in C
≤AP is transitive. Proof. Let P ≤AP P and P ≤AP P . Let (f (f , g , α) α) and (f (f , g , α ) be the corresponding corresponding reductions. reductions. Let γ = 1 + α (β − 1) We claim that (F,G,αα (F,G,αα )
Lemma 2.19.
is an AP-reduction from P to P where
F ( F (x, β ) = f (f ( f (x, γ ), β ), G(x,y,β) x,y,β) = g(x, g (f ( f (x, γ ), y , γ ) , β ). We verify that (F,G,αα ( F,G,αα ) is indeed an AP-reduction by checking the five conditions in Definition 2.15: 1. Obvious. Obvious. 2. Obvious, Obvious, too.
∈
∈ ∈
3. Almost Almost obvious, obvious, thus thus we we give give a proof. Let x IP and y SP (F ( F (x, β )). We know that g (f ( f (x, γ ), y , β) β) SP (f ( f (x, γ )), ) ), since (f (f , g , α ) is an AP-red AP-reduct uction ion.. But then also also g(x, g (f ( f (x, γ ), y , γ ) , β ) SP (x), since (f , g , α) α) is an AP-reduction.
∈
4. Obvious. Obvious. 5. Finally Finally, if y if y is a β -approximation to f (f ( f (x, γ ), β ), then g (f ( f (x, γ ), y , β) β) is a (1+α (1+α (β 1))-approximation to f ( f (x). But then g(x, g (f ( f (x, γ ), y , β) β ), γ ) is a (1 + αα (β 1))-approximation to x, as
−
−
1 + α(1 + α (β
Lemma 2.20. Let C C-hard.
1). − 1) − 1) = 1 + αα(β − 1).
⊆ NPO. If P ≤AP P and P is C-hard, then P is also
16
2. Approximation algorithms and approximation classes
∈ C be arbitrary. Since P is C-hard, Q ≤AP P . P . Since ≤AP ≤AP P .
Proof. Let Q is transitive, Q
Thus once we have identified one APX-hard problem, we can prove the -hardness using AP-reductions. AP-reductions. A canonical candidate is of course the APX-hardness following problem: Problem 2.21 (Max-SAT). The Maximum Satisfiability problem ( Max-SAT)
is defined as follows: Inst Instan ancces: es: Solutio Solutions: ns: Meas Measur ure: e: Goal:
form formul ulas as in CNF CNF Boole Boolean an assign assignmen ments ts to the variab variables les the the numb number er of claus clauses es satis satisfie fied d max
Proposition 2.22. Max-SAT is APX-hard.
The proof of this proposition above is very deep, we will spend the next few weeks with it. Exercise 2.7. Give a simple 2-approximation algorithm for SAT.
2.3 2.3
Furth urther er exer exerci cise sess
Here in an NPO-complete problem. Problem 2.23. Maximum Weighted Satisfiability is the following prob-
lem: Instan Instancces: Solutio Solutions: ns: Measure: Goal:
Boole Boolean an formul formula a φ with variables x1 , . . . , xn having nonnegative weights w1 , . . . , wn Boole Boolean an assign assignmen ments ts α : x1, . . . , xn 0, 1 that satisfy φ n max 1, i=0 wi α(xi ) max
{
}
{
}→{ }
1. Show that every maximization maximization problem problem in NPO is APreducible ducible to Maximum Maximum Weighte Weighted d Satisfi Satisfiabil ability. ity. (Hint: (Hint: Constr Construct uct an -machine that guesses guesses a solution y solution y to input x input x and computes m(x, m(x, y). NP-machine Use a variant of the proof of the Cook-Karp-Levin Theorem to produce an appropriate formula in CNF. Assign only nonzero weights to variables that contain the bits of m(x, m(x, y).)
Exercise 2.8.
2. Show that every minimizatio minimization n problem problem in NPO is AP-reducible to Minimum Weighted Satisfiability. 3. Show that Maximum Weighted Weighted Satisfiability Satisfiability is AP-re AP-reducible to Minimum Weighted Satisfiability and vice versa. 4. Conclude Conclude that Maximum (Minimum) (Minimum) Weighted Weighted Satisfiability Satisfiability is NPOcomplete
2.3. Further exercises
17
The world of optimization classes
⊆ PTAS ⊆ APX ⊆ exp-APX ⊆ NPO All of these inclusion are strict, provided that P = NP. PO
Under Under this this
assumption, assumption, we have have for instance instance
• Knapsack ∈ PTAS \ PO • TSP ∈ exp-APX \ APX • Weighted Satisfiability Satisfiability ∈ NPO \ exp-APX. The The goal goal of the the next next chapt hapter erss is to pro prove that that Max-SAT is in APX PTAS provided that P = NP.
\
3
Prob Probab abil ilis isti tica call llyy chec check kable able proofs roofs and inapproximability
3.1
Probab Probabili ilisti stical cally ly check checkabl able e proofs proofs (PCP (PCPs) s)
3.1.1 3.1.1
Probab Probabilis ilistic tic verifier verifierss
A polynomial time probabilistic verifier is a polynomial time probabilistic Turing machine that has oracle access to a proof π 0, 1 ∗ in the following way: way: The The proof π induces a function 0, 1 log(|π|) 0, 1 by mapping | | log( π ) b 0, 1 to the bit of π of π that stands in the position that is encoded by the binary representation b. By abuse of notation, we will call this function again π . If the verifier queries a bit outside the range of π, then the answer will be 0. A verifier described above may query π several times and each query may ma y depend depend on previo previous us queries. queries. Such Such a behavio behaviorr is called adaptive. adaptive. We need a more restricted kind of verifiers, called nonadaptive: nonadaptive : A nonadaptive verifier gets the proof π again as an oracle, but in a slightly different form: The verifier can write down several positions of π at one time. time. If it enters enters the query state, it gets the values values of all the positions that it queries. But the verifier may enter the query state only once, i.e., the verifier has to decide in advance which bits it wants to query. A nonadaptive probabilistic verifier is called (r ( r(n), q (n))-restricted if it uses r (n)-bits of randomness and queries q (n) bits of π for all n and all inputs x of of length n.
{ }
∈{ }
∈{ } →{ }
→
languag guagee L belongs elongs to the class class Definition Definition 3.1. Let r, q : N N. A lan PCP[r, q ] if there exists a (r, q )-restricted nonadaptive polynomial time probabilistic verifier such that the following holds: 1. For any x any x
∈ L, there is a proof π such that Pr[V π (x, y) = 1] = 1. 1. y
∈
2. For any x any x / L and for all proofs π , Pr[V π (x, y ) = 0] y
≥ 1/2.
The probabilities are taken over the the random strings y . 18
3.1. Probabilistically checkable pro ofs (PCPs)
19
In other words, if x is in L, then there is a proof π that convinces the verifier regardless of the random string y . If x is not in L, then the verifier will detect a “wrong” proof with probability at least 1/ 1 /2, that is, for half of the random strings. Since the verifier is r(n)-restricted, there are only 2 r(n) (relevant) (relevant) random strings. For any fixed random string, the verifiers queries at most q (n) bits of the proof. Therefore, for an input x of length n, we only have to consider proofs of length q(n)2r(n) , since the verifier cannot query more bits than that.
3.1.2 3.1.2
A differen differentt charac character teriza izatio tion n of NP
Once we have defined the PCP classes, classes, the obvious obvious question question is: What is this good for and how is it related to other classes? While complexit complexity y theorists also like to answer the second part of the question without knowing an answer to the first part, here the answer to the second part also gives the answer to the first part. Let R and Q denote sets of functions N N. We generalize the notion of PCP[r, q] in the obvious way:
→
PCP[R, Q] =
PCP[r, q ].
r R,q Q
∈ ∈
The characterization of NP by polynom p olynomial ial time verifiers immediately immediately yields yields the following following result. result. [0, poly(n poly(n)]. )]. Proposition 3.2. NP = PCP[0, In the theore theorem m above, above, we do not use the randomne randomness ss at all. all. The next next result, the celebrated PCP theorem [ ?], shows that allowing a little bit of randomness reduces the number of queries dramatically. O(1)]. Theorem 3.3 (PCP Theorem). NP = PCP[O(log n), O(1)]. What What does does this this me mean? an? By allo allowi wing ng a litt little le rand random omne ness ss—n —not otee that that O(log n) are barely sufficient to choose O(1) bits of the proof at random— and a bounded bounded probabili probability ty of failure failure,, we can chec check k the proof π by just just reading a constant number of bits of π ! This is really astonishing. Exercise Exercise 3.1. Show that PCP[O(log n), O(1)]
random strings are there?)
⊆ NP.
(Hin (H int: t: How How man many y
The other direction is way more complicated, we will spend the next few lecture lecturess with with its proof. We will will not present present the original original proof by Arora et + al. [ALM 98] but a recent and—at least compared to the first one—elegant proof by Irit Dinur [Din07].
20
3. PCP and inapproximability
3.2 3.2
PCPs PCPs and and gap gap pro probl blem emss
The PCP theorem is usually used to prove hardness of approximation results. Dinur’s proof goes the other way around, we show that the statement of the PCP theorem is equivalent to the NP-hardness of some gap problem Theorem 3.4. The following two statements are equivalent:
1. NP = PCP[O(log n), O(1)]. O(1)]. 2. There is an > 0 such that gap(1
− , 1)1)-Max-3-SAT is NP-hard.1
⇒
Proof. “= ”: Let L be any NP-complete -complete language. language. By assumption assumption,, there is an (r ( r(n), q )-restricte )-restricted d nonadaptiv nonadaptivee polynomial polynomial time probabilisti probabilisticc verifier V with r (n) = O(log n) and q = O(1). O(1). We can assum assumee that V always queries exactly q bits. Let x be an input for L of length n. We will will constru construct ct a form formul ulaa in 3-CNF φ in polynomial time such that if x L, then φ is satisfiable and if x / L, then every assignment can satisfy at most a fraction of 1 of the clauses for some fixed > 0. For each position i in the proof, there will be one Boolean variable vi . If vi is set to 1, this will mean that the corresponding ith bit is 1; if it is set to zero, then this bit is 0. Since we can restrict ourselves to proofs of length q 2r(n) = poly(n poly(n), the number of these variables is polynomial. For a random string y, let i(y, 1), 1), . . . , i( i(y, q ) denote the positions of the bits bits that that the verifier verifier will query. query. (Note (Note that that the verifier verifier is nonadap nonadaptiv tive, e, hence these position can only depend on y .) Let Ay be the set of all q tuples (b (b1, . . . , bq ) 0, 1 q such that if the i(y, j )th bit of the proof is bj for 1 j q , then the verifier will reject (with random string y). For each tuple (b (b1 , . . . , bq ) Ay , we construct a clause of q literals, that is true iff the variables vi(y,1) y, 1) , . . . , vi(y,q) y,q ) do not take the value b1 , . . . , bq , i.e,
∈
∈
−
≤ ·
≤ ≤
b1 vi1(−y,1) y, 1)
∈{ }
∈
b . ∨ · · · ∨ vi1(−y,q) y,q ) q
(Here, (Here, for a Boolean Boolean varia variable ble v , v 1 = v and v 0 = v¯.)
The formula φ has Ay 2r(n) 2q+r(n) = poly(n poly(n) many clause clauses. s. These These clauses have length q. Like Like in the reduction reduction of SAT to 3SAT, for each such clause c, there are q 2 clauses c1 , . . . , cq−2 of length three in the variables of c and some additional variables such that any assignment that satisfies c can be extended to an assignment that satisfies c1 , . . . , cq−2 and conversely, the restriction of any assignment that satisfies c1 , . . . , cq−2 satisfies c, too. This replacement can be computed in polynomial time. The formula φ can be computed computed in polynomial polynomial time: We enumerate enumerate all (polynomially (polynomially many) random strings. For each such string y, we simulate 1
≤| | −
≤
Instead of stating the absolute bounds (1 − )m and m, where m is the number of clauses of the given instance, we just state the relative bounds 1 − and 1. This is very very convenient here, since there is an easy upper bound of the objective value, namely m.
3.2. PCPs and gap problems
21
the verifier V to find out which bits he will query. Then we can give him all the possible answers to the bits he queried to compute the sets Ay . If x L, then there will be a proof π such that V π (x, y) = 1 for every random string y . Theref Therefore, ore, if we set the variab variables les of φ as given by this proof π, then φ will be satisfied. If x / L, then for any proof π, there are at least 2 r(n) /2 random strings y for which V π (x, y ) = 0. For each each such such y , one clause corresponding to a tuple in Ay will not be satisfied. In other words, for any assignment, 2 r(n) /2 clause clausess will will not be satisfi satisfied. ed. The total total numbe numberr of clauses clauses is bounded bounded by q + r ( n ) (q 2)2 . The fraction of unsatisfied clauses therefore is
∈
∈
−
r(n)
≥ (q −2 2)2/q+2r(n) ≥ 2−q−1/(q − 2), 2),
which is a constant. “ =”: By Exercise 3.1, it suffices to show that NP PCP[O(log n), O(1)]. Let L NP. By assumption, there is a polynomial time computable function f such that
⇐ ∈
⊆
∈ L =⇒ f ( f (x) is a satisfiable formula in 3-CNF, x∈ / L =⇒ f ( f (x) is a formula in 3-CNF such that every assignment satisfies at most (1 − ) of the clauses. x
We construct construct a probabilistic probabilistic verifier verifier as follows: follows: Input: input x, proof π
1. Comput Computee f ( f (x). 2. Randomly Randomly select select a clause c from f ( f (x). 3. Interpret Interpret π as an assignment to f ( f (x) and read the bits that belong to the variables in c. 4. Accept Accept if the selected clause clause c is satisfied. Reject otherwise. Let m be the number of clauses of f ( f (x). To select select a clause clause at random random,, the verifier reads log m rando random m bits bits and inte interpr rpret etss it as a number umber.. If it “selec “selects” ts” a nonexis nonexistin tingg clause clause,, then then it will accept. accept. So we can think think of m being a power of two at the expense of replacing by /2. /2. Now assume x L. Then f ( f (x) is satisfiable and therefore, there is a proof that will make the verifier always accept, namely a satisfying assignment of f ( f (x). If x If x / L, then no assignment assignment will satisfy satisfy more than 1 of the clauses. In particular, the probability that the verifier selects a clause that is satisfied is at most 1 . By repeating this process for a constant number of times, we can bring the error probability down to 1/ 1 /2. Since f ( f (x) is in 3-CNF, the verifier needs O(log m) = O(log x ) random bits, and it only queries O(1) bits of the proof.
∈
∈
−
−
||
22
3. PCP and inapproximability
Let c be a clause of length q length q. Construct clauses c1 , . . . , cq−2 of Exercise 3.2. Let c length three in the variables of c and some additional variables such that any assignment that satisfies c can be extended to an assignment that satisfies c1 , . . . , cq−2 and conversely, the restriction of any assignment that satisfies c1 , . . . , cq−2 satisfies c, too.
Note that we get an explicit value for in terms of q . Thus Thus in order order to get good nonapproximability results from the PCP theorem, we want q to be as small as possible.
3.3 3.3
Furth urther er exer exerci cise sess
Exercise 3.3. Show that PCP[O(log n), 2] = P.
It can be shown—tadah!—that three queries are enough to capture NP; however, it is not possible to get error probability 1/ 1 /2 and one-sided error, see [GLST98] for further discussions.
A
Max-3-SAT
is
APX-hard
In this chapter, we will strengthen the result of the previous one by showing that Max-3-SAT is in fact APX-hard. -hard. We do this in severa severall steps. steps. First, First, we show that any maximization problem in APX is AP-reducible AP-reducible to Max-3-SAT. Second, we show that for every minimization problem P , P , there is a maxi mization problem P such that P AP P . This will conclude the proof. Our proof of the PCP-Theorem will also yield the following variant, which we will use in the following.
≤
There are are > 0 and polynomi polynomial al time Theorem Theorem A.1 (PCP-Theorem’). There computable functions f PCP PCP and gPCP such that for every formula ψ in 3CNF: 1. f PCP PCP (ψ ) is a formula in 3-CNF, 2. if ψ is satisfiable, so is f PCP PCP (ψ ), 3. if ψ if ψ is not satisfiable, then any assignment can satisfy at most a fraction of 1 of the clauses in f PCP PCP (ψ ),
−
4. if a is an assignment for f PCP PCP (ψ ) that satisfies more than a fraction of 1 of the clauses, then gPCP(ψ, a) is an assignment that satisfies ψ.
−
Theorem A.2. Let P = (IP , SP , mP , max) be a maximization problem in APX. Then P AP Max-3-SAT.
≤
Proof. Our goal is to construct an AP reduction (f (f , g , α) α) from P to Max-3-SAT. Let f PCP constructed in Theorem A.1 PCP and gPCP be the functions constructed and let be the corresponding corresponding constant. constant. Let A be a b-approximation algorithm for P . P . Let 1+ α = 2(b 2(b log b + b 1) . Our goal is to define the functions f and g given β . Let r = 1 + α(β 1). If r < b, b , then
−
−
β=
r
−1 +1 = α
−
r 1 2(1 + ) b log b + b
·
−1
+1 <
+1 2k (1 + )
where k = logr b . The last inequality follows from k
log b r log b b log b + b − 1 ≤ log +1 ≤ +1 ≤ . r r−1 r−1 23
(A.1)
24
A. Max-3-SAT is APX-hard
Let µ(x) = mP (x, A(x)). Since A is a b-approximation algorithm, µ(x) OPTP (x) bµ( bµ(x). The following Turing machine computes f : f :
≤
Input: x
≤
∈ {0, 1}∗, β ∈ Q+
1. Construct Construct formulas formulas φx,i in 3-CNF that are true if OPT P (x) i. (These formulas φx,i can be uniformly uniformly constructed constructed in polynomial polynomial time, cf. the proof of Cook’s theorem.)
≥
≤ ≤
2. Let ψx,κ = f PCP κ k. PCP (φx,µ( x,µ(x)rκ ), 1 By padding with dummy clauses, we may assume that all the ψx,κ have the same number of clauses c. 3. Return ψx =
k κ=1 ψx,κ .
The function function g is computed as follows:
∈ {0, 1}∗, assignment a with performance ratio β 1. If b If b ≤ 1 + α(β − 1), then return A(x).1
Input: x
2. Else let κ0 be the largest κ such that gPCP(φx,µ( x,µ(x)rκ ,a ) satisfies φx,µ( x,µ(x)rκ . (We restrict a to the variables of φx,µ( x,µ(x)rκ .) 3. This satisfying satisfying assignment assignment corresponds corresponds to a feasible feasible solution y with κ mP (x, y ) µ(x)r 0 . Return y .
≥
≤
−
If b 1 + α(β 1), then we return A(x). This is a b-approximation by assumption. Since b 1 + α(β 1), we are done. Therefore, Therefore, assume that b > 1 + α(β 1). We have OPTMax-3-SAT (ψx )
≤
−
−m
Max-3-SAT
−
(ψx , a)
≤ OPT
Max-3-SAT
(ψx )
β
− 1 ≤ kc β − 1 . β
β
Let βκ denote the performance ratio of a with respect to ψx,κ , i.e., we view a as an assignment of ψx,κ . We have OPTMax-3-SAT (ψx )
−m
Max-3-SAT
(ψx , a)
≥ OPT
Max-3-SAT
= OPTMax-3-SAT
≥ 2c · βκβ−κ 1 . 1
Here is the promised dependence on β .
− m -3βκ − 1 (ψx,κ) (ψx,κ)
Max
βκ
SAT
(ψx,κ , a)
25 The last inequality follows from the fact that any formula in CNF has an assignment that satisfies at least half of the clauses. This yields
· − ≤ kc β −β 1
c βκ 1 2 βκ and finally βκ
≤ 1 − 2k(β1 − 1)/β . 1)/β
Exploiting (A.1), we get, after some routine calculations, βκ
≤ 1 + . ≥ −
This means that a satisfies at least a fraction of 1/β 1 /βκ 1 of the clauses of ψx,κ . Then gPCP(a) satisfies satisfies φx,µ( x,µ(x)rκ if and only if φx,µ( x,µ(x)rκ is satisfiable. κ This is equivalent to the fact that OPT P (x) µ(x)r . By the definition of κ0 , µ(x)rκ0 +1 > OPTP (x) µ(x)r κ0 .
≥
≥
This means that m P (x, y) µ(x)rκ0 . But then y is an r-approximate solution. Then we are done, since r = 1 + α(β 1) by definition.
≥
−
Theorem A.3. For every minimization problem P imization problem P APX such that P AP P .
∈
≤
∈ APX, there is a max-
Proof. Let A be a b-approximation algorithm for P . P . Let µ(x) = mP (x, A(x)) for all x IP . Then µ(x) b OPTP (x). P has the same instances and feasible solutions as P . P . The objective function is however different:
∈
≤
mP (x, y ) =
(k + 1)µ 1)µ(x) µ(x)
≤
− k mP (x, y)
if mP (x, y) otherwise
≤
≤ µ(x)
where k = b . We have µ(x) OPTP (x) (k + 1)µ 1)µ(x). This means, that A is a (k (k + 1)-approximation algorithm for P . Hence, P APX. The AP reduction (f ( f , g , α) α) from P to P is defined as follows: f ( f (x, β ) = x for all x IP . (Note that we do not need any dependence on β here). Next, we set y if mP (x, y ) µ(x) g(x,y,β) x,y,β) = A(x) othe otherw rwis isee
∈
∈
≤
And finally, α = k + 1. Let y be a β -approximate solution to x under mP , that is, RP (x, y) = OPTP (x)/ mP (x, y ) β . We have to show that RP (x, y) 1 + α(β 1).
≤
≤
−
26
A. Max-3-SAT is APX-hard We distinguish two cases: The first one is m P (x, y ) mP (x, y ) =
≤ µ(x). In this case, case,
(k + 1)µ 1)µ(x) mP (x, y ) k (k + 1)µ 1)µ(x) OPTP (x)/β k (k + 1)µ 1)µ(x) (1 (β 1)) OPT OPTP (x) k β 1 OPTP (x) + OPTP (x) k β 1 OPTP (x) + (k + 1)µ 1)µ(x) k OPTP (x) + (β ( β 1)(k 1)(k + 1)µ 1)µ(x)/r
− − − − − − − −
≤ ≤ ≤ ≤ ≤ ≤ (1 + α(β − 1)) 1)) OPTP (x).
This completes the first case. For the second case, note that mP (x, g(x, y )) = mP (x, A(y)) Thus, P
≤ b OPTP (x) ≤ (1 + α(β − 1)) OPT OPTP (x).
≤AP P .
Now Theorems A.2 and A.3 imply the following result. Theorem A.4. Max-3-SAT is APX-hard.
A.1 A.1
Furth urther er exer exerci cise sess
particular ular,, Clique Exercise A.1. Show that Max-3-SAT AP Clique. (In partic does not have a PTAS, unless P = NP.)
≤
The k th cartesian product of a graph G = (V, E ) is a graph with nodes and there is an edge between ( u1 , . . . , uk ) and (v (v1, . . . , vk ) if either ui = vi or ui , vi E for all 1 i k . V k
{
}∈
≤ ≤
1. Prove Prove that if G has a clique of size s, then Gk has a clique of size sk .
Exercise A.2.
2. Use this to show that if Clique apply Exercise A.1.
∈ APX, then Clique ∈ PTAS.
Now
H˚ astad [H˚ [H˚ as99] shows that any approximation approxim ation algorithm algor ithm with perforperf or− 1 mance ratio n 0 for some 0 > 0 would imply ZPP = NP On the other hand, achieving a performance ratio of n is trivial. trivial.
4
The long code ode
The next chapters follow follow the ideas of Dinur Dinur quite closely closely.. The Diplomarbeit Diplomarbeit by Stefan Senitsch [Sen07] contains a polished proof with many additional details which was very helpful in preparing the next chapters. Let n denote the set of all Boolean functions 0, 1 n 0, 1 and n− the set of all functions 1, 1 n 1, 1 . These are essentia essentially lly the same x objects, x 2x + 1 (or x ( 1) ) is a bijection that maps 1 to 1 and 0 to 1. Note that we identify 1 (true) with 1 and 0 (false) with 1. For our purposes, it is more convenient to work with n− and this interpretation of true and false.
B
→ −
{− } → {− } → −
Definition Definition 4.1. Let x
∈ {−1, 1}n.
{ } →{ }
−
B
−
B
The long code of x is the function
LCx : Bn− → {−1, 1} given by LCx (f ) f ) = f ( f (x) for all f ∈ Bn− . The long code was invented by Bellare, Goldreich, and Sudan [BGS98]. 2n By ordering the functions in n− , we can view LC x as a vector in 1, 1 2 and we will tacitly switch between these two views. 2n The relative distance between two elements in A, B 1, 1 2 is
B
{− }
∈ {− }
δ(A, B ) = Pr [A(f ) f ) = B (f )] f )],, −
∈Bn
f
i.e., it is the probability that the vectors A and B differ at a random position. position. 2n 2 Furthermore, we define a scalar product on 1, 1 by
{− }
A, B = f ∈BE
−
n [AB] AB ] = 2−2
n
⊆ {− }
Note that A, A = 1 for all A. For a set S 1, 1 n , let χS :
−
∈B ∈Bn
f
·
A(f ) f ) B (f ) f ).
Bn− → {−1, 1} be defined by
χS (f ) f ) =
f ( f (x).
∈
x S n
V n = {A : Bn− → R}. V n is a vector space of dimension 2 2 . Lemma 4.2. {χS | S ⊆ {−1, 1}n } is an orthonormal basis of V n . Proof. Let S, T ⊆ {−1, 1}n with S = T . T . First, χS , χS = 2−2 χS (f ) f )2 = 1. Let
n
f
−
∈B ∈Bn
27
28
4. The long co de
Second,
χS , χT = 2−2
n
·
χS (f ) f )χT (f ) f ) =
f ( f (x)
∈B ∈Bn x∈S
−
∈B ∈Bn
f
f
f ( f (x) =
∈
f ( f (x).
∈B ∈Bn x∈S ∆T
x T
−
f
−
Choose an x S ∆T . T . Note that such an x exists, since S = T . T . Consider the − mapping mapping on n that maps a function f to the function function g with f ( f (x) = g(x) and f ( f (y ) = g(y ) for all y = x. This This mapping mapping is an involut involution ion (i.e., (i.e., self inverse) inverse) that does not have any fixed points. Such Such an involution involution separates separates − the functions in n into two sets of the same size such that for all functions f , f , the corresponding function g is in the other set. 1 Hence
∈ B
−
B
∈Bn x∈S ∆T
f
f ( f (x) =
−
f ( f (x)
∈
f ( f (y ) = 0 .
\{x}
y S ∆T
−
∈Bn
f
Thus, Thus, the χS form an orthonorma orthonormall family family.. Since Since its size equals equals the 2n 2 dimension dimension of 1, 1 , it is spanning, too.
{− }
Once we have an orthonormal family, family, we can look at Fourier expansions. expansions. − The Fourier coefficients coefficients of a function A : n 1, 1 are given by
B → {− }
n AˆS = A, χS = 2−2
A(f ) f )
∈
x S
−
∈B ∈Bn
f
f ( f (x).
The Fourier expansion of A is A(f ) f ) =
AˆS χS (f ) f ) =
⊆{− ⊆{−1,1}n
S
AˆS
⊆{− ⊆{−1,1}n
S
f ( f (x).
x S
∈
Furthermore, Parceval’s identity holds, that is,
S
2 AˆS = A, A = 1. n
⊆{− ⊆{−1,1}
n
In what follows, we will usually consider folded strings. A 1, 1 2 is called folded over true if for all f , f , A( f ) f ) = A(f ). f ). Let ψ : 1, 1 n 1, 1 . A is called folded over ψ if for all f , f , A(f ) f ) = A(f ψ). A is simply called folded , if it is folded over true and over ψ . (This (This assumes assumes that that ψ is clear clear from the contex context.) t.) If a string string is folded folded,, then then we only need need to specify specify it on a smaller set of positions Dψ defined defined as follows: follows: Let D be a set of functions that contains exactly one function of every pair f and f and let Dψ = f D f = f ψ .
−
{− }
{ ∈ |
1
∧ }
−
∧
∈ {− } {− } →
−
To achie achieve ve this, pick pick a functio function, n, put it into into the one set and its image image under under the involutio involution n into the other. This is possible, since the involution involution has no fixed points. Repeat until until all functi function on are put into into one of the two sets. sets. I am sorry that you had to read this, this, but I was puzzled.
4.1. Properties of folded strings
4.1 4.1
29
Prope Propert rtie iess of fold folded ed str strin ings gs
Lemma 4.3. If A = LCa for some a
folded.
∈ {−1, 1}n and ψ(a) = −1 then A is
{−1, 1}n → {−1, 1}. We have A(f ) f ) = LCa (f ) f ) = f ( f (a) = (f ∧ ψ )(a )(a) = LCa (f ∧ ψ) = A(f ∧ ψ )
Proof. Let f :
and
−
−
A( f ) f ) = LCa ( f ) f ) = Lemma 4.4. Let ψ
have:
−f ( f (a) = − LCa (f ) f ) = −A(f ) f ).
∈ Bn−, let A ∈ V n be folded, and let S ⊆ {−1, 1}n. We
1. Ef ∈B f )] = 0. 0. ∈Bn [A(f )] −
2. If S is even, then AˆS = 0.
||
∈ S with ψ(y) = 1, then AˆS = 0. Proof. We start with 1: Let f ∈ Bn− . Since A is folded, A(f )+ f )+A A(−f ) f ) = 0.
3. If there is a y
From this, it follows easily that the expected value is 0. Next comes 2: We have
−
χS ( f ) f ) = Let f
−
f ( f (x) = ( 1)|S |
−
∈
x S
f ( f (x) = χS (f ) f ).
∈
x S
∈ Bn−. We have −
−
−
A(f ) f )χS (f ) f ) + A( f ) f )χS ( f ) f ) = (A(f ) f ) + A( f )) f ))χ χS (f ) f ) = 0. Thus AˆS = E[Aχ E[AχS ] = 0. Finall Finally y, we show show 3: Let f differs only at y from f . f . We have
∈ B n− and let g ∈ Bn− be the function that
χS (g) =
g (x) = g (y)
∈
x S
Since A is folded and f A(g )χS (g) = 0.
x S y
∈ \{ \{ }
g(x) =
−f ( f (y )
x S y
f ( f (x) =
∈ \{ \{ }
f ) = A(g). ∧ ψ = g ∧ ψ, A(f )
−χS (f ) f )
Thus Thus A(f ) f )χS (f ) f ) +
5
Long code ode tests
− We will use the long code to encode assignments of a formula ψ n . We will design a test T that gets a string A and test whether A is the long code of a satisfying assignment of ψ. The The test test will will only only query query three three bits bits of A! Howe Ho weve ver, r, the long code is simply simply to long long . . . but this this will not matter matter in the end.
∈B
5.1 5.1
First irst test test
Input: folded string A :
Bn− → {−1, 1}, ψ : {−1, 1}n → {−1, 1}.
1. Let τ = 1/100.
∈ Bn− uniformly at random. 3. Define µ ∈ Bn− as follows: follows: If f ( f (x) = 1, then let µ(x) = −1. If f ( f (x) = −1, then let
2. Choose f, g
µ(x) =
1
−
with with prob probab abil ilit ity y 1 τ , τ , 1 with with proba probabi bili lity ty τ . τ .
−
·
4. Let h = µ g . 5. If A If A(f ) f ) = A(g) = A(h) = 1, then reject. Else accept. The following lemma show that if the test T accepts, then A is close to the long code of a satisfying assignment of A or its negation. There exists exists a constan onstant t K ∗ such such that the following following holds: Lemma Lemma 5.1. 5.1. There
≤ −
∈
If Pr[T Pr[T rejects (A, ψ)] for small enough > 0, then there is an a n 1, 1 with ψ with ψ (a) = 1 such that either δ either δ (A, LCa ) < K ∗ or δ or δ (A, LCa ) < ∗ K .
{− }
−
Proof. T accepts iff not all of A(f ), f ), A(g) and A(h) equa equall 1. This This is 1 equivalent to 1 f ))(1 + A(g))(1 + A(h)) = 1 (and not 0; the 8 (1 + A(f ))(1
−
30
5.1. First test
31
{ }
lefthand side is 0, 1 -valued). Thus
− −
Pr[T Pr[T accepts (A, (A, ψ)] = Pr =E =
7 8
1 1 (1 + A(f ))(1 f ))(1 + A(g))(1 + A(h)) = 1 8 1 1 (1 + A(f ))(1 f ))(1 + A(g ))(1 + A(h)) 8 1 (E[A (E[A(f )] f )] + E [A(g)] + E [A(h)] 8 + E[A E[A(f ) f )A(g )] + E[A E[A(f ) f )A(h)] + E[A E[A(f ) f )A(h)]
−
+ E[A E[A(f ) f )A(g )A(h)])
(5.1)
As A is folded, E[A E[A(f )] f )] = 0 by Lemm Lemma a 4.4. Sinc Sincee f , f , g, and h (check the latter!) are drawn uniformly at random, E[A E[A(f )] f )] = E[A E[A(g )] = E[A E[A(h)] = 0. 0. The pairs (f, (f, g ) and (f, (f, h) are independent ((g, (( g, h) is, however, not!), thus E[A E[A(f ) f )A(g)] = E[A E[A(f )]E[ f )]E[A A(g )] = 0, 0, and in the same way E[A E[ A(f ) f )A(h)] = 0. Therefore, it remains to estimate E[A E[ A(g )A(h)] and E[A E[A(f ) f )A(g )A(h)] in (5.1). We start with E[A E[A(g)A(h)] and will use Fourier analysis:
AˆS χS (g)AˆT χT (h)
E[A E[A(g)A(h)] = E
⊆{−1,1}n
S,T
=
AˆS AˆT E[χ E[χS (g)χT (h)]. )].
(5.2)
S,T
So we should analyze the terms E[χ E[ χS (g)χT (h)]. )]. We start start with the the case S = T . T . Let z S T , T , the other case is symmetric. We have
∈ \
E[χ E[χS (g )χT (h)] = E
g(x)
x S
h(y )
y T
∈
∈
= E g(z )
g (x)
x S z
∈ \{ \{ }
= E[g E[g(z )] E
y T
∈
g (x)
∈ \{ \{ }
x S z
= 0,
h(y )
h(y )
y T
∈
32
5. Long code tests
since g(z ) and the remaining product are independent and E[ g(z )] = 0, because g is random. If T = S , then
− − −
E[χ E[χS (g)χS (h)] = E
g (x)h(x)
∈
x S
g 2 (x)µ(x)
=E
∈
x S
=E
µ(x)
x S
∈
=
E[µ E[µ(x)]
∈
x S
=
(Pr[µ (Pr[µ(x) = 1]
∈
x S
=
Pr[µ Pr[µ(x) =
= 12 (1 τ ) τ )
1])
= 12 (1+τ (1+τ ))
−
( τ ) τ )
∈
x S
= ( τ ) τ )|S | ,
−
(5.3)
Above, we used g 2 (x) = 1 for all x and the independence of µ(x) and µ(y) for x = y . If we plug everything into (5.2), we get
E [A(g)A(h)] =
2 AˆS ( τ ) τ )|S | . n
⊆{− ⊆{−1,1}
S
Because Aˆ∅ = 0 by Lemma 4.4,
|E [A(g)A(h)]| ≤ τ
−
Aˆ2s = τ,
(5.4)
⊆{− ⊆{−1,1}n
S
where the last inequality follows from Parceval’s identity. Next comes E[A E[A(f ) f )A(g )A(h)] =: W . W . Like Like befo b efore, re, it can be shown that that W =
2 ˆ AˆS AR E[χ E[χS (µ)χR (f )] f )],,
⊆ ⊆{− ⊆{−1,1}n
R S
see Exercise 5.1. Now,
− − − · −
E[χ E[χS (µ)χR (f )] f )] = E
f ( f (x)µ(x)
x R
∈
=
µ(y )
y S R
∈ \
(Pr[f (Pr[f ((x)µ(x) = 1]
∈
Pr[f Pr[f ((x)µ(x) =
x R
=
∈
x R
= (τ
1 τ 2
1 1 + (1 2 2
τ )|S \R| . − 1)|R|(−τ )
τ ) τ )
−1]) · (−τ ) τ )|S \R|
( τ ) τ )|S \R|
5.1. First test Note that
33
y S R µ(y )
∈ \
= ( τ ) τ )|S \R| has already be shown, see (5.3). Thus,
−
W =
2 ˆ AˆS Ar (τ
⊆ ⊆{− ⊆{−1,1}n
R S
− 1)|R|(−τ ) τ )|S \R|
and
|W | ≤
⊆ ⊆{− ⊆{−1,1}n
R S
τ )|R| τ |S \R| |AˆS 2 ||AˆR|(1 − τ )
≤ | | | 2 AˆS
S
AˆR (1
R S
⊆
| − τ ) τ )|R| τ |S \R| .
(5.5)
By the Cauchy–Schwartz inequality,
|
AˆR (1
|
⊆
R S
| | | | ≤ − −
− τ ) τ )|R| τ |S \R| ≤
AˆR
⊆ |S |
((1
⊆
R S
i=0
=
2
R S
S ((1 i
− τ ) τ )|R| τ |S \R| )2
|−i τ ) τ )2 )i (τ 2 )|S |−
τ ) τ )2 )|S |
(τ 2 + (1
≤ (1 − τ ) τ )|S |/2 .
The last inequality follows by the choice of τ . τ . Thus
|W | ≤
⊆ ⊆{− ⊆{−1,1}n
R S
≤ |
2 AˆS (1
|S |=1
|AˆS 2 |(1 − τ ) τ )|S |/2
| − τ ) τ ) +
|
2 AˆS (1
|S |≥ |≥3
| − τ ) τ )|S |/2 .
since AˆT = 0 for even T . T . We get get a bett better er bound bound for for the the first first sum sum by 2 analyzing analyzing (5.5), since AR 1. Set = Pr[T Pr[T rejects]. rejects]. Then (5.1) (5.1) yields yields 1 + τ + 8 8 W . W . For smal smalll enough , this yields W < 0 and therefore we get
| |≤
−
≥
− τ − 8 ≤ |W | ≤ (1 − ρ)(1 − τ ) τ ) + ρ(1 − τ ) τ )3/2 2 |S |≥ |≥3 |AˆS |. From this, we get 1
where ρ =
ρ
8 √ τ ) ≤ K . ≤ (1 − τ )(1 τ )(1 − 1 − τ )
8 √ with K = (1−τ )(1 , which is constant. τ )(1− 1−τ ) τ ) Finall Finally y, we will apply Theorem Theorem 5.2. For small small enough , 1 Lρ will ˆ be great greater er than than 0. Sinc Sincee A∅ = 0, the the first first case in the the theo theore rem m cann cannot ot
−
34
5. Long code tests
happen. Hence Aˆ{2a}
| ≥ 1 − Lρ for some a ∈ {−1, 1}n . Thus, either Aˆ{a} ≥ √1 − Lρ ≥ 1 − Lρ or −Aˆ ≥ 1 − Lρ. Lρ. In the first case, we get {a} |
1
− Lρ ≤ Aˆ{a} = A, χ{a} = A, LCa,
because χ{a} (f ) f ) = f ( f (a) = LCa (f ). f ). Thus δ (A, LCa )
L/2ρ ≤ K L/2 L/2 · . ≤ L/2 In the second case, we get δ (A, − LCa ) ≤ K L/2 L/2 · in the same same way way. ˆ 0. Lemma 4.4, ψ (a) = −1, since A{a} =
By
Here Here is the theorem theorem that we used in the proof above. above. It is essen essentia tially lly the only result result that we will not prove prove.. The theorem theorem says that wheneve wheneverr a function A only has small Fourier Fourier coefficients coefficients corresponding corresponding to sets S > 1, then most of the mass is concentrated in one Fourier coefficient with S 1.
|| | |≤
Theorem 5.2 (Friedgut, Kalai & Naor [FKN02]). There is a constant L > 0
2 such that for all ρ > 0 and A : n− 1, 1 with ρ |S |>1 AˆS the following holds: Either Aˆ∅2 1 Lρ or Aˆa2 1 Lρ for some a 1, 1 n .
B → {− } | |≥ − | |≥ −
Exercise 5.1. Show that W =
≥
| | ∈ {− }
E[χS (µ)χR (f )] f )].. ⊆ ⊆{− ⊆{−1,1}n AˆS AˆR E[χ 2
R S
Theorem 5.3. The long code test T has the following properties:
∈ {− }n
1. If a 1, 1 probability 1.
with ψ (a) =
−1, then T accepts LCa
and ψ with
≤ 1, if A is folded −1, then T then T rejects
2. Ther There is a constan onstant t c > 0 such that for all 0 < δ and δ (A, LCa ) δ for all a 1, 1 n with ψ(a) = A and ψ with probability cδ. cδ .
≥
≥
∈ {− }
Proof. We first first prov prove 1: Let Let a 1, 1 n with ψ (a) = 1 and let A = LCa . By Lemma 4.3, A is folded. If A(f ) f ) = f ( f (a) = 1, then T accepts. If A(f ) f ) = f ( f (a) = 1, then µ(a) = 1. Hence, Hence, A(h) = h(a) = g(a) = A(g ). Thus one of these two values equals 1 and T accepts, too. Now No w we come to 2: Assume Assume that the asserti assertion on does not hold. hold. Then Then for all c > 0, there is a δc such that δ (A, LCa ) δc for all a 1, 1 n with ψ (a) = 1 and T rejects with probability < cδc . We choose c < 1/K ∗ small small enough enough and apply apply Lemma Lemma 5.1. There There is an ∗ n a 1, 1 such that ψ (a) = 1 and δ (A, LCa ) < K cK ∗ δc < δc or δ (A, LCa ) < δc . The first possibility is ruled out by the assumption about δc . Hence δ (A, LCa ) < δ c .
∈ {− }
−
≥
−
∈ {− } −
−
−
−
−
−
∈ {− }
≤
5.2. Second test
35
We have
−
Pr[T Pr[T rejects ( LCa , ψ )] = Pr[LCa (f ) f ) = LCa (g ) = LCa (h) =
−1]
−1] = Pr[f Pr[f ((a) = −1] · Pr[g Pr[g (a) = 1] · Pr[h Pr[h(a) − 1|f ( f (a) = g(a) = −1] 1 = Pr[µ Pr[µ(a) = 1|f ( f (a) = −1] 4 1 = (1 − τ ) τ ). 4 = Pr[f Pr[f ((a) = g (a) = h(a) =
This implies that
−
= Pr[T Pr[T rejects ( LCa , ψ)] Pr[T accepts (− LCa , ψ)] ≥ 1 − Pr[T ≥ 1 − Pr[T Pr[T accepts (− LCa , ψ) or − LCa(f ) f ) = A(f ) f ) or − LCa (g ) = A(g ) or − LCa (h) = A(h)] Pr[T accepts ( − LCa , ψ)] − 3δ (A, LCa ) ≥ 1 − Pr[T ≥ 1 − (1 − 14 (1 − τ )) τ )) − 3K ∗ 1 = (1 − τ ) τ ) − 3K ∗ . 4 For small enough, the right hand side is about 1 /4 and therefore greater than , a contradiction.
5.2 5.2
Seco Second nd tes test
Let ej : 1, 1 n 1, 1 be the projection on the j th component. component. We define a second test T that bases on T : T :
{− } → {− }
Input: a
∈ {−1, 1}n, folded A : Bn− → {−1, 1}, ψ ∈ Bn−
1. With probabilit probability y 1/ 1/2, run T on (A, ψ ). 2. With probabilit probability y 1/ 1/2, choose a j 1, . . . , n and f Accept if aj = A(f ) f ) A(f ej ). Else reject.
·
∈{
·
}
−
1. If ψ (a) = 1, then there is an A Pr[T Pr[T accepts (a,A,ψ)] a,A,ψ)] = 1. 1.
Theorem 5.4.
≤
∈ Bn− at random. ∈ Bn−
such that
2. There is a c > 0 such that for all 0 < δ 1, the following following holds: holds: If − δ (a, a ) δ for all a with ψ(a ) = 1, then for all folded A n, Pr[T Pr[T rejects (a,A,ψ)] a,A,ψ)] c δ .
≥
≥ ·
−
∈B
36
5. Long code tests
Proof. We start with 1: Let A = LCa . Then A is folded. If T If T is executed, then T accepts by Lemma 5.1. Otherwise )(a) = f ( A(f ) f ) A(f ej ) = f ( f (a) (f ej )(a f (a) f ( f (a) ej (a) = aj .
·
·
· ·
·
·
Thus T accepts with probability 1. Next Next comes 2: Let δ (a, a ) δ for all a with ψ (a ) = 1 and let A be folded. First case: δ (A, LCa ) δ/4 δ/4 for all a with ψ (a ) = 1. With With probab probabili ility ty c 1/2, T is executed. T rejects with probability 4 δ. n Second case: There is an a 1, 1 with δ (A, LCa ) < δ/4 δ/ 4 and ψ(a ) = 1. If aj = aj and A(f ) f )A(f ej ) = aj , then T will reject. Thus,
≥
−
≥
−
≥
∈ {− }
Pr[T Pr[T rejects]
−
≥ Pr[2. is executed] · Pr[2. rejects] ≥ 12 Pr [aj = aj ∧ A(f ) f )A(f ej ) = aj ] j,f ≥ 12 Prj [aj = aj ] · Pr [A(f ) f )A(f ej ) = aj |aj = aj ] j,f 1 = Pr[aj = aj ] · Pr[A(f ) f )A(f ej ) = aj ] f 2 j ≥ 12 δ(1 − 2δ ) ≥ 4δ .
For the second-to-last inequality, we have to show that Pr f [A(f ) f )A(f ej ) = δ aj ] 1 2 . A(f ) f )A(f ej ) = aj is implied by A(f ) f ) = f ( f (a ) and A(f ej ) = (f ej )(a )(a ) (multiply the two equations). Therefore,
≥ −
Pr[A Pr[A(f ) f )A(f ej ) = aj ]
Pr[A(f ) f ) = f ( f (a ) ∧ A(f ej ) = (f ej )(a )(a )] ≥ Pr[A (f ej )(a = 1 − Pr[A Pr[A(f ) f ) = f ( f (a ) ∨ A(f ej ) = )(a )] ≥ 1 − Pr[A (f ej )(a Pr[A(f ) f ) = f ( f (a )] − Pr[A Pr[A(f ej ) = )(a )] LCa (f )] LCa (f ej )] = 1 − Pr[A Pr[A(f ) f ) = f )] − Pr[A Pr[A(f ej ) = ≥ 1 − δ/2 δ/ 2.
The last inequality is true since we can bound both probabilities by δ/4. δ/ 4. The first one by assumption and also the second one, since f f ej is a − bijection bijection of n .
B
→
6
Assi Assign gnme ment nt Teste esterr
6.1
Constr Constrain aintt grap graph h sati satisfia sfiabil bilit ityy
A constraint constraint graph G over over some some alpha alphabet bet Σ is a dire direct cted ed graph graph ( V, E ) together with a mapping c : E (Σ). ( Σ). An assign assignme ment nt is a ma mapp ppin ingg V Σ. The assig assignme nment nt a satisfies the (constraint at) edge e = (u, v) if (a(u), a(v )) c(e). The unsatisfiability value of a is the number of constraints not satisfied by a divided by the number of constraints (edges). This This value alue is denote denoted d by UNSATa (G). The unsatis unsatisfiabi fiabilit lity y value value of G is UNSAT(G) = mina UNSAT(G).
→ P
→
∈
Problem 6.1. Maximum Constraint graph satisfiability Max-CGS is the fol-
lowing problem: Instanc Instances: es: constr onstrain aintt graph graphss ((V, ((V, E ), Σ, c) Solut Solution ions: s: assi assign gnme ment ntss a : V Σ Measure: (1 UNSAT(G UNSAT(G)) E Goal: max
−
→ ·| |
Exercise 6.1. The following two statements are equivalent:
− , 1)1)-Max-3-SAT is NP-hard. 2. Ther There is an > 0 such that gap(1 − , 1)1)-Max-CGS is NP-hard. 1. Ther There is an > 0 such that gap(1
Thus, Thus, to prove prove the PCP-The PCP-Theorem orem,, we can show show the NP-hardness -hardness of gap(1 , 1)-Max-CGS instead of gap(1 , 1)-Max-3-SAT. The The forme formerr one has the advantage that it is easier to apply results from graph theory, in particular expander graphs, which we will introduce in the next chapter.
−
6.2 6.2
−
Assi Assign gnme ment nt test tester erss
Definition 6.2. An assignment tester over Σ is a deterministic algorithm
{
}
that given a Boolean formula ψ in variables X = x1 , . . . , xn outputs a constraint graph G = ((V, ((V, E ), Σ, c) with X V such that there is an > 0 such that for all a : X 1, 1 1 :
⊆
→ {− }
−1, then there is a b : V \ X → Σ with UNSATa∪b(G) = 0. 2. If ψ (a) = 1, then for all b : V \ X → Σ, we have UNSATa∪b (G) ≥ 1. If ψ (a) = δ( δ (a, a ).
1
We identify two values of Σ with −1 and 1.
37
38
6. Assignment Tester
is called the rejection probability.
∪
→
∈
Above, a b : V Σ is the assignment that maps a v X to a(v ) and a v V X to b(v ). Again, we do not care for running times, since we will apply the assignment tester only to constant size instances. Given a Boolean formula, an assignment tester constructs a graph such that every satisfying assignment of the formula can be extended to a satisfying assignment of the graph. Every Every non-satisfyi non-satisfying ng assignmen assignmentt a, however, cannot be extended to an assignment assignment of the graph that fulfills fulfills a fraction fraction > 1 δ of the constraints, where δ is the distance of a to any satisfying assignment. Our construction takes the test T and models its behaviour in a graph. n We set Σ = 1, 1 3 and let Y be a set of 2 2 Boolean variable variables. s. These variables correspond to the bits of the string A. The The test test T makes some random experiments. It first flips a coin and then, depending on its outcome, either chooses f , f , g , and h or chooses j and f at random. random. For each each of the possible outcomes r , there will be one variable zr . Let Z be the set of all these variables variables.. We will interpret interpret these three values values as “guesses” “guesses” of the bits queried. The variables in X and Y will get Boolean values, that is, whenever they get values from Σ that do not represent 1 or 1, then all constraints containing them will not be satisfied. If in the outcome r , T queries A(f ), f ), A(g), and A(h), then zr will be connected connected to the three nodes in Y that correspond correspond to these positions. (More precis precisely ely,, since since we only only consid consider er folded folded string stringss A, we will cons consid ider er only only positions in Dψ and might replace f , f , g, and h by the corresponding elements of Dψ .) The constraints constraints on these three edges are satisfied, if the three bits at zr correspond to the bits of A(f ), f ), A(g), and A(h) and T would accept when reading these three bits. If in the outcome r, T queries A(f ), f ), A(f ej ), and aj , then zr will be connected to two nodes of Y and one of X of X . The rest of the construction is essentially the same.
∈ \
−
{− }
−
Theorem 6.3. The construction above is an assignment tester.
Proof. First we have to show that if a is a satisfying assignment of ψ , then we can extend it to a satisfying assignment of G. To the variable of Y , Y , we will assign the values according to LC a . By Theorem 5.4, T will accept with probability probability 1. This means that if we assign to Z values matching the values of X Y , Y , then all constraints will be satisfied. If a If a does not satisfy ψ, then T will reject every A with probability c δ , where δ (a, a ) δ for all satisfying assignments a. This This mean meanss that that for a fraction of c of c δ of the zr ’s at least one constraint is not satisfied. (Either we choose the values consistently with the values of X and Y , Y , then all three constraints are not satisfied, or we try to cheat but then at least one is not satisfied.) Thus UNSATa∪b (G) 3c δ .
∪ ·
≥ ·
≥
≥ ·
7
Expander graphs
Throughout this chapter, we are considering undirected multigraphs G = (V, E ) with self-loops. self-loops. The degree degree d(v ) of a node v is the number of edges that v belongs to. This particularly means that a node with a self-loop and no other edges has degree 1 (and not 2, which is a meaningful definition, too). This This definit definition ion of degree degree will be very very conveni convenien entt in the follow following ing.. A graph is called d-regular if d(v ) = d for all v V . V . It is a well-known fact that for graphs without self-loops, the sum of the degrees of the nodes is twice the number of edges (proof by double-counting). With self-loops, the following bounds hold.
∈
| | ≤ v∈V d(v) ≤ 2|E |. 2. If G is d-regular, then |E | ≤ d|V | ≤ 2|E |. 1. We have E
Fact 7.1.
A walk in a graph G = (V, E ) is a sequence (v ( v0 , e1 , v1 , e2 , . . . , e , v ) such that eλ = vλ−1 , vλ for all 1 λ . v0 is the start node, v is the end node of the walk. walk. Its length length is . A walk walk can visit visit the same same node or edge edge several times, i.e., it is allowed that vi = vj or ei = ej for some i = j . A graph is connected if for all pairs of nodes u and v, there is a walk from u to v . The neig neighbo hbourhood urhood N ( N (v ) of v is the set of all nodes u such that v, u E . In gene genera ral, l, the t-neighbourhood is the set of all nodes u such that there is a walk from v to u of length t.
{
}
≤ ≤
{ }∈
7.1 7.1
Alge Algeb braic raic grap graph h the theo ory
| | × |V |-matrix
The adjacency matrix of G is the V
A = (au,v )u,v∈V where au,v is the number of edges between u and v . We will usually index the rows and columns by the nodes itself and not by indices from 1, . . . , V . But we will assume that the nodes have some ordering, so that when we need it, we can also index the rows by 1 , . . . , V . We will now apply tools from linear algebra to A in order to study properties of G. This This is called called algebraic graph theory . The The book by Biggs Biggs [Big93 [Big93]] is an excell excellen entt introdu introducti ction on to this this field. field. Every Everythi thing ng you you want want to know about expander graphs can be found in [HLW06]. Because G is undirected, A is symmetric. symmetric. Therefore, Therefore, A has n real eigenvalues λ1 λ2 λn and there is a orthonormal basis consisting of eigenvectors.
{
| |
≥
≥ ··· ≥
39
| |}
40
7. Expander graphs
Let G be a d-regular graph with adjacency matrix A matrix A and eigenLemma 7.2. Let G values λ1
≥ λ2 ≥ · · · ≥ λn.
1. λ1 = d and 1n = (1, (1, . . . , 1)T is a corresponding eigenvector. 2. G is connected if and only if λ2 < d. d. Proof. We start with 1: Since G is d-regular, d-regular, d = d(v ) =
av,u
for all v
∈
u V
and A 1n = d 1n .
·
·
Thus d is an eigenvalue and 1 n is an associated associated eigenvec eigenvector. tor. Let λ be any eigenvalue and b be an associated eigenvector. We can scale b in such a way that the largest entry of b is 1. Let this entry by bv . Then
·
λ = λ bv =
av,u bu
u V
∈
≤
av,u = d.
u V
∈
Therefore, d is also the largest eigenvector. Now comes 2. “= ”: Let b be an eigenvector associated with the eigenvalue d. As abov above, we scal scalee b such such that the largest largest entry entry is 1. Let bv be this entry. entry. We next show show that for every node u N ( N (v ), bu = 1, too. Since Since G is connected, b = 1n follow followss by inducti induction. on. But this means that d has multiplicity 1 and λ2 < d. d. A b = d b implies
⇒
∈
·
·
d = dbv =
av,u bu =
∈
u V
≤
av,u bu .
∈
u N (v)
Since bu 1 for all u and since d = u∈N (v) av,u , this equation above can only be fulfilled if bu = 1 for all u N ( N (v ). A1 0 “ =”: If the graph G is not connected, then A = . There0 A2 fore (1, (1, . . . , 1, 0, . . . , 0) and (0, (0, . . . , 0, 1, . . . , 1) (with the appropriate number of 1’s and 0’s) are linearly independent eigenvectors associated with d.
∈
⇐
Let . denote the Euclidean norm of R|V | , that is b =
2 v V bv .
∈
Let G be a graph with adjacency matrix A. Then Definition 7.3. Let G λ(G) = max b 1n
⊥
Ab . b
Theorem Theorem 7.4. Let G be a d-regular graph with adjacency matrix A and
eigenvalues λ1
≥ λ2 ≥ · · · ≥ λn.
7.2. Edge expansion
41
| |
1. λ(G) = λj for some j . 2. λ(G) = maxb⊥1n λj .
Ab b
is attained for any eigenvector b associated with
3. λ(G) = max λ2 , λn .
{| | | |}
4. λ(G)
≤ d.
Proof. Let b be a vector for which the maximum is attained in the definition of λ(G). W.l.o.g. let b = 1. Let c1 , . . . , cn be an orthonormal basis consisting of eigenvectors of A. W.l.o. W.l.o.g. g. let c1 = 1n . Sinc Sincee b is orthogonal to 1n , we have b = β2 c2 + + βn c n ,
···
Since c1, . . . , cn is a orthonormal family,
1 = b = b22 +
· · · + bn2 .
Let λj be the eigenvalue cj is associated with. We have λ(G) = Ab
= β2 Ac2 + . . . βn Acn = (β2 λ2 )2 + · · · + (β (βn λn )2 .
Since b is a vector for which the maximum is attained, βj can only be nonzero for a λj whose absolute value is maximal among λ2 , . . . , λn . It is an easy exercise to derive the statements 1–4 from this. Exercise 7.1. Prove statements 1–4 of Theorem 7.4.
λ(G) is also called the seco second nd largest largest eigenvalue eigenvalue.. (More (More corr correc ectl tly y, it should be called the second largest absolute value of the eigenvalues, but this is even longer.)
7.2 7.2
Edge Edge expa expans nsio ion n
-regular ular graph. graph. The edge expansion h(G) of Definition Definition 7.5. Let G be a d-reg G is defined as h(G) =
¯) E (S, S . |≤|V |/2 S S ⊆V : V :|S |≤| min
||
¯) is the set of all edges with one endpoint in S and one endpoint in E (S, S ¯. G is called an h-expander if h(G) h. S
≥
Large edge expansion means that any set S has many neighbours that are not in S . This will be a very useful property. Families of expanders can be constructed constructed in polynom p olynomial ial time, one construction construction is [RVW02]. [RVW02]. We will not prove it here.
42
7. Expander graphs
Theorem 7.6. There are constants d0
∈ N and h0 > 0 and a deterministic
algorithm that given n given n constructs in time polynomial in n a d0 -regular graph Gn with h(G) > h 0 . Large edge expansion means small second largest eigenvalue and vice versa. We will need the following bound. Let G be a d-regular -regular graph. If λ(G) < d 1 then Theorem 7.7. Let G λ(G)
≤ d−
h(G)2 . 2d
To prove the theorem, it is sufficient to prove h(G)2
≤ 2d(d − λ) h(G)2 ≤ 2d(d + λ)
(7.1) (7.2)
by Theorem Theorem 7.4. The proofs of both inequali inequalitie tiess are very similar, similar, we will will only show the first one. Let B = dI
−A
B = dI + dI + A where I is the n n-identity -identity matrix. matrix. Let f Rn . Later, Later, we will deriv derivee f from an eigenvector of A. In the follo followin wing, g, a summat summation ion over over “e “ e = u, v ” is a sum over all edges in E with two end nodes and a summation over “e = v ” is a sum over all self-loops in E .
×
∈
{ }
{}
Lemma 7.8.
≥
f T Bf =
(f u
{ }
e= u,v
f T B f
− f v )2
(f u + f v )2
{ }
e= u,v
Proof. We have f T Bf =
− − df v2
f T Af
∈
v V
=
(f u2 + f v2 ) +
e= u,v
(f u
f v2
e= v
{ }
=
− {}
f v )2 .
e= u,v
{ }
2f u f v +
f v2
e= v
{}
e= u,v
{ }
1
We have to exclude bipartite graphs, which have λn = −d but can have edge expansion > 0. Our prove prove will break down down if λn = −d, because ( d + λ) must not be zero when proving the counter part of (7.3).
7.2. Edge expansion
43
The second inequality is proven in a similar manner. To a given f , f , let
|
F =
f u2
{ }
e= u,v
Let β0 < β 1 <
− f v2|.
· · · < β r be the different values that f attains. Let U j = {u ∈ V | f u ≥ βj }, U j = {u ∈ V | f u ≤ βj }
be the set of all nodes whose value f u is at least or at most βj , respectively. Lemma 7.9. r
| |
¯j ) (β 2 E (U j , U j
F =
| − βj2−1)
j =1 r 1
−
F =
¯ ) (β 2 E (U j , U j j +1
|
j =0
− βj2)
{ }∈
Proof. Let e = u, v E be an edge that is no self-loop. Assume that f u = βi βj = f v . The contrib contributio ution n of e to F is βi2 βj2. On the the other other ¯ hand, e crosses U k and U k for j k i 1. Thus the contribution of e to right-hand side of the first equation in the statement of the lemma is
≥
(βi2
−
≤ ≤ −
( βi2−1 − βi2−2 + · · · + (β (βj2+1 − βj2 ) = βi2 − βj2 . − βi2−1) + (β
Thus both sides of the equation are equal. Lemma 7.10. We have
If f ( f (v )
F
≤
F
≤
≤ 0 for all v, then
√ √
2d f T Bf f .
2d f T B f f .
Proof. We have
| − | | − |·| | ≤ − · ·
F =
f u2
f v2
f u
f v
e= u,v
{ }
=
f u + f v
{ }
e= u,v
(f u
f v )2
{ }
e= u,v
{ }
f T Bf
(f u + f v )2
e= u,v
=
(f u + f v )2
{ }
e= u,v
44
7. Expander graphs
by the Cauchy–Schwartz and Lemma 7.8 We can bound the second factor by
(f u + f v
)2
{ }
≤ ≤
(f u2 + f v2 )
2
{ }
e= u,v
e= u,v
2d
≤
√
f v2
∈
v V
2d f .
The second inequality is proven in a similar manner.
≥ 0 for all v ∈ V or f v ≤ 0 for all v ∈ V . V . | supp(f supp(f ))| ≤ n/2 n/2, then F ≥ h(G)f .
Lemma Lemma 7.11. 7.11. Let f v
≥
If
Proof. We only show the statement for f v 0, the other case is completely pletely similar. Since supp(f supp(f )) n/2, n/2, we have β0 = 0 and U j n/2 n/2 for ¯j ) j > 0. We have E (U j , U h(g ) U j . By Lemma 7.9,
|
|
|≤
|≥
| |≤
| |
r
F =
|
¯j ) (β 2 E (U j , U j
| − βj2−1)
j =1
r
≥ h(G)
| | − | | −| | U j (βj2
βj2−1 )
βj2 ( U j
U j +1 ) +βr2 U r
j =1
r 1
= h(G)
−
j =1
|{ |
= v f v =βj
= h(G) f 2 .
| |
}|
Finally Finally, we will now prove prove (7.1) and (7.2). We only show (7.1), (7.2) is proven proven in the same manner. Let λ < d be an eigenvector of A. d λ is an eigenvector of B = dI A and every eigenvector of A associated associated with λ is an eigenvector of B associated associated with d λ. Let g be such an eigenvector. g is orthogonal to 1n. We can assume that g has at most n/2 n/2 entries that are 0, otherwise we consider g instead. We define
−
−
≥
−
−
f v =
gv 0
≥
if gv 0 othe therwi rwise
7.2. Edge expansion
45
| | ≤ n/2. n/2. We have (Bf ) Bf )v = df v − av,u f u
and W = supp(f supp(f ). ). By construction, W
− − ∈
u V
= dgv
av,u gu
u W
∈
≤ dgv av,u gu u∈V = (d − λ)gv . ∈
Since f v = 0 for for v / W , W , this implies f T Bf =
f v (Bf ) Bf )v
∈
v V
≤ (d − λ)
f v gv
v V
∈
≤ (d − λ) f v2 v∈V = (d − λ)f 2 .
(7.3)
By Lemmas 7.10 and 7.11,
2 ≤
h(G) f
√
2d f T Bf f .
Squaring this and exploiting the inequality before, we get h(G)2 f
2 ≤ 2d · f T Bf · f 2 ≤ 2d(d − λ)f 4.
Because g is orthogonal to 1n and nonzero, it has at least one entry > 0. Therefore, f > 0 and we get
h(G)2
≤ 2d(d − λ).
The second inequality is proven in the same manner.
8
Rand Random om walks alks on expa expand nder erss
Consider the following method RW to generate a walk in a d-regular -regular graph G = (V, E ). ). -regular graph G, Input: d-regular
∈N
1. Randomly Randomly choose a vertex vertex v0 . 2. For λ = 1, . . . , choose vλ
∈ N ( N (vλ−1 ) uniformly at random.
3. Return (v ( v0 , . . . , v ). Let be the set of all -walks in G. We have = nd , since a path is uniquely specified by its start node ( n choices) and a vector 1, . . . , d which tells us which of the d edges of the current node is the next in the given walk. For this, we number the d edges incident incident with a node arbitrarily from 1 to d. It is now clear that method RW generates the walks according to the uniform distribution on RW.
W
|W |
Lemma 8.1. Method RW returns each walk W
1/(nd ).
{
}
∈ W with a probability of
Instead of choosing a start node, we can choose a node in the middle. This modified method RW also generates the uniform distribution on all walks of length . -regular graph G, Input: d-regular
∈ N, 0 ≤ j ≤
1. Randomly Randomly choose a vertex vertex vj .
− 1, . . . , 0 choose vλ ∈ N ( N (vλ+1 ) uniformly at random. 3. For λ = j + 1, 1, . . . , choose vλ ∈ N ( N (vλ−1 ) uniformly at random. 2. For λ = j
4. Return (v ( v0 , . . . , v ). In the following, let G be a d-regular graph with adjacency matrix A. Let A˜ = d1 A be the normalized adjacency matrix. A˜ is doubly stochastic, stochastic, i.e., all entries are nonnegative and all row sums and all column sums are 1. λ is an eigenvalue of A˜ iff d λ is an eigenvalue of A. Let x = (xv )v∈V be a probability distribution on the nodes v and consider x as an element of Rn . If we now now select select a node v according according to x, then select an edge v, u (u = v is allowed) incident with v uniformly at random, and then go to u, the probability distribution that we get is given by A˜ x. Applying induction we get the following result.
·
·
{ }
·
46
47 V . If we run run metho method d Lemma Lemma 8.2. Let x be a probability distribution on V . RW for steps and draw the start node according to x, then this induces a probability distribution on V given by A˜ x.
⊆
Let F E be a set of edges. We want to estimate the probability probability that a random walk of length j that starts in an edge of F ends in an edge of F , F , too. To this aim, we first calculate the probability xv that a random walk that starts with an edge of F starts in v . The distribu distribution tion x = (xv )v∈V is generated by the following process: First choose an edge f F at random. Then choose one of its nodes uniformly at random as the start node (and the other node as the second node). Here it makes a difference whether f is a self-loop or not. We have
∈
1 xv = F d F
| |
· |{
}|
1 e = u, v 2
{ } | e ∈ F, u = v}| + |{e = {v} | e ∈ F
≤| |
(8.1)
By symmetry, this is also the probability that v is the second node in a walk that starts with an edge in F . F . Second we estimate the probability yv that if we choose a random edge incident to v , we have chosen an edge in F . F . This is simply 1 e = u, v e F d 2 F 1 = e = u, v e d 2 F 2 F xν . d
· |{ { } | ∈ }| | | · |{ { } | ∈ F }| | | ≤ | |·
yv =
(8.2)
Now the probability distribution on the nodes after performing a walk of length j that starts in F is given by A˜j −1 xv . (Note (Note tha thatt xv is also the probability that v is the second node in the walk.) The probability that the (j + 1)th edge is in F is then given by
y, A˜j −1 x
(8.3)
where ., . is the ordinary scalar product in Rn . To estimate estimate (8.3), we will exploit exploit the CauchyCauchy-Sch Schwart wartzz inequalit inequality y. For 1 ˜ this, we need estimate x . x1 = n 1n is an eigenvector of A associated with 1. Let x⊥ = x x1 . x⊥ is orthogonal to x1, because
−
1n, x⊥ =
∈
v V
(xv
− 1/n) /n) = 1 − 1 = 0.
48
8. Random walks on expanders
A˜k x⊥ is also orthogonal to x1 for every k , since
1n, A˜k x⊥ = (A˜k )T 1n, x⊥ = A˜k1n, x⊥ = 1n, x⊥ = 0.
˜ = λ(G)/d. Let λ /d. We have
A˜j−1x⊥ ≤ |λ˜|j−1x⊥ and
x⊥2 = x − x12 = x2 − 2x, x1 + x1 2 2 1 = x2 − xv + n n v∈V < x2 .
By (8.1),
x2 ≤ max xv · v ∈V Altogether, we have
xv = max xv v V
∈
∈
v V
≤ |F d | .
y, A˜j−1x⊥ ≤ y · A˜j−1x⊥ ≤ 2|dF | x · |λ˜j−1|x ≤ 2λ˜j−1. Finally, the probability that ej +1 ∈ F can be bounded by y, A˜j−1x = y, A˜j−1x1 + y, A˜j−1x⊥ ≤ y, x1 + 2|λ˜|j−1 1 ˜ |j −1 = y v + 2 |λ n v∈V ≤ 2|F | + 2|λ˜|j−1
dn
| | ≤ 2
F + E
| |
λ d
j 1
−
.
-regular graph G = (V, E ) Lemma Lemma 8.3. Consider a random walk on a d-regular
⊆
starting with an edge from a set F E . Then Then the prob probabi abilit lityy that that the (j + 1)th 1)th-edge -edge of the walk is again in F is bounded by 2
| | F + E
| |
λ(G) d
j 1
−
.
If F does not contain any self-loops, then (8.1) can be bounded by 2|dF | and we can get rid of the 2 in the estimate. Then this bound says that even after a logarithmic number of steps, the (j ( j + 1)the edge is almost drawn at random.
9
The final proof oof
Finall Finally y, we can start with the proof of the PCP theorem. theorem. We begin with with 1 the observation that gap(1 |E | , 1)-Max-CGS is NP-hard (over the alphabet Σ = 0, 1 3 . Let G be a given constraint graph. We apply three procedures to G:
−
{ }
G
↓ Gprep ↓ Gamp ↓
Preprocessing (G (G becomes an expander) Amplification (UNSAT value gets larger, but also Σ) 3
{ }
Alphabet reduction (Σ = 0, 1
Gred
again)
| |
If we do this O(log E ) times, then we bring the (relative) size of the 1 gap from |E | to constant and we are done.
9.1 9.1
Prep Preproc roces essi sing ng
Throughout this chapter, d0 and h0 will be “global” constants that come out of the construction of a constant degree d0 expander X n with constant edge expansion h0 , see Theorem 7.6. ((V, E ), Σ, c) be a constraint onstraint graph. graph. There There is a conLemma 9.1. Let G = ((V, stant γ stant γ 1 > 0 such that we can construct in polynomial time a (d0 + 1)-regular 1)-regular graph G1 = ((V ((V 1 , E 1 ), Σ : c1 ) with size(G size(G1 ) = O(size(G O(size(G)) and
·
γ 1 UNSAT(G UNSAT(G)
≤ UNSAT(G UNSAT(G1 ) ≤ UNSAT(G UNSAT(G).
Proof. Let X n be the expander from Theorem 7.6. G1 is constructed as follows:
∈ V by a copy Y v of X d(v). 2. For each each edge {u, v} ∈ E insert an edge from Y u to Y v . Do this in such 1. Replace Replace each each v
a way that every node of Y v is incident with exactly one such extra edge. In this way, the resulting graph will be (d ( d0 + 1)-regular.
3. Let E int int be the edges within the copies Y v and E ext ext be the edges betwe between en two two differe different nt copies. copies. For all e E int int , c1 (e) is an equality
∈
49
50
9. The final pro of constraint that is satisfied iff both nodes have the same value (“internal constraint constraints”). s”). For all e E ext ext , c1 (e) is the same constraint as the original original edge has (“external (“external constraint constraint”). ”).
∈
We have V 1 2 E and E 1 V 1 (d0 + 1) 2 E (d0 + 1). v ∈V d(v ) Thus size(G size(G1 ) = O(size(G O(size(G)). Next, we show that UNSAT(G UNSAT( G1 ) UNSAT(G UNSAT(G). Chose Chose an assign assignmen mentt σ : V Σ with UNSAT(G UNSAT(G) = UNSATσ (G) (i.e., an optimal assignment). We define σ1 : V 1 Σ by σ1 (u) = σ (v ) iff u belongs to V ( V (Y v ), the vertex set of the copy Y v that replaces replaces v . In this way, way, all intern internal al constrai constraint ntss are fulfilled by construction. construction. Every Every external constraint constraint is fulfilled iff it was fulfilled under σ in G. Therefore,
| |≤
→
≤ | |
| |≤| |
≤ | |
≤
→
≤ UNSATσ (G1) ≤ UNSAT(G UNSAT(G), where the second equation follows from the fact that |E | ≤ |E 1 |. The interesting case is γ · UNSAT(G UNSAT(G) ≤ UNSAT(G UNSAT(G1 ). Let σ1 : V 1 → Σ be an optimum optimum assignme assignment nt.. We define σ : V → Σ by a majority vote: σ (v) is the value a ∈ Σ that is the most common among all values σ1 (u) with u ∈ V ( V (Y v ). Ties Ties are broke broken n arbitr arbitrari arily ly.. Let Let F ⊆ E be the set of all unsatisfied constraints under σ and F 1 ⊆ E 1 the set of all unsatisfied constraints under σ1 . Let S = {u ∈ V ( V (Y v ) | v ∈ V, σ1 (u) = σ (v )} and v S = S ∩ V ( V (Y v ), i.e., all the looser nodes that voted for a different value for σ (v ). Let α := |F |/|E | = UNSATσ (G). We have α|E | = |F | ≤ |F 1 | + |S |, UNSAT(G UNSAT(G1 )
since, if a constraint in F is not satisfied, then either the corresponding external constraint in F 1 is not satisfied or one of the nodes is a looser node. α Case 1: F 1 E . We have 2
| | | |≥ ·| | UNSAT( G) |F 1| ≥ α · |E | = α ≥ UNSAT(G UNSAT(G UNSAT(G1 ) = . |E 1| 2 |E 1| 4(d 4(d0 + 1) 4(d 4(d0 + 1) Case 2: |F 1 | < α2 |E |. In this case, we have α E + S > F 1 + S 2
| | | | | | | | ≥ α|E |.
α Thus S V and let S av = u S v σ1 (u) = a . 2 E . Consider some v 1 We have S v = a=σ(v ) S av . Because Because we took a majority vote, vote, S av V (Y v ) 2 V ( for all a = σ1 (u). As Y v is an expander,
| |≥ | |
∈
|E (S av , S ¯av )| ≥ h0 · |S av |,
{ ∈ | | |≤ |
}
|
9.1. Preprocessing
51
¯v = V ( where the complement is taken “locally”, i.e., S V (Y v ) S av . Sinc Sincee we we a v v ¯ ) are have equality constraints on all internal edges, all edges in E (S a , S a not satisfied. Thus,
\ |
|F 1| ≥ 12
| ·| ≥ | | ≥
¯v ) E (S av , S a
∈
v V a=σ (v )
∈
v V
1 h0 2
1 h0 2
S av
a=σ (v)
|
|
|
S v
v V
∈
1 = h0 S 2 α > h0 E . 4
|| | |
Thus UNSAT(G UNSAT(G1 ) = >
≥ ≥
|F 1| |E 1|
· || ||
αh0 E 4 E 1 αh0 8(d 8(d0 + 1) h0 UNSAT(G UNSAT(G). 8(d 8(d0 + 1)
We set γ 1 to be the minimum of the constants in the two cases. -regular ular constraint constraint graph. We can construc constructt in Lemma Lemma 9.2. Let G be a d-reg polynomial time a constraint graph G2 such that
• G2 is (d + d0 + 1)-regular, 1)-regular, • every node of G2 has a self loop, h , • λ(G2) ≤ d + d0 + 1 − 2(d 2(d+d +1) • size(G size(G2 ) = O(size(G O(size(G)), )), d UNSAT(G UNSAT(G) ≤ UNSAT(G UNSAT(G2 ) ≤ UNSAT(G UNSAT(G). • 2+2(d 2+2(d +1) · 2 0 0
0
Proof. Assume that G has n nodes. nodes. We take the union union of G and X n (both graphs have the same node set) and attach to each node a self-loop. The edges from X n and the self loops get trivial constraints that are always fulfilled. G2 = ((V, ((V, E 2 ), Σ, c2 ) is clearly d + d0 + 1-regular.
52
9. The final pro of We have h(G)
≥ h(X n) ≥ h0. Since G is not bipartite, h20 λ(G2 ) ≤ d + d0 + 1 − . 2(d 2(d + d0 + 1)
Finally, 2(d0 + 1) d + 2(d (d0 + 1)|V | ≤ |E 2| = |E | + |E (X n)| + n ≤ |E | + (d |E |. d Thus the size increase is linear. Furthermore, the UNSAT value can at most shrink by this factor. By combining these two lemmas, we get the following result. Theorem 9.3. There is are constants βprep > 0 and 0 < λ < δ such that for
all constraint graphs G, we can construct in polynomial time a constraint graph Gprep over the same alphabet with
• G is δ-regular, • every node in G has a self-loop, • λ(G ) ≤ λ, • size(G size(G ) = O(size(G O(size(G)), )), • β · UNSAT(G UNSAT(G) ≤ UNSAT(G UNSAT(G ) ≤ UNSAT(G UNSAT(G). h d We set δ = d + d0 + 1, βprep = γ · d+2(d , and λ = δ − 2δ . +2(d +1) prep
prep
prep
prep
prep
prep
2 0
0
9.2 9.2
Gap Gap ampl amplifi ifica cati tion on
((V, E ), Σ, c) be a d-regular constraint graph such Definition 9.4. Let G = ((V, that every node has a self loop. Let t N be even. The t-fold amplification product Gt
= ((V, ((V,
t/2 t/2 E t ), Σd , ct )
∈
is defined as follows:
• For every walk W of length t from u to v, there is an edge {u, v} in E t . If there there are are sever several al walks walks betwe etween u and v, we introduce several edges between u and v . But we disr disregard gard the directi directions ons of the walks, walks, that is, for every walk W and its reverse, we put only one edge into E t . t/2 t/2
• An assignment σˆ maps every node to a vector from Σd
. We inde index x the entries with walks of length t/2 t/2 starting in v . (Ther (Theree are are exactly exactly t/2 t/ 2 d such walks. Let W be such a walk and let u be the other end node. σ ˆ (v )W is called “the opinion of v about u with respect to w”. Sinc Since there might be many walks from v to u, v can have many opinions about u. We will usually assume that nodes nodes are not “schizophre “schizophrenic”, nic”, i.e., that they always have the same opinion about u. In this case, we will also write σ ˆ (v )u for the opinion of v about u.
9.2. Gap amplification
53
• It remains to define ct. Let e = {u, v} ∈ E t and σˆ be an assignment. t Let Ge be the subgraph of G induced by N t/2 t/2 (u) ∪ N t/2 t/2 (v ). c (e) is satisfied by σ ˆ iff all opinions (of u and v) about every x ∈ Ge are
consistent and all constraints in Ge are satisfied. satisfied. (Since (Since G will be an expander, if one constraint of G “appears” in many constraints of Gt .)
If t If t is a constant, Gt is polynom p olynomial ial time computable computable from G and we have t size(G size(G ) = O(size(G O(size(G)). constants, s, Σ an alphab alphabet et.. Ther There is a Theorem Theorem 9.5. Let λ < d be two constant
| |
constant βamp solely depending on λ, d, and Σ such that for all d-regular constraint graphs G with self loops at every node and λ(G) λ:
≥ β · √t · min{UNSAT(G UNSAT( G), 21t } 2. UNSAT(G UNSAT( G) = 0 ⇒ UNSAT(G UNSAT(Gt ) = 0. 1. UNSAT(G UNSAT( Gt )
≤
amp
Proof. We start start with showing showing 2: Let σ be a satisfying assignment for t/2 t/2 d G. We define σ ˆ : V Σ by setting σ ˆ (v )W = σ (u) where W is a walk of length t/2 t/2 starting in v and u is the other end node of W . W . By construction, t ˆ. σ ˆ fulfills all constraints of G For 1, let σ ˆ be an optimum assignment for Gt . We can assum assumee that there are not any schizophrenic nodes v because otherwise all constraints involving v are not satisfied and therefore, we cannot increase the UNSAT value by changing the assignment to v . We will define an assignment σ with
→
UNSATσˆ (Gt )
≥ Ω(√t) · min{UNSATσ (G), 21t }.
σ is again defined by a majority vote. σ (v ) is the majority of all opinions of the nodes u that are reachable by a walk of length t/2 t/2 from v . (These (These are are exactly the nodes that have an opinion about v .) If several paths go from v to u, then each paths contributes one opinion. We choose an F E as large as possible such that all constraints in F are not satisfied by σ and F / E 1/t. /t. Then
⊆
| | | |≤
min UNSATσ (G),
{
1 2t
F | 1 } ≤ ||E ≤ | t.
W t denote the set of all walks of length t. “ hit at j ” if ej ∈ F and Definition Definition 9.6. W = (v0 , e1 , v1 , . . . , vt ) ∈ W t is “ hit Let
the opinion of v0 about vj −1 and of vt about vj are equal to σ (vj −1 ) and to σ (vj ), resp respeectively. (In particular, particular, both nodes have an opinion about about the corresponding node.)
54
9. The final pro of
If an edge is hit, then it is not satisfied and it is not satisfied because it is really not satisfied and not just because ˆσ and σ were inconsistent. We set I = j t/2 t < j t/2 t/2 + t , the set of “middle N t/2 indices”. For a walk W , W , we set
−√
{ ∈ |
N ( N (W ) W ) =
√}
≤
|{j ∈ I | W is hit at j }|.
Let eW be the edge in Gt corresponding corresponding to W If N ( N (W ) W ) > 0, then eW is not satisfied by σ ˆ , since ej is not satisfied in G under σ and σ is consistent with σ ˆ on vj and vj −1 . In formulas, Pr[N Pr[N ((W ) W ) > 0]
≤ eˆPr [ˆσ does not satisfy ˆe] ∈E t
= UNSATσˆ (Gt ) = UNSAT(G UNSAT(Gt ).
√
F | We will show that Ω( t) ||E | Let
≤ Pr[N Pr[N ((W ) W ) > 0]. This will finish the proof.
N j (W ) W ) = Then
W ) = N ( N (W ). W ). j I N j (W ) F E ). With this, we can
∈ | | 1] = Ω( | |
1 if W is hit in j , 0 other otherwi wise se..
Lemma Lem ma 9.7 below below shows shows Pr W ∈W t [N j (W ) W ) =
bound Pr[N Pr[ N ((W ) W ) > 0]. We use the following conditional expectation inequality: Pr[N Pr[N ((W ) W ) > 0]
≥
j I
∈
Pr[N Pr[N j (W ) W ) = 1)] . E[N E[N ((W ) W ) = 1 N j (W ) W ) = 1]
|
By linearity of expectation,
|
E[N E[N ((W ) W ) = 1 N j (W ) W ) = 1] =
|
E[N E[N k (W ) W ) = 1 N j (W ) W ) = 1]. 1].
∈
k I
For every summand on the righthand side, we have E[N E[N k (W ) W ) = 1 N j (W ) W ) = 1]
|
| − j + 1| ends in F | it started in F ] F ]
= Pr[a random walk of length k
| | ≤ 2
F + E
| |
λ d
| k −j |
by Lemma 8.3. Hence,
|
|| || | | ≤ | |· | | − ≤ √
E[N E[N ((W ) W ) = 1 N j (W ) W ) = 1] =
2
∈
k I
2 2
F + E
λ d
| k −j |
F 2 + E 1 λ/d 2 2 + . t 1 λ/d
I
−
9.2. Gap amplification
55
Thus Pr[N Pr[N ((W ) W ) > 0]
F | Ω( ||E |)
≥
2 j I t
2 1 λ/d
∈ √ + −
≥Ω
√ · | | t
F E
| |
by the exercise below. Exercise 9.1. Show that for every c > 0, there is a constant a such that
x 2/x + c for all x
≥a·x
≥ 1.
Lemma 9.7. For all j
|F | ). ∈ I , PrW ∈W [N j (W ) W ) = 1] = Ω( |E | t
∈
Proof. Fix j I . We generate a walk W = (v0 , e1 , v1 , . . . , vt) uniformly at random by using the method RW’ with parameter j . Then, Then, the edge edge ej is chosen uniformly at random. Furthermore, urthermore, v0 only depends on vj −1 and vt only depends on vt . Therefore, P rW ∈W t [N j = 1] =
|F | pq |E |
where p = PrW ∈calW t [ˆσ (v0 )vj 1 = σ (vj −1 )] and q = PrW ∈calW t [ˆσ (vt )vj = σ (vj ). We are done done if we can show show that p and q are constan constant. t. Since Since both cases are symmetric, we will only present a proof for p. Let X j be the random variable variable generated generated by the following process. We start in vj −1 and perform a random walk of length j 1. Let u be the node that we reach. We output the opinion σ (u)vj 1 of u about vj −1 . If u has no opinion about vj −1 (this can happen, since j can be greater than t/2; t/2; but this will not happen too often) then we output some dummy value not in Σ. Obviously, p = Pr[X Pr[X j −1 = σ (vj −1 )]. If j = t/2, t/2, then we reach all the nodes that have an opinion about vj −1 . We start with the case j = t/2. t/2. In this case, both nodes v0 and vt have an opinion about vj −1 and vj . Sinc Sincee σ (vj −1 ) is chosen by a majority vote, 1 p |Σ| in this case. We will now show that for all j I , the probability Pr[X Pr[X j −1 = σ (vj −1 )] cannot differ by too much from this, in particular, it is Ω(1 / Σ ). The The self self loops will play a crucial role here, since they ensure that a random paths with edges visit not more than (1 1/d) /d) different different nodes. We leave leave the rest of this proof as an exercise. −
−
−
≥
∈
| |
−
Pr[X j −1 = σ (vj −1 )] Exercise 9.2. Show that Pr[X
≥ Ω(1/ Ω(1/|Σ|) for j ∈ I .
56
9. The final pro of
9.3 9.3
Alph Alphabe abett redu reduct ctio ion n
In the last section, we increased the UNSAT value of the constraint graph but also enlarged the alphabet. To apply the construction construction iteratively iteratively,, we 3 need that in the end, the alphabet is again Σ = 0, 1 . This is achieved by the procedure in this section. For this, we need a little little coding coding theory theory.. An encoding encoding of 0, 1 k is an injective mapping E : 0, 1 k 0, 1 . Its Its imag imagee C is called a code, an element of C a code code word. ord. The The (rel (relat ativ ive) e) dista distanc ncee of a code code is δ (C ) = min δ (x, y ) x, y C, x = y . O(k) with For our purposes, we need an encoding E : 0, 1 k 0, 1 O(k relative distance ρ for some constant ρ > 0. For a constr construct uction ion of such such a code, code, see see e.g. e.g. [SS9 [SS96] 6].. If we hav have a relat relativ ivee dist distanc ancee of ρ, then this in particular means that whenever we take a code word and change an arbitrary fraction of less than ρ/2 ρ/2 of the bits, then we can recover the original code word, since there is only one that has relative distance less than ρ/2. ρ/2.
{ }
{
|
∈ ≥
{ } →{ } }
{ }
{ } →{ }
Lemma Lemma 9.8. There is a constant βred red such that for all constraint graphs
ˆ , c) we can construct in polynomial time a constraint graph G = ((V, ((V, E ), Σ Gred ((V , E ), 0, 1 3 , c ) such that red = ((V
{ } ˆ |, 1. size(G size(G ) ≤ O(size(G O(size(G)) where the constant only depends on |Σ 2. β · UNSAT(G UNSAT(G) ≤ UNSAT(G UNSAT(G ) ≤ UNSAT(G UNSAT(G). ˆ |. We identify every element of Σ ˆ with a string {0, 1}k Proof. Let k = |Σ ˆ |. Then with k = log |Σ Then we map each each string string to {0, 1} with = O(k O(k ) using the code from the beginning of this section. section. We replace every every node v ∈ V by a sequence of nodes v1 , . . . , v . With every edge e = (u, v) ∈ E , we identify a function φe : {0, 1} × {0, 1} → { 0, 1}. φe (x, y ) is true iff x and y are ˆ such that (a, code words corresponding to values a, b ∈ Σ ( a, b) ∈ c(e). For each such φe , we construct an assignment tester (see Theorem 6.3) Ge = ((V ((V e , E e ), {0, 1}3 , ce ). The The grap graph h Gred is the union of all these assignment test tester ers. s. The The nodes v1 , . . . , v representing v , v ∈ V , V , are shared by all red red
red red
red red
assignment tester corresponding to an edge that contains v. The constraints of each edge in Gpre are the constraints of the Ge . We can assume that each Ge has the same number of edges, say, r. Thus Gpre has r E edges. Each assignment tester is a constant size graph whose size only depends ˆ . This immediately yields the upper bound on the size of Gpre . on Σ For the second statement of theorem, consider an optimal assignment σ of G. We constr construct uct an assign assignmen mentt σ for G as follo follows: ws: If σ satisfies the constraint ce , then, by the properties of assignment testers, we can extend σ in such a way that all constraints of Ge are satisfied. If σ does not satisfy ce , then we extend σ in any way. way. In the worst worst case, no constraints constraints of Ge are satisfied. Thus for every constraint satisfied in G, at least r constraints are
| |
| |
9.4. Putting everything together
57
satisfied satisfied in Gpre . Thus UNSAT(G UNSAT(Gpre )
UNSAT(G) ≤ r · |E | · UNSAT(G = UNSATσ (G) = UNSAT(G UNSAT(G). |E |
For the other inequality, let σ be an optimum assignment for G . For each set of nodes v1 , . . . , v , representing v , v V , V , we interpret the as signment σ to these nodes as a string x 0, 1 . We set set σ (v) to be the ˆ whose encoding x element a Σ ˆ minimizes δ(x, x ˆ). We will now show that for every constraint ce that is not satisfied by σ , a constant fraction of the constraints of Ge is not satisfied. This will complete the proof. Let v and w be the two nodes of e and x and y be the strings given by σ and x ˆ and yˆ be the nearest code words. Since e is not satisfied, either x or y differs from each satisfying assignment of φe in a least ρ/2 ρ/2 of the bits. bits. If this were were not the case, then x and y would have been decoded to an assignment satisfying e. Thus Thus x and y in total differ from any satisfying assignment of φe in a fraction of ρ/4 ρ/4 of the bits. But then, then, by the propertie propertiess of an assignme assignment nt tester, also a fraction of ρ/4 ρ/4 of the constraints of Ge are not satisfied.
∈ ∈{ }
∈
·
9.4
Puttin Putting g every everythi thing ng togeth together er
If we put together the constructions of the three previous sections, we get the following following result. result. There are are constan constants ts C > 0 and 1 > a > 0 such that for Lemma Lemma 9.9. There every constraint graph G over the alphabet Σ = 0, 1 3 , we can construct a constraint graph G over the same alphabet in polynomial time such that
{ }
1. size(G size(G )
≤ C · size(G size(G),
2. If UNSAT(G UNSAT(G) = 0, then UNSAT(G UNSAT(G ) = 0, 3. UNSAT(G UNSAT( G )
≥ min{2 · UNSAT(G UNSAT(G), a}.
Proof. We start with G, make it an expander, then amplify the gap (the value t is yet to choose choose)) and finally finally reduce the alphabet. alphabet. It is clear clear that if we choose t to be a constant, then the first two statements are fulfilled. It remains to choose t in such a way that the third statement is fulfilled. We have 1 UNSAT(G UNSAT(G ) βred βamp t min UNSAT(G UNSAT(Gpre ), 2t 1 βred βamp t min βpre UNSAT(G UNSAT(G), 2t
≥ ≥
If we now set t = 4
·√ · ·√ ·
· ·
1
βpre βamp βred
UNSAT(G UNSAT(G )
2
{ {
}
·
, we get
≥ min{2 · UNSAT(G UNSAT(G), a}
}
58
9. The final pro of
with a =
2 2 βpreβamp βred . 4
With this lemma, the proof of the PCP theorem follows easily. We start with the observation that the decision version of constraint graph satisfaction is NP-complete, i.e., gap(1 1/ E , 1)-Max-CGS is NP-hard. -hard. Let G be an input input graph graph.. If we now now appl apply y the the above above lemma lemma log E times, we get an graph G that can be computed in time polynomial in size( G) with the property property that
− | |
UNSAT(G UNSAT(G )
| |
≥ min{2log |E | · |E 1 | , a} = a − | |
is constant constant.. Thus Thus we have have a reduct reduction ion from gap(1 1/ E , 1)-Max-CGS to gap(1 a, 1)-Max-CGS. In particular, the latter problem is also NP-complete. But this is equivalent to the statement of the PCP theorem.
−
10
Aver Averag agee-ca case se comp comple lexi xitty
Being intractable, intractable, e.g., NP-complete, does not completely reflect the difficult ficulty y of a problem problem.. Appro Approxim ximabi abilit lity y is one way way of refinin refiningg the notion notion of intractability: We have seen some NP-hard optimization problems, for which finding a close-to-optimal solution is easy, and others, for which finding even a very weak approximation is as hard as solving an NP-complete problem. Average-case complexity is another way of refining the intractability of problems: problems: Unless Unless P = NP, no efficien efficientt algo algorith rithm m exists exists to solve solve an NPcomplete complete problem efficiently efficiently on all instances. instances. Howeve However, r, we may still hope that we can solve the problem efficiently on most instances or on typical instances, where “typical” here means something like “sampled from a distribution tribution that reflects reflects practical practical instances.” instances.” Another motivation for studying average-case complexity is cryptography: A cryptographic system is secure only if any efficient attempt to break it succeeds succeeds only with a very very small probabil probabilit ity y. Thus, Thus, it does not help help if a cryptographic system is hard to break only in the worst case. In fact, most of cryptography assumes that NP problems exist that are intractable not only in the worst but even in the average case, i.e., on random inputs. Bogdanov and Trevisan give a well-written survey about average-case complexity [BT06]. They also cover connections of average-case complexity to areas like cryptography.
10.1
Probabilit Probabilityy distrib distribution utionss and and distribu distributiona tionall probproblems
What probability probability distribution distribution should we take? What probability probability distribudistributions should we allow? How should we model that we have have inputs of various sizes? There are essentially two ways to deal with inputs of various sizes: First, for each n N, we can have a distribution Dn , from which we draw the instances of “size” n (e.g., n can be the length length of the strings strings). ). Combin Combining ing = (Dn )n≥1 of probability distribuD1 , D2 , D3 , . . ., ., we get an ensemble tions. The second possibility is to have a single probability distribution for strings strings of all lengths. This is convenien convenientt in some applications, applications, and it leads to a simple notion of reducibilit reducibility y that preserve preservess averageaverage-case case tractability tractability. But it is difficult, e.g., to define circuit complexity for this case, and it is also sometimes sometimes counterin counterintuitiv tuitive. e. For instance, instance, since 0, 1 is a countable infi-
∈
D
D
{ }
59
60
10. Average-case complexity
nite set, there is no “uniform” distribution on 0, 1 . Instead, the distribution that is commonly called the “uniform distribution” assigns probability 6 −2 −n n 2 to every string of length n 1 (or, to get rid of the factor 6/π 6 /π 2 , π2 we assign probability n(n1+1) 2−n to strings of length n). We will use ensembles of distributions distributions as it will be b e more convenien convenient. t. Howeve However, r, most results hold independent independent of which which possibility possibility we choose. The uniform distribution distribution − n n = (U n )n∈N is then given by U n (x) = 2 for x 0, 1 . In order to allow for different probability distributions, we do not fix one, but we will consider distributional problems in the following.
{ }
≥
U
∈{ }
distributional al decision decision problem is a pair Π = (L, Definition 10.1. A distribution
D
D),
where L is a language and = (Dn )n≥1 is an ensemble of probability distributions, where each Dn has finite support.
{ |
}
By supp(D supp( D) = x D(x) > 0 , we denote the support of a probability distri distribut bution. ion. You can think think of supp( supp(D Dn ) 0, 1 n , but this will not always ways be the case. Howeve However, r, we will have supp(D supp( Dn ) 0, 1 ≤p(n) for some polynomial p for the distributions that we consider. What we would now like to have is something like an “average-case NPhard” distributional problem Π = (L, (L, ): If Π is average-case tractable, then (L , ) is average-case tractable for each L NP and each ensemble . However, as we will show, a statement like “every ( L , ) is average-case tractable” is the same as “ P = NP”. (The previous previous sentence sentence is non-trivial: non-trivial: For the same language L , we can use different algorithms for different distributions .) Thus, Thus, the average average-ca -case se analog analog of NP-completeness cannot refer refer to arbitra arbitrary ry probabi probabilit lity y distri distribut butions ions.. We will will restri restrict ct ourselv ourselves es to two possible sets of distributions, namely polynomial time samplable and polynomial time computable ensembles.
⊆{ }
D
D
⊆{ }
∈
D
D
D
D = (Dn) is polynomial-tim polynomial-timee samplable samplable if there exists a randomized algorithm A that takes an input n ∈ N and produces a string in A(n) ∈ {0, 1} with the following properties: • Ther There exists exists a polynom olynomial ial p such such that that A on input input n is p(n) time ensemble Definition Definition 10.2. 10.2. An ensemble
bounded, regardless of the random bits A reads.
• For every n ∈ N and every x ∈ {0, 1}, we have Pr(A Pr(A(n) = x) = Dn (x).
PSamp denotes the set of all polynomial-time samplable ensembles.
Several variants of definitions of polynomial-time samplable distributions exist. For instance, one can relax the strict bound on the running-time and require only that A runs in expected polynomial time. Such finer distinctions are important, e.g., in the study of zero-knowledge proofs, but we will not elaborate on this.
10.2. Average polynomial time and heuristics
61
≤ || || || ||
To define the second set of distributions, let denote the lexicographic lexicographic ordering ordering between between bit strings. strings. (We (We have have x y if x < y or x = y and x appears before y in lexicographic order.) Then the cumulative probability of x with respect to a probability distribution D is defined by
≤
f D (x) =
D(y) .
y x
≤
D = (Dn) is polynomial-time computable if there exists a deterministic algorithm A algorithm A that, on input n input n ∈ N and x and x ∈ {0, 1} ,
Definition 10.3. An ensemble
runs in time poly(n poly(n) and computes f Dn (x). PComp denotes the set of all polynomial-time computable ensembles.
If a distribution f Dn (x) is computable in time poly(n poly( n), then also the density function Dn (x) is computable in time poly(n poly( n). The conve converse rse does not hold unless P = NP.
D = (Dn)n∈ such that the density functions Dn are polynomial-time computable but D ∈ / PComp Exercise 10.1. Show that there exists an ensemble
N
unless P = NP. The latter latter means means that if the function functionss f Dn are computable in polynomial time, then P = NP. In the following, we will focus on ensembles from PComp. Many results will nevertheless carry over to the wider class PSamp.
Exercise 10.2. Show that PComp
⊆ PSamp.
The converse is unlikely to be true. Exercise 10.3. Show that PComp = PSamp if and only if P = P#P .
⇒
Hint: For , define an appropriate appropriate distribution on ϕ, a (choose pairing function and encoding carefully), where ϕ is a Boolean formula and a is an assignment for ϕ for ϕ. Given ϕ Given ϕ, the probability of ϕ, a should essentially depend on whether a whether a satisfies ϕ. Show that the cumulative distribution function can be used to compute the number of satisfying assignments.
One can argue that PSamp is the class of natural distributio distributions: ns: A distribution is natural if we can efficiently sample from it. This, however, does not mean that their density or distribution function is efficiently computable.
10.2 10.2
Avera Average ge polyno polynomia miall time time and and heuris heuristic ticss
When When can we say that a problem problem can be solve solved d efficien efficiently tly on averag average? e? A first attempt might be to consider the expected running-time with respect to a given given probability probability distribution. distribution. Howeve However, r, this definition, definition, although although very natural, does not yield a robust class of average tractable problems.
62
10. Average-case complexity
Exercise 10.4. Show that the class
{(L, D) | L can be solved in expected polynomial time with respect to D} is not invariant under changes of the machine model: Let A be an algorithm with running-time t, and let B let B be a simulation of A of A that is, say, quadratically slower. Give a function t such that A has expected polynomial running-time but B has not. Another Another way would be to consider consider the median of the running-time. running-time. At least, this would be a definition that is robust against changes of the machine model. Howeve However, r, you probability probability would not call an algorithm efficient efficient if it requires linear time on 70% of the instances and exponential time on the rest. rest. So what about about requiri requiring ng that the algorith algorithm m runs runs in polynomial polynomial time on 99% of the instances? The threshold 99% would be arbitrary, there is no reason why 99% is preferable preferable to 98% or 99. 99.9%. But with any such such threshold, threshold, there would still be a constant fraction of the inputs, for which the algorithm performs poorly. poorly. But what would be a natural way of defining “average tractability”? Intuitively, one would probably say that an algorithm is efficient on average if instances that require longer and longer running time show up with smaller and smaller probability probability:: An algorithm A has polynomial average runningtime if there exists a constant c > 0 such that the probability that A requires more than time T is at most poly(n poly(n)/T c . In this way, we have a polynomial trade-off between running-time and fraction of inputs. Definition 10.4. An algorithm A with running-time tA has average poly-
nomial running-time with respect to the ensemble and a polynomial p such that, for all n and t, Pr (tA (x, n)
∼
x Dn
D if there exists an ε > 0
≥ t) ≤ p(tnε ) .
If an algorithm A has average polynomial running-time, then its median running-time is polynomial, it runs in polynomial time on all but a 1 / poly(n poly(n) − polylog(n polylog( n ) polylog(n polylog( n ) fraction of inputs, and in time n on all but an n fraction of the inputs, and so on. The whole field of average-case complexity theory was essentially founded by Leonid Levin Levin [Lev86]. (Leonid (Leonid Levin is the guy who did not get a Turing award for inventing the Russian analog of NP com comple pleten teness ess.. His paper [Lev86] is a good candidate for the shortest paper in theoretical computer science science that founded an area: The conference conference version version of the paper is one page long, the final journal paper has two pages.) Levin’s original definition is different, different, but turns out to be equivalen equivalent: t: An algorithm has average average polynomial running-time with respect to if there exists an ε > 0 such that
D
E (tA (x, n)ε ) = O(n) .
x Dn
∼
10.2. Average polynomial time and heuristics
63
Exercise 10.5. Prove that Definition 10.4 and Levin’s definition are indeed
equivalent. Exercise 10.6. Actually, Levin’s original definition uses a single distribu-
tion D tion D : 0, 1 + [0, [0, 1]. 1]. Given an ensemble = (Dn )n≥1 with supp( with supp(D Dn ) n + 0, 1 , we obtain obtain a sing single le dist distrib ribut ution ion D : 0, 1 by settin setting g D(x) = 6 D (x). π2 |x|2 |x| A function t : 0, 1 + N is called polynomial on D-average if there exists constants k and c such that
{ }
{ } →
D
⊆
{ }
{ } →
∈{0,1}+
x
t(x)1/k D(x) x
||
≤c.
Prove that t is polynomial on D-average if and only if Prx∼Dn t(x, n) p(n) tε for all n, some polynomial p, and some ε > 0.
≤
t
≥
In addition to how to measure time, we have the choice to use deterministic or randomized algorithms or even non-uniform families of circuits. “In practice”, we would probably not run an algorithm forever, but only for a polynomial number of steps. We can model this by an algorithm with worst-cas worst-casee polynomial polynomial running-time running-time that “fails” on some inputs. This leads to the following following definition, definition, where “failure” “failure” means that the algorithm algorithm says “I don’t know.” Let Π = (L, Definition 10.5. Let Π
D) be a distributional problem with D = (Dn).
An algorithm A is a (fully polynomial-time) errorless heuristic scheme for Π if there is a polynomial p such that the following holds: supp(Dn ), A(x,n,δ) x,n,δ) outputs • For every n, every δ > 0, and every x ∈ supp(D either L(x) or the failure symbol ⊥. • For every n, every δ > 0, and every x ∈ supp(D supp(Dn ), A(x,n,δ) x,n,δ) runs in time p(n, 1/δ) /δ ).
• For every n and every δ > 0, we have Prx∼D (A(x,n,δ) x,n,δ) = ⊥) ≤ δ . n
This, however, is yet another way of defining average-polynomial time. Exercise Exercise 10.7. 10.7. Show that a distributional problem Π admits an errorless
heuristic scheme if and only if it admits an algorithm whose running-time is average-polynomial according to Definition 10.4. Now that we have three equivalent definitions of “tractable on average”, we will finally define a complexity class of distributional problems that are tractable on average. Definition Definition 10.6. 10.6. The class AvgP is the set of all distributional problems
that admit an errorless heuristic scheme.
64
10. Average-case complexity
Exercise 10.8. Let
3COL = G G is 3-colorable ,
{ |
}
let Gn,1 uniform distributi distribution on on graphs graphs with n vertic vertices, es, and let n,1/2 be the uniform AvgP. 1/2 = (Gn,1 n,1/2 )n∈N . Show that (3COL, 1/2 ) n Note: If we include every edge of 2 with a probability of p, i.e., the n probability of getting G = (V, E ) is p|E | (1 p)( 2 )−|E | , then this probability
G
G
∈ −
distribution is called Gn,p.
Instead of having δ as part of the input, we can also have a function δ:N (0, (0, 1] of failure probabilities. This leads to the following definition.
→
D
Definition Definition 10.7. 10.7. Let Π = (L, ) be a distributional problem, and let δ : (0, (0, 1]. 1]. An algor algorithm ithm A is an errorless heuristic algorithm for Π with N
→
failure probability at most δ if the following properties are fulfilled:
• For every n ∈ N and every x ∈ supp(D supp(Dn ), A(x, n) outputs either L(x) or ⊥. • For every n ∈ N, we have Prx∼D (A(x, n) = ⊥) ≤ δ(n). For a time bound t, we say that Π ∈ Avgδ DTime(t) if there exists an errorless heuristic deterministic algorithm A that, for every n, runs on time at most t(n) for all x ∈ supp(D supp(Dn ) and has failure probability at most δ. n
Let Avgδ P =
p:polynomial Avgδ DTime(p).
So far, all algorithms that we considered considered never produced wrong answers. They might return “don’t know”, but if they give an answer, it is the correct answer. Weakening the requirement yields the following definition. algorithm A is called a (fully polynomial-time) heurisDefinition 10.8. An algorithm A
D
tic scheme for Π = (L, ) if there exists a polynomial p such that the following holds: 1. For every every n, every x time p(n, 1/δ) /δ ).
supp(Dn ), and every δ > 0, A(x,n,δ) x,n,δ ) runs in ∈ supp(D
··
2. For every every δ > 0, A( , , δ ) is a heuristic algorithm for Π with error probability at most δ . Let HeurP be the set of all distributional problems that admit a heuristic scheme.
D
Definition Definition 10.9. 10.9. Let Π = (L, ) be a distributional problem, and let δ : (0, (0, 1]. 1]. An algor algorithm ithm A is a heuristic algorithm with error probability N
→
at most δ for Π if, for all n,
≤
Pr A(x, n) = L(x)
∼
x Dn
δ (n).
10.3 10.3.. A dist distri ribu buti tion on for for whic which h wors orst case case equa equals ls averag eragee case case
→
65
∈
For a time bound t and δ : N (0, (0, 1], 1], we have Π Heurδ DTime(t) if there there exists a heuristic heuristic deterministic deterministic algorithm A algorithm A such that, for every n and every x supp(D supp(Dn ), A(x, n) runs in time t(n) with failure probability at most δ (n). Let Heurδ P = p:polynomial Heurδ DTime(p).
∈
Prove the following: For every constan constant t cc, AvgP Exercise 10.9. Prove and HeurP Heurn c P.
⊆
−
⊆ Avgn
−
c
P
So far, we have defined classes of problems that are tractable (to various degrees) degrees) on average. average. Now we define the averageaverage-case case analog of NP, which is called DistNP. Definition 10.10. DistNP is the class of all distributional problems Π = (L, ) with L NP and PComp.
D
∈
D∈
Similar to P versus NP, the central question in average-case complexity is whether DistNP AvgP. Note that AvgP = DistNP does not hold: First, there is no restriction on the probability distributions for the problems in AvgP. Second, AvgP might contain problems that are not even in NP.
⊆
10.3 10.3
A distrib distributi ution on for for which which worst worst case case equals equals aver aver-age case
There exists an ensemble for which worst case and average case are equivalent. Thus, the study of average-case complexity with respect to all (instead of only samplable or computable) distributions reduces to worst-case complexity. To get meaningful results in average-case complexity, we thus have to restrict ourselves to sufficiently simple sets of ensembles like PComp or PSamp.
K such that if L is a decidable language and the distributional problem (L, K) is in Heuro(1)P, then L ∈ P.
Theorem 10.11. There exists an ensemble
For the proof of this theorem, we need Kolmogorov complexity (see also the script of the lecture lecture “Theoretical “Theoretical Computer Science”). Science”). To define Kolmogorov mogorov complexit complexity y, let us fix a universal universal Turing Turing machine machine U . U . The Ko Koll mogorov complexity of a string x 0, 1 is the length of the shortest string c such that U on input c = g, y outputs x. Here, g is a G¨odel odel number of a Turing machine and y is an input for the Turing machine M g . We denote the length c by K (x). The conditional Kolmogorov complexity K (x z ) is the length of the shortest string c = g, y such that M g on input y and z outputs x.
∈{ }
||
|
(0n ) = log n + O(1) and K (0 (0n bin(n bin(n)) = O (1) Example 10.12. We have K (0 and K (x)
≤ |x| + O(1). (1).
|
66
10. Average-case complexity
|
|
||
By abusing notation, we write K (x n) instead of K of K (x bin(n bin(n)) for x = n. This is also called the length-conditioned Kolmogorov complexity .
K
Proof Proof of Theor Theorem em 10.11. The universal universal prob probability ability distribution distribution = (K n )n∈N assigns every string x of length n a probability proportional to −K (x|n) = 2−K (x|n) : We hav have 1 since , is a x∈{0,1}n 2 x K n (x) prefix-free prefix-free code (this follows from Kraft’s inequalit inequality). y). By scaling with an appropriate constant 1, we make sure that the sum equals 1. Since L Heuro(1)P, there exists a heuristic algorithm A for L that fails with probability at most o(1). (1). Consid Consider er a string string x of length n such that A(x) = L(x). Since the overall overall probabilit probability y of such such strings is at most o(1), we have K n (x) = o(1). Thus,
≥
∈
≤
· ·
|
K (x n) =
− log K n(x) + O(1) = ω(1). (1).
This lower bound for the Kolmogorov complexity holds for all strings on which A fails. Now let x0 be the lexicographically first string of length n on which A fails. fails. Since Since L is decidable, x0 can be computed given n. This This impl implie iess K (x0 n) = O(1). To conclude, we observe observe that for sufficient sufficiently ly large n, no string exists on which A fails. Thus, Thus, A fails only on finitely many strings, which proves L P.
|
∈
Exercise 10.9 immediately yields the following result. Corollary 10.13. If (L,
K) ∈ HeurP, then L ∈ P.
Exercise 10.10. An interesting feature of the universal distribution is the following: Let Let A be any algorithm with running-time t : 0, 1 N. Then
the expected running-time with respect to worst-case running-time: E t(x) = Θ
∼
x K n
K is asymptotically equal to the
max t(x)
∈{0,1}n
x
{ } →
,
where the constant hidden by Θ depends only on the algorithm A. Prove this! encoding ding of M is an injective Exercise 10.11. Let M = 1, . . . , m . An enc
{ } mapping c : M → {0, 1, . . . , γ − 1} . The code c is called prefix-free if there are no i = j in M such that c(i) is a prefix of c(j ). Prove Prove the following following:: Assume Assume that we are are given given length lengthss 1 , . . . , m ∈ N. Then there exists a prefix-free code c for M with |c(i)| = i if and only if m − ≤ 1. (This is called Kraft’s inequality.) i=1 γ
i
11
Aver Averag agee-ca case se comp comple lete tene ness ss
The goal of this section is to prove that DistNP = (NP, PComp) contains a complete problem. There are three issues that we have to deal with: First, we need an appropriate notion of reduction. reduction. Second, Second, we have to take care of the different probability distributions, which can differ vastly. Third, and this is the easiest task, we need an appropriate problem with an appropriate appropriate probability distribution.
11.1 11.1
Redu Reduct ctio ions ns
For reductions between distributional problems, the usual many-one reductions do not suffice: A feature that a suitable reduction should have is that if (A, ) reduces to (A ( A , ) and (A (A , ) AvgP, then (A, (A, ) AvgP. Let us consider the following following example: example: We use the identity identity mapping as reduc tion between (A, ( A, ) and (A, (A, ). But assigns high probability to the hard instances, instances, whereas whereas assigns assigns small small probabi probabilit lity y to hard instanc instances. es. Then Then (A, ) is tractable on average, but (A, ( A, ) is not. What we learn is that a meaningful notion of reduction must take into account the probability distributions.
D
D
D
D
D
D
D ∈ D D
D ∈
D) and Π = (L, D) be distributional problems. lems. Then Then Π reduces to Π , denoted by Π ≤AvgP Π, if there is a function
Definition 11.1. Let Π = (L,
f that, for every n and every x in the support of Dn, can be computed in time polynomial in n such that 1. (correctness) (correctness) x
∈ L if and only if f ( f (x, n) ∈ L and
2. (domination) (domination) there exist polynomi polynomials als p and m such that, for every n and every y in the support of Dm(n) ,
Dn (x)
x:f (x,n)= x,n)=y y
≤ p(n) · Dm (n)(y) .
The first condition is usual for many-one reductions. The second condition forbids the scenario sketched sketched above: Drawing Drawing strings according to Dn and then using f ( f ( , n) yields a probability distribution of instances for L . (Of course, we just get binary strings again. But we view them as instances for L .) Then the second property makes sure that no string y is generated generated . with much larger a probability than if y had been drawn according to Dm (n)
·
67
68
11. Average-case completeness
Lemma Lemma 11.2. 11.2. Let C Π C.
∈
∈ { AvgP, HeurP}.
If Π
≤AvgP Π and Π ∈ C, then
Proof. We only consider the case C = AvgP. The other other case is similar. similar. Suppose Π AvgP, and let A be a fully polynomial-time polynomial-time errorless errorless heuristic heuristic scheme for Π . Let f be a reduction from Π to Π , and let p and m be the polynomials of Definition 11.1. We claim claim that that A(x,n,δ) x,n,δ) = A f ( f (x, n), m(n),δ/p( ,δ/p(n) is a fully polynomial-time nomial-time errorless heuristic heuristic scheme for Π. To prove prove this, let Bn = y supp(D supp(Dm(n) ) A (y, m(n),δ/p( ,δ/p(n)) = be the set of strings on which A fails. Since A is a fully polynomial-time errorless heuristic scheme, we have (Bn ) δ/p( Dm δ/p (n). With this, we get (n)
∈
|
⊥}
{ ∈
≤
Pr A(x,n,δ) x,n,δ) =
∼
x Dn
⊥
≤
⊥
= Pr A f ( f (x, n), m(n),δ/p( ,δ/p(n) =
∼
x Dn
=
Dn (x)
∈
x:f (x,n) x,n) Bn
(y) p(n)Dm (n)
∈
y Bn
(Bn) = p(n) Dm (n)
·
≤ δ.
The inequality holds because the reduction must fulfill the domination condition. Thus, Π AvgP.
∈
It is also not hard to show that
≤AvgP is transitive. Lemma 11.3. Let Π = (L, D ), Π = (L , D ), and Π = (L , D ) be distributional problems with Π ≤AvgP Π and Π ≤ Π . Then Π ≤AvgP Π .
Proof. Let f be a reduction reduction from Π to Π , and let g be a reduction from Π to Π . Let p and m the polynomials of Definition 11.1 for f , f , and let p and m be the polynomials for g. Obviously, Obviously, h given by h(x, n) = g(f ( f (x, n), m(n)) is polynomial-time computable and a many-one reduction from L to L . To prove prove that it fulfill fulfillss domina domination tion remains remains to be done. done. Theref Therefore, ore, Let q (n) = p(n) p (m(n)), and let (n) = m (m(n)). The functi functions ons q and are obviously polynomials. Now let n be arbitrary, and let z supp(D supp(D(n) ). Then
·
∈
Dn (x) =
x:h(x,n)= x,n)=zz
Dn (x)
y :g(y,m( y,m(n))= ))=zz x:f ( f (x,n)= x,n)=y y
≤
(y) p(n)Dm (n)
y :g(y,m( y,m(n))= ))=zz
≤ p(n)p(m(n))D ))Dm (m(n)) (z ) = q (n)D(n) (z ) .
11.2. Bounded halting
11.2 11.2
69
Boun Bounde ded d halt haltin ing g
In this section, we present a problem that is complete for DistNP. It is the bounded halting problem, which is the generic NP-complete problem: BH =
g,x, 1t
|
M g is a non-deterministic Turing machine
that accepts x in at most t steps .
We will show that ( BH, BH ) is DistNP-complete, where BH = (U nBH )n∈N is some kind of uniform distribution on the inputs for BH. The main challenge in the proof that ( BH, BH ) is DistNP-complete is that problems Π = (L, ( L, ) from DistNP have very different distributions. Thus, just using the many-one reduction from L to BH is unlikely to work. The key idea is to find an injective mapping C with the property that C (x) is almost uniformly distributed if x is distributed distributed according to . The following lemma makes this more precise and proves that such a function exists.
U
U
U
D
D
Lemma Lemma 11.4. 11.4. Let
D = (Dn)n∈ ∈ PComp be an ensemb ensemble. le. N
Then ther there
exists an algorithm C with the following properties:
1. C (x, n) runs in time polynomial in n for all x
∈ supp(D supp(Dn ),
2. for every n and x, x supp(D supp(Dn ), C (x, n) = C (x , n) implies x = x (so C is somewhat “injective”), and
∈
| ≤ 1 + min |x|, log |D 1(x)| . Proof. Consider any x ∈ supp(D supp(Dn ). If D If Dn (x) ≤ 2−|x| , then let C (x, n) = |
3. C (x, n)
n
0x. If D If Dn (x) > 2−|x| , then let y be the string that precedes x in lexicographic order, and let p = f Dn (y ). Then we set C (x, n) = 1z , where z is the longest common prefix of the binary representation of p and f Dn (x) = p + Dn (x). PComp, the string z can be computed in polynomial time. Thus, Since C can be computed in polynomial polynomial time. (This also also shows that C (x, n) is bounded by a polynomial in x .) It remains to prove that C is injective and fulfills the length condition. C is injective because no three strings can have the same longest common prefix: If z is the longest common prefix of x1 and x2 and z is also a prefix of x3 , then also either z 0 or z 1 is a prefix of x3 and it is also a prefix of either x1 and x2 . Finally, we observe that either C (x, n) = 0x or C (x, n) = 1z for the z descri described bed above. above. In the former former case, case, we have have C (x, n) 1 + x and −| | x Dn (x) 2 . Thus, Thus, also 1 + log(1 log(1/Dn (x)) x + 1 = C (x, n) . In the the −| | z latter case, we have Dn (x) 2 . Thus, C (x, n) = 1 + z log(1/D log(1/Dn (x)) −| | x and, since Dn (x) 2 , C (x, n) 1+ x .
D∈
|
||
≤
≥
|
≤
|≤
≥| | | | ||
|
|≤ | | |≤
|
|
||
70
11. Average-case completeness
Now let us focus on BH . The The instan instance cess of BH are triples g,x, 1t of leng length th 2 log log g + 2 log log x + 2 log t + x + g + t + Θ(1). Θ(1). Note Note that this this representa representation tion is prefix-free. prefix-free. We draw such instances of length at most N as follows: follows: We flip random bits b1 , b2 , . . . until either i = N or b1 . . . bi has the form g, x . In the forme formerr case case,, we output output b1 . . . bN , in the latter we − N i BH output g,x, 1 . We denote this distribution distribution by U N . The probability of t an instance g,x, 1 under this distribution is
U ||
||
U N g,x, 1t where t = N − |g, x|. BH
|| ||
= 2−(2log |g|+2log |x|+|g|+|x|+Θ(1) ,
With this preparation, we can prove the main theorem of this section. The key idea is to use C to “compress” “compress” the inputs: While Dn can be any distribution, the images C (x, n) with x drawn drawn according according to Dn are, more or BH less, uniformly distributed in the sense of : If x is likely, then C (x, n) is short. short. If x is unlikely, then C (x, n) is long. long. Thus, Thus, if we draw random random bits until we have seen an image y = C (x, n), then the probability of seeing y is roughly Dn (x).
U
BH
U ) is DistNP with respect to ≤AvgP. Proof. Let Π = (L, ( L, D) ∈ DistNP be arbitrary, i.e., L ∈ NP and D ∈ PComp. Let M be a nondeterministic Turing machine that accepts an input string y if and only if there exists a string x ∈ L with C (x, n) = y. Since C is polynomial-time computable and L ∈ NP, we can assume that M obeys a polynomial time bound q . Let g be the G¨odel odel number of M . M . Let us describe the reduction from ( L, D) to (BH, U ): On input x and parameter n, the reduction outputs an instance g, C (x, n), 1t(x) of length N = N ( N (n). We choose N to be a sufficiently large polynomial to make sure that t(x) ≥ q (n). Obviously, we have x ∈ L if and only if g, C (x, n), 1t(x) ∈ BH. To verify the domination condition, we exploit that C is injective injective.. Thus, Thus, it suffices to chec check k that, that, for every every n and every every x ∈ supp(D supp(Dn ), we hav have Dn (x) ≤ t ( x ) poly(n poly(n) · U N (g,x, 1 ). Let = |g | be the length of the encoding of M , M , which is fixed. Then x,n)|++|C (x,n) x,n)|+Θ(1)) U N g, C (x, n), 1q(n) = 2−(2log +2log |C (x,n) . Now log |C (x, n)| ≤ log(m log(m(n)) + 1 and |C (x, n)| ≤ log D 1(x) + 1 yields
Theorem 11.5. (BH,
BH
BH
BH
BH U N g, C (x, n), 1q(n)
n
≥ 2−(2log +) · (m(n)1+ 1)2 · Dn(x) · Ω(1) ,
=Θ(1)
which proves that domination is fulfilled.
Let us make a final remark on the function C . This function C is sometimes called a compression function for . It play playss a cruc crucia iall role in the the
D
11.3. Heuristic algorithms vs. heuristic schemes
71
completen comple teness ess proof, as it mak makes es the reduction reductions, s, which which have have to meet meet the domination domination requiremen requirement, t, possible in the first place. Why is C called compression function? Assume that we are given samples drawn according to . If we compress them using C , we have a compression with close to optimal compression rate.
D
11.3 11.3
Heuris Heuristic tic algo algorit rithms hms vs. vs. heuris heuristic tic schem schemes es
In the previous chapter, we distinguished between algorithms and schemes: An algorithm has a fixed failure probability (fixed means a fixed function, not a fixed constant), whereas a scheme works for all failure probabilities δ, but the running-time depends on δ . By Exer Exerci cise se 10. 10.9, 9, if a prob proble lem m Π admi admits ts a heur heuris istic tic schem scheme, e, then then it − c admits heuristic algorithms with error probabilities n for every constant c. The contain containmen mentt in the other directio direction n does not hold. hold. For instan instance, ce, Avg1/n P contains undecidable problems, whereas AvgP does not. 1. Show that there exists an undecidable undecidable problem problem L L with Exercise 11.1. (L, ) Avg1/nP.
U ∈
U
2. Show Show that AvgP does not contain undecidable problems (L, ). But if we restrict ourselves to problems in DistNP, the other containment can be proved: DistNP as a whole admits heuristic schemes if and only if it admits heuristic algorithms. arbitrary ary.. If (BH, BH ) Theorem Theorem 11.6. 11.6. Let c > 0 be arbitr DistNP AvgP. The same holds for Heurn c P and HeurP.
⊆
U
−
∈ Avgn
−
c
P, then
Proof. For simplicity, we will only consider AvgP and c = 1. By th the BH BH completeness of (BH, ), it suffices to show ( BH, ) AvgP. BH Let A be an errorless heuristic algorithm for ( BH, ) with error probability 1/n 1/n.. We will will use use A to construct an errorless heuristic scheme A . The idea is to use padding to map short instances of BH to longer instances. Then we exploit that the error probability of A decreases with growing input length. Let N be the length of an instance I = g,x, 1t of BH. Then we set
U
A (I , N , δ) δ) = A
U ∈ U
t+1/δ +1/δ
g,x, 1
, N +
1 δ
.
(A immediately rejects inputs that are not syntactically correct.) Note that
+1/δ BH U N (I ) = U BH 1 g,x, 1t+1/δ N + δ
BH BH by the definition of U N . On inputs inputs from from U N +1 N +1/δ /δ , algorithm A outputs with a probability of at most 1/ 1 /(N + 1/δ 1 /δ)) < δ . Thus, Thus, A outputs on at BH most a δ fraction of the instances obtained from U N .
⊥
⊥
72
11. Average-case completeness
11.4 11.4
More More DistNP DistNP-co -compl mplete ete probl problems ems
Here, we list some more DistNP-complete problems without proving that they are. For the proofs as well as some other problems, we refer to Wang’s survey of DistNP-complete problems [Wan97]. There are not too many DistNP-complete distributional problems (L, ( L, ), where both L and are natural. natural. The main main issue is that we lack lack a powpowerful tool to prove the a problem is hard-on-average like the PCP theorem for (in)approx (in)approximabil imability ity.. So, in some sense, sense, averageaverage-case case complexity complexity is in a similar state as the complexity of optimization was before the PCP theorem.
D
D
Tiling A tile is a square with a symbol on each of its four sides. Tiles must not be rotated or turned over. If we have a set T of tiles, we assume that we have an infinite number of tiles of any kind in T . T . A tiling of an n n square is 2 an arrangement of n tiles that cover the square such that the symbols of the adjacen adjacentt sides sides of the tiles tiles agree. The size size of a tile tile is the length length of the binary representation of its four symbols.
×
sequence s1 , . . . , sk Instance: A finite set T of tiles, an integer n > 0, a sequence
∈ T
of tiles that match each other (the right side of si matches the left side of si+1 ) The size of the instance is n plus the sizes of the tiles in T plus the sizes of s1 , . . . , sk . Question: Can s1 , . . . , sk be extended to a tiling of the n
only tiles from the set T ? T ?
× n square using
Distribution: Given n, select T using your favorite probability distribution
(this really does not matter much; for the reduction, T just represents the Turing machine deciding a language, and this Turing machine has constant constant size for any fixed language). language). Select Select k uniformly at random from 1, . . . , n . Finally Finally, select select s1 uniformly from T and select si+1 randomly randomly from T such that it matches si .
{
}
Levin’s original DistNP-complete problem was a variant of tiling, where the corners instead of the sides of the tiles had to match.
Post correspondence The Post correspondence problem is one of the better known undecidable problems. problems. In a restricted varian variant, t, it becomes b ecomes NP-complete. -complete. Together ogether with an appropriate probability distribution, it becomes DistNP-complete.
|xi| + |yi|.
Instance: A positive integer n, and a list x1 , y1 , . . . , xm , ym of pairs of
strings. The length N of the instance is n +
m i=1
11.4. More DistNP-complete problems Question: Is there a sequence i1 , . . . , in
xi1 xi2 . . . xin = yi1 yi2 . . . yin ?
73
∈ {1, . . . , m} of indices such that
Pr(m = µ) = Θ(1/µ Θ(1/µ2 ). Then Then dra draw w Distribution: Draw m according to Pr(m x1 , . . . , xm and y1 , . . . , ym according to the uniform distribution on 0, 1 + defined in Section 10.1.
{ }
Arbitrary NP-complete problems
D E
∈
D∈
If some problem (L, ( L, ) with L NP and PComp is hard-on-average, then every NP-complete language A is hard-on-average with respect to some samplable ensemble . The ensemble , however, might look a bit unnatural. In particular, for every NP-complete language A, there exists an ensemble PSamp such that (A, (A, ) is DistNP-hard.
E∈
E
E
12
Aver Averag age e case case vers versus us wors rstt case case
In this section, we will show some connections between average-case and worst-case complexity. We will first provide a condition under which a distributional problem is not DistNP-complete unless EXP = NEXP. Second, we will show that DistNP is not contained in AvgP unless E = NE. (Recall that poly(n) ) whereas EXP = DTime(2poly(n E = DTime(2O(n) ) and NE = NTime(2O(n) ), whereas poly(n) ).) and NEXP = DTime(2poly(n
12.1 12.1
Flatne Flatness ss and and DistNP DistNP-co -compl mplete ete prob problem lemss
So under under which which condit condition ionss is a distrib distributi utiona onall problem problem DistNP-complete? Gurevich gave a partial answer: Π = (L, ( L, DistNP) cannot be DistNP-complete if assigns only very little weight to all strings in supp( Dn ). The intuition is the following: Assume that we have a distributional problem Ψ = ( A, ) that reduces reduces to Π. Assume Assume further that assigns high weight to few strings. Then, in order to satisfy the domination requirement, also must assign somewhat somewhat high weight weight to some strings. strings. The following following definition makes makes the notion of “assigns very little weight to all strings” precise.
D
E
E
D
D = (Dn)n∈ is flat if there exists an ε > 0 such that, for all n and x, Dn (x) ≤ 2−n . Exercise 12.1. Show that G1/2 (introduced in Exercise 10.8) is flat. Exercise 12.2. Show that U (see Section 11.2) is not flat. Definition 12.1. An ensemble
N
ε
BH
Theorem 12.2 (Gurevich [Gur91]). If there is a DistNP-complete problem (L, ) with a flat ensemble , then NEXP = EXP.
D
D
D
D
Proof overview: Assume that Π = (L, ( L, ) is DistNP-complete, is flat, and there exists a reduction from Ψ = ( A, ) to Π, where = (E n )n∈N is non-flat. Let f be a reduction from Ψ to Π. In order to maintain maintain domination, domination, f must map strings x supp(E supp(E n ) to very short strings f ( f (x). Short strings, however, mean (relatively) short running-time.
E
E
∈
Proof. Assume that there exists a DistNP-complete distributional problem Π = (L, (L, ), where is a flat ensemble. ensemble. Obviously Obviously,, L EXP. Now let A NEXP be arbitrary. Let p be a polynomial such that A NTime(2p(n) ). p(n) For a string x 0, 1 with x = n, let x = x012 −n−1 . Let A = x
∈
D
D
∈{ }
∈ ∈
||
74
{ |
12.1. Flatness and DistNP-complete problems
75
x A . Sinc Sincee A NEXP, the language A is in NP. Let the following ensemble:
∈ }
∈
E 2p(n) (z ) =
2−|x| 0
E = (E n)n∈
N
be
if z = x for some string x and otherwise.
Since is com comput putabl able, e, we have have ( A , ) Thus,, ther theree exist existss a DistNP. Thus reduction f from (A (A , ) to (L, (L, ). Let us make a few observations.
E
E
E ∈
D
• Given x of length n, f ( f (x , 2p(n) ) can be computed in time 2 q(n) for some polynomial polynomial q .
• The function x → f ( f (x , 2p(n) ) is a many-one reduction from A to L, i.e., x ∈ A if and only if f ( f (x , 2p(n) ) ∈ L. polynomials m and r such that • There exist polynomials E 2 (z ) ≤ r(2p(n) ) · Dm(2 ) (f ( f (x , 2p(n) )) z:f ( f (x
p(n)
p(n)
,2p(n) )=f )=f ((z,2 z, 2p(n) )
for all n and x. (This (This might might look confusin confusingg at first glance since since the strings x , z and f ( f (x ) are exponentially long, but it is just the domination condition.) Now we have E 2p(n) (x ) = 2−n . Thus, domination implies that Dm(2p(n) ) (f ( f (x , 2p(n) ))
−n
≥ r(22p(n)) = 2−s(n)
(12.1)
D is flat, there exists an ε > 0 such that f (x , 2p(n) )) ≤ 2−(m(2 )) (12.2) ) (f (
for some polynomial s. Since Dm(2p(n)
p(n)
ε
From the two bounds (12.1) and (12.2) on Dm(2p(n) ) , we get s(n)
≥ m(2p(n))ε .
(12.3)
Now we are almost done. Since the images f ( f (x , 2p(n) ) are in supp(D supp( Dm(2p(n) ) ), they are polynomially bounded in m(2p(n) ). By (12.3 (12.3), ), this quan quanti tity ty is bounded from above by a polynomial s(n). Hence, Hence, A EXP: (1) x A if and only if f ( f (x , 2p(n) ) L. (2) (2) We can can comput computee y = f ( f (x , 2p(n) ) in time p(n) 2q(n) . (3) We We can can decide decide wheth whether er y L in time 2p(|y|) 2p(m(2 )) poly(n) , where the second inequality holds because of (12.3). 2p(s(n)) = 2poly(n
∈
∈
∈
∈
≤
≤
Since the uniform distribution = (U n )n∈N is flat, we immediate get the following result as a special case.
U
Corollary 12.3. There is no L unless NEXP = EXP.
∈ NP such that (L, U ) is DistNP-complete
76
12. Average case versus worst case
12.2 12.2
Coll Collap apse se in in expon exponen enti tial al tim time e
Our second result concerning connections between average-case and worstcase complexity shows that it is unlikely that DistNP is a subset of AvgP. If this is the case, then nondeterministic exponential time collapses to deterministic exponential time. To show this, we need the following two lemmas. Lemma Lemma 12.4. 12.4. E = NE if and only of there exists a unary language L NP P.
\
∈
Proof. “= ”: Assume Assume that that NE = E, and let L NE E. Let L = cod(x cod( x ) 1 x L be a unary language. Let us first show that L NP: Given cod(x cod( x ) , x can be computed in polynomial time. We have x = O(log y ). y=1 Since L NE, there exists a 2 O(|y|) time bounded nondeterministic Turing machine that decides L . On inp input ut x, this machine needs time 2 O(log n) = nO(1) . Now let us prove that L / P. Assu Assume me to the contr contrary ary that that L P. cod(x cod( x ) O ( n ) Then, given any string x, we can compute y = 1 in time 2 . By definition, y L if and only if x L . Since Since L P, we can decide in time | | O (1) O ( x ) y =2 if y L. This would imply L E – a contradiction. “ =”: Assume Assume that there exists a unary language L 1 in NP P. Consider the language L = bin(n bin(n) 1n L . We will show that L NE E. On input y, we can compute an x = 1n with bin(n bin(n) = y in time 2O(|y|) . Then we can use the nondeterministic polynomial-time Turing machine that witnesses L NP to decide x L in time nO(1) = 2O(|y|). Thus, L NE. Lastly Lastly,, we have have to show show that L / E. Assu Assume me to the contra contrary ry that that O ( n ) E. Then L Then there is a determ determini inisti sticc 2 time bounded Turing machine n that decides L . No Now, w, on inp input ut x = 1 , we can compute y = bin(n bin(n) in polynomial polynomial time. We have y = O (log n). Thus, Thus, y L can be decided in O (log n ) O (1) time 2 =n . Sinc Sincee y L if and only if x L, this would imply L P – again a contradiction.
⇒ | ∈ } ∈
{
∈
\ ∈ ||
||
∈
∈
||
∈
∈
∈
⇐
{
∈
⊆{ }
| ∈ }
∈
||
∈
\ ∈ \ ∈
∈
∈
∈
∈
∈
∈ ∈
Q = (Qn)n∈ be given by Qn(1n) = 1. Then, Then, for every unary language language L ⊆ {1} , we have L ∈ P if and only if (L, Q) ∈ AvgP. Proof. Clearly, if L ∈ P, then (L, ( L, Q) ∈ AvgP. To see that the convers conversee also holds, consider any algorithm A that witnesses (L, ( L, Q) ∈ AvgP. Let t be the running-time of A. Then we have, for some ε > 0, Ex∼Q (tε (x, n)) = O (n). Sinc Sincee supp supp((Qn ) = {1n }, this is equivalent to t(1n , n) = O(n1/ε ). Lemma 12.5. Let
N
n
Thus, A runs in worst-case polynomial time.
With these two lemmas, the main theorem of this section can be proved easily.
Theorem Theorem 12.6 (Ben-David, Chor, Goldreich, Luby [BDCGL92]) . If E = NE, then DistNP AvgP.
⊆
12.2. Collapse in exp onential time
Q
77
D∈ ∈
Proof. Let be the ensemble ensemble of Lemma 12.5. Obviously Obviously,, PComp. Thus, L NP if and only if (L, (L, ) DistNP. We have E = NE if and only if there exists a unary language L NP P by Lemma Lemma 12.4. This This in turn holds if and only of (L, ( L, ) DistNP AvgP by Lemma 12.5.
∈
D∈
Q ∈
\
\
13
Decis ecisio ion n vers versu us sea search rch
A search search algorithm for an NP relation V is an alg algorit orithm hm that, on input input x, computes a witness w of length poly( x ) such that x, w V . V . Reca Recall ll P that the corresponding language L NP is L = x w : x, w V . By abusing notation, we will call A also a search algorithm for a language L NP. This is ambiguou ambiguouss since L does not uniquely define a corresponding NP relation. Obviously, if we have an efficient search algorithm for a language L NP, then we can use it to get an efficient decision algorithm for L. What about the the opposi opposite te?? If L is NP-complete, then we can use an efficient decision algorithm algorithm for L to efficiently compute witnesses (see the script of the lecture “Computational Complexity Theory”). Thus, if P = NP, then every problem in NP admits efficient efficient search search algorithms. algorithms. So for NP as a whole, decision and search search are equally hard. Neverthel Nevertheless, ess, it is believ b elieved ed that in general, efficient decision does not imply efficient search. For instance, one-way permutations, if they exist, yield problems for which decision is easy but search is hard.
∈
||
{ |∃
∈ ∈ }
∈
∈
permutation on is a bijective bijective function f function f : 0, 1 Exercise 13.1. A one-way permutati
{ } →
{0, 1} such that • |f ( f (x)| = |x| for every x ∈ {0, 1} , • given x, f ( f (x) can be computed in polynomial time, and f (x) = y for a given y cannot be • the problem of finding an x with f (
solved solved in polynomi polynomial al time. (Since (Since f is bijective, we can equivalently − 1 say that f cannot be computed in polynomial time.)
Show that the existence existence of one-way one-way permutati permutations ons implies that there are problems for which search is harder than decision. In this section, we consider the question of decision versus search in the averageaverage-case case setting: setting: Assume Assume that all DistNP problems admit efficient-onaverage decision algorithms, i.e., DistNP AvgP. Do then all proble problems ms in DistNP also have efficient-on-average search algorithms? We will give a partial answer to this question: If all languages in NP with the uniform distribution admit efficient-on-average randomized algorithms, then all languages in NP with the uniform distribution distribution admit efficientefficient-on-a on-aver verage age randomized randomized search algorithms. We have not yet defined what an efficient-on-average randomized algorithm is. (Here, the instances instances are drawn at random and, in addition, addition, also the
⊆
78
13.1. Randomized decision algorithms
79
algorithm itself is allowed to use randomness to solve the instance.) Furthermore, we also do not know yet what an efficient-on-average (randomized) search algorithm is. We will define all this in the next section and postpone the main theorem of this section to Section 13.3.
13.1 13.1
Random Randomize ized d deci decisio sion n algo algorit rithms hms
We first generalize AvgP, Avgδ P and so on to randomized algorithms.
D) be a distrib distributio utional nal proble problem. m. An algoalgorithm A is a randomized errorless heuristic scheme for Π if A runs in time polynomial in n and 1/δ for every δ > 0 and x ∈ supp(D supp(Dn ) and 1 Pr A(x,n,δ) x,n,δ) ∈ / {L(x), ⊥} ≤ (13.1) A 4 Definition Definition 13.1. 13.1. Let Π = (L,
(the probability is taken over A’s coin tosses) and Pr
x Dn
∼
Pr A(x,n,δ) x,n,δ ) = A
⊥ ≥ ≤ 1 4
δ
(13.2)
(the inner probability is again over A’s coin tosses, the outer probability over the random instances). AvgBPP is the class of all distributional problems that admit a randomized errorless heuristic scheme. We stress that “errorless” “errorless” refers to the random input, not to the internal internal coin tosses of the algorithm. Definit Definition ion 13.1 probab probably ly need need some some explan explanatio ation. n. Fix some input input x supp(D supp(Dn ), consider running A(x, n) k times for some large k. If significantly more than k/4 k/4 of these runs return , then we can interpret this as A not knowing the answer for x. This This follows follows from the second second condit condition ion of Definit Definition ion 13.1. On the other hand, if is returned fewer than k/4 k/4 times, then the first condition guarantees that we will see the right answer at least k/2 k/2 times (with high probability probability due to Chernoff Chernoff ’s bound). The choice choice of the constant 1/ 1/4 in Definition Definition 13.1 is arbitrary: arbitrary: Any constant constant strictly strictly smaller than 1/ 1/3 serves well.
∈
⊥ ⊥
Excursus: Excursus: Chernoff Chernoff bounds
Chernoff bounds are frequently used to bound large deviations from the expected value value of random random variables variables that are sums of independent independent indicator indicator variables. variables. The rough statemen statementt is: If we toss n unbiased coins, we see n/2 n/2 O( n) heads with high probability. probability. More precisely: Let X 1, . . . , Xn be independent random variables that assume n only values in 0, 1 . Let Let Pr( Pr(X X i = 1) = pi , let X = i=1 X i , and let E(X E(X ) = n p = µ. Then i=1 i
± √
{ }
−
Pr X > E(X E(X ) + a < exp
2a2 n
80
13. Decision versus search
−
for all a > 0. By symmetry, we have the same bound for Pr(X Pr( X < E(X E(X ) a). There are many variants of Chernoff bounds. Sometimes, they lead to slightly different bounds. For most applications, however, it does not matter which version we use.
Let A be a randomized errorless heuristic scheme. Let A be Exercise 13.2. Let A the an algorithm that executes A k = k(n) times on inputs from supp(D supp(Dn ) and outputs the majority vote. Prove that
Pr A (x,n,δ) x,n,δ) / L(x), A
and Pr
x Dn
∼
Pr A
∈{
A (x,n,δ) x,n,δ) = ⊥
Ω(k(n)) ⊥} ≤ 2−Ω(k
≥
Ω(k(n)) 2−Ω(k
≤
δ.
As in Definition 10.2, it is also possible to define randomized errorless heuristic algorithms or randomized heuristics that are allowed to make errors, rors, but but we will will not do so here. here. We can can also also repl replac acee the the cons consta tant nt 1/ 1 /4 in (13.1) (13.1) by 0. Then Then we obtain obtain zero-e zero-error rror randomiz randomized ed errorless errorless heuristi heuristicc schemes. Now what is the difference difference between between errorless and zero-error? zero-error? Note that we have two types of “errors” or “failure”: We can be unlucky to get a hard instance, instance, and the algorithm, algorithm, since randomized, randomized, may fail. Errorless Errorless means that there is no instance on which the algorithm A errs. errs. It is just allowe allowed d to produce . However, if A is randomized it may still have bad luck with its coin tosses, which may cause it to output a wrong answer. If this is not the case, then A is called zero-error.
⊥
Exercise 13.3. We can also define a non-uniform variant of AvgP: A distributional problem Π = (L, ) is in AvgP/poly if there exists an algorithm A and an advice function a : N (0, (0, 1] 0, 1 with a(n, δ ) poly(n, poly(n, 1/δ) /δ )
D
such that the following holds:
×
→{ }
∈
|
|≤
1. For every n, every δ > 0, and every x supp(D supp(Dn ), A(x,n,δ,a( x,n,δ,a(n, δ)) outputs either L(x) or the failure symbol . 2. For every n, every δ > 0, and every x runs in time p(n, 1/δ) /δ ).
⊥
∈ supp(D supp(Dn ), A(x,n,δ,a( x,n,δ,a(n, δ))
3. For every every n and every δ > 0, we have Prx∼Dn (A(x,n,δ,a( x,n,δ,a(n, δ )) = ) δ.
⊥≤
(One might prefer to define AvgP/poly in terms of circuits rather than Turing machines machines that take advic advice. e. But we want want to have have one circuit circuit for each n each n and δ , and supp(D supp(Dn ) can can contain contain strings of different different lengths. This technical technical problem can be solved, but why bother?) Prove that AvgBPP AvgP/poly.
⊆
13.2. Search algorithms
13.2 13.2
81
Search Search algo algorithms rithms
Now we turn to the definition of search algorithms. In order to avoid confusion, we will first define deterministic search algorithms that are efficient on average, average, although we will never use them. After that, we allow our search search algorithms to use randomness. Definition 13.2. Let Π = (L,
D) be a distributional problem with L ∈ NP.
An algorithm A is a deterministic errorless search scheme for Π if there is a polynomial p such that the following holds: 1. For every n, δ > 0, and every x p(n, 1/δ) /δ ).
∈ supp(D supp(Dn ), A runs in time at most
2. For every n, δ > 0, and every x witness for x L or .
∈ L ∩ supp(D supp(Dn ), A(x,n,δ) x,n,δ) outputs a
∈
⊥
3. For every n and δ > 0, we have Prx∼Dn (A(x,n,δ) x,n,δ) =
⊥) ≤ δ.
∈
For x / L, the algorithm A(x,n,δ) x,n,δ) can output anyth anything ing.. The above above definition definition is not completely precise precise since the witness language is not unique. However, this really makes no difference hear. In the next definition, we allow our search algorithm to use randomness. Definition 13.3. Let Π = (L,
D) be a distributional problem with L ∈ NP.
An algorithm A is a randomized errorless search scheme for Π if there is a polynomial p such that the following is true: 1. For every n and δ > 0, A runs in time p(n, 1/δ) /δ ) and outputs either a string w or .
⊥
2. For every n, δ > 0, and x
∈ L ∩ supp(D supp(Dn ),
Pr A(x,n,δ) x,n,δ ) outputs a witness for x or A(x,n,δ) x,n,δ) = A
⊥
>
1 . 2
3. For every n and δ > 0, Pr
x Dn
∼
Pr A(x,n,δ) x,n,δ) = A
⊥ ≤ >
1 4
δ.
∈
What does this definition definition mean? For any x L, A may output a nonwitness w. According According to item (2), this happens with bounded probability probability.. This is an internal failure of the algorithm A and not due to x being a hard instance. Item (3) bounds the probability that A outputs . Intuitively, Intuitively, A outputs not because of internal failure, but because x is a hard instance.
⊥
⊥
82
13. Decision versus search
This This is called called an external external failure. failure. Ho Howe weve ver, r, there there is at most a δ fraction of strings x (measured with respect to Dn ) on which A outputs with significant probability. probability. The constants 1/ 1/2 and 1/ 1/4 in the definition are to some extent arbitrary. We can replace them by any constants c and c with 1 > c > c > 0. Ω(k) by Furthermore, these two failure probabilities can be decreased to 2 −Ω(k executing the algorithm k times: times: If we ever get a witnes witness, s, we output output this witness. If we see more than c k times, then we output . Otherwise, we output an arbitrary string. Definition 13.3 allows the algorithm A to output anything on input x / L (but even then only only with with bounded bounded probabili probability ty). ). Thus, Thus, a random randomize ized d errorless search scheme can be used as a randomized decision algorithm: If we get a witness, then we know that x L. If we get neither a witness nor , then we claim that x / L. If the answer is , then we do not know. By amplifying probabilities, we can make sure that the probability of outputting x / L though there exists a witness for x is small.
⊥
⊥
⊥
∈
⊥
⊥
∈
∈
⊥
∈
13.3 13.3
Search Search-to -to-de -decis cision ion reduct reduction ion
Recall that = (U n )n∈N is what we call the uniform distribution on 0, 1 . Namely, U n (x) = 2−n for x = n and U n (x) = 0 otherwise. In the following, we will reduce search to decision in the average-case setting. setting. Let us first see why the usual approach from worst-case worst-case complexit complexity y P does does not work work.. Let Let L = x w : x, w V NP with being V the corresponding witness language. Then, given x and y, deciding if there exists a witness w that is lexicographically smaller than y is an NP language as well. Let W = x, y w:w y x, w V ,
U
{ }
||
{ |∃
≤
|∃
∈ }∈
≤ ∧
∈
where w y means “lexicographica “lexicographically lly smaller”. Assuming Assuming that decision decision is easy, namely P = NP, we can use binary search to find a witness w with x, w L. What about the average-case? Let wx be the lexicographically smallest witness for x. Suppose our efficient-on-average algorithm for W works well on all instances x, y except for those y that are close to wx . Then Then our our algorithm is able to find the most significant bits of wx , but it fails to find a few least significant bits of wx . Since Since most strings strings are not close to wx , our algorithm can still be efficient on average. Our goal in the remainder of this section is still to prove that search-todecision reduction is possible in the average-case setting, despite what we sketched above. To do this, let us first consider the scenario where every x L has a unique witness wx . Then Then we we can ask ask NP questions like “is the ith bit of the witness for x a 1?” 1?” Let Let p be the (polynomial) length length of witnesses. witnesses. By
∈
∈
13.3. Search-to-decision reduction
83
∈{
| |}
querying the above for i 1, 2, . . . , p( p( x ) , we can find the witness. witness. In the following, let x = n and p = p(n). Of course, we cannot assume in general that witnesses witnesses are unique. unique. But we know tools to make witnesses unique (recall the Valiant–Vazirani theorem from the lecture “Computational “Computational Complexity Complexity Theory” Theory” [VV86]). We use a family of pairwise independent hash functions h : 0, 1 p 0, 1 p : consists of all functions x Ax + b, where A 0, 1 p×p and b 0, 1 p . By restricting h to the first j bits, we obtain h|j . Also h|j h is a family of pairwise independent hash functions. Now we consider the language
||
H
→
∈H
W =
{ } →{ } H ∈{ } ∈{ } { | ∈ H}
| ∃w : x, w ∈ V ∧ wi = 1 ∧ h|j (w) = 0j . We build the quadruple x,h,i,j such that, for |x| = n, it always has length q = q (n) for some polynomial q. Further urthermore more,, we make make sure sure that that x, h, i and j are independent. independent. This means that we can equivalen equivalently tly draw x{0, 1}n , h ∈ H as well as i, j ∈ {1, . . . , p} uniformly and independently at random and pair them to x,h,i,j . In this way, we get the same distribution. This x,h,i,j
can be done since the lengths of x and h is fixed once n is known. (We can, for instance, assume that p be a power power of 2. Then Then we can write write i and j as binary string of length log p, possibly with leading 0s.) It can be shown (see again the script of the lecture “Computational Complexity Theory”) that if j is the logarithm of the number of witnesses for x, then, with at least constant, positive probability (taken over the choice of h ), there is a unique witness w for x that also satisfies h|j (w) = 0. Now we proceed as follows:
∈H
1. Draw Draw h
∈ H uniformly at random. 2. If, for some j ∈ {1, . . . , p}, the sequenc sequencee of answe answers rs to the queries queries x,h, 1, j , . . . , x,h,p,j yields a witness w for x ∈ L, then we output this this witness witness.. (Note (Note that that x, w ∈ V can be checked in polynomial time.)
3. If an answer answer to to some x,h,i,j is we output an arbitrary string.
⊥, then we also output ⊥. Otherwise,
We call this algorithm B . Apart Apart from from techni technical cal details, details, which which we will prove prove belo b elow, w, this proves proves the following following theorem. The essential essential subtlety subtlety is that h is part of the (random) input for W , where h is part of the internal randomness of B for L. This This means means that h appears in the outer probability in (13.2) of Definition 13.1 and in the inner probability of item (3) of Definition Definition 13.3.
U ⊆
Theorem 13.4 (Ben-David et al. [BDCGL92]) . If (NP, ) AvgBPP, then every problem in (NP, ) has an errorless randomized search algorithm.
U
84
13. Decision versus search
Proof. We have already done the lion’s share of the work. It remains to estimate estimate the failure probabilities. probabilities. Let L NP be arbitrary such that witnesses for L have length p for some polynomial polynomial p, and let A be a randomized errorless heuristic scheme for (W ( W , ) with
∈
U W = x,h,i,j | ∃w : x, w ∈ V ∧ wi = 1 ∧ h|j (w ) = 0j . Let x ∈ L ∩ supp(U supp(U n ) = L ∩ {0, 1}n, and let δ > 0 be arbitr arbitrary ary..
Our Our search search algorithm algorithm B proceeds as described described above. We call A with a failure probability of α to achieve a failure probability of δ for B . Furthermore, we amplify item (2) of Definition 13.1 to
Pr A(y,q,α) y,q,α) / W (y ), A
∈{
for every y = x,h,i,j and item (3) to Pr
y= x,h,i,j
∼U q
Pr A(y,q,α) y,q,α) = A
⊥} ≤ β
(13.3)
⊥ ≥ ≤ γ
α.
We will specify α, β , and γ later on. The algorithm described above, which we will call B, runs obviously in polynomial time. The failure probabilities remain to be analyzed: We have to find constants c and c with 0 < c < c < 1 such that
Pr B (x,n,δ) x,n,δ) yields a witness or B
for each x
∈ L ∩ {0, 1}n and Pr
x U n
∼
Pr B (x,n,δ) x,n,δ) = B
⊥ ≥c
⊥ ≤ > c
δ
(13.4)
(13.5)
for every n and δ > 0. To show (13.4), consider any x L. The probabi probabilit lity y that we draw draw a hash function h with the property that there exists exists a j such that x possesses a unique witness w with h|j (w) = 0 is at least 1/ 1 /8 (script of the lecture “Computational Complexity Theory”, Lemma 17.3). We call such an h good for x. Fix an arbitrar arbitrary y good h. If B If B draws this h, then a sufficient sufficient condition condition for B output a witness or is that A never never outputs a wrong answer. The probability that A outputs a wrong answer (i.e., neither correct nor ) is at most p2 β by a union bound over all i and j . Thus, the probability that
∈
⊥
1. B samples an h that is good for x and 2. A never gives a wrong answer
⊥
13.3. Search-to-decision reduction
85
is at least c = 18 (1 p2 β ). We choose β = 51p2 , which yields c = 1/10. Before specifying the parameters parameters α and γ , let us also analyze (13.5). To do this, let
· −
Z =
∈{ x
n
} | Prh
0, 1
∃
i, j : Pr A( x,h,i,j , q , α) α) = A
⊥ ≥ ≥ γ
be the set of bad strings. strings. Let us analyze the probabilit probability y Pr x∼U n (x a random x is bad. We have Pr
∃
i, j : Pr A( x,h,i,j , q , α) α) =
A
x,h
Thus,
⊥ ≥ ≥ ⊥ ≥ ≥ ·
Pr
y = x,h,i,j
Pr A(y,q,α) y,q,α) = A
≤α by (13.3)
γ
γ
φ
φ
∈ Z ) that
· x∼PrU (x ∈ Z ) . n
φ Prx∼U n (x p2
∈ Z ) .
2
∈ Z ) ≤ αpφ . We want Pr (x ∈ Z ) ≤ δ , x∼U
From this, we learn Pr x∼U n (x
(13.6)
n
αp2 φ
≤ δ. For x ∈/ Z , we have Pr B (x,n,δ) x,n,δ) = ⊥ ≤ φ + (1 − φ)p2 γ . B
thus we put the constraint
Now we choose φ = 1/40 and γ = which satisfies our constraint
αp2 φ
1 . 40p 40p2
This also specifies α to α = δφ/p2 ,
≤ δ This specification of φ and γ yields
Pr B (x,n,δ) x,n,δ) = B
⊥ ≤ 201 .
(13.7)
for x / Z . We set set c = 1/20 < c. Then Then item item (3) of Definit Definition ion 13.3 follows follows from (13.6) and (13.7).
∈
14
Hard Hardne ness ss ampl amplifi ifica cati tion on
Assume that we have a function f such that f is hard-on-average in a weak sense. sense. This means that every algorithm algorithm has a non-negligible non-negligible chance chance of making a mistake when evaluating f on a random input. But there still still might might be algorithms that get a huge portion (for instance, a 1 1/ poly(n poly(n) fraction) of the inputs inputs right. right. The goal of hardne hardness ss amplifica amplificatio tion n is the follo followin wing: g: If there is such a function f , f , then we can get a related problem g from f such that g is hard-on-average in the strongest possible sense: No algorithm can do signifi significan cantly tly better than simply simply tossin tossingg a fair coin. coin. To put it the other way way round: If, for some class of functions, we can compute every every function in this class with a non-trivial error probability (i.e., significantly less than 1/2). Then we can amplify this to get the error probability very small (i.e., 1 1/ poly(n poly(n)). (Note that this does not work by simple simple Chernoff bounds: it is the hardness of the instance that causes the algorithm to fail, not bad luck with its random bits. In fact, our algorithms in this section will always be deterministic.) Yao’s XOR lemma is a powerful tool for hardness amplification. The idea is is simple: If f is slightly hard on average, then g, given by g(x1 , . . . , xk ) = f ( f (x1 ) . . . f ( f (xk ), is very hard on random x1 , . . . , xk . The intuitive reason is as follows: follows: Although Although the probability probability that a specific x is hard, the probability that at least one of x1 , . . . , xk is hard is much much higher. Howeve However, r, intuition says that we need all f ( f (x1 ), . . . , f ( xk ) to compute g(x1 , . . . , xk ) correctly.
−
−
⊕ ⊕
Exercise 14.1. Let X 1 , . . . , Xn
∈ {0, 1}n be independent random variables
with Pr(X Pr(X i = 1) = p. Prove that
n
Pr
X i is even =
i=1
1 + (1
− 2p)n .
2
For simplicity, we restrict ourselves to (non-uniform) circuits in the section. In the next section, section, we will state hardness amplificatio amplification n results for NP languages. Definition Definition 14.1. 14.1. We say that a Boolean function f : 0, 1 n
{ } → {0, 1} is
(s, δ )-hard with respect to a distribution D if, for every circuit C of size at most s, we have Pr f ( f (x) = C (x) > δ .
∼
x D
What does this mean? For every circuit C of size at most s, there exists a set H of size 2δ 2δ 2n such that using C to compute f ( f (x) for x H is about as good as tossing a fair coin.
∈
86
14.1. Impagliazzo’s hard-core set lemma
87
We also also need need the the adv advantag antagee of a circ circui uitt has has in comp comput utin ingg a cert certai ain n function. Let f be a function, C be a circuit, and D be a distribution Definition 14.2. Let f on inputs for f and C . If 1 P rx∼D f ( f (x) = C (x) = (1 + ε) , 2
then we say that C has an advantage of ε with respect to D. (By definition, the advantage ε is a number in the interval [0, [0, 1].) 1].) We will will prove prove Yao’ Yao’ss XO XOR R lemma lemma in Section Section 14.2 14.2.. There There are severa severall differe different nt proofs of this this lemma. lemma. An elegant elegant (and quite quite intui intuitiv tive) e) proof is via Impagliaz Impagliazzo’ zo’ss hard-c hard-core ore set lemma lemma (Secti (Section on 14.1 14.1). ). This This lemma lemma is also also interesting in its own right and a somewhat surprising result. There are at least least two two differe different nt proofs of this this lemma. lemma. An elegan elegantt (and quite quite intui intuitiv tive) e) proof is via von Neumann’s Neumann’s min-max theorem. Impagliazzo Impagliazzo attributes attributes this proof to Nisan.
14.1 14.1
Impagl Impagliaz iazzo’ zo’ss hardhard-co core re set set lemm lemma a
Note the quant quantifie ifiers rs in Definit Definition ion 14.1: For all circuits circuits C , a hard set H exists exists.. Impagli Impagliazz azzo’s o’s hard-core hard-core set lemma lemma states states that we can switc switch h the quantifiers quantifiers:: There exists exists a set H such that for all C computing f on H is hard. We will prove the hard-core hard-core set lemma in two steps: First, we show show n that there is a probability distribution over 0, 1 such that f is hard with respect to this probability probability distribution distribution.. Second, Second, we show how to get a set from this distribution distribution..
{ }
{0, 1}n → {0, 1} be an (s, δ)-har -hard function function with n respect to the uniform distribution on {0, 1} , and let ε > 0. Then Then there there is n a probability distribution D on {0, 1} with the following properties: −ε , 1 − ε -hard with respect to D. 1. f is s · 8·log(εδ log(εδ)) 2 2. D(x) ≤ 1δ · 2−n for all x ∈ {0, 1}n .
Lemma Lemma 14.3. 14.3. Let f :
2
Proof overview: We have to switch quantifiers: We have “for all circuits, there exists a hard set”, and we want “there exists a hard set such that for all circuits”. circuits”. We model this by a zero-sum zero-sum game (see excursus below): below): One player’s strategies are circuits C , the other player’s strategies are sets H . The amount amount that the first player player (who plays plays C ) gets from the second player (who plays H ) is proportional to the number of inputs of H that C gets gets right. right. Then Then we can use von von Neuman Neumann’s n’s min-max min-max theorem theorem to switc switch h quantifiers.
88
14. Hardness amplification
Proof. Consider the following two-player game: Player D picks a set T of −ε2 strings from 0, 1 n . Player C picks a circuit of size s = s 8·log(εδ log(εδ)) . The payoff for C is Prx∼{0,1} (f ( f (x) = C (x)). This This is a zero-su zero-sum m game. game. and we we can apply the min-max theorem: Either the player D has a mixed strategy so that there is no mixed strategy for player C for which C achieves a payoff of at least 12 + ε. Or there is a mixed mixed strategy strategy for player player C with with which which C 1 gets a payoff of at least 2 + ε for any (mixed) strategy of player D. Consid Consider er the first first case. case. This This mea means ns the follow following ing:: There There exists exists a disdis n tribution D on sets of size δ 2 such that every circuit C of size at most s, which corresponds to the pure strategies of player C, achieves only δ 2n
{ }
·
E
≤
Pr f ( f (x) = C (x)
∼
T D x T
∼
1 +ε. 2
This is the same as the probability that f ( f (x) = C (x) if x if x is drawn according to the following distribution D: First, First, draw draw T D . Second Second,, draw x T uniformly uniformly at random. This probabilit probability y distribution distribution D is as stated in the n lemma: lemma: For each x0 0, 1 , we have Pr x∼D (x = x0 ) = PrT ∼D (x0 1 −n 1 1 − n T ) T ) δ 2 ε)-hard with respect to D. The lemma δ 2 . Thus, f is (s, 2 follows for this case since s s. Now consider the second case. There exists a probability distribution on circuits of size s such that, for every subset T 0, 1 n of cardinality δ 2n , we have 1 Pr (C (x) = f ( f (x)) +ε, x∼T,C ∼C 2
∼
·
∈{ } ≤
≤ ·
∈
−
∈ C
⊆{ }
≥
which corresponds to an average advantage of 2ε 2 ε. Let U be the set of inputs x for which the distribution on circuits achieve achievess an advantag advantagee of at most ε in computing f . f . (“Advantage” is generalized to distributions over circuits in the obvious way.)
C
| | ≤ δ(1 − ε)2n.
Claim 14.4. U
Proof of Claim 14.4. Assume to the contrary that U > δ (1 ε)2n . If U δ 2n , then U would give rise to a strategy of player D to keep the payoff to at most 12 (1 + ε), which which contradict contradictss the assumption. assumption. Otherw Otherwise ise,, consid consider er any any set T U of cardinalit cardinality y δ 2n for which which n achieves the smallest advantage. Since U > δ (1 ε)2 , we have T U < εδ2 εδ 2n . Then the advantage of on T would be smaller than
| |
| |≥
⊇
C
1 δ 2n
· |T ∩ U | · ε + |T \ U |
| |
<
−
−
| \ |
C
1 (εδ2 εδ 2n + εδ2 εδ 2n ) = 2ε . n δ2
·
This contradicts the average advantage of at least 2ε 2 ε on this set T (which was the assumption of the second case).
14.1. Impagliazzo’s hard-core set lemma
89
˜ of size s that gets f ( Now we construct a circuit C f (x) right for more than a 1 δ fraction of the inputs, which contradicts the assumption that f is (s, δ )-hard. )-hard. The idea idea is as follows: follows: On U , U , we might have only little chance to get the right right answe answer. r. Thus, Thus, we ignore inputs inputs from U . U . For inputs inputs from n 0, 1 U , U , however, we have a non-trivial chance of computing the right answer if we sample circuits according to the distribution , which is an optimal strategy for player C. Then we amplify probabilities by sampling more than one circuit and taking the majority outcome. More precis precisely ely,, let us draw draw t independen independentt random circuits circuits according according ˜ to the distrib distributi ution on . Our Our new new circ circui uitt C output outputss the majority majority output output of these t circui circuits. ts. Fix any any x / U . U . We can can bound bound the probab probabil ilit ity y that that ˜ ˜ C gets f ( f (x) wrong using Chernoff bounds: C errs only if at most t/2 t/2 of its subcircui subcircuits ts give a wrong wrong output. output. The expected expected number number of correc correctt out1 puts is at least 2 (1 + ε). Thus, Thus, Chernoff Chernoff bounds yields yields an upper b ound ound of − · − · 4 log(εδ log( εδ) ) 2 log(εδ/ log( εδ/2) 2) exp( 2(tε/ 2(tε/2) 2)2 /t) /t) = exp( tε2 /2). We set t = . This This ε2 ε2 ˜ that errs on only a εδ/2 gives a probabilistic construction of C εδ/2 fraction of ˜ explicitly. the inputs not from U . U . (Note that we do not need to construct C ˜ errs with a probability of Its existence existence suffices.) suffices.) Since U δ (1 ε)2n , C at most δ(1 ε) + εδ/2 εδ/2 δ for random x 0, 1 n . ˜ consists of t circuits of size s , its size is at most 2 ts = s. This Since C contradicts the assumption that f is (s, δ )-hard.
−
{ } \
C
C
∈
−
−
−
≤
≥
| |≤
−
∈{ }
Excursus: Excursus: Min-max Min-max theorem theorem
A zero-sum game is a game between two players such that the loss of one player is the gain of the other. other. A zero zero-s -sum um game game can can be modele modeled d by a ma matr trix ix A = m×n (ai,j )1≤i≤m,1≤j ≤n R . The game consists of one player, called the maximizer, choosing an i 1, . . . , m and the other other playe player, r, called called minimi minimizer zer,, choosi choosing ng a j 1, . . . , n . Then the minimizer minimizer has to pay pay ai,j to the maximizer. maximizer. (If ai,j < 0, then the maximizer has to pay ai,j to the minimizer.) The set 1, . . . , m is the set of pure strategies of the maximizer. The set 1, . . . , n is the set of pure strategies of the minimizer. The order in which the players choose matters, as can be seen easily from the simple game with m = n = 2 and ai,j = ( 1)i+j . However, if we allow the players to use randomized strategies (so-called mixed strate strategie gies), s), then the order order of play play does not matter. matter. This This is what the min-ma min-max x theorem by von Neumann [vN28] says. More More precisel precisely: y: A mixed strategy strategy is a probability distribution over the pure n strategies strategies of a player player.. In our case, it is simply a vector p [0, [0, 1]n with j =1 pj = 1 m for the minimizer and a vector q [0, [0, 1]m with i=1 qi = 1 for the maximizer. The outcome of the game is then qT Ap. Ap. Then the min-max theorem says
∈{
∈ ∈{
}
}
−
{
{
}
}
−
∈
min
max
[0, 1]n q [0, [0, 1]m pn [0, m j =1 pj = 1 i=1 qi = 1
∈
∈
qT Ap =
max
∈
min
q [0, [0, 1]m p [0, [0, 1]n m n i=1 qi = 1 j =1 pj = 1
∈
∈
q T Ap .
The number minp maxq qT Ap = maxq minp qT Ap is called the value of the game.
90
14. Hardness amplification
The goal of the next lemma is to get a hard-core set from the hard-core distribution just constructed. Let D : 0, 1 n Lemma 14.5. Let D
{ } → [0, [0, 1] be a probability distribution such that n ∈ {0, 1} . Let f : {0, 1}n → {01 , 1} be a function such D(x) ≤ n εδ )2 that f respect to D for 2n < s < 16n − 16n 2 (εδ) Then there exists a set H ⊆ {0, 1}n such that f is (s, 12 − ε)-hard with 1 n for all x δ2 is (s, 12 2ε ) with
−
respect to the uniform distribution on H .
Proof Proof overview: We use the probabilistic method: We draw a set according to the hard-core distribution. Then we take a union bound over all possible circuits to bound the probability that there exists a circuit that achieves a signifi significan cantt advan advantag tagee on this this (random (random)) set. set. Since Since this this probabi probabilit lity y will will be bounded away from 1, there exists a set such that no circuit achieves a significant advantage on this set. This will be our hard-core set. Proof. The construction of our set H will will again be probabi probabilis listic tic.. Let H be the random set obtained by placing x into H with a probability of δ 2n D(x). The expecte expected d number number of elemen elements ts in H is δ2n . With With non-z non-zer eroo probability, this set H will have the desired property. The number of circuits of size s is upper-bounded by
2(2n 2(2n + s)
2s
1 < exp 4
2ns
≤2
2n ε2 δ 2 8
.
Let C be any circuit of size s. Let
{ ∈ | } ≤ ·
AC (H ) =
x
H f ( f (x) = C (x)
be the number of strings x that C gets right. We have E AC (H )
H
1 ε + 2 2
δ 2n
by the assumption that Prx∼D (C (x) = f ( f (x)) 12 + 2ε for every C . The random variable AC (H ) consists of 2 n independent indicator random variables, one for each string x. This brings Chernoff bounds into play:
≤
Pr AC (H ) H
≥ · ≤ − − 1 3ε + 2 4
δ 2n
Pr AC (H ) H
< exp
= exp
2
εδ E AC (H ) + 2n 4
≥
εδ n 2 42 2n
ε2 δ 2 2n 8
.
14.1. Impagliazzo’s hard-core set lemma
91
Furthermore, urthermore, also by Chernoff Chernoff bounds, H is unlikely to be small: Pr H
| |
n
H < δ 2
· − − ε 4
1
ε2 δ 2 2n 8
< exp
.
Now we take a union bound over all circuits of size s: The probability that, for a random set H , there exists a circuit C with AC (H ) ( 12 + 34ε ) δ 2n or H < δ 2n (1 4ε ) is bounded by
| |
≥
· −
Pr H
∃
≥ · ∨| | · · −
C : AC (H )
1 exp 4
≤ ·
ε2 δ 2 2n 8
1 3ε + 2 4
2 exp
δ 2n
ε2 δ 2 2n 8
H < δ 2n =
1 . 2
·
· − 1
ε 4
We can conclude that a set H with the desired properties exists with the following properties:
• |H | ≥ δ2n · (1 − 4ε ). • There is no circuit C of size s that gets more than ( 12 + 34ε ) · δ2n strings from x right.
If H = δ 2n , then we we are done. done. If H < δ 2n , we add δ 2n H arbitrary elements to H and call the new set again H . No circu circuit it gets gets more than than 1 n n ( 2 + ε) δ 2 strings of H right right.. If H > δ 2 , then we remove H δ 2n elements elements from H and again call the new set H . Sinc Sincee we only only remo remove ve elements, no circuits gets more strings of the new set right that it got for the old set. Thus, this set also meets our requirements.
| |
| | | |
·
−| |
| |−
From the two lemmas above, the main result of this section follows easily. Theorem 14.6 (Impagliazzo [Imp95]). Let f : 0, 1 n
{ } → {0, 1} be a func-
tion that is (s, δ )-har -hard with with respe espect to the unifor uniform m distrib distributio ution. n. Then, Then, for n every ε > 0, there exists a set H 0, 1 (called hard-core set for f ) f ) of − ε2 1 n cardinality at least δ 2 such that f is s 64 log(εδ ε -hard with respect log(εδ)) , 2 to the uniform distribution over H .
⊆{ } ·
−
Proof. By Lem Lemma ma 14. 14.3, 3, there there exists exists a distri distribut bution ion D such that f is − ε2 1 ε s 64·log(εδ 2 -hard with respect to D and D ’s density is bounded by log(εδ)) , 2 1 −n . (Note that we use ε/2 ε/2 instead of ε, which yields the worse constant.) δ2 Now Lemma 14.5 shows the existence of a set H of cardinality δ 2n such that −ε2 , 1 ε -hard with respect to the uniform distribution on H . f is s 64·log(εδ log(εδ)) 2
·
−
·
−
Exercise 14.2. It is, in fact, possible to show an even stronger statement:
Assume Assume that f that f : 0, 1 n 0, 1 is (s, δ)-hard, and let η > 0 be an arbitrary constant onstant.. Then there there exists exists a set H 0, 1 n of cardinality at least (2
{ } →{ }
⊆{ }
−
92
14. Hardness amplification
η )δ2n such that f is (s poly(ε,δ,η poly(ε,δ,η)), 12 ε)-hard with respect to the uniform distribution. Prove this! Hint: Modify Lemma 14.3, then the rest follows.
·
−
This is close to optimal: Assume that there is a circuit C that errs only with a probability of δ, and consider any set H of cardinality significantly larger than 2δ 2δ 2n . Then Then the probabili probability ty that that C errs on a random input is significantly smaller than 1/ 1 /2.
14.2 14.2
Yao’s ao’s XOR lemm lemma a
The XOR lemma is attributed to Yao [Yao82]. The version that we present here is due to Impagliazzo [Imp95]. Let f : 0, 1 n Theorem 14.7 (Yao’s XOR lemma). Let f with respec respectt to the uniform distribution. distribution. 0, 1 be given by
{ }
{ } → {0, 1} be (s, δ)-hard Let k ≥ 1, and let g : {0, 1}kn →
g (x1 , . . . , xk ) = f ( f (x1 )
f (x2 ) ⊕ . . . ⊕ f ( f (xk ) . ⊕ f ( −ε 1 Then, for every ε > 0, the function g is s · 100log(εδ (1 − δ )k -hard 100log(εδ)) , 2 − ε − with respect to the uniform distribution.
2
Proof overview: Let H be a hard-core set of f as in Theorem Theorem 14.6. The probability probability that one specific xi is in H is δ . We ignore ε for the moment. If xi H , then computing g (x1 , . . . , xk ) correctly stay about the same if we replace f ( f (xi ) by a random bit b and compute
∈
f ( f (x1 )
⊕ . . . ⊕ f ( f (xi−1 ) ⊕ b ⊕ f ( f (xi+1 ) ⊕ . . . ⊕ f ( f (xk ) .
A random bit xor-ed with something is still a random bit. Thus, we get the right answer in this case only with probability 1 /2. Our only hope is that none of x1 , . . . , xk is in H . This This happens happens with with a k probability of (1 δ ) . Thus, we compute g correctly only with a probability of 12 + (1 δ )k .
−
−
The problem with this proof idea is that a circuit for computing g does not necessarily proceeds by first computing f ( f (x1 ), . . . , f ( xk ). It is allow allowed ed to do anyth anything ing.. Nevert Neverthel heless ess this this idea idea can be turned turned into a proof of the XOR lemma. Proof. Let H be a hard-core set for f of cardinality at least δ 2n as in Theorem Theorem 14.6. Assume Assume to the contrary contrary that there exists a circuit circuit C of size − ε2 s = s 100log(εδ 100log(εδ)) such that
·
Pr
x1 ,...,xk
C (x1 , . . . , xk ) = g(x1 , . . . , xk ) >
1 + (1 2
− δ )k + ε .
(14.1)
14.2. Yao’s XOR lemma
93
Let D be the uniform distribution over (x ( x1 , . . . , xk ) with xi 0, 1 n and conditioned on at least one xi being in the hard-core set H . This yields
∈{ }
Pr
∼
(x1 ,...,xk ) D
≥ x ,...,x Pr 1
>
k
C (x1 , . . . , xk ) = g(x1 , . . . , xk )
(14.2)
− ∃
C (x1, . . . , xk ) = g(x1 , . . . , xk )
1 +ε. 2
Pr
x1 ,...,xk
i : xi
∈ H
Let us take a different view on the distribution D: Firs First, t, we pick pick a nonnonempty set T 1, . . . , k with an appropriate distribution. distribution. Then, we choose 0, 1 n H for i / T . xi H for i T uniformly at random and xi T . Let the latter distribution be DT . Thus, we can rewrite (14.2) as
⊆{ ∈
∈
E
}
Pr
T (x1 ,...,xk ) DT
∼
∈{ } \
C (x1 , . . . , xk ) = g(x1 , . . . , xk ) >
∈
1 +ε. 2
Fix a set T that maximizes the inner probability. Without loss of generality, we assume that 1 T . T . Then we can further rewrite the probability as
∈
E
Pr C (x1 , . . . , xk ) = g(x1, . . . , xk ) >
x2 ,...,xk x1 H
∼
∼
1 +ε, 2
where, where, by abusing notation, x1 H means that x1 is drawn uniformly at random from H . No Now w let let aj for j > 1 be the assignment for xj that maximizes the above probability. This yields
Pr C (x1 , a2 , . . . , ak )
x1 H
∼
⊕ f ( f (a2 ) ⊕ . . . ⊕ f ( f (ak ) = f ( f (x1 )
>
1 +ε, 2
where we have rearranged terms to isolate f ( f (x1 ). To get get a circ circui uitt for for f from C , we replace x2 , . . . , xk by the constants a2 , . . . , ak . Then we observe that f ( f (a2 ) . . . f ( f (ak ) is a constant. Thus, we possibly have to negate the output of C . This This increase increasess the size size by at most most 1. Thus, Thus, we have have a circui circuitt − ε2 C of size s + 1 s 64log(εδ success probability probability greater than 64log(εδ)) for f that has success 1 f . This would contradict Theorem 14.6. 2 + ε on the hard-core set H for f .
⊕ ⊕ ≤ ·
Using Exercise 14.2, we can even improve the hardness almost to 12 ε (1 2δ )k . Analogously to Exercise 13.3, we can generalize Heurδ P to non-uniform non-uniform circuits circuits of polynomial polynomial size in a straight-f straight-forwa orward rd way way. In this way way, we obtain Heurδ P/poly.
− −
−
class of Boole Boolean an functio functions ns with the followCorollary Corollary 14.8. 14.8. Let C be a class ing prop propert erty: y: If f = (f n )n∈N C, then also g C with g (x1 , . . . , xk ) =
∈
k f (xi ), i=1 f (
∈
| |
where k can be a function of xi . Suppose there exists a family of functions f = (f n )n∈N C such that f / Heur1/p( /p(n) P/poly . Then, for every constant c > 0, there exists a family g of functions such that g C and g / Heur 1 −n c P/poly.
∈
∈
∈
2
∈
−
94
14. Hardness amplification
Exercise 14.3. Prove Corollary 14.8.
We can phrase the XOR lemma and Corollary 14.8 also the other way round: round: Assume Assume that a class class of functions is closed under , and suppose that that every function in can be computed with a success probability of at least 12 + ε for some not too small ε > 0. (Say, ε = 1/ poly(n poly(n).) Then we can reduce the error probability to 1/ 1 / poly(n poly(n). (If (If this this were were not the case, case, 1 then we would be able to amplify the hardness of 1 / poly(n poly(n) to 2 ε – a contrad contradict iction ion.) .) So if we are able to compute compute a functi function on with a non-tr non-trivi ivial al advantage, then we can bring the advantage close to 1. This is closely related to boosting , which which is a concept in computational computational learning learning theory. theory. Klivans Klivans and Servedio [KS03] explain the connections between boosting and hard-core sets.
C
C
⊗
−
15
Ampli mplific ficat atio ion n withi ithin n NP
Our goal in this section is to show a statement of the form “if there is a language in NP that is mildly hard on average, then there is a language in NP that is very hard on average.” Unfortunately, the XOR lemma does not yield such a result: If L NP, then it is unclear if computing L(x) L(y ) is also in NP. For instance, if L is co-NP-complete, then L(x) L(y) can only be computed in NP if NP = co-NP. We circumvent this problem by replacing parity by a monotone function g : 0, 1 k 0, 1 . If L NP, then computing g(L(x1 ), . . . , L( L(xk )) from x1 , . . . , xk is also in NP.
∈
{ } →{ }
⊕
⊕
∈
Prove the above above statement. More More precisely precisely,, prove prove the folExercise 15.1. Prove lowing stronger statement: Assume that NP = co-NP. Prove Prove that the following following two statement statementss are are k equivalent for any function g : 0, 1 0, 1 :
{ } →{ } 1. For all L ∈ NP, also {(x1 , . . . , xk ) | g(L(x1 ), . . . , L( L(xk )) = 1} ∈ NP. 2. g is monotonically monotonically incre increasing. This means means that for any y, z ∈ { 0, 1} with y ≤ z (component-wise), we have g(y ) ≤ g(z ).
15.1 15.1
The The main main idea idea
For the results of this chapter, which mainly due to O’Donnell [O’D04], we need need some preparat preparation ion.. Let f : 0, 1 n 0, 1 , and let g : 0, 1 k 0, 1 . Then g f : ( 0, 1 n )k 0, 1 denote denotess the functi function on given given by (g f )( f )(x x1 , . . . , xk ) = g(f ( f (x1 ), . . . , f ( xk )). Our goal is to analyze the hardness of g f in terms of properties of g and the hardness of f . f . The The proper property ty of g that we need is the bias of g or, more precisely, the expected bias of g subject to a random restriction.
{ } ⊗ ⊗
⊗
{ }
{ } →{ } →{ }
{ } →
Definition 15.1. The bias of a Boolean function h is
∈
bias(h bias(h) = max Pr(h(x) = 0), 0), Pr(h(x) = 1) x
x
The function h is called balanced if bias(h bias(h) = 1/2.
[1/ [1/2, 1] .
In fact, not the bias of g is the parameter that plays a role, but the expected bias of g with respect to a random restriction. 95
96
15. Amplification within NP
{0, 1}m → {0, 1} is a mapping ρ : {1, 2, . . . , m} → {0, 1, }. Then hρ denotes the subfunction of h obtained by substituting each coordinate i with ρ(i) ∈ {0, 1} by ρ(i). For a δ ∈ [0, [0, 1], 1], we denote be P δm the probability space over all restricDefinition Definition 15.2. 15.2. A restriction ρ of a function h :
tions, where a restriction ρ is drawn according to the following rules:
• ρ(1), (1), . . . , ρ( ρ(m) are independent. • Pr(ρ Pr(ρ(i) = ) = δ . • Pr(ρ Pr(ρ(i) = 0) = Pr(ρ Pr(ρ(i) = 1) = 1−2 δ . If ρ ∼ P δm , then ρ is called a random restriction with parameter parameter δ . The expected expected bias of h at δ is
EBiasδ (h) =
E m bias(h bias(hρ ) .
∼
ρ P δ
bits. Comp Compute ute Exercise Exercise 15.2. 15.2. Let paritym be the parity function of m bits. EBiasδ (paritym ). Give estimates for EBiasδ (andm ), where andm is the AND of m bits. The main result result of this this chapt chapter er is the follo followin wingg theore theorem, m, which which is a general generalize ized d versi version on of the XOR lemma. lemma. It states states the hardnes hardnesss of g f in terms of the hardness of f and the expected bias of g. The techni technical cal restriction is that we require that the function f be balanced.
⊗
Let f : 0, 1 n Theorem 15.3. Let f
{ } → {0, 1} be an ( an (s, δ )-hard balanced balanced function, and let g : {0, 1} → {0, 1} be arbitrary. Then, for every η > 0, the function ε g ⊗ f is (s , 1 − EBias(2−η )δ (g) − ε)-hard, where s = Ω s · log(1/δ . log(1/δ))k k
2
We will will not give give a full full proof proof of the result result.. Ra Rath ther er,, we will will give give an intuition why it should be true. In the next sections, we will discuss which functions g are suitable to amplify hardness. Proof Proof overview: Suppose that x1, . . . , xk are draw drawn n at random random.. Our Our task is to compute (g ( g f )( f )(x x1 , . . . , xk ), where f is both balanced and δ hard. We model the hardness of f of f by computing g (y1 , . . . , yk ) with imperfect information information about y1 , . . . , yk . This means that yi = f ( f (xi), but the hardness of f obscures the true values and we see only corrupted values z1 , . . . , zk . Since f is balanced, Pr(y Pr(yi = 1) = Pr(y Pr(yi = 0) = 1/ 1/2, i.e., the values y1 , . . . , yk are drawn uniformly uniformly and independently independently at random. Since f is δ -hard, we have Pr(z Pr(zi = yi ) 1 δ . We abstract away f and just use the δ -hardness -hardness of f . f . We model this by setting drawing zi according to Pr(z Pr(zi = yi ) = 1 δ and Pr(z Pr(zi = yi ) = δ . Now we might take simply output g (z1 , . . . , zk ). Then we would compute the correct value g (y1 , . . . , yk ) with a probability of NStabδ (g ). (The (The noise noise stability, denoted by NStab, is defined below.) More sophisticated, we might
⊗
≤ −
−
15.2. Noise stability and expected bias
97
compute g(z ) for different z close to z = (z1 , . . . , zk ) and output a maximum likelihood answer. However, in the true setting involving f , f , we might not only have zi = yi , but we may also know that zi is correct. Taking this to its extreme, we get the following following scenario: scenario: With a probability probability of 1 2δ , we have zi = yi , and we know for sure that zi = yi . With a probability of 2δ 2 δ , zi is a random bit, and we know that zi is a corrupt corrupted ed bit. (But, (But, of course course,, we do not know know if zi = yi or zi = yi . Question: Why 2δ 2δ , where f is only δ -hard?) So what can we do now? We take the values zi for which we are certain that zi = yi for granted granted,, and we replace replace the corrupted corrupted values alues by . In this way, we obtain a restriction ρ. Then Then we compute compute Pra (gρ (a) = 0) and Pra (gρ (a) = 1). If the first is larger, then we output 0, otherwise, we output 1. The error probability is thus 1 bias(g bias(gρ ). To compute compute the overall probability probability that this strategy succeeds, we must must take into account that ρ is in fact a random restriction, drawn from P k2δ . Thus, the probability of outputting the correct answer is nothing else but EBias(g EBias(g ). The other way round, this looks as if g f be EBias2δ (g)-hard.
−
−
⊗
The The idea idea sketc sketche hed d abov above can can be turn turned ed into into a proof. proof. It proceed proceedss as follow follows: s: First, First, one can show that f does not only possess a hard-core set, but a balanced hard-core set H . This This is not surprisi surprising: ng: If a hard-co hard-core re set H is not balanced, then either always outputting 1 or always outputting 0 gives gives a non-trivial non-trivial advanta advantage. ge. Then we transfer the idea using arguments arguments similar to those of the proof of Theorem 14.7. Exercise Exercise 15.3. 15.3. Use Theorem 15.3 and Exercise 15.2 to derive a weaker
form form of the XOR XOR lemma lemma (The (Theorem orem 14.7), 14.7), which which holds holds only only for balanc alanced functions.
15.2 15.2
Noise Noise stabil stabilit ityy and and expec expected ted bias bias
The expected bias is closely related to another measure for Boolean functions, called noise stability stability. Lemma 15.5 states this connection connection precisely precisely.. We use noise stability since it is sometimes easier to compute, although the expected bias is the “right” parameter for Theorem 15.3. Definition 15.4. The noise stability of a Boolean function h : 0, 1 m
{ } →
{0, 1} is defined as NStabδ (h) =
x
Pr
∼{0,1}m,y∼N δ (x)
f ( f (x) = f ( f (y) ,
where x is drawn uniformly at random and y is obtained from x by flipping each bit of x independently with a probability of δ.
98
15. Amplification within NP The quantity NSensδ (h) = 1
− NStabδ (h) = x∼{0,1}P,yr ∼N (x) f ( f (x) = f ( f (y) m
δ
is called the noise sensitivity of h.
Exercise 15.4. Compute NStabδ (paritym ) and NStabδ (andm ).
Depending on the context, either noise stability or noise sensitivity will be more convenient to use or to analyze. In the following, let x = 2x 1 [0, [0, 1] for any quantity x [1/ [1/2, 1].
− ∈
Lemma 15.5. For any Boolean function h : 0, 1 m
NStabδ (h)
≤ EBias2δ (h)
Proof. We exploit the following fact.
{ } ≤
∈
→ {0, 1}, we have
NStabδ (h) .
Prove the following: For any Boole Boolean an function h and inExercise Exercise 15.5. 15.5. Prove dependen dependently tly and uniformly uniformly drawn drawn x and y , we have h(x) = h(y) with a 1 1 2 probability of 2 + 2 bias(h bias(h) . In other words, 1 1 2 + bias(h bias(h) = NStab1/2 (h) . 2 2 We take a different view on NStab δ (h): First, we draw ρ P 2mδ . We set xi = yi = ρi if ρi = . For ρi = , we draw xi , yi 0, 1 uniformly and indepen independen dently tly at random. random. Then Then x is drawn uniformly at random and yi differs from xi with a probability of δ. Furthermore, x and y are identically distributed. distributed. And given given ρ, they are drawn independent independently ly.. Let x and y be the vectors obtained by removing all positions i with ρi = . Together with Exercise 15.5, we get
∈{ }
∈
NStabδ (h) = =
E m Pr hρ (x ) = hρ (y )
ρ P 2δ x ,y
∼
E
ρ P 2m δ
∼
1 1 + bias(h bias(hρ )2 2 2
Thus, NStabδ (h) =
.
E m bias (hρ )2 .
∼
ρ P 2δ
by linearity of expectation. expectation. Since bias (hρ ) [0, [0, 1], we have bias (hρ )2 bias (hρ ), which yields the first inequality. The second inequality follows from Jensen’s inequality since squaring is a convex function:
∈
NStabδ (h)
≤
=
E m (bias (hρ )2 )
∼
ρ P 2δ
≥ ρ∼EP
m 2δ
bias (hρ )2
= EBias2δ (h) .
=
E m bias (hρ )
∼
ρ P 2δ
15.3. Recursive ma jority-of-three and tribes
99
The following lemma will be very useful to compute the noise stability and noise sensitivity of the functions g that we use to amplify hardness. Let h : 0, 1 m Lemma 15.6. Let h
{ } → {0, 1} be a balanced Boolean function, and
k
let g : 0, 1
{ } → {0, 1} be an arbitrary Boolean function. Then NSensδ (g ⊗ h) = NSensNSens (h) (g ) . δ
Proof. Let us take a closer look at NSensδ (g =
Pr
⊗ h)
∈ { }m
x1 , . . . , xk 0, 1 y1 N δ (x1 ), . . . , yk N δ (xk )
∼ ∼
g h(x1 ), . . . , h( h(xk ) = g h(y1 ), . . . , h( h(yk )
.
Let zi = h(xi ) and zi = h(yi ). Then Pr(z Pr( zi = 1) = Pr(z Pr(zi = 0) = 1/ 1/2 since h is balanced. Furthermore, urthermore, the probabilit probability y that zi = zi is just NSens δ (h) = 1 NStabδ (h). Thus,
−
NSensδ (g
⊗ h) = z ∈ {0,P1}r
k
z
∼ N NSens (h) δ
g (z ) = g (z )
= NSensNSensδ (h) (g) as claimed. claimed.
15.3 15.3
Recurs Recursive ive majo majorit rity-o y-of-t f-thre hree e and tribe tribess
The function g should be nearly balanced subject to a random restriction in order to keep EBiasδ (g) close to 1/ 1/2. Two Two funct function ionss that that turn out to be very very useful useful:: The The first first one is the “recurs “recursiv ivee majority majority of 3” functi function, on, which which we will will define define recurs recursiv ively ely:: Let k 3 M k : 0, 1 0, 1 . Them M 1 (x,y,z) x,y,z) = 1 if and only if at least two of x, y, and z are set to 1. For k 1, we have M k+1 = M 1 M k . This means that
{ } →{ }
≥
⊗
M k+1 (x1 , . . . , x3k , y1 , . . . , y3k , y1 , . . . , y3k )
= M 1 M k (x1 , . . . , x3k ), M k (y1, . . . , y3k ), M k (y1 , . . . , y3k ) . Lemma 15.7. For
(1/δ)), we have NStabδ (M ) ≤ δ −1.1 · (3 )−0.15 . ≥ log1.1(1/δ
Prove Lemma Lemma 15.7. You can also show show a slightl slightlyy weaker weaker Exercise Exercise 15.6. 15.6. Prove
≥
variant: variant: Prove Prove that there there exist constan constants ts a > 1, b 1, c > 0 such that for − − b c loga (1/δ (1/δ)), we have NStabδ (M ) δ (3 ) . Hint: Calculate NSensδ (M 1 ) explicit explicitly. ly. Then Then use Lemma emma 15.6. Make Make a case distinction whether NSensδ (M k ) is large or small for k .
≥
≤
≤
100
15. Amplification within NP
Majority-of-three is helpful to amplify a (1/ (1 / poly(n poly(n))-hard language to 1 − α become somewhat hard, namely to ( 2 n )-hard for some small constant α > 0. 2+η )-hardness, we use our second function, To amplify further to ( 12 n−1/2+η which is called “tribes”. Tribes does particularly well if the function whose hardness we want to amplify is already somewhat hard. Let w N, and let n = n(w ) N be the smallest multiple multiple of w such that − w n/w (1 2 ) 1/2. Then the tribes function T n of n of n variables is defined as
−
−
−
∈
∈
≤
−
n/w 1 w
T n (x1 , . . . , xn ) =
i=0
xiw+ iw+j .
j =1
If we write w as a function of n, we get w = log n log log ln n + o(1). To estimate the expected bias of tribes is technically more challenging than it was for majority-of-three, and we will omit a proof here.
−
Lemma 15.8. For every constant η > 0, there is a constant r > 0 such that 2+η . ≤ 12 + n−1/2+η Boolean function. The influExercise 15.7. Let f : {0, 1}n → {0, 1} be a Boolean
EBias1−r (T n )
ence of the j -th variable is defined as I j =
x
Pr
∼{0,1}n
f ( f (x) = f ( f (x(j ) ) ,
where x(j ) is obtained obtained from from x by flipping the j -th entry of x. The The tota total l n influence of f is I = j =1 I j . Compute (or give as good as possible estimates for) I (and (andm ), I (parity (paritym ), I (majority (majoritym ), I (M k ), I (T k ). ( majority majority m (x1 , . . . , xm ) is 1 if and only if at least m/2 m/2 of the xi are 1.)
Exercise 15.8. Show that
NSensδ (f ) f )
15.4 15.4
≤ δ · I (f ) f ) .
Hard Hardne ness ss with within in NP
Recursive majority-of-three turns out to be very useful to amplify δ -hardness for relatively small values of δ . More precisely, if f is 1/ poly(n poly(n)-hard, then recursive recursive majority-of-three majority-of-three we can amplify amplify its hardness to 12 n−α for some small constant α. Then Then tribes tribes comes into into play, play, which which can, can, if δ is not too 1 − 1 / 2+ε 2+ ε small, bring the hardness to 2 n for arbitrarily small ε > 0. Using Lemma 15.7 and Theorem 15.3, we can show that majority-ofthree can amplify hardness close to 1/ 1 /2.
−
−
15.4. Hardness within within NP
101
infinitely often Lemma 15.9. If there is a family of functions in NP which is infinitely balanced and (poly(n (poly(n), 1/ poly(n poly(n))-ha ))-harrd. (This (This means means that this function function is 1/ poly(n poly(n)-har -hard for circuit circuitss of polynomi olynomial al size.) size.) Then Then there there is a family family of − 0 functions (hm ) in NP that is infinitely often balanced and (1/ (1/2+ 2+m m .07 )-hard for circuits of polynomial size. Proof. Suppose a family f = (f n ) of functions is infinitely often balanced and 1/n 1/nc -hard for polynomial-size polynomial-size circuits. circuits. Choose k = nC for some sufficiently large constant C , and set = log3 (k ) = C log C log3 n for some sufficiently large constant C . +1 . Let hm = M f n . The functi function on hm has input length m = nk = nC +1 The family h = (hm )m∈N is in NP since f is in NP and M is monotone and in P. Moreover, hm is balanced whenever f n is balanced. balanced. − 0 . 07 We have to show that hm is (1/ (1/2 m )-hard whenever f n is hard and balanced. We apply Theorem 15.3 with η = 1, ε = n−C , and δ = n−c . Lemmas 15.5 (for converting noise stability to expected bias) and 15.7 (for the noise stability of M ) yield
⊗
−
EBiasδ (M )
≤
1 1 + 2 2
δ 2
−0.55
075 3−0.075 .
This assumes that is sufficiently large, which can be ensured by choosing C large. Now we observe that 1 2
δ 2
−0.55
075 ≤ (2n 3−0.075 (2nc )0.55
nC 3
−0.075
+ n−C
074C for large enough (but still constant) C . Finally, n−0.074C +1 . enough C since m = nC +1
074C ≤ n−0.074C
≤ m−0.07 for large
Using the tribes function, we can further improve the hardness. family f = (f n )n∈N of functions in Theorem 15.10. Suppose that there is a family f (poly(n), 1/ poly(n poly(n))-har ))-hard. d. (This NP which is infinitely often balanced and (poly(n means that f n is 1/ poly(n poly(n)-hard -hard for circuits circuits of polynomial polynomial size.) Then there 2+ε )is a family of functions in NP which is infinitely infinitely often (poly( often (poly(n n), 12 n−1/2+ε hard for any ε > 0.
−
Exercise 15.9. Prove Theorem 15.10 using the tools Lemma 15.8, Lemma 15.9,
and Theorem 15.3. At the expense of a small loss in the final hardness, we get even get rid of the requirement that the initial function is balanced. Theorem 15.11. Suppose that there is a family of functions in NP which
is infinitely often (poly(n (poly(n), 1/ poly(n poly(n))-hard. ))-hard. Then there is a family of func3+ε )-hard. tions in NP which is infinitely often (poly(n (poly(n), 12 n−1/3+ε
−
102
15. Amplification within NP
We can rephrase Theorem 15.11 in “boosting form”, which sounds more positive. Theorem Theorem 15.11’. 15.11’. Suppose that (L, )
U ∈ Heur
1 2
−n
−
0.33
P/poly for all L
∈
U ∈ Heur1/pP/poly for every polynomial p and every language
Then (L, ) NP. Then ( L NP.
∈
Exercise Exercise 15.10. 15.10. If we allow arbitrary Boolean functions (which are not required to be from, say, NP or PSPACE), we can even find a function that
is exponentially close to 1/2-hard for circuits of exponential size. Prove Prove the following: There There exists a universal universal constant constant γ γ 1/8 such that, for all sufficiently large n N, there exists a function h : 0, 1 n 0, 1 1 − γn γn which is (2 , 2 2 )-hard. This is almost as hard as possible: No function is harder than 1/2-hard even even for very very small small circui circuits. ts. And And just just by hard-w hard-wiri iring ng the corre orrect functio function n value for one input and outputting either 0 or 1 on all other inputs (depending on whether more function values are 0 or 1), we can bring the hardness down to 12 2−n .
−
−
∈
≥ { } →{ }
16
RL
and undirected connectivity
The problem problem
{
|
CONN = (G,s,t) G,s,t) G is a directed graph with a path from s to t
}
is NL-complete. What about its undirected counter part
{
|
}
UCONN = (G,s,t) G,s,t) G is a undirected graph with a path from s to t ?
It is of course in NL, but the NL-hardness proof for CONN does not work for undirected G, since the configuration graph of a nondeterministic Turing machine is a directed graph. In this chapter, we will show that UCONN can be decided in randomized logarithmic logarithmic space. We define RSpace(s(n)) to be the class of all languages that can be decided by an s(n)-space bounded probabilistic Turing machine with one-sided one-sided error. The Turing Turing machine machine has a separate separate input tape and the space used on the random tape is not counted. In the same way, way, we define BPSpace(s(n)), the only difference is that we allow two-sided errors. Definition 16.1.
1. RL = RSpace(O(log n)). )).
2. BPL = BPSpace(O(log n)). )). Both RL and BPL allow probability amplification. Obviously RL
⊆ NL.
For randomized computations with small space, it is important that the randomness is created “on the fly”, that is, that the random tape is oneway. For instance, one can show that BP- L = BPP (the BP-operator applied to the class L) but BPL is not likely to be BPP. Theorem 16.2. UCONN
∈ RL ∈
The algorithm showing that UCONN RL is very simple simple.. We perform perform a random walk starting in s. If we reach the node t, we accept. If we do not reach t after a polynomial number of steps, we reject. ), nodes s, t Input: undirected graph G = (V, E ), 1. Let v := s. 2. For i := 1 to poly(n poly(n) do 103
∈ V . V .
104
16. RL and undirected connectivity (a) Replace Replace v by a random neighbour of v . (b) If v If v = t, then accept.
3. Reject Reject.. The algorithm is obviously logarithmic space bounded, since we only have to store one node and a counter that counts to a polynomially large value. It is clear that if there is no path betw b etween een s and t, then the algorithm is alway alwayss right. right. The hard part part is to show that that if there there is a path, path, then it is also right with constant constant probability probability. Along the proof, we will also give an explicit bound for the poly(n poly( n) term. Let G = (V, E ) be a d-regular graph and A be its adjacency matrix. Recall that A˜ = d1 A is the normali normalized zed adjacenc adjacency y matrix matrix.. It is a doubly doubly t ˜ stochastic stochastic matrix. matrix. If p is a probability distribution on V , V , then A p is the probability probability distribution distribution that we get when drawing a starting starting vertex vertex according to p and then performing a random walk of length t. As a first step, we will show that A˜t p converges to the uniform distribution 1˜ on V . V . We will will need the following relation between 1-norm and 2-norm:
·
x2 ≤ x1 ≤ √n · x2 for all x ∈ Rn. Whenever we write just x, we mean the 2-norm in the following. Let G = (V, E ) be a d-regular -regular connected connected graph with adjacency Lemma 16.3. Let G matrix A and let p be a probability distribution on V . V . Then A˜t p
− ≤ ˜1
2
λ(G) d
t
for all t
∈ N.
˜ Proof. Let λ = λ(G)/d. /d. By the definition of λ, we have Ax λ x if ˜ If x ˜ then Ax 1, ˜ since x belongs to the direct sum of the eigenspaces x 1. If x 1, of the eigenvalues γ = 1. Therefore, by induction we get:
⊥
≤
⊥
⊥ 1. A˜t x ≤ λt x if x⊥1˜ and 2. A˜t ˜1 = 1.
˜ q 1˜ means that q, ˜1 = 0, that is, We decompose p = α1˜ + q with q 1. n probability distribution. distribution. i=1 qi = 0. This means that α = 1, since p is a probability Thus A˜t p = A˜t (1˜ + q ) = 1 + A˜t q. We have p 2 = ˜1 2 + q 2 , sinc sincee 1˜ and q are orthogon orthogonal. al. Theref Therefore, ore, q p . Since p is a probabilit probability y distribution distribution,, p p 1 1. Hence,
⊥
≤
⊥
≤ ≤
A˜tp − ˜1 = A˜tq ≤ λtq ≤ λt.
Next Next we will show show that that many many graphs graphs are “sligh “slight” t” expand expanders ers,, that is, λ(G)/d is bounded away from 1 by 1/ 1/ poly(n poly(n).
105 Lemma 16.4. Let G be a connected d-regular graph with self-loops at each
node. Then
λ(G) d
≤ 1 − 8dn1 3 .
˜ . We have Proof. Let x 1˜ with x = 1 and let y = Ax Ax.
⊥
1 − y 2 = x2 − y 2 = x2 − 2y2 + y2 ˜ + y2 = x2 − 2Ax,y n n n n 2 ˜ = Ai,j xj − 2
i=1 j =1 n n
=
n
A˜i,j xj yi +
i=1 j =1
A˜i,j (xj
i=1 j =1
n
A˜i,j yj2
i=1 j =1
− vi)2.
We now claim that there are indices i and j such that A˜i,j (xj
− yi)2 ≥ 4dn1 3 .
Since the sum above only contains nonnegative terms, this will also be a lower bound for 1 y 2 . We sort the nodes (indices) such that x1 x2 xn . Sinc Sincee ni=1 xi = 0, we have x1 0 xn . Be Beca caus usee x 2 = 1, x1 1/ n or xn 1/ n. Thus
− √
·· · ≥ √ ≥
≥ ≥
≥ ≥
≤−
x1
− xn ≥ √1n .
1 Thus there is an i0 such that xi0 xi0 +1 . Set U = 1, . . . , i0 and n1.5 ¯ U = i0 + 1, . . . , n . Sinc Sincee G is connected, there is and edge j, i with ¯ j U and i U . U . Then
∈
{
∈
−
}
|xj − yi| ≥
≥
| − | xj
xi
{
} { }
−|xi − yi|.
≥xi0 −xi0+1≥1/n1.5 1 1 1 If xi yi , then xj yi and A˜i,j (xj yi )2 4dn 3 , because 2n1.5 2n1.5 1 ˜ Ai,j 1/d since there is an edge j, i . If xi yi , then A˜i,i (xi 2n1.5 1 ˜ yi )2 4dn 1/d since the graph has all self loops. 3 , because Ai,i Thus 1 y 2 1 4dn3 and 1 y 1 . 8dn3
| − |≤ ≥ ≥
| − |≥ { } ≥
≤ − ≤ −
− | − |≥
≥
−
106
16. RL and undirected connectivity
˜ with x = 1 and x 1, ˜ this is also an upper Since this holds for all y = Ax bound for λ(G)/d. /d.
⊥
Assume G = (V, E ) is a connected d-regular graph such that every node has a self loop. Let p be any probability distribution on V . V . By Lemmas 16.3 and 16.4, t 1 1 t t/(8dn dn3 ) ˜ ˜ Ap 1 1 e−t/(8 3 8dn 2n1.5
for t
− ≤ − ≤
≤
≥ 12 12dn dn3 ln n + 8dn 8dn3 ln2. Thus A˜tp − ˜11 ≤ 21n
and (A˜t p)i 1/n 1/(2n (2n) = 1/(2n (2n). Thus Thus the proba probabi bili lity ty that that we hit hit any particular node i is at least 1/ 1 /(2n (2n). If we repeat this for 2 n times, the probability that we hit i is at least 1 1/e 1/2. This proves the correctness of the algorithm in the beginning of our chapt chapter. er. The input input graph G need not to be regular or have self loops at ever every y node. node. But But we can ma mak ke it regu regular lar with self self loo loops ps by attac attachi hing ng an appropriate appropriate number number of self loops to each node. The degree of the resulting resulting graph is at most n. This This does not change change the connecti connectivit vity y properti properties es of the the graph graph.. (In (In fact fact,, we even even do not not hav have to do the prep preproc roces essi sing, ng, sinc sincee if a node is hit in the new graph, it is only hit earlier in the old graph.) Then we apply the analysis above to the connected component that contains s. If t is in this component, too, then we hit it with probability at least 1/2. Note that instead of restarting restarting the random walk, we can perform one longer random walk, since the analysis does not make any assumption on the starting probability except that the mass should be in the component of s.
≥
−
−
≥
17
Expl Explic icit it cons constr truc uctio tions ns of expa expand nder erss
We call a family of (multi)graphs (G (Gn )n∈N a family of d of d-regular λ-expanders if 1. Gn has n nodes 2. Gn is d-regular 3. λ(Gn )
≤λ
for all n. Here, d and λ are constants. The family is called explicit if the function 1n
→ Gn
is polynomial time computable. It is called strongly explicit if (n,v,i) n,v,i)
→ the ith neighbour of v in Gn
is polynomia polynomiall time time computabl computable. e. Here Here the input input and output size is only O(log n), so the algorithm runs in time only poly(log n). In our our case, case, it is also possible to return the whole neighbourhood, since d is constant. Let G be a d-regular -regular graph with adjacency adjacency matrix In this chapter, chapter, it will we very convenient to work with the normalized adjacency matrices A˜ = d1 A. These matrices are also called random walk matrices, since they describe the ˜ (G) is the second transition probabilities of one step of a random walk. λ ˜ (G) = λ(G)/d. largest (absolute value of an) eigenvalue of A˜. Obviously, λ /d. We will now describe three graph transformations. One of them increases the number number of nodes. This will be used to construct construct larger expanders expanders from smalle smallerr ones. ones. The second second one will reduce reduce the degree degree.. This This is used used to keep keep the degree of our family constant. constant. An the last one reduces reduces the second largest eigenvalue. This is needed to keep λ(G) below λ.
17.1 17.1
Matr Matrix ix produc roducts ts
Let G be a d-regular graph with normalized adjacency matrix A˜. The k fold matrix product Gk of G of G is the graph given by the normalized adjacency k ˜ matrix A . This This transfor transformat mation ion is also also called called path product, product, since since there is k an edge between u and v in G if there is path of length k in G between u and v . It is obvious that the number of nodes stays the same and the degree becomes dk . 107
108
17. Explicit constructions of expanders
˜ (Gk ) = λ ˜ (G)k for all k Lemma 17.1. λ
≥ 1.
Proof. Let x be an eigenvector of A˜ associated with the eigenvalue λ ˜ (G). Then ˜ (Gk ) λk . such that λ = λ Then A˜k x = λk x (induction (induction in k ). Thus Thus λ ˜ (G) > λ. It cannot be larger, since otherwise λ λ.
≥
Matrix product G Gk
node nodess n n
degr degree ee d dk
˜ (G) λ λ λk
Given oracle access to the neighbourhoods of G, that is, we may ask queries “Give me a list of all neighbours of v !”, we can compute the neighbourhood of a node v in Gk in time O(d O(dk log n) by doing a breadth first search starting in v . From rom v, we can reach at most dk vertices and the description size of a node is O(log n).
17.2 17.2
Tenso ensorr produc roducts ts
Let G be a d-regular graph with n nodes and normalized adjacency matrix A˜ and let G be a d -regular graph with n nodes and normalized normalized adjacency matrix A˜ . The tensor product G G is the graph given by the normalized adjacency matrix A˜ A˜ . Here A˜ A˜ denotes denotes the Kronecker Kronecker product of the two matrices, which is given by
⊗ ⊗
⊗
A˜
⊗ A˜ =
a1,1 A˜ . . . .. .. . . ˜ ... an,1 n,1 A
a1,n A˜ .. . an,n A˜
,
where A = (ai,j ). The new graph has nn nodes and its degree is dd .
× m-matrix and B be a n × n-matrix with eigenvalues λ1 , . . . , λm and µ1 , . . . , µn . The eigenva eigenvalue luess of A ⊗ B are λi µj , 1 ≤ i ≤ m, 1 ≤ j ≤ n. Lemma Lemma 17.2. 17.2. Let A be a m
Proof. Let x be an eigenvector of A associated with the eigenvalue λ. and y be an eigenvector of B associated with the eigenvalue µ. Let z := x y be the vector x1 y .. . . xn y
⊗
17.3. Replacement pro duct
1 09
⊗ B associated associated with λµ: λµ: a1,1 x1 By + · · · + a1,m xm By
where x = (xi ). z is an eigenvector of A
A
⊗B ·z =
=µ
.. .
am,1 m,1 x1 By +
+ am,m xm By
··· ··· · ··· · (a1,1 x1 +
.. .
(am,1 m,1 x1 +
+ a1,m xm )y + am,m xm )y
x1 y .. .
= λµ
xm y
= λµz.
These are all eigenvalues, since one can show that if x1 , . . . , xm and y1, . . . , yn are bases, then xi yj , 1 i m, 1 j n, form a basis, too.
⊗
≤ ≤
≤ ≤ ˜ (G ⊗ G ) = max{λ ˜ (G), λ ˜ (G )}, since From the lemma, it follows that λ ˜ (G ) and λ ˜ (G) · 1 are eigenvalues of A˜ ⊗ A˜ , but the eigenvalue 1 · 1 is 1·λ ˜ (G ⊗ G ). excluded in the definition of λ Tensor product G G G
⊗ G
node nodess n n nn
degr degree ee d d dd
˜ (G) λ λ λ max λ, λ
{
}
Given oracle access to the neighbourhoods of G and G , we can compute the neighbourhood of a node v in G G in time O(d O(d2 log log max n, n ). (This assume assume that from the names of the nodes v in G and v in G we can compute in linear time a name of the node that corresponds to v v .)
⊗
{
}
⊗
17.3 17.3
Repl Replac acem emen entt product roduct
Let G be a D-regular -regular graph with n nodes and adjacency adjacency matrix A and H be a d-regular graph with D nodes and adjacency matrix B . The replacement product G H is defined as follows:
• For every node v of G, we have one copy H v of H . • For every edge {u, v} of G, there are d parallel edges between node i in H u and node j in H v where v is the ith neighbour of u of u and u is the j th neighbour of v .
110
17. Explicit constructions of expanders
We assume that the nodes of H are the number from 1 to D and that the neighbours of each node of G are ordered. Such Such an ordering ordering can for instance instance be induced by an ordering of the nodes of G. We can think of G H of having an inner and an outer structure. The inner structures are the copies of H and the outer structure is given by G. For every edge of G, we put d parallel parallel edges into G H . This ensures that when we choose a random neighbour of some node v , the probability that we stay in H v is the same as the probability that we go to another H u . In other words, words, with probability probability 1/2, we perform an inner step and with probability 1/2, we perform an outer step. The normalized adjacency matrix of G H is given by 1˜ 1 A + I B, 2 2
⊗
where I is the n n-identity -identity matrix. The nD nD-matrix nD-matrix Aˆ is defined as follows: Think of the rows and columns labeled with pairs ( v, j ), v is a node of G of G and j is a node of H of H . Then there is a 1 in the position ((u, (( u, i), (v, j )) if v if v is the ith neighbour of u of u and u is the j th neighbour of v of v . Aˆ is a permutation permutation matrix. Obviously, G H has nD nodes and it is 2d 2 d-regular.
×
×
Excursus: Excursus: Induced Induced matrix norms
For a norm . on Rn , the induced matrix norm on Rn×n is defined by
Ax = max Ax. A = sup x x=0 =0
x =1
It is a norm that is subadditive and submultiplicative. By definition, it is compatible with the vector norm, that is,
Ax ≤ A · x. It is the “smallest” norm that is compatible with the given vector norm. For the Euclidian norm . 2 on Rn , then induced norm is the so-called spectral norm , the square root of the largest of the absolute values of the eigenvalues of AH A. If A is symmetric, then this is just the largest of the absolute values of the eigenvalues of A. In particular,
λ(G)
≤ A . If A is symmetric and doubly stochastic, then A ≤ 1. 2
2
˜ (G) Lemma Lemma 17.3. 17.3. If λ δ 2 /24 24..
≤ 1 − and λ˜(H ) ≤ 1 − δ, then λ˜(G H ) ≤ 1 −
17.3. Replacement pro duct
1 11
¯ (G H )3 Proof. By Bernoulli’s inequality, inequality, it is sufficient to show that λ ¯ (G H )3 = λ ¯ ((G 1 δ 2 /8. Since λ ((G H )3 ), we analyze analyze the threefold threefold matrix power of G H . Its normalized adjacency matrix is given by
≤
−
1ˆ 1 A + I 2 2
⊗
˜ B
3
.
(17.1)
˜ are doubly stochastic, so their spectral norm is bounded by 1. Aˆ and I B Since the spectral norm is submultiplicative, we can expand (17.1) into
⊗
1 = sum of seven matrices of spectral norm 8 7 1 ˜ )Aˆ(I B ˜) = M + (I B 8 8
≤ 1 + (I ( I ⊗
˜ )Aˆ(I B
⊗ ⊗
⊗
˜) B
=:( )
∗
˜ = (1 with M 1. By Exerci Exercise se 17.1, 17.1, we can write write B C 1. Thus
≤ − δ)C + δJ with ≤ (∗) = (I ⊗ (1 − δ )C + C + I ⊗ δJ ) δJ )Aˆ(I ⊗ (1 − δ )C + C + I ⊗ δJ ) δJ ) = (1 − δ 2 )M + δ 2 (I ⊗ J )Aˆ(I ⊗ J ) with M ≤ 1. A direct calculation shows that (I ⊗ J )Aˆ(I ⊗ J ) = A ⊗ J where the entries of J are all equal to 1/D 1/D2 . Thus Thus,, the the seco second nd larges largestt eigenvalue of λ((I ((I
⊗ J )Aˆ(I ⊗ J )))) = λ(A ⊗ J ) ≤ λ(A˜).
Hence,
1ˆ 1 A + I 2 2
⊗
˜ B
3
= (1
−
δ2 δ2 )M + (A 8 8
⊗ J )
with M
≤ 1 and λ
because λ(M )
1ˆ 1 A + I 2 2
≤ M .
⊗
˜ B
≤ 3
1
−
=1
−
δ2 δ 2 + (1 8 8 δ2 , 8
− )
˜ )Aˆ(I B ˜) The only term in the analysis that we used was the ( I B term. This corresponds to doing an “inner” step in H , then an “outer step” in G and again an “inner” step in H . The The so-ca so-call lled ed zig-zag product is a product similar to the replacement product that only allows such steps.
⊗
⊗
112
17. Explicit constructions of expanders
Exercise Exercise 17.1. 17.1. Let A be the normalized adjacency matrix of a d-regular
λ-expander. Let J = Then
1 n
... .. . ...
.. .
1 n
A = (1
1 n
.. .
1 n
.
− λ)J + λC
≤ 1.
for some matrix C with C
Replacement product G H GH
nodes nodes n D nD
degr degree ee D d 2d
˜ (G) λ 1 1 δ 1 δ 2 /24
−
− −
Given oracle access to the neighbourhoods of D and H , we can compute the neighbourhood of a node v in G G in time O((D O((D + d)log n). (Thi (Thiss assume that the oracle gives us the neighbourhoods in the same order than the one used when building the replacement product.)
⊗
17.4 17.4
Explic Explicit it constr construct uction ion
We first construct a family of expanders (G ( Gm ) such that Gm has cm nodes. In a second step (Exercise!), we will show that we can get expanders from Gm of all sizes between cm−1 + 1 and cm . The consta constant ntss occurring occurring in the proof are fairly arbitrary, they are just chosen in such a way that the proof works. We have taken them from the book by Arora and Barak. For the start, we need the following constant size expanders. Since they have constant size, we do not need a constructive proof, since we can simply enumerate all graphs of the particular size and check whether they have the mentioned properties. Exercise 17.2. For large enough d, there are
1. a d-regular 0.01 01-expander -expander with (2d (2d)100 nodes. 2. a 2d-regular (1
− 501 )-expander with (2d (2d)200 nodes
We now construct the graphs Gk inductively: 1. Let H be a d-regular 0. 0.01-expander with (2d (2d)100 nodes. 1 2. Let G1 be a 2d 2 d-regular (1 50 )-expander with (2d (2d)100 nodes and G2 1 be a 2d 2d-regular (1 50 )-expander with (2d (2 d)200 nodes.
−
−
17.4. Explicit construction 3. For k
≥ 3, let
1 13
Gk := ((G ((G k
−
1
2
−
Theorem 17.4. Every Gk is a 2d-regular (1
nodes. Furthermore, the mapping (bin k, bin i, bin j )
50
⊗ G k 2 1 )
) H
100k − 501 )-expander with (2d (2d)100k
→ jth neighbour of node i in Gk
is computable in time polynomial in k . (N (Not otee that that k is logarithmic in the size of Gk !) Proof. The proof of the first part is by induction in k. Let nk denote the number of nodes of Gk . Induction base: Clear from construction construction.. Induction Induction step: The number of nodes of Gk is n k
1
−
2
100
(2d) · n k 2 1 · (2d −
The degree of G k
1
−
2
100(k−1) 100k = (2d (2d)100(k (2d (2d)100 (2d (2d)100k .
and G k
·
−
1
2
·
is 2d by the induction induction hypothesis. hypothesis.
The degree of their tensor product is (2 d)2 and of the 50th matrix power is (2d (2d)100 . Then we take take the replacement replacement product with H and get the graph Gk of degree 2d 2d. 1 Finally, the second largest eigenvalue of G k 1 G k 1 is 1 50 . 2 2 Thus, 1 50 1 1 ˜ ((G λ ((G k 1 G k 1 )50 ) (1 ) 2 2 50 e 2 1 ˜ (Gk ) 1 1 0.992 /24 1 Thus λ 2 50 . For the seco second nd part part note note that that the defin definit itio ion n of Gk gives gives a recursive recursive schem schemee to compute compute the neighbourh neighbourhood ood of a node. node. The recursi recursion on depth is log k. We have shown how to compute the neighbourhoods of G50 , G G , and G H from the neighbourhoods of the given graphs. The total size of the neighbourhood of a node in Gk is Dlog k = poly(k poly(k) for some constant D. −
−
≤ − ·
⊗
−
≤ −
⊗
−
≤ −
≤ ≤
≤ −
⊗
18
UCONN
∈
L
We modify the transition relation of k -tape nondeterministic Turing machines as follows: A transition is a tuple (p, ( p, p , t1, . . . , tk ) where p and p are states and tκ are triples of the form (αβ,d,α ( αβ,d,α β ). The interpretation is the following: if d = 1, the head of M stands on α, and β is the symbol to the right of the head, then M may go to the right and replace the two symbols by α and β . If d = 1, then the head has to be on β and M goes to the left. In both cases, the machine machine changes changes it state from p to p . An “ordinary” Turing machine can simulate such a Turing machine by always first looking at the symbols to the left and right of the current head position and storing them in its finite control. By definin definingg a transit transition ion like like above, above, every every transi transitio tion n T has a reverse − 1 transition T that undoes what T did. M is now called symmetric if for every T in the transition relation ∆, T −1 ∆.
−
∈
Definition 18.1.
SL = L there is a logarithmic space bounded symmetric
{ |
Turing machine M such that L = L(M ) M )
}
transition relation of a deterL is a subset of SL. We simply make the transition ministic Turing machine M symmetric by adding T −1 to it for every T in it. Note that the weakly connected components of the configuration graph of M are directed trees that converge into a unique accepting or rejecting configuration. figuration. We cannot reach any other accepting or rejecting rejecting configuration by making edges in the configuration graph bidirectional, so the accepted language is the same. In the same way, we can see that UCONN SL: Just always guess a neighbour of the current node until we reach the target t. The guessing step can be made reversible and the deterministic steps between the guessing steps can be made reversible, too. UCONN is also hard for SL under deterministic logarithm logarithmic ic space reduction reductions. s. The NL-hardness proof CONN works, we use the fact that the configuration graph of a symmetric Turing machine is undirected. Finally, if A SL and B log A, then B SL. Less obvious are the facts that
∈
∈
≤
• planarity testing is in SL, • bipartiteness testing is in SL, 114
∈
18.1. Connectivity in expanders
1 15
• a lot of other interesting problems are contained in SL, see the compendium by [AG00].
• SL is closed closed under complementation complementation [NTS95]. In this chapter, we will show that UCONN ∈ L.
This This immediat immediately ely also also yields space efficient algorithms for planarity or bipartiteness testing.
18.1 18.1
Connec Connectiv tivit ityy in expand expanders ers
Let c < 1 and d Lemma 18.2. Let c
∈ N. The following promise problem can be
decided by a logarithmic space bounded deterministic Turing machine:
a d-regular graph, such that every connected component is a λ-expander with λ/d c, nodes s and t. Output: accept if there is a path between s and t, otherwise reject. Input:
≤
Proof. The Turing uring mac machin hinee enume enumerate ratess all paths paths of length length O(log O(log n) starting in s. If it see seess the the node node t, it accept accepts; s; after after enume enumerati rating ng all the paths without seeing seeing t, it rejects. Since G has constant degree, we can enumerate all paths in space O(log n). Every path is described by a sequence 1, . . . , d O(log n) . Suc Such a sequ sequen ence ce δ0 , δ1 , . . . is interpreted as “Take the δ0 th neighbour of s, then the δ1 th neig neighb hbou ourr of this this node, node, . . . ”. If the machine accepts, then there certainly is a path between s and t. For the other direction note that, by Lemma 16.3, a random walk on G that starts in s converges to the uniform distribution on the connected component containing s. After After O(log O(log n) steps, every node in the same connected component of s has a positiv positivee probabi probabilit lity y of being being reach reached. ed. In particula particularr there is some path of length O(log n) to it.
{
18.2 18.2
}
Conver Convertin ting g graph graphss into into expand expanders ers
Lemma 18.3. There is a logarithmic space computable transformation that
transforms any graph G = (V, E ) into a cubic regular graph G = (V , E ) such that V V and for any pair of nodes s, t V , V , there is a path between s and t in G iff there is one in G .
⊆
∈
Proof. If a node v in G 1. has degr degree ee d > 3, then we replace v by a cycle of length d and connect every node of the cycle to one of the neighbours of v . 2. has degre degreee d
≤ 3, then we add 3 − d self loops.
116
18. UCONN
∈L
For every node v with degree > 3, we identify one of the new nodes of the cycle with v . Let the the resulti resulting ng graph graph be G . By construc construction tion,, G is cubic and if there is a path between s and t in G then there is one between in G and vice versa. With a little care, the construction can be done in logarithmic space. (Recall that the Turing machine has a separate output tape that is writeonly and oneway, so once it decided to output an edge this decision is not reversible reversible.) .) We process each node in the order given by the representa representation tion of G. For each node v, we count the number m of its neighbours. If m 3, then we just copy the edges containing v to the output tape and output the additional self loops. If m > 3, then we output the edges (v, i), (v, i + 1) , 1 i < m and (v, m), (v, 1) . Then we go through all neighbours of v . If u is the ith neighbour of v , then we determine which neighbour v of u is, say the j th, and output the edge (v, i), (u, j ) . (We only need to do this if v is processed before u because otherwise, the edge is already output.)
≤
≤
{
{
}
{
}
}
Let d be large enough such that there is a d/2-regular d/2-regular 0. 0.01-expander H 50 with d nodes. nodes. (Again, (Again, the constan constants ts are chose chosen n in such a way way that that the proof works; works; they they are fairly arbitrary arbitrary and we have have taken taken them them from the book by Arora and Barak.) We can make our cubic graph G d50-regular by adding d50 3 self loops per node. Recursively define
−
G0 := G Gk := (Gk−1 H )50 .
≥ 1,
Lemma 18.4. For all k
50k n nodes, 1. Gk has d50k
·
2. Gk is d50 -regular, ˜ (Gk ) 3. λ
k 50 3
≤ 1 − k , where k = min{ 201 , 8d1.5n }.
Proof. The proof is by induction in k . Let nk be the number of nodes of Gk . ˜ (G0 ) Induction base: G0 has n nodes and degree d50 . By Lemma Lemma 16.4, 16.4, λ 1 8d501 n3 1 0 . Induction step: The replacement product Gk H has nk d50 = nk+1 nodes. Its degree is d. Gk+1 has the same number of nodes and the degree becomes d50 . We have ˜ (Gk H ) 1 k 0.992 1 k λ 24 25
−
≤
≤ −
·
≤ − ·
≤ −
and ˜ (Gk ) λ
≤ − 1
k 25
50
≤ e−2 ≤ 1 − 2k + 2 2k2 = 1 − 2k (1 − k ). k
18.2. Converting graphs into expanders If k =
1 20 ,
117 k 50 3
˜ (Gk ) then λ
≤ 1 − 201 . If k = 8d1.5n < 201 , then ˜ (Gk ) ≤ 1 − 1.5k = 1 − k+1 . λ
If we set k = O(log n), then then Gk is a constant degree expander with 19 ¯ λ(Gk ) connectivity can be decided decided in deterministic deterministic 20 . For such graphs, connectivity logarithmic logarithmic space by Lemma Lemma 18.2. So we could first make our input graph cubic, then compute Gk for k = O(log n) and finally use the connectivity algorit algo rithm hm for expand expander er graphs. graphs. Since Since L is closed under logarithmic space computable reductions, this would show UCONN SL. But there one problem: problem: To compute Gk , we cannot compute G0, then G1 , then G2, and so on, since L is only closed under application of a constant number number of many-one-r many-one-reduct eductions. ions. Thus Thus we have have to compute Gk from G0 in one step.
≤
∈
Lemma Lemma 18.5. 18.5. The mapping G0
logarithmic space computable.
→ Gk with k = O(log n) is deterministic
Proof. Assume that G0 has nodes 1, . . . , n . Then the nodes of Gk are from 1, . . . , n 1, . . . , d50 k . The descri descripti ption on length length of a node of Gk is log n +50log d k = O(log n). We will identify 1, . . . , d50 with 1, . . . , d 50 , since an edge in Gk corresponds to a path of length 50 in Gk−1 H . Now given a node v = (i, δ1 , . . . , δk ) of Gk and j 1, . . . , d50 , we want to compute the j th neighbour of v in Gk . We inter interpre prett j as a sequence 50 (j1 , . . . , j50 ) 1, . . . , d .
{
}×{ ·
}
{
} {
}
∈{
∈{
{
}
}
}
Input: node v = (i, δ1 , . . . , δk ) of Gk , index j = (j1 , . . . , j50 ) Output: the j th neighbour of v in Gk
1. For h = 1, . . . , 50 compute the jh neighbour of the current node in Gk−1 H . So it remains to compute the neighbours in Gk−1 H . Input: node v = (i, δ1 , . . . , δk ) of Gk , index j Output: the j th neighbour of v in Gk−1 H
1. If j If j d/2, ( i, δ1 , . . . , δk−1 , δ ) where δ is the j neighbour d/2, then return (i, of δk in H . Since H is constant, this can be hard-wired. (We perform an internal step inside a copy of H .) .)
≤
2. Otherwise, Otherwise, recursivel recursively y compute the δk th neighbour of (i, (i, δ1 , . . . , δk−1 ) in Gk−1 . (We perform an external step between two copies of H .) .) Note that we can view (v, ( v, δ1 , . . . , δk ) as a stack and all the recursive calls operate on the same step. Thus we only have to store one node at a time.
118 Theorem 18.6 (Reingold [Rei08]). UCONN Corollary 18.7. L = SL.
18. UCONN
∈ SL.
∈L
19
Extractors
Extractors are a useful tool for randomness efficient error probability amplifica plificatio tion. n. To define define extractors extractors,, we first have have to be able able to measure measure the closeness of probability distributions. Definition Definition 19.1. 19.1. Let X and Y two random variables with range S . The
statistical statistical difference difference of X and Y is Diff(X, Diff(X, Y ) Y ) = maxT ⊆S Pr[X Pr[X Pr[Y Pr[Y T ] T ] . X and Y are called -close if Diff(X, Diff(X, Y ) Y ) .
∈ |
≤
|
T ] − ∈ T ]
In the same way, we can define the statistical difference of two probability distributions. We can think of T as a statistical test which tries to distinguish the distributions of X and Y . Y . The L1 -distance of X and Y is defined as
|X − Y |1 =
|
Pr[X Pr[X = s]
s S
∈
− Pr[Y Pr[Y = s]|
L1 -distance and statistical difference are related as stated below. Prove the following: Two random random variables X and Y are Exercise 19.1. Prove
| − Y |1 ≤ 2.
-close if and only if X
Statistical Statistical closeness is preserved preserved under application application of functions. functions. Exercise 19.2. Prove the following statements:
1. Let X et X and Y be random variables with range S that are -close. Let f be a function with domain S . Then f ( f (X ) and f ( f (Y ) Y ) are -close. 2. If Z is a random variable independent of X and Y , Y , then the random variables (X, Z ) and (Y, Z ) are -close. A classical measure for the amount of randomness contained in a random source X is the Shannon entropy H (X ) = Pr[X = s]logPr[X ]logPr[X = s]. s∈S Pr[X This is however however not a suitable measure in our context. context. Consider Consider for instance instance the followin followingg source source:: With With probab probabili ility ty 0. 0 .99 it returns the all-zero string. With probability probability 0.01 is returns a string in 0, 1 N chosen uniformly at random. random. The Shannon Shannon entrop entropy y of this this source is 0.01 01N N which is quite large, in particular unbounded. If we want to use this source for simulating randomized algorithms, we will take one sample from this source. But with probability 0.99, we see a string that contains no randomness at all which is not very useful useful for derandomi derandomizati zation. on. The Shannon Shannon entro entropy py measures measures “randomness “randomness on the average” average” and particularly particularly does not talk about variance. variance. It is useful when one draws many samples from a source. For our purposes, the following definition is more useful.
−
{ } ≥
119
120
19. Extractors
Definition 19.2. Let X be a random variable with range S .
1. The min-entropy of X is mins∈S
− log log Pr[ Pr[X = s].
2. If X If X has min-entropy at least k, then X then X will be called a k-source. If in addition its range is contained in 0, 1 N , then X is an (N, k )-source.
{ }
Note Note that that the min-en min-entrop tropy y of the source source above above is only log 1 /0.99 which is constant. constant. In some sense, the min-entropy min-entropy measures measures “randomness “randomness in the worst-case”. Let U d be the uniform distribution on 0, 1 d. A function Definition 19.3. Let U N
d
{ }
m
{ } ×{ } → { }
Ext : 0, 1 0, 1 0, 1 is called a (k, )-extractor if for any (N, k )-source X , Ext(X, Ext(X, U d ) is -close to uniform. Above, we call a source -close to uniform, if it and U m are -close. Our aim is to construct extractors with small d and large m. An extractor extracts the randomness of the weak source in the sense that given a sample of the weak random source and a short truly random string, it produces a string that is nearly uniformly distributed. Sometimes Sometimes it is convenien convenientt to view an extractor extractor Ext as a bipartite bipartite multiN m graph. The nodes are 0, 1 on the one and 0, 1 on the other other side. Each N d node v 0, 1 has degree 2 . It is inciden incidentt with the edges edges (v, ( v, Ext(v, Ext(v, i)) d for all i 0, 1 . A famil family y of extra extract ctors ors Ext Extm : 0, 1 N (m) 0, 1 d(m) 0, 1 m is called explicit , if the mapping (m,v,e (m,v,e)) Extm (v, e) is computable in time poly(N poly(N ((m), d(m), m). (Usuall (Usually y, N m for an extracto extractor. r. Theref Therefore, ore, we parameterize the family by the size of the image.)
∈{ } ∈{ }
19.1 19.1
{ }
{ }
{ } → ≥
×{ }
→{ }
Extra Extracto ctors rs from from expan expander derss
Let > 0. Let k Let k (n) Lemma 19.4. Let
all n. There is an explicit family ≤ n for all n t of (k, )-extractors Extn : {0, 1} × {0, 1} → { 0, 1}n with t = O(n O(n − k − n
log1/ log1 /)).
Proof. Let X be an (n, (n, k )-source, and let v be a sample drawn from X . Let Let G = (V, E ) be a d-regular 12 -expander with 2 n nodes. nodes. (We (We do not constr construct uct this graph, graph, since since it is too large. large. We just just perform perform a random random walk on it. This is possible, possible, since strongly strongly explicit explicit expanders expanders exist.) exist.) Let z be a truly random string of length 1 · n2 − k2 + log 1 + 2) = O(n O(n − k + log ). We interpret z as a random walk in G of length = n2 − k2 + log 1 + 1 and t = log d (
set
Ext(v, Ext(v, z ) = label of the node reached from v by a walk as given by z
19.2. Randomness efficient probability amplification
121
Let p be the probability distribution on V induced by X and A be the adjacency matrix of G. Let p = 1˜ + p with ˜1 p . We have
⊥ ˜ ≤ A˜ p − 1 ≤ 2− p − 1˜. A˜p − 1 ≤ A˜(p − 1) Since X is an (n, (n, k)-source, we have Pr[X Pr[ X = s] ≤ 2−k for every s in the range of X . Thus p2 ≤ 2−k . Therefore, k/2 n/2 k/2+1 p − ˜1 ≤ p + ˜1 ≤ 2−k/2 ≤ 2−k/2+1 + 2−n/2 and
n/2+k/ k/2 2−log1/ log1/−2 k/2+1 n/2−1 A˜ − ˜1 ≤ 2−n/2+ · 2−k/2+1 ≤ · 2−n/2 .
Finally, Diff(A˜ p, U n ) = 2 A˜ p
n/2 − ˜11 ≤ 2A˜p − ˜12 · 2n/2 ≤ .
The extractor constructed above is only efficient if the k is large, at least (1 )n. For small k , better constructions are known.
−
19.2
Randomness Randomness efficient efficient probabil probabilit ityy amplificat amplification ion
8)-extractors Extr : Lemma 19.5. If there is an explicit family of (k (r), 1/8)-extractors N ( r ) d ( r ) r 0, 1 0, 1 0, 1 , then for any BPP-Turing machine M that
{ }
×{ } → { }
runs in time t, uses r random bits, and has error probability 1/3, there is a BPP-machine M with L(M ) M ) = L(M ) that runs in time poly(N poly(N ((r ), 2d(r) , t), uses N ( N (r ) random bits, and has error probability bounded by 2k(r)−N (r) . Proof. M uses its N ( N (r) random bits and interprets it as a string x N ( r ) 0, 1 . Let yi = Ext(x, Ext(x, i) for all i 0, 1 d(r) . M now simulates 2 d(r) runs of M , M , each one with a different string yi as random string. M accepts if the majority of these runs lead to an accepting configuration and rejects otherwise. The The bound bound on the the runn runnin ingg time time is clear clear from from the the cons constr truc ucti tion on.. We have have to estimate estimate the error probabilit probability y. Assume Assume that a given given input u is in L(M ), M ), i.e., M accepts u with probability at least 2/ 2 /3. The The case case u / L(m) is symmet symmetric ric.. To show show the bound on the error probabili probability ty,, it is sufficie sufficient nt k ( r ) to show that less than 2 of the random strings x lead to a rejecting configu configurati ration. on. Suppose Suppose on the contrar contrary y that this is not the case. case. Let S be the set of all such x. Then the uniform distribut distribution ion X on S has min-entropy at least k (r). Thus Ext(X, Ext(X, U d(r) ) is 1/ 1/6-close to uniform. Let T 0, 1 r be the statistical test that consists of all random strings that make M accept. The probability that a string drawn uniformly at random from 0, 1 r is in T is at least 2/ 2 /3. By definitio definition, n, the probabil probabilit ity y that that the yi are in T is 2/3 1/8 > 1/2.
{ }
∈
∈{ }
∈
⊆{ } { }
≥
−
122
19. Extractors
This is a contradiction, since for each choice of x that makes M reject, more than half of the string Ext(x, Ext( x, i) lead to a rejecting configuration, i.e., are not in T . T . If we take the extractor from the previous section, we have N ( N (r ) = r. To achieve d(r) = O(log n) (and get polynomial running time), we have to set k (r ) = r log r. To get a k (r) source, we can use k (r ) random bits and fill the remaining log r bits bits with with zeroes. zeroes. The error error probabi probabilit lity y is 2 r−log r−r = 1/r. /r. So we get a polynom polynomial ial error reducti reduction on with less random bits! bits! (Note (Note that that one can always save log r random bits by trying all possibilities for them and then making making a majority majority vote. vote. But it is not clear clear that that this this reduces reduces the error probability, since the trials are not independent.) Extractors can also be used to run PTMs with a weak random source instea instead d of a prefec prefectt random random string. string. The proof of the follow following ing lemma lemma is similar to the proof of the previous one and is left as an exercise.
−
6)-extractors Extr : Lemma 19.6. If there is an explicit family of (k (r ), 1/6)-extractors N ( N ( r ) d ( r ) r 0, 1 0, 1 0, 1 then for any BPP-machine M that runs in
{ }
×{ } → { }
time t, uses r random bits, and has error probability 1/3, there is a Turing machine M with L(M ) M ) = L(M ) that runs in time poly(N poly(N ((r ), 2d(r) , t), uses one sample of an ( an (N ( N (r), k (r )+ )+(r))-sourc ))-source, e, and has error probability probability bounded bounded − ( r ) by 2 . Exercise 19.3. Prove Lemma 19.6
20
Circ Circui uits ts and and fir first st-o -ord rder er logi logicc
One can (quite easily) find AC0 circuits circuits for addition. addition. Multiplicati Multiplication on seems a little harder, but there are constant depth circuits with unbounded fan-in for multiplication, if we use not only and, or, and not gates, but also threshold gates. gates. But for a long time, time, it was was not even known known how to divide divide even in logspac logspace, e, let alone with with consta constant nt-de -depth pth circuits circuits of polynom polynomial ial size. size. This This changed The goal of this and the following section is to develop threshold circuits circuits for division. division. It will sometimes be more convenient to do this in the framework of logic. Thus, Thus, we will show the equivalence equivalence of constant-dep constant-depth th circuits circuits to firstorder logic in this section, which has been proved by Barrington and Immerman [BI90]. In the next section, we will then show that division can be performed by constant-depth circuits. This chapter is far from being a complete introduction to complexity theory in terms of logic, which is called descriptive complexity. We will only cover cover first-order logic with some extensions extensions.. For a more detailed introducintroduction, we refer to Immerman [Imm99].
20.1 20.1
Firs Firstt-o order rder logi logicc
First-order logic is logic, where the quantifiers range only over elements of the domain and not (as in second-order logic) over sets of elements. Since we want to express properties of strings over 0, 1 , we introduce a unary predicate X . For an input string string x = x0 . . . xn−1 , we have X (i) = 1 if and only if xi = 1. (We (We will will use use 1 and and true true as well well as 0 and and false false inter interch change angeabl ably y.) We will will have have constan constants ts 0, 1, and x = n and binary predicates = and on numbers 0, . . . , n . Finally Finally, we include the binary predicate BIT, where BIT(i, BIT( i, m) = 1 if and only if the ith bit of the binary expansion of m is 1. (For (For instance, instance, BIT(0, BIT(0, m) = 1 if and only if m is odd. The role of BIT will soon become clear.) Our first-order language is the set of formulas that we can build using 0, n, , =, BIT, X () ( ) as well as , , and variables x , y , z , . . . and the quantifiers and . The quantifier always range over 0, 1, . . . , n 1 , i.e., x means x 0, 1, . . . , n 1 . To mak makee the notation notation less cumbersom cumbersome, e, we add syntactic sugar like , where a b means a b. Furthermore, urthermore, we abbreviate (a ( a b) ( a b) by a = b, knowing well that this might cause confusion confusion with the “official” “official” binary predicate =. The exclusiveexclusive-or or is denoted by . We will will also also use the binary binary predic predicate ate < on numbe numbers. rs. Not
{ }
≤
≤
∃
∀ ∃ ∃ ∈{ − } → ∧ ∨ ¬ ∧¬ ⊕
{
}
∧ ∨ ¬
→
123
||
{ ¬ ∨
− }
124
20. Circuits and first-order logic
≤
¬
surprisingly, a < b if and only of a b and (a = b). Analogously Analogously,, we will use =, =, >, and . To increase confusion, we identify 0 and false as well as 1 and true, we will thus treat true and false also as numbers. A sentence is a closed formula of the first-order language that contains no free variables. (A variable x is free if there is no quantifier x or x to the left left of it that that binds binds this this variable ariable.) .) Senten Sentences ces express express properti properties es of binary binary strings strings:: A string string x specifies X and n = x . Then Then the senten sentence ce is either either true or false. false. In the former former case, case, x has the property (or is in the language specified by the sentence). In the latter, not. We denote by FO the set of all languages that can be expressed by firstorder sentences. sentences. In particular, the predicates and BIT are allowed allowed for FO. If we want to make clear that only built-in predicates P 1 , . . . , Pc are allowed, we call the corresponding class FO[P 1 , . . . , Pc ]. Thus Thus,, FO = FO[ , BIT]. Furthermore, FO[], which forbids and BIT, is a strict subclass of FO. In FO[], only =, 0, 1, and n are available.
≥
∀
∃
||
≤
≤
≤
Example 20.1. The regular language L((00 + 11) ) can be expressed by
∀i : BIT(0, BIT(0, i) → ∃j : succ(i, succ(i, j ) ∧ X (i) = X (j ) . We have to specify succ(i, succ(i, j ), which should be 1 if and only if i + 1 = j : succ(i, succ(i, j )
≡ ∀k : (k < j ) → (k ≤ i) .
The class FO suffices to perform (or describe) addition. We assume that three inputs x, y, and z (each (each an n digit number) number) are each given as separate separate unary predicates X , Y , Y , and Z , respectively. We have x + y = z if and only if X (0) (0) Y (0) Y (0) = Z (0) (0) and X (i) Y ( Y (i) C (i) = Z (i) for i 1, 2, 3, . . . , n 1 and C (n) = 0. He Here re,, C (i) denotes the ith carry bit. We can express this as
∈{
− }
⊕
⊕
⊕
⊕ Y (0) Y (0) = Z (0) (0) ∧ ∀i : i = 0 ∨ X (i) ⊕ Y ( Y (i) ⊕ C (i) = Z (i) ∧ ¬C (n) . X (0) (0)
This leaves us with two problem of how to compute C ? As a first attempt, one might try
≡
− 1) ∧ Y ( Y (i − 1) ∨ C (i − 1) ∧ (X (i − 1) ∨ Y ( Y (i − 1)) . (We can compute i − 1 usin usingg succ succ.) .) But But this this does not work work:: C is not C (i)
X (i
a pred predic icat atee that that we are allo allow wed to use. Inst Instea ead, d, it is a plac placeh ehol olde derr for for something, and we are only allowed to replace it by a first-order formula. Thus, we replace C (i) by the first-order sentence
∃j < i : X (j ) ∧ Y ( Y (j ) ∧ ∀k ∈ {j , . . . , i − 1} : X (k ) ∨ Y ( Y (k ) .
20.1. First-order logic
125
We call something like C , which looks like a predicate, will be used like a predicate, but is no predicate, a pseudo predicate in the following. In the same way, we can add numbers from 0, 1, . . . , n 1 using BIT. We call the corresponding ternary predicate +, written in the usual way x + y = z. Just to get a better feeling, assume that our input consists consists of three parts, parts, representing n/3-bit n/3-bit numbers a, b, and c. We want to test if a + b = c. First, we compute m = n/3 n/3 and m = 2n/3 n/3 = 2m, which can be done by
{
− }
∃m∃m : m + m = m ∧ m + m = n . Then we add three pseudo predicates A, B , C as follows:
≡ (i < m) ∧ X (i) , B (i) ≡ ∃j : i + m = j ∧ (i < m) m ) ∧ X (j ) , and C (i) ≡ ∃j : i + m = j ∧ (i < m) ∧ X (j ) . A(i)
Now we can use addition using the pseudo predicates.
{
first-order der sentence sentence for the language COPY = ww Exercise 20.1. Give a first-or w
∈ {0, 1} }.
|
Exercise 20.2. Give a first-order sentence for the following variant of par-
ity:
∈{ x
} | x = x1 . . . xn
0, 1
∧
log2 n x i=1
i
=1
.
(We will soon prove that FO equals AC0 (with appropriate uniformity). We already know that parity is not in AC0 (not even in non-uniform AC0 ). Inε deed, every constant depth circuit for parity has to be of size 2n for some constant ε > 0. Why is this not a contradiction?) Do the proof without the following Lemma 20.2. Using BIT, we can also perform multiplication of numbers between 0 and n 1. To do this, we need the following following result, which which is called the bitsum lemma.
−
predicate BSUM, BSUM, which is defined by BSUM(x, BSUM(x, y ) = Lemma 20.2. The binary predicate 1 if and only if y is equal to the number of 1s in the binary representation of x, is in FO. Proof overview: The idea is to keep a running-sum s1 , s2 , s3 , . . . of the first first,, second second,, third third,, . . . log log log log n bits of x. Then Then we only have have to com compare pare whether si equals si−1 plus the ith block block of log log n bits of x. This reduces the problem problem of counting counting 1s in blocks blocks of log log n bits. bits. We apply the same idea again.
126
20. Circuits and first-order logic
Now we encode an array containing the prefix sums within each block into a single variable variable.. Furthermore, urthermore, we have have a variable that represents represents an array containing s1 , s2 , . . . Proof. It is fairly easy to express a predicate Pow2 that some number m is a power of two:
≡ ∃i : BIT(i, BIT(i, m) ∧ ∀j : j = i → ¬ BIT(j, BIT(j, m) . The number x consists of at most log2 n bits. Let L be the smallest power of two larger than log2 n. The number L can be expressed as follows: ∃L : Pow2(L Pow2(L) ∧ BIT(L BIT(L − 1, n) = 1 . (Strictly speaking, we have to translate something like BIT( L − 1, n) into ∃q : q + 1 = L ∧ BIT(q, BIT(q, n). But will not do so for the sake of readability.) In the following, we assume that a variable can hold up to L bits. This is not precisely true, but we can use a fixed number c of variables to store c · log n Pow2(m Pow2(m)
bits. Addressing them is not too complicated since c is a constant and log n can be computed. Given any power of two A = 2a , we can multiply with and divide by A. We express x = Ay as
BIT(i, y ) = BIT(i BIT(i + A, x) ∧ (i < a → BIT(i, BIT(i, x) = 0 ∀i : BIT(i,
.
We have to add less than L bits. This is a number with at most log2 L bits. Let L be the smallest power of two larger than log2 L . The number L can be expressed in the same way as L. The idea is to keep keep a runningrunning-sum sum:: Using Using one existen existential tially ly quant quantifie ified d variable s (which represents L bits), we can guess (roughly) L/L numbers s1 , s2 , . . . , sL/L such that si = si−1 + ti , where ti is the number of 1s of x(i−1)L urthermore, s1 = t1 . Giv Given i, the bits of si are the 1)L +1 , . . . , xiL . Furthermore, bits s(i−1)L address them since since L is a power of two. 1)L , . . . , siL −1 . We can address Thus, Thus, for instance, instance, we can add them or compare them. We assume for the moment that the numbers ti are given. Then we can express this as
∃s : s1 = t1 ∧ ∀i : (i > 1) → si = ti + si−1 . Thus, we can express BSUM: BSUM(x, BSUM(x, y) = s : s1 = t1
∃
∧ ∀i : (i > 1) → si = ti + si−1 ∧ sL
= y.
The t1 , t2 , . . . remain to be computed. computed. We can compute ti by a running-sum, this time over over single single bits. The numbers numbers of roughly roughly at most log log log n < L bits, and there are only L partial sums. We assume that L L < L, L , which is true for sufficiently large n. Then all partial sums fit into into a single single variable. variable.
·
20.2. First-order logic with ma jority
127
We call the kth partial sums tkj . We know how to address since L is a power of two:
∃tj : BIT( BIT(00, tj1) = BIT(jL BIT(jL , x) ∧ ∀k ∈ {1, . . . , L − 1} BIT(i, BIT(i, tj1 ) = 0∧ ∀i∀k : tkj + BIT(jL BIT(jL + k, x) = tkj +1 . (We have used + with a Boolean value BIT(), but it should be clear how to interpret this.) Noting that we can deal with not-large-enough values of n of n by hard-wiring the results completes the proof. Lemma 20.3. The ternary multiplication predicate
·
only if x y = z , is first-order definable.
×, which is true if and
Proof. Multiplication is equivalent to adding log n numbers a1 , . . . , alog n of O (log n) bits, where ai = 2i xyi . Let L = Θ(log n) and L = Θ(log Θ(log log n) be as in the proof of Lemma 20.2. We add the numbers a1 , a2 , . . . as follows: First, we split ai = bi +ci such that the binary representations of bi and ci consist each of L bits of a of ai separated by L bits of 0s: bi contai contains ns the first, first, third, third, fifth, fifth, . . . block block of L bits of a of ai , ci the second, second, fourth, fourth, sixth, sixth, . . . block. block. From now on, we treat treat b1 , b2 , b3 , . . . and c1 , c2 , . . . separately. By symmetry, we only have to describe how to add the bi s. If we have the two sums of the bi s and ci ’s, then we can add them sums using a final +. We consider the bi s as written written below below each other. other. Then Then we count count the num number of 1s in each each colum column n usin usingg BSUM BSUM.. Due Due to the stru struct ctur uree of the the bi s, no carry is propagated more than L bits. This reduces the problem to adding log n numbers (L (L for each of the log n/L blocks), each of length L (becaus (becausee of possible possible carries). carries). But if fact, fact, we have have only only L/L blocks of L numbers of L bits to add. We can guess all sums in a single variable. This is then the sum of the bi s. Then Then we can verify verify that we guessed guessed correct correctly ly using BSUM and running-sums, as we did for Lemma 20.2.
≤, we can directly use + and ×. This is equivalent: equivalent: We can implement implement BIT using + and ×, and we have already seen that BIT suffices to implement + and ×. Thus Thus,, FO = FO[≤, BIT] = [+, ×]. FO[+, [+]. Exercise 20.3. Show that FO[≤] ⊆ FO[+]. Remark 20.4. Instead of BIT and
20.2 20.2
FirstFirst-or order der logic logic with with majo majorit rityy
We can extend extend first-orde first-orderr log logic ic in differe different nt ways. ways. First, First, we can add new predicates and constants, which would allow us to specify properties of more
128
20. Circuits and first-order logic
complicated complicated structures. structures. For instances, instances, to specify properties properties of graphs, graphs, it is more convenient to specify a graph G = (V, E ) on n vertices by a binary relation E with E (i, j ) = 1 if and only if i, j E .
{ }∈
Exercise 20.4. Show that using a binary input predicate E instead of the
unary predicate X does not increase the expressive power of first-order logic. Second, and more importantly for our purposes, we can introduce new quantifiers quantifiers.. We will mak makee heavy use of the threshold quantifier quantifier M. It has has the following following interpretat interpretation: ion: Mx : P ( P (x) is true if and only if P ( P (x) = 1 for more than half of the possible x, i.e., for at least (n + 1)/ 1)/2 of the n possible values for x. We denote by FOM the set of all languages that can be expressed by first-order sentences with and BIT as well as M. Another quantifier, which is only of temporary use for us, is H: Hx : P ( P (x) is true if and only of P ( P (x) is true for n/2 n/2 of the n possible values of x. “Hx : P ( P (x)” can be expressed by saying “P “ P ((x) is not true for the majority of x, but it becomes true if we add one more element x for which P ( P (x ) is true”:
≤
P (x) Hx : P (
≡ ∃x :
P (x) Mx : P (
∨ x = x ∧ ¬ Mx : P ( P (x)
.
The quantifier H is useful to express the following predicates: 1. F ( F (x, y)P ( P (x): “There “There are exactl exactly y y values of x with x P ( P (x).”
n/2 and ≤ n/2
2. S (x, y)P ( P (x): “There “There are exact exactly ly y values of x with x > n/2 n/2 and P ( P (x).” 3. y = #x : P ( P (x): “There are exactly y values of x for which P ( P (x).” We only show how to express the first expression: F ( F (x, y )P ( P (x)
≡
≤
Hx : x
∧ P ( P (x) ∨ n/2 n/2 < x ≤ n − y
n/2 n/2
.
The second expression expression follows by symmetry. symmetry. The third is addition addition of variables, which we have seen in Section 20.1. Using #, we can add not only two numbers, but a sequence of numbers: bers: Let ITADD( ITADD(X X 1 , . . . , Xn , Y ) Y ) be the predicate that is true if the sum of the numbers x1 , . . . , xn , represented by the unary predicates X 1 , . . . , Xn , is equal to y, represented by Y . Y . (ITADD (ITADD stands for iterated iterated addition. For convenience, we assume that we have a binary predicate X (i, j ) = X i (j ).) As we have seen, multiplication reduces to adding sequences of numbers. Let MULT(X,Y,Z MULT(X,Y,Z ) be true if and only if x y = z .
·
20.3. Uniformity
1 29
ITADD, and hence also MULT, MULT, can be expressed in FOM. Lemma 20.5. ITADD, Proof. The lemma lemma can be prove proved d in a simila similarr way way as Lemma 20.3. 20.3. We already have # because we are allowed to use M, which is in this setting the equivalent of BSUM. Exercise 20.5. Fill in the details of the proof of Lemma 20.5.
∀
∃
For and , quantification over pairs of variables can be replaced by two two quantifiers quantifiers.. For instance, instance, xy is equivalent to x y. For M, it is not immediately clear how to get rid of quantifiers over two or more variables. However, we can do so by using BIT.
∃
∃∃
Lemma 20.6. Mxy can be expressed using FOM. Exercise 20.6. Prove Lemma 20.6.
Hint: First express express the pred predic icate ate u, v = # x, y : P ( P (x, y) with with the meaning “there are exactly n(u 1) + v pairs x, y for which P ( P (x, y) is true.
−
Exercise 20.7. Show that PARITY
∈ FOM.
In the next chapter, we will show that division is in FOM.
20.3 20.3
Unif Unifo ormit rmityy
The circuit circuit complexit complexity y classes classes NCi and ACi all come in different flavors corresponding to different conditions on uniformity. Recall: A family C = (C n )n∈N of circuits is called polynomial-time uniform, if the mapping 1 n C n (where we identify C n and an appropriate encoding of C of C n ) can be computed in polynom p olynomial ial time. C is log-space uniform if the mapping can be computed in logarithmic space. From the two uniformity conditions, we get ptime-u ACi and ptime-u NCi as well as logspace-u ACi and logspace-u NCi . Ho Howe weve ver, r, both uniformi uniformity ty conditions have drawbacks if we want to analyze subclasses of L or NC1 . The reason is, for instance, for ptime-u AC0 , constructing the circuit can be much harder than evaluating it. Thus, in order to study subclasses of NC1 , we need a more restrictive variant of uniformity. It turns out that a good choice is DLOGTIME uniformity:
→
1. It is restricted restricted in the sense that constructing constructing the circuit is very easy. easy. 1 In particular, it allows us to distinguish subclasses of NC . 2. It yields yields circuit circuit com comple plexit xity y classe classess equal equal to FO and FOM. So it it is is robust. 3. Many ptime-u or logspace-u circuits are actually DLOGTIME uniform.
130
20. Circuits and first-order logic
Of course, in logarithmic time, we cannot construct a circuit of polynomial nomial size. size. But we are able able to answer answer questions questions about specific specific gates. gates. To make this more precise, we need the following definition. Definition Definition 20.7. 20.7. The connection language of a family C = (C n )n∈N of
circuits is the set of all tuples z = t,a,b,y , such that a and b are numbers (in binary) of gates of C n such that b is an input for a and gate a has type t. The string y is arbitrary such that the whole string z has length n.
In time logarithmic logarithmic in n (which is by construction also the input length), we are able to read the relevant parts of the instance t,a,b,n . (We (We will will see that the necessary steps can be performed in logarithmic time below.) Let us make more precise what we mean by deterministic logarithmic time: time: A log log-ti -time me Turing uring mac machin hinee has a read-o read-only nly input tape, a consta constant nt number number of work-tapes, work-tapes, and a read-write address address tape. The address address tape is used to select bits from the input. On a given given time step, the Turing Turing machine machine has access access to the input input bit specifie specified d by the conten contents ts of its address address tape. If the number on the address tape is too large, the Turing machine will get the information that this is the case. Deterministic logarithmic time Turing machines look quite limited at first glance, but they can perform some non-trivial basic tasks:
1. They can determine determine the length length of their input. 2. They can add and subtract subtract numbers of O(log n) bits. 3. They can compute compute the logarithm of numbers numbers of O (log n) bits. 4. They can decode simple pairing pairing functions. functions. This suffices to recognize the relevant parts of connection languages. Exercise 20.8. Prove that deterministic log-time Turing machines can in-
deed do what we claimed above. We denote by DLOGTIME the set of languages that can be decided in deterministic logarithmic time. Definition Definition 20.8. 20.8. A family C = (C n )n∈N of circuits is DLOGTIME uni-
form (DLOGTIME-u), if the connection language of C can be decided in DLOGTIME. In the following, we are mainly concerned with AC0 and TC0 . If nothing else is said, we assume DLOGTIME uniformity. AC0 contains all languages that can be decided by DLOGTIME-u families of circuits with unbounded fan-in, constant depth, and polynomial size. TC0 is defined similarly similarly.. The exception is that we also have threshold gates of unbounded fanin. A threshold gate of m inputs outputs a 1 if and only if at least (m + 1)/ 1) /2 of its inputs are 1. (This means, more than half of its inputs must be 1.)
20.4. Circuits vs. logic
20.4 20.4
131
Circ Circui uits ts vs vs.. logi logicc
The main result of this section is that AC0 = FO and TC0 = FOM. To show this, we first prove that our uniformity condition is restricted enough to allow for “construction in FO”. Lemma 20.9. DLOGTIME
⊆ FO.
Proof. Let M be a DLOGTIME Turing machine with k work tapes. We will have to write down a first-order sentence ϕ such that for all input strings x, M accepts x if and only ϕ is satisfied by X . The main idea is simple: simple: Since Since the mac machin hinee runs runs in logarithmi logarithmicc time, time, we can encode encode its behavior behavior into a consta constant nt number number of variable ariables. s. (Recal (Recalll that a variable can hold values between 0 and n 1, thus log n bits. Using BIT, we can specify individual bits.) Each step t is described by a constant number number of bits: M ’s M ’s state qt , the k symbols w1 , . . . , wk that M writes on its work tapes, the k directions d1, . . . , dk to which M moves its head on tape 1, . . . , k, k, respectively, as well as the position I t of the input head, which is controlled by the address tape. The sentence sentence ϕ begins with existent existential ial quantifiers quantifiers over the variables ariables z1 , . . . , zc (c is a suitable constant) that describe the behavior of M . M . This This means that ϕ = z1 z2 . . . zk ψ(z1 , . . . , zk ) for some first-order sentence ψ. The sentence ψ must assert that z = (z1 , . . . , zk ) forms a valid accepting computation computation of M of M .. To do this, we define two first-order formulas: C (p,t,a) p,t,a) is true if and only if the contents of cell p at time t is a. P ( P (p, t) is true if and only if the appropriate head is at position p at time t. (The position p also contains contains the information information on which which tape the position is.) Given Given P and T , T , we can write ψ as follows. Let us fix a time t.
−
∃ ∃
∃
1. We have have to assert assert that I t is correct for all t. To do this this,, we have have a variable y with an existential quantifier, and we condition y to be equal to the contents of the address tape at time t, which can be verified using C . Then we set I t = C (y). 2. The step step from time t to time t + 1 should be according to M ’s M ’s finite control control.. The step step depends depends on I t , the current state qt , k work tape symbols (a current tape symbol is an a such that there exists a p with C (p,t,a) p,t,a) P ( P (p, t)).
∧
Using P , P , we can write C : The conten contents ts of cell cell p of tape i at time t is wi,t , where t is the most recent visit of head i to position p. If M has not yet visited p, then wi,t is the blank symbol. Finally, to get P , P , we have to sum up O(log n) values of dt for t < t. This can be done as we have already seen.
Theorem 20.10. AC0 = FO and TC0 = FOM.
132
20. Circuits and first-order logic
Proof. We only only prove prove the first statem statemen ent. t. The second second follows follows with an almost identical proof, where we have to replace the quantifier M by threshold threshold gates and vice versa. (Since (Since threshold gates can take more than n inputs, we need Lemma 20.6 for TC0 = FOM.) Let us first prove FO AC0 . Let ϕ be any first-order formula of quantifier depth d. Withou Withoutt loss loss of generali generality ty,, we can assume assume that ϕ is in prenex normal form. This means that ϕ = Q1 y1 Q2 y2 . . . Qd yd : ψ (y1 , . . . , yd ), where ψ is quantifier-free and Q1 , . . . , Qd are any quantifiers. For such a ϕ, there is a canonical constant-depth circuit C n for every n. A tree of fan-out n and depth d corresponds corresponds to the quantifiers. quantifiers. At each leaf, there is a constant-size circuit corresponding to ψ (y1 , . . . , yd ). This This circuit circuit consists of Boolean operators, input nodes, and constants corresponding to the value of atomic formulas (=, , and BIT), where the constants depend on y1 , . . . , yd . What remains to be done is to show that this canonical circuit family is indeed DLOGTIME DLOGTIME uniform. uniform. The address address of a node will consist consist of log n bits for each quantifier (this is needed to specify the respective value of yi ) as well as a constant number of bits specifying which node of the respective copy of the constantconstant-size size circuit circuit we are considering. considering. In order to answer answer queries for the connection language, our DLOGTIME machine has to be able to (i) compare O(log n) bit numbers and do arithmetic with them (like dividing them into their parts for the several quantifiers) and (ii) compute from the numbers y1, . . . , yd to which which input nodes the respective respective constant-siz constant-sizee circuit circuit has to be connected. The latter is possible because the DLOGTIME machine can perform BIT and all the other operations on O (log n) bit numbers that a first-o first-orde rderr formu formula la can do. It is not difficult difficult but tedious tedious to work work out the details. To prove AC0 FO, we first observe that the connection language is in Lemma 20.9. 20.9. Let Let C = (C n )n∈N be a DLOGTIME uniform family FO by Lemma of constantconstant-depth, depth, polynomial-size polynomial-size circuits. Since C n is of polynomial size, we can refer to its nodes by tuples tuples of (a consta constant nt number number of ) variable ariables. s. In order to devise a first-order formula for C n , we will express the predicate AccGate(a AccGate(a), which is true if and only if gate a accepts, i.e., outputs 1. If we have AccGate, then we just have to evaluate it for the output gate. To get AccGate, we define inductively predicates AccGate d (a) with the meaning “gate a on level d outputs outputs 1”. For level 0, AccGate0 (a) is true if and only if (i) a is a number of a gate on level 0 and (ii) a is connected to some input xi = 1. To express AccGate d (a), we have to evaluate if a is a gate on level d in the first first place. place. Since Since d is constan constant, t, this can be expres expressed sed.. If gate a is indeed on level d, then then we procee proceed d as follo follows ws.. If a is a NOT gate, then AccGated (a) = AccGated−1 (b) for some gate b on level d 1. (We (We can can easily find out which gate b we need using the connection connection language, language, which is in FO by assumption.) If a is an AND gate, we use a b : ξ (b) to range over
⊆
≤
⊆
¬
−
∀
20.4. Circuits vs. logic
133
all other gates b. The expression ξ (b) is true if and only if (i) b is not a gate on level d 1, or (ii) b is not a predecessor of a, or (iii) b is a predecessor of a and AccGated−1 (b) = 1.
−
Exercise Exercise 20.9. 20.9. Let LH be the class of languages that can be decided by
an alternating log-time log-time Turing Turing machine. machine. (Such machines machines work similar to deterministic log-time Turing machines, except that they are alternating.) Show that FO = LH = AC0 . (This requir requires es only little extra extra work, given 0 that we know FO = AC and DLOGTIME FO.) Thus, DLOGTIME-u AC0 is indeed indeed a very robust robust class: We can define it in terms of circuits, logic, or using Turing machines.
⊆
Exercise 20.10. Construct family of circuits of polynomial size and depth
log log n) for parity. O(log n/ log Note: This is asymptotically optimal (see Corollary 12.7 of the lecture notes of “Computational Complexity Theory”).
21
Thre Thresh shol old d circ circui uits ts for for divi divisi sion on
For addition and multiplication, and also for subtraction, it is not too hard to come up with AC0 circui circuits. ts. But divisi division on seems seems to be much much harder. harder. It is fairly fairly easy easy to see that divisio division n can be done done in polynomial polynomial time. time. But for a long time, it was unknown unknown if division division is possibl p ossiblee in logarithmic space. space. In the 1980s, it has been shown that there are polynomial-time uniform TC0 circuits. (We will define below, what TC0 means.) However, even this does not prove prove that division can be done in logarithmic space. space. This was shown shown by Chiu et al. [CDL01], who proved that division lies in log-space uniform Finally y, it was shown shown that division division is in TC0 (which is a subclass of L). Finall 0 DLOGTIME uniform TC , which which is optima optimal: l: the problem problem is com comple plete te for 0 DLOGTIME uniform TC . (We (We will will also also define define below what what DLOGTI DLOGTIME ME uniform uniform means. means. Let us just just remark remark that it is even even weaker weaker than loglog-spa space ce uniform.) The goal of this chapter is to prove that division is in DLOGTIME uniform TC0 = FOM. We will do so in three steps: First, we reduce division to iterated multiplication. Second, we will introduce a new predicate POW (see below). Let us write FOMP = FOM[BIT, [BIT, <, POW] and FOP = FO[BIT, [BIT, <, POW] for short. short. We will will show show that divisio division n can be descri described bed in FOMP. This This places places 0 division in L since FOM = TC L and POW can easily be seen to be in L. Third, we will show that in fact POW can also be expressed in FOM, which places division in FOM = TC0 . In the remainder of this chapter, variables in capital letters, such as X and Y , Y , denote numbers of polynomial lengths. We call them also long numbers. Small letters letters represent represent short numbers, numbers, which which are of length O(log n). If there are numbers of length poly(log n), we will mention their lengths explicitly.
⊆
21.1 21.1
Divisi Division, on, iterat iterated ed multip multiplic licati ation, on, and poweri powering ng
Division is closely related to two other problems: iterated multiplication and powering: Division( X,Y,i)) is 1 if and only if the ith bit of Division: The predicate Division(X,Y,i
X/Y is 1. Powering(X,k,i)) is 1 if and only if the ith bit of X of X k is 1. (Note Powering: Powering(X,k,i that X has n bits and k has length O(log n). 134
21.2. Division in FOM + POW
135
ItMult(X 1 , X 2 , . . . , Xn , i) = 1 if and only if the Iterated multiplication: ItMult(X ith bit of
n j =1 X j
is a 1.
If we want to compute X/Y and have 1/Y 1/Y with sufficient precision, then division division reduces to multiplic multiplication. ation. And we already know how to multiply in 0 TC . Now observe that ∞ 1 (1 α)i = α
i=0
for α
−
∈ (0, (0, 1). If we assume further that α ∈ [1/ [1/2, 1), then we have n
1 = α
(1
i=0
− α)i + O(2−n) .
Now let j = log2 Y be roughly the number of bits of Y . Y . Then use 2 −j Y [ 12 , 1) as α in the preceding equation. This yields nj
2
·
X = X Y = X
∈
n
· ·
(1
i=0 n
− 2−j Y ) Y )i + O(X · 2nj −n ) − Y ) Y )i · (2j )n−i + O (X · 2nj −n ) .
(2j
i=0
(21.1)
This is equivalent to X = X Y
n
·
(2j
i=0
− Y ) Y )i · 2−ij + O(X · 2−n ) .
Thus, X/Y is approximated within an additive error of O (X 2−n ). If we we can evaluate evaluate the sum in (21.1), then we can proceed as follows: follows: We calculate X/Y with a precision of O(1), and then we compute the exact value of X/Y by hand. hand. (There (There is only only a consta constant nt number number of candid candidate atess and multiplication can be done in FOM.) So far, we have reduced division to computing an iterated sum of powers. Of course course,, poweri powering ng reduce reducess to iterate iterated d multi multipli plicat cation ion.. Thus, Thus, we mainly focus on iterated multiplication in the following.
21.2 21.2
Divi Divisi sion on in FOM + POW
The central tool for iterated multiplication, and thus also for division, is the Chinese remainder representation (CRR): An n-bit number is uniquely uniquely determ determine ined d by its residu residues es modulo modulo polynom polynomiall ially y prime prime numbe numbers, rs, each each of length O(log n). (There are enough such primes.) Assume that we are given primes m1 , . . . , mk , each a short number, and let M = ki=1 mi be their product. Any number X 0, 1, . . . , M 1 can
∈{
− }
136
21. Threshold circuits for division
≡
be represented uniquely as (x ( x1 , . . . , xk ) with X xi (mod mi ) for each i. Let C i = M/m i , and let hi be the inverse of C i modulo mi , i.e., C i hi 1 (mod mi ). Then, for any i, we have
≡
k
X
≡
xi hi C i
(mod M ) M ) .
i=1
Even more, k
X =
xi hi C i
i=1
− rM
for some number r = rankM (X ), ), called the rank of X with respect to M . M . Note that r is a short number, equal to the sum of the integer parts of xi hi C i /M = xi hi /mi , which is in 0, 1, . . . , mi 1 . What does CRR help? We have reduced iterated multiplication to iterated multiplication of short numbers, which is considerably easier. The algorithm for iterated multiplication is now easy to describe:
{
− }
1. Convert Convert the input from binary to CRR. 2. Compute Compute the iterated product in CRR. 3. Convert Convert the answer from CRR back to binary. binary. As a tool, we assume that the following predicate is given: POW(a,i,b,p POW(a,i,b,p))
≡ ai ≡ b
(mod p) .
(All four numbers here are short numbers.)
21.2 21.2.1 .1
The The secon second d step step
If p is a prime, then the multiplicative group Zp is cyclic and of order p 1. This This allow allowss us to take take discre discrete te log logarit arithms hms:: First, First, we find g , the smallest generator of Zp : g is the smallest number with g i 1 (mod p) for 0 < i < p 1. This yields a FOP formula GEN(g, GEN(g, p) that yields true if and only if g is the smallest generator of Zp . If g is a generator, then g i a (mod p) has a unique solution for every a. Using Using POW and GEN, GEN, we can take take discrete discrete logarithms: GEN(g, GEN(g, p) POW(g,i,a,p POW(g,i,a,p))
−
≡
−
≡
∧
is a FOP predicate that states that i is the discrete logarithm of a. Now, if the input is in CRR, then iterated multiplication simply reduces to iterat iterated ed addition: addition: We just just have have to add the discre discrete te logarithm logarithms. s. Since Since iterated addition is in FOM, this would put iterated multiplication in FOMP. This gives us the second step of our algorithm. However, we still have to be able to perform the first and third step of the algorithm.
21.2. Division in FOM + POW
21.2 21.2.2 .2
137
The The first first step step
The first step of our algorithm is easy to accomplish in FOMP, as we see from the following lemma. If X , m1 , . . . , mk are given in binary and X and X < M = Lemma 21.1. If X then we can compute CRR(X CRR(X ) = (x1 , . . . , xk ) in FOMP.
k i=1 mi ,
Proof. For each mi and each j < n, we can compute 2 j mod mi using POW. In this way, way, we obtain values values yi,j 0, 1, . . . , mi 1 for 1 i k and 0 j < n. n . Then we add yi,1 i,1 + . . . + yi,n−1 using iterated addition (which is in FOM) and take the sum modulo mi to obtain xi .
∈{
≤
− }
≤ ≤
In the lemma above, above, the prime numbers numbers m1 , . . . , mk are giv given en.. You might wonder how we actually get them since, of course, they are not part of the input. This is not very difficult, but we will nevertheless deal with it later on.
21.2 21.2.3 .3
The The thir third d step step
We will prove that we can perform the third step of our algorithm in FOMP by a series of lemmas. First, we observe that we, in fact, already know how to divide, albeit only by short primes. short prime. prime. Then Then the binary binary represe epresenta ntation tion of Lemma Lemma 21.2. 21.2. Let p be a short O (1) 1/p can be computed to n bits of accuracy in FOP.
∈
Proof. We can assume that p is odd. odd. Let Let s N be arbitrary. arbitrary. We write s = ap + b with b = 2 mod p. Then Then the the sth bit of the binary expansion of 1/p is equal to the low-order bit of a: We are interested in the low-order s bit of 2s /p . Now we have 2p = a + pb . Since b 0, 1, . . . , p 1 , we have b/p = 0. Thus, 2s b = a+ =a. p p 2s
∈{
− }
We observe that ap + b = 2s is even. even. Thus, Thus, because because p is odd, b mod 2 = 1/p.. a mod 2. Therefore, the low-order bit of b is the sth bit of 1/p In binary representation, it is very easy to test if a number is smaller than another. In Chinese remainder remainder representat representation, ion, this is more difficult, difficult, although it can be done.
∈{
− }
Let X, Y 0, 1, . . . , M 1 given in CRRM form. Testing Lemma 21.3. Let X, whether X < Y is in FOMP. Proof. Of course, X < Y if and only if X/M < Y/M . Y/M . Thus, Thus, it suffices suffices to show that we can compute X/ X/M M to polynomially many bits of accuracy.
138
21. Threshold circuits for division
Recall Recall that X = Thus,
k i=1 xi hi C i
X = M
k
i=1
− rankM (X )M and that C i
xi hi mi
= M/m i .
− rankM (x) .
The numbers x1, . . . , xk are given to us since CRR M (X ) is part of the input. put. The numbe numbers rs C 1 , . . . , Ck can be computed in FOMP: For C i , we add the discrete logarithms of mj for j = i. And h1 , . . . , hk are the inverse of C 1 , . . . , Ck , respectively, which can also easily be computed in FOMP. By Lemma 21.2, we can compute each summand can be computed to polynomially polynomially many bits of accuracy accuracy. We know that iterated iterated addition is in FOM, thus we can compute polynomially many bits of the binary representation of k x i hi X = + rankM (X ) . mi M
i=1
Since rankM (X ) is an integer, X/ X/M M is just the fractional part of this sum, of which we have sufficiently many bits. A useful consequence of us being able to compare numbers in CRR is that it allows allows us to chang changee the CRR basis: basis: If we have have primes primes primes p1, . . . , p with i=1 pi = P , P , then we can get CRR P (X ) from CRRM (X ). ). The crucial ingredient for this is that, given CRR M (X ) we can compute X mod p for a short prime p.
Lemma Lemma 21.4. 21.4. Given CRRM (X ) and a short prime p,, we can compute X mod p in FOMP.
Proof. If p = mi for some i, then we know the answer from the input. Thus, we can assume that p does not divide M . M . Let P = M p. If we we can can compute CRRP (X ), ), then this gives us X mod p. We turn to brutebrute-forc force: e: We try all p poly(n poly(n) possible values q for X mod p. This gives us the CRR P of numbers X 0 , . . . , Xp −1. One One of these these numbers is X . Moreover Moreover,, X is the only number among X 0 , . . . , Xp −1 that is smaller than M . M . (This (This follows follows that numbers numbers smaller smaller than M differ in CRRM and all X 0 , . . . , Xp −1 have a unique representation in CRR P .) We can compute CRR P (M ) M ) by adding the discrete logarithms of the primes m1 , . . . , mk modulo p. We carry out comparisons with M in CRRP . Thus, we can compute X mod p by finding the unique X i that is smaller than M . M . All of this can be done in FOMP.
≤
The last lemma towards implementing the third step of our division algorithm is dividing by products of short primes. Lemma 21.5. Let b1 , . . . , b be distinct short primes, let B =
i=1 bi ,
and let CRRM (X ) be given. Then we can compute CRRM ( X/B ) in FOMP.
21.2. Division in FOM + POW
139
Proof. We can assume that B divides M . M . Otherwise we apply Lemma 21.4 and extend our CRR basis. Let M = BP . BP . By dropping the primes from P from our basis, we can compute CRR B (X mod B ) in FOMP. From this, we can compute CRRM (X mod B ) by extending the basis again according to Lemma 21.4. Finally, we compute X (X mod B ) = B X/B in CRRM . By assumption, B and P are relativ relatively ely prime. prime. Thus, Thus, there exists exists a − − − 1 1 1 B with BB 1 (mod P ). P ). We can find CRR CRRP (B ) in FOMP: This This is finding the inverse of each component of the CRR P representation with respect to each component of P using discrete logarithms. Now we have
−
·
≡
B −1 B
≡ X B
X B
(mod P ) P )
(21.2)
in CRRP represent representation. ation. The final step is to observe that X < M implies X/B < P . P . Thus, Thus, we can extend the basis to get the CRR M representation of X/B .
Finally, we are able to prove that also the third step of our algorithm, converting CRR numbers into binary representation, can be expressed in FOMP. Theorem 21.6. Given CRRM (X ), with 0 binary representation of X in FOMP.
≤ X < M , M , we can compute the
Proof. It suffices to compute X/ X/22s for any s. Then Then the the sth bit of X s s +1 is given as X/ X/22 2 X/ X/22 . We get this this number number in CRRM , but it is easy to distinguish 0 from 1, even in CRR. First, we create numbers A1 , . . . , As . Each Aj is the product of polynomially many short distinct primes that do not divide M , M , and want Aj > M . M . k Recall that M = i=1 mi for short short primes. primes. Let m1 < m2 < . . . < mk be the first k odd primes. (We (We can assume assume this without without loss of generality generality.) k Then we set Aj = i=1 pjk+1 jk +1 , where p is the th smallest prime number. The prime number theorem guarantees that there are enough short primes for our purposes, and these Aj fulfill our conditions. conditions. Furthermore, urthermore, a list of all (short) primes smaller than poly(n poly( n) can easily be computed by a TC0 circuit, this also in FOM. Thus, we know how to get these primes. 1+A 1+A 1 Assume that the Aj are are very very large. large. Then Then 2Ajj Thus,, X/ X/22s 2 . Thus
− ·
≈
1+A 1+A
≈
X js=1 2Ajj . It might might look as if we are complic complicatin atingg the problem problem,, but it turns out that, on the one hand, this quantity involving the Aj s is easier to compute and, on the other hand, it is precise enough to give us X/ X/22s . s Let P = M extend the basis basis to get CRRP (X ). ) . Sinc Sincee j =1 Aj . We extend M < A j for all j , we have
·
s
j =1
1 + Aj < Aj
1 1+ M
s
.
140
21. Threshold circuits for division
Furthermore, for every K
1 1+ M
Setting K =
M s+1
≥ 1,
s
s < exp < M
and exploiting that s s
j =1
1 1+ K
s M
·(K +1) +1)
M , M , this yields
1 + Aj s+1 < 1+ . Aj M
(21.3)
Using Lemma 21.5, we can compute the CRR P of
· ≥ · · 1+A 1+Aj s j =1 2 s j =1 Aj
Q = X By (21.3), we have X
·
1+A 1+Aj s j =1 2 s j =1 Aj
X < s 2
s+1 1+ M
X < s 2
X . 2s
2s X 1+ < s +1. X 2
Thus, Q X/22s , X/ X/ X/22s + 1 . We determine determine which which one of Q, Q s correct by checking whether Q2 > X (using CRRP ).
∈ {
}
− 1 is
Exercise 21.1. Using Lemma 21.1 and Theorem 21.6, one can convert in FOMP numbers from any base to any other base.
Prove this! Corollary 21.7. Division, iterated multiplication, and powering can be expressed in FOMP.
21.3
POW is is in in FO
21.3 21.3.1 .1
Two Two speci special al cas cases es in in FO FO
∈
The first step towards proving POW FO will be to show POW as well as division division and iterated iterated multiplic multiplication ation of very very short numbers numbers can be b e performed in FO. We start by showing that this is true for POW. POW(a,r,b,p)), where a, r , b, and p have O (log (log log n) bits Lemma Lemma 21.8. 21.8. POW(a,r,b,p each, is in FO. Proof. Let us assume that a, b, p, and r have k log log log log n bits each. each. We r can compute a mod b in FO by using using repeated repeated squarin squaring. g. To do this, we consider the sequence r0 , r1 , . . . , rk log log n of exponents with ri = r/2 r/2i . Thus, r0 = r and rk log logn Moreove ver, r, ri = 2ri+1 or ri = 2ri+1 + 1 log n = 0. Moreo depending on the corresponding bit of r.
21.3. POW is in FO
141
Now we compute all values ai = ari mod p. We have have to chec check that 2 2 ak log log n = 1 and ai = ai+1 mod p or ai = ai+1 mod p, depending on the corresponding bit of r. Each Each check check can be performed performed easily easily in FO. Sinc Sincee each ai needs at most k log log log log n bits and there are k log log log n such numbers, all ai easily fit into a single variable for sufficiently large n. Thus Thus,, we can perform all checks in parallel, which completes the proof of the lemma. Now we use that POW for very short numbers can be expressed in FO to show that division and iteration multiplication of short numbers can be done in FO as well. and Division,, where the inputs have (log n)O(1) bits, Theorem 21.9. ItMult and Division are in FO. Proof. We know from Corollary 21.7 that division and iterated multiplication with inputs of length r can be done in FOMP over the universe 0, . . . , r 1. We set r = (log n)k . Then Division Division and ItMult can be expressed expressed k in FOMP over the universe 0, 0 , . . . , (log n) 1. We will show that such FOMP formulas can be expressed in FO over the universe 0, 0 , 1, . . . , n 1. Note that all uses of POW in these formulas are called with inputs of O(log(log n)k ) = O(log (log log log n) bits bits.. Thus Thus,, we can can repl replac acee POW POW by FO formulas according to Lemma 21.8. In the same way, the threshold quantifier can be replaced by a FO formula since the range of the quantified variables is 0, . . . , (log n)k 1. This This is becau because se BSUM BSUM can can be expre express ssed ed in FO as long as there are at most (log n)O(1) ones to count, which is the result of Exercise 21.2, which completes the proof.
−
−
−
−
Exercise 21.2. Prove the following: In FO over the universe 0, 1, . . . , n
length (log n)O(1).
− 1,
we can count the number of 1s in binary strings of Even more, we can even count the number of 1s in a binary string of length n if this number is at most (log n)O(1). Remark 21.10. Beyond being a tool for showing that division is in FOM,
this theor theorem em is also also inter interesting esting in its own right: It gives gives tight tight bounds ounds for the size of the numbers for which Division and ItMult are in FO: One the one hand, the theorem shows that this is the case for numbers consisting of (log n)O(1) bits. bits. On the other other hand, hand, we have have FO = AC0 . And And any any circui cir√ cuit t 2d m) Ω( of constant depth d that computes parity of m √ bits must be of size 2 . 2d m) Ω( For parity of m bits being in FO, we need 2 poly(n poly(n), which implies m poly(log n).
≤
≤
21.3.2
POW is in FO
What remains to be done is to show that POW is in FO. In order to prove this, we first show something something slightly slightly more general: powering powering in groups of
142
21. Threshold circuits for division
order n is FO Turing reducible to finding the product of log n elements of this group. This This needs needs some clarific clarificatio ation: n: First, First, FO Turing reducible reducible essentially essentially means that we are allowed to use a predicate for the product of log n element ementss of this group. Second Second,, we restric restrictt ourselv ourselves es to groups groups that can be represented in FO. This means that group elements can be labeled by numbers 0, 0, . . . , n 1 such that the product operation is FO definable.
−
Exercise 21.3. Show that for any group that can be represented in FO, the inverse and the neutral element can be defined in FO. Lemma 21.11. Finding small powers in any group of order n, i.e., computing ar for a group element a and a small number r , is FO Turing reducible reducible
to finding products of log n elements. Proof. Our goal is to compute ar . The way how we do this is to compute group elements a1 , . . . , ak as well as numbers u, u1 , . . . , uk for k = o(log n) k ui such that ar = au addition, on, we wan wantt ui < 2log n and u < i=1 ai . In additi 2 2(log n) . Given these elements and numbers, we can easily compute ar : aiui for 1 i k as well as au amounts to computing products of small numbers of group elements: elements: For aiui , this follows from ui < 2log n. And for au , we use two two rounds rounds of multi multiply plying ing at most most 2 log n elements elements.. The result result ar is then also just a product of k + 1 numbers. We will choose the group elements ai to be d-th roots of unity for a small prime d. The numbers ui can then be computed using Chinese remaindering. Our first step consists of finding a CRR basis D consisting of primes, each of which is at most O(log n). More More precis precisely ely,, we choose choose a set k = o(log n) primes d1 , . . . , dk such that di < 2log n for each i and each di is relatively prime to n. Furthermore, we want n < D = ki=1 di < n 2. We can compute these di s by a FO formula that finds the first D > n that is square-free, relatively prime to n such such that all its prime factors are smaller than 2 log n. To compute the number k and the relation between the di and i, we count, for each prime p0 , the numbers of primes dividing D that are smaller than p0 . We can do this using BSUM. The second second step step consis consists ts of com comput puting ing ai = an/di . We do th this as − 1 follow follows: s: First, First, we compute compute a (see (see Exerci Exercise se 21.1). Second Second,, we compute compute − n i ni = n mod di . Third, Third, we compute compute a by multiplying ni copies of a. We can do this by the assumption of this lemma because ni < di < 2log n. Now we come to computing an/di . To do this, we observe that
·
≤ ≤
an/di
di
= adi n/di = an−ni = a−ni .
The last equality holds because an = 1 in any group of order n.
21.3. POW is in FO
143
1 Now let d− i be the multiplicative inverse of di , i.e., there exists a number 1 m with di d− = mn + 1. There There exists exists exactly exactly one group group elemen elementt x with i − d n i i x = a , and this group element x is the one we are looking for: We have
−
x=x
mn+1 mn+1
1
di
= x
di
= a−ni
−
di
1
.
Thus, we can express ai as
∃ai x d
i
= a−ni .
Note that we can compute xdi using multiplication of O(log n) numbers, but 1 we cannot compute an/di directly since it might happen that d− is not i O(log n). Our third step consists of finding the exponents u, u1 , . . . , uk . By th the choice of the ai in the second step, we have k
a1u1 . . . akuk = a
· ·
i=1
ui n/di
.
Thus, we have to choose u1 , . . . , uk such that k
u
≡ − · r
ui
i=1
n di
(mod n) .
(21.4)
k n In order to get a small value for u, we have to choose i=1 ui di close to r. To achieve achieve this, we approximate approximate r as a linear combination of n/di : Compute f = rD/n . (We can compute this since r has only O(log n) bits by Theorem Theorem 21.9.) Let Di = D/di . Then Then we compute compute ui = f Di−1 mod di . This gives us
·
k
i=1
ui Di
≡ f
(mod D) .
Let m be a number that satisfies ki=1 ui Di = f + f + mD. mD. Now we can calculate u from u1 , . . . , uk accord according ing to (21.4). (21.4). (This (This is a sum of k short numbers, which can be computed in FO since k = o(log n).) What remains to be done is to show that u < 2(log n)2 . To show this, this,
144
21. Threshold circuits for division
we calculate the difference difference between between r and the sum of the ui n/di : k
k
k
· − − · · − · − · − · − · − · − − − − · − ui
i=1
n di
=
i=1
n = D =
ui n di
i=1
k
ui Di
=r
i=1 k
rD n
n di
n di
ui
i=1
k
rD + nm n
n D
n di
ui
i=1
n di
ui
k
n (f + mD) mD) D
n = D
ui n di
ui
i=1
n di
n di
n di
k
rD n
+ nm
ui
i=1
n di
n di
.
This yields n u= D
k
− · − rD n
rD n
+
ui
i=1
n di
n di
.
− ∈
For any number x, we have x x [0, [0, 1). Furthermore, urthermore, n/D < 1 by our choice of D, ui < 2log n for each i, and k = o(log n). Thus, Thus, we have have 2 u < 2(log n) , which finishes the proof. Now we note that, first, FO is closed under polynomial changes of the input size and, second, the product of log(n log( nk ) = k log n groups elements is elements. ts. This yields yields FO Turing reducible to the product of log n groups elemen k that finding powers in any group of order n is FO Turing reducible to finding the product of log n elements. We now apply the above result that powering reduces to iterated multiplication to the groups of integers modulo p for a prime p = O(nk ). The The multiplicative group Zp contains the integers 1, 1, . . . , p 1. Multiplication in Zp is FO-definable since multiplication is FO-definable. For evaluating POW(a,r,b,p POW(a,r,b,p), ), we proceed now as follows: If a = 0, then we just have to check whether also b = 0. Otherwise, we can find ar in Zp , provided that the product of log n group elements can be computed with inputs of size log 2 n. However, this can be done according to Theorem 21.9. This immediately yields the main results of this section and of this chapter.
−
Theorem 21.12. POW is in FO. Theorem 21.13. Division is in FO.
Bibliography [ACG+ 99]
G. Ausiell Ausiello, o, P. P. Crescenzi Crescenzi,, G. Gambosi Gambosi,, V. Kann, Kann, A. A. MarchettiMarchettiSpaccamela, and M. Protasi. Complexity and Approximation . Springer, Springer, 1999.
[AG0 [AG00] 0]
Carm Carmee Alv Alvar arez ez and and Ra Raym ymon ond d Gre Green enla law. w. A com compe pend ndiu ium m of of problems complete for symmetric logarithmic space. Comput. Complexity , 9:73–95, 2000.
[ALM+ 98]
Sanjeev Sanjeev Arora, Arora, Carsten Carsten Lund, Rajeev Motw Motwani, ani, Madhu Madhu Sudan, and Mario Szegedy. Proof verification and hardness of approximation problems. J. ACM , 45(3):501–555, 1998.
[BDCGL92] [BDCGL92] Shai Ben-David, Ben-David, Benny Benny Chor, Oded Goldreich, Goldreich, and Michael Luby Luby. On the theory of average average case complexity complexity. J. Comput. Syst. Sci , 44(2):193–219, 1992. [BGS98] [BGS98]
Mihir Mihir Bellare Bellare,, Oded Oded Goldre Goldreic ich, h, and Madhu Madhu Sudan. Sudan. Free bits, bits, PCPs, and nonapproximability—towards tight results. SIAM J. Comput , 27(3):804–915, 1998.
[BI9 [BI90] 0]
Davi David d A. Mi Mix x Ba Barr rrin ingto gton n and and Ne Neil il Im Imme merm rman. an. On unif uniform ormit ity y 1 within NC . J. Comput. Syst. Sci , 41:274–306, 1990.
[Big93]
Norman rman Bigg ggss. Algebraic graph theory . Cambridge Cambridge Universit University y Press, second edition, 1993.
[BT0 [BT06] 6]
Andr Andrej ej Bo Bogda gdano nov v and and Luca Luca Trevi revisa san. n. Av Aver erag agee-ca case se comp comple lexxity. Foundations and Trends in Theoretical Computer Science, Science , 2(1):1–106, 2(1):1–106, 2006.
[CDL01 [CDL01]]
Andrew Andrew Chiu, Chiu, George George I. David Davida, a, and Bruce Bruce E. Litow Litow.. Division Division 1 in logspace-unif logspace-uniform orm NC . RAIRO Theoretical Informatics and Applications, Applications, 35(3):259–275, 2001.
[Din [Din07] 07]
Irit Irit Dinu Dinur. r. The PCP PCP theo theorem rem by gap am ampl plifi ificat catio ion. n. J. ACM , 54(3), 2007.
[FKN02] [FKN02]
E. Friedgu riedgut, t, G. Kalai, Kalai, and A. Nao Naor. r. Bool Boolean ean functi functions ons whose whose Fourier transform is concentrated on the first two levels. Adv. in Applied Math., Math., 29:427–437, 2002. 145
146
BIBLIOGRAPHY
[GLS [GLST9 T98] 8]
Venk enkates atesan an Gurus Guruswa wami mi,, Danie Daniell Lewi Lewin, n, Ma Madh dhu u Suda Sudan, n, and Luca Luca Trevisa revisan. n. A tight tight charac character teriza ization tion of NP with with 3-q 3-quer uery y PCPs PCPs.. In Pro Proc. 39th Ann. IEEE IEEE Symp. Symp. on Foundati oundations ons of Comput. Sci. (FOCS), (FOCS), pages 8–17, 1998.
[Gur [Gur91] 91]
Yuri Gure Gurevi vicch. Avera Average ge case case comp comple lete tene ness ss.. J. Comput. Syst. Sci , 42(3):346–398, 1991.
[H˚ [H˚ as99]
Johan H˚ astad. Clique is hard to approximate within n1− . Acta Mathematica , 182:105–142, 1999.
[HLW06 [HL W06]]
S. Hoory Hoory,, N. Linial Linial,, and and A. Wigder Wigderson son.. Expand Expander er grap graphs hs and and their applications. applications. Bull. Amer. Amer. Math. Math. Soc. Soc.,, pages pages 439– 439–561, 561, 2006.
[Imm [Imm99 99]]
Neil Ne il Im Imme merm rman an.. Descriptive Complexity . Springer, 1999.
[Imp [Imp95] 95]
Russ Russel elll Im Impa pagli gliaz azzo zo.. Ha Hardrd-co core re dist distri ribu buti tion onss for some somewh what at hard funct function ions. s. In Proc. 36th Ann. IEEE Symp. on Foundations of Comput. Sci. (FOCS), (FOCS) , pages 538–545, 1995.
[KS03 [KS03]]
Adam Adam R. Kliv Klivan anss and Rocco Rocco A. Serv Served edio io.. Boostin Boostingg and and hard hard-core set construction construction.. Machine Learning , 51(3):217–238, 2003.
[Lev [Lev86] 86]
Leon Leonid id A. Levi Levin. n. Av Aver erage age case case comp comple lete te prob proble lems ms.. SIAM J. Comput , 15(1):285–286, 1986.
[NTS [NTS95 95]]
Noam Nis Noam Nisan an and and Amn Amnon on TaTa-Sh Shma ma.. Symm Symmet etri ricc logs logspa pace ce is is closed under complement. Chicago Journal of Theoretical Computer Science, Science, 1995.
[O’D0 [O’D04] 4]
Ryan Ryan O’Don O’Donne nell ll.. Ha Hardn rdnes esss am ampl plifi ifica cati tion on with within in NP. NP. J. Comput. Syst. Sci , 69(1):68–94, 2004.
[Rei [Rei08 08]]
Omer Rein Omer Reingol gold. d. Undi Undire rect cted ed conn connec ecti tivi vitty is in loglog-sp spac ace. e. J. ACM , 55(4), 2008.
[RVW0 [RVW02] 2]
Omer Omer Reing Reingold old,, Salil Salil Vadhan, adhan, and and Avi Avi Wigd Wigders erson. on. Entro Entropy py waves, the zig-zag graph product and new constant degree expanders and extractors. Annals of Mathematics, Mathematics, 155(1):157– 187, 2002.
[Sen [Sen07] 07]
Stef Stefan an Seni Senits tsch ch.. Ein Ein komb kombin inato atoris risch cher er Be Bew weis eis f¨ur u r das PCPTheorem. Diplomarbeit, TU Ilmenau, 2007.
[SS9 [SS96] 6]
Michael Mic hael Sips Sipser er and and Dani Daniel el Spie Spielm lman an.. Expand Expander er codes codes.. IEEE Trans. Inform. Theory , 42:1710–1722, 1996.
[Vaz [Vaz01 01]]
Vija Vijay y V. Vazir aziran ani. i. Approximation Algorithms. Algorithms. Springer, 2001.
BIBLIOGRAPHY
147
[vN28] [vN28]
John von von Neuman Neumann. n. Zur Theori Theoriee der Gesell Gesellsc schaft haftssp sspiel iele. e. Mathematische Annalen , 100:295–320, 1928.
[VV8 [VV86] 6]
Lesli esliee G. Val Valia ian nt and and Vija Vijay y V. Vaz Vazir iran ani. i. NP is is as eas easy y as detecting unique solutions. Theoret. Comput. Sci., Sci., 47(1):85– 93, 1986.
[Wan97 [Wan97]]
Jie Wang. ang. Average Average-ca -case se intrac intractab table le NP problem problems. s. In Ding-Z Ding-Zhu hu Du and Ker-I Ko, editors, Advances in Languages, Algorithms, and Complexity , pages 313–378. Kluwer, 1997.
[Yao8 [Yao82] 2]
A. C. Yao. ao. Theo Theory ry and and app appli lica cati tion onss of of tra trapdoo pdoorr fun funct ctio ions ns.. In Proc. 23rd Ann. IEEE Symp. on Foundations of Comput. Sci. (FOCS), (FOCS), pages 80–91, 1982.