Electronic Colloquium on Computational Complexity, Report No. 60 (2010) A Separation of NP and coNP in Multiparty Communication Complexity

We prove that NP differs from coNP and coNP is not a subset of MA in the number-on-forehead model of multiparty communication complexity for up to k = (1-\epsilon)log(n) players, where \epsilon>0 is any constant. Specifically, we construct a function F with co-nondeterministic complexity O(log(n)) and Merlin-Arthur complexity n^{\Omega(1)}. The problem was open for k>2.


Introduction
The number-on-forehead model of multiparty communication complexity [CFL] features k communicating players whose goal is to compute a given distributed function. More precisely, one considers a Boolean function F : ({0, 1} n ) k → {−1, +1} whose arguments x 1 , . . . , x k ∈ {0, 1} n are placed on the foreheads of players 1 through k, respectively. Thus, player i sees all the arguments except for x i . The players communicate by writing bits on a shared blackboard, visible to all. Their goal is to compute F (x 1 , . . . , x k ) with minimum communication. The multiparty model has found a variety of applications, including circuit complexity, pseudorandomness, and proof complexity [Y, HG, BNS, RW, BPS]. This model draws its richness from the overlap in the players' inputs, which makes it challenging to prove lower bounds. Several fundamental questions in the multiparty model remain open despite much research.

Previous Work and Our Results
The k-party number-on-forehead model naturally gives rise to the complexity classes NP cc k , coNP cc k , BPP cc k , and MA cc k , corresponding to communication problems F : ({0, 1} n ) k → {−1, +1} with efficient nondeterministic, conondeterministic, randomized, and Merlin-Arthur protocols, respectively. An efficient protocol is one with communication cost log O(1) n. Determining the exact relationships among these classes is a natural goal in complexity theory.
For example, it had been open to show that nondeterministic protocols can be more powerful than randomized, for k 3 players. This problem was recently solved in [LS, CA] for up to k = (1 − o(1)) log 2 log 2 n players, and later strengthened in [DP] to k = (1 − ǫ) log 2 n players, where ǫ > 0 is any given constant. An explicit separation for the latter case was obtained in [DPV].
The contribution in this paper is to relate the power of nondeterministic, co-nondeterministic, and Merlin-Arthur protocols. For k = 2 players, the relations among these models are well understood [KN,K2]: it is known that coNP cc 2 = NP cc 2 and further that coNP cc 2 MA cc 2 . Starting at k = 3, however, it has been open to even separate NP cc k and coNP cc k . Our main result is that coNP cc k MA cc k for up to k = (1 − ǫ) log 2 n players, where ǫ > 0 is an arbitrary constant. The separation is by an explicitly given function.
In particular, our work shows that NP cc k = coNP cc k and also subsumes the separation in [DP, DPV], since NP cc k ⊆ MA cc k and BPP cc k ⊆ MA cc k . Let the symbols N (F ), N (−F ), and MA(F ) denote the nondeterministic, conondeterministic, and Merlin-Arthur complexity of F in the k-party numberon-forehead model. In particular, coNP cc k MA cc k and NP cc k = coNP cc k .
It is a longstanding open problem to exhibit a function with nontrivial multiparty complexity for k log 2 n players. Therefore, the separation in Theorem 1.1 is state-of-the-art with respect to the number of players.
The proof of Theorem 1.1, to be described shortly, is based on the pattern matrix method [S1, S2] and its multiparty generalization in [DPV]. In the final section of this paper, we revisit several other multiparty generalizations [C, LS, CA, BH] of the pattern matrix method. By applying our techniques in these other settings, we are able to obtain similar exponential separations by functions as simple as constant-depth circuits. However, these new separations only hold up to k = ǫ log n players, unlike the separation in Theorem 1.1.

Previous Techniques
Perhaps the best-known method for communication lower bounds, both in the number-on-forehead multiparty model and various two-party models, is the discrepancy method [KN]. The method consists in exhibiting a distribution P with respect to which the function F of interest has negligible discrepancy, i.e., negligible correlation with all low-cost protocols. A more powerful technique is the generalized discrepancy method [K1, R3]. This method consists in exhibiting a distribution P and a function H such that, on the one hand, the function F of interest is well-correlated with H with respect to P, but on the other hand, H has negligible discrepancy with respect to P.
In practice, considerable effort is required to find suitable P and H and to analyze the resulting discrepancies. In particular, no strong bounds were available on the discrepancy or generalized discrepancy of constant-depth circuits AC 0 . The recent pattern matrix method [S1, S2] solves this problem for AC 0 and a large family of other matrices. More specifically, the method uses standard analytic properties of Boolean functions (such as approximate degree or threshold degree) to determine the discrepancy and generalized discrepancy of the associated communication problems.
Originally formulated in [S1, S2] for the two-party model, the pattern matrix method has been adapted to the multiparty model by several authors [C, LS, CA, DP, DPV, BH]. The first adaptation of the method to the multiparty model gave improved lower bounds for the multiparty disjointness function [LS, CA]. This line of work was combined in [DP, DPV] with probabilistic arguments to separate the classes NP cc k and BPP cc k for up to k = (1 − ǫ) log 2 n players, by an explicit function. A new paper [BH] gives polynomial lower bounds for constant-depth circuits, in the model with up to k = ǫ log n players. Further details on this body of research and other duality-based approaches [SZ] can be found in the survey article [S3].

Our Approach
To obtain our main result, we combine the work in [DP, DPV] with several new ideas. First, we derive a new criterion for high nondeterministic communication complexity, inspired by the Klauck-Razborov generalized discrepancy method [K1, R3]. Similar to Klauck-Razborov, we also look for a hard function H that is well-correlated with the function F of interest, but we additionally quantify the agreement of H and F on the set F −1 (−1). This agreement ensures that F −1 (−1) does not have a small cover by cylinder intersections, thus placing F outside NP cc k . To handle the more powerful Merlin-Arthur model, we combine this development with an earlier technique [K2] for proving lower bounds against two-party Merlin-Arthur protocols.
In keeping with the philosophy of the pattern matrix method, we then reformulate the agreement requirement for H and F as a suitable analytic property of the underlying Boolean function f and prove this property directly, using linear programming duality. The function f in question happens to be OR.
Finally, we apply our program to the specific function F constructed in [DPV] for the purpose of separating NP cc k and BPP cc k . Since F has small nondeterministic complexity by design, the proof of our main result is complete once we apply our machinery to −F and derive a lower bound on MA(−F ).

Organization
We start in Section 2 with relevant technical preliminaries and standard background on multiparty communication complexity. In Section 3, we review the original discrepancy method, the generalized discrepancy method, and the pattern matrix method. In Section 4, we derive the new criterion for high nondeterministic and Merlin-Arthur communication complexity. The proof of Theorem 1.1 comes next, in Section 5. In the final section of the paper, we explore some implications of this work in light of other multiparty papers [C, LS, CA, BH].
We will need the following observation regarding discrete probability distributions on the hypercube, cf. [S1].
For functions f, g : X 1 × · · · × X k → R (where X i is a finite set, i = 1, 2, . . . , k), we define f, g = (x 1 ,...,x k ) f (x 1 , . . . , x k )g(x 1 , . . . , x k ). When f and g are vectors or matrices, this is the standard definition of inner product. The Hadamard product of f and g is the tensor f • g : The symbol R m×n refers to the family of all m × n matrices with real entries. The (i, j)th entry of a matrix A is denoted by A ij . In most matrices that arise in this work, the exact ordering of the columns (and rows) is irrelevant. In such cases, we describe a matrix using the notation [F (i, j)] i∈I, j∈J , where I and J are some index sets.
We conclude with a review of the Fourier transform over Z n 2 . Consider the vector space of functions {0, 1} n → R, equipped with the inner product is an orthonormal basis for the inner product space in question. As a result, every function f : {0, 1} n → R has a unique representation of the form f = S⊆[n]f (S) χ S , wheref (S) = f, χ S . The realsf (S) are called the Fourier coefficients of f. The following fact is immediate from the definition off (S):

Communication Complexity
An excellent reference on communication complexity is the monograph by Kushilevitz and Nisan [KN]. In this overview, we will limit ourselves to key definitions and notation. The simplest model of communication in this work is the two-party randomized model. Consider a function F : X × Y → {−1, +1}, where X and Y are finite sets. Alice receives an input x ∈ X, Bob receives y ∈ Y, and their objective is to predict F (x, y) with high accuracy. To this end, Alice and Bob share a communication channel and have an unlimited supply of shared random bits. Alice and Bob's protocol is said to have error ǫ if on every input (x, y), the computed output differs from the correct answer F (x, y) with probability no greater than ǫ. The cost of a given protocol is the maximum number of bits exchanged on any input. The randomized communication complexity of F, denoted R ǫ (F ), is the least cost of an ǫ-error protocol for F. It is standard practice to use the shorthand R(F ) = R 1/3 (F ). Recall that the error probability of a protocol can be decreased from 1/3 to any other positive constant at the expense of increasing the communication cost by a constant factor. We will use this fact in our proofs without further mention.
A generalization of two-party communication is the multiparty numberon-forehead model of communication. Here one considers a function F : X 1 × · · · × X k → {−1, +1} for some finite sets X 1 , . . . , X k . There are k players. A given input (x 1 , . . . , x k ) ∈ X 1 × · · · × X k is distributed among the players by placing x i on the forehead of player i (for i = 1, . . . , k). In other words, player i knows x 1 , . . . , x i−1 , x i+1 , . . . , x k but not x i . The players communicate by writing bits on a shared blackboard, visible to all. They additionally have access to a shared source of random bits. Their goal is to devise a communication protocol that will allow them to accurately predict the value of F on every input. Analogous to the two-party case, the randomized communication complexity R ǫ (F ) is the least cost of an ǫ-error communication protocol for F in this model, and R(F ) = R 1/3 (F ).
Another model in this paper is the number-on-forehead nondeterministic model. As before, one considers a function F : X 1 × · · · × X k → {−1, +1} for some finite sets X 1 , . . . , X k . An input from X 1 × · · · × X k is distributed among the k players as before. At the start of the protocol, c 1 unbiased nondeterministic bits appear on the shared blackboard. Given the values of those bits, the players behave deterministically, exchanging an additional c 2 bits by writing them on the blackboard. A nondeterministic protocol for F must output the correct answer for at least one nondeterministic choice of the c 1 bits when F (x 1 , . . . , x k ) = −1 and for all possible choices when F (x 1 , . . . , x k ) = +1. The cost of a nondeterministic protocol is defined as c 1 + c 2 . The nondeterministic communication complexity of F , denoted N (F ), is the least cost of a nondeterministic protocol for F. The co-nondeterministic communication complexity of F is the quantity N (−F ).
The number-on-forehead Merlin-Arthur model combines the power of randomized and nondeterministic models. Similar to the nondeterministic case, the protocol starts with a nondeterministic guess of c 1 bits, followed by c 2 bits of communication. However, the communication can be randomized, and the requirement is that the error probability be at most ǫ for at least one nondeterministic choice when F (x 1 , . . . , x k ) = −1 and for all possible nondeterministic choices when F (x 1 , . . . , x k ) = +1. The cost of a protocol is defined as Analogous to computational complexity, one defines BPP cc k , NP cc k , coNP cc k , and MA cc k as the classes of functions F : ({0, 1} n ) k → {−1, +1} with complexity log O(1) n in the randomized, nondeterministic, co-nondeterministic, and Merlin-Arthur models, respectively.

Generalized Discrepancy and Pattern Matrices
A common tool for proving communication lower bounds is the discrepancy method. Given a function F : X × Y → {−1, +1} and a distribution µ on X × Y, the discrepancy of F with respect to µ is defined as This definition generalizes to the multiparty case as follows. Consider a function F : X 1 ×· · ·×X k → {−1, +1} and a distribution µ on X 1 ×· · ·×X k .
The discrepancy of F with respect to µ is defined as where the maximum ranges over functions χ : X 1 × · · · × X k → {0, 1} of the form A function χ of the form (3.1) is called a rectangle for k = 2 and a cylinder intersection for k 3. Note that for k = 2, the multiparty definition of discrepancy agrees with the one given earlier for the two-party model. We put Discrepancy is difficult to analyze as defined. Typically, one uses the following estimate, derived by repeated applications of the Cauchy-Schwarz inequality.
Theorem 3.1 ( [BNS,CT,R1]). Fix F : X 1 ×· · ·×X k → {−1, +1} and a distribution µ on X 1 ×· · ·×X k . Put ψ(x 1 , . . . , x k ) = F (x 1 , . . . , x k )µ(x 1 , . . . , x k ). Then In the case of k = 2 parties, there are other ways to estimate the discrepancy, including the spectral norm of a matrix (e.g., see [S2]). For a function F : X 1 × · · · × X k → {−1, +1} and a distribution µ over X 1 × · · · × X k , let D µ ǫ (F ) denote the least cost of a deterministic protocol for F whose probability of error with respect to µ is at most ǫ. This quantity is known as the µ-distributional complexity of F. Since a randomized protocol can be viewed as a probability distribution over deterministic protocols, we immediately have that R ǫ (F ) max µ D µ ǫ (F ). We are now ready to state the discrepancy method.

Generalized Discrepancy Method
The discrepancy method is particularly strong in that it gives communication lower bounds not only for bounded-error protocols but also for protocols with error vanishingly close to 1 2 . This strength of the discrepancy method is at once a weakness. For example, the disjointness function disj(x, y) = n i=1 (x i ∧ y i ) has a randomized protocol with error 1 2 − Ω 1 n and communication O(log n). As a result, the disjointness function has high discrepancy, and no strong lower bounds can be obtained for it via the discrepancy method. Yet it is well-known that disj has communication complexity Θ(n) in the randomized model [KS,R2] and Ω( √ n) in the quantum model [R3] and Merlin-Arthur model [K2]. The generalized discrepancy method is an extension of the traditional discrepancy method that avoids the difficulty just cited. This technique was first applied by Klauck [K1] and reformulated in its current form by Razborov [R3]. The development in [K1, R3] takes place in the quantum model of communication. However, the same idea works in a variety of models, as illustrated in [S2]. The version of the generalized discrepancy method for the two-party randomized model is as follows.
The usefulness of Theorem 3.3 stems from its applicability to functions that have efficient protocols with error close to random guessing, such as 1 2 − Ω 1 n for the disjointness function. Note that one recovers Theorem 3.2, the ordinary discrepancy method, by setting H = F in Theorem 3.3.
Proof of Theorem 3.3 (adapted from [S2], pp. 88-89). Put c = R ǫ (F ). A public-coin protocol with cost c can be thought of as a probability distribution on deterministic protocols with cost at most c. In particular, there are random variables χ 1 , χ 2 , . . . , χ 2 c : X × Y → {0, 1}, each a rectangle, as well as random variables σ 1 , σ 2 , . . . , σ 2 c ∈ {−1, +1}, such that Therefore, On the other hand, by the definition of discrepancy. The theorem follows at once from the last two inequalities.

Pattern Matrix Method
To apply the generalized discrepancy method to a given Boolean function F, one needs to identify a Boolean function H which is well correlated with F under some distribution P but has low discrepancy with respect to P. The pattern matrix method [S1, S2] is a systematic technique for finding such H and F. To simplify the exposition of our main results, we will now review this method and sketch its proof.
Recall that the ǫ-approximate degree of a function f : {0, 1} n → R, denoted deg ǫ (f ), is the least degree of a polynomial p with f − p ∞ ǫ. A starting point in the pattern matrix method is the following dual formulation of the approximate degree.
See [S2] for a proof of this fact using linear programming duality. The crux of the method is the following theorem. Let N be a given integer. Define where the rows are indexed by x ∈ {0, 1} N and columns by V ∈ [N ] n . Then At last, we are ready to state the pattern matrix method. n . If N 16en 2 /d, then Proof (adapted from [S2]). Let ǫ = 1/10. By Fact 3.5, there exists a function h : {0, 1} n → {−1, +1} and a probability distribution µ on {0, 1} n such that (3.5) The theorem now follows from (3.4) and (3.5) in view of the generalized discrepancy method, Theorem 3.3.

Remark.
Presented above is a weaker, combinatorial version of the pattern matrix method. The communication lower bounds in Theorems 3.6 and 3.7 were improved to optimal in [S2] using matrix-analytic techniques. Unlike the combinatorial argument above, however, the matrix-analytic proof is not known to extend to the multiparty model and is not used in the follow-up multiparty papers [C, LS, CA, DP, DPV, BH] or our work.
An alternate technique based on Fact 3.5 is the block-composition method [SZ], developed independently of the pattern matrix method. See [S3, §5.3] for a comparative discussion.
Since sign tensors H and −H have the same discrepancy under any given distribution, we have the following alternate form of Theorem 4.1.
Corollary 4.2. Let F : X → {−1, +1} be given, where X = X 1 × · · · × X k . Fix a function H : X → {−1, +1} and a probability distribution P on X. Put . At first glance, it is unclear how the nondeterministic bound of Theorem 4.1 and its counterpart Corollary 4.2 relate to the generalized discrepancy method. We now pause to make this relationship quite explicit. Recall that nondeterminism is a kind of randomized computation, viz., a nondeterministic protocol with cost c for a function F is a kind of cost-c randomized protocol with error probability at most ǫ = 1 2 − 2 −c on F −1 (−1) and error probability ǫ = 0 elsewhere. This is the setting of Theorem 4.1. The generalized discrepancy method, on the other hand, has a single error parameter ǫ for all inputs. To best convey this distinction between the two methods, we formulate a more general criterion yet, which allows for different errors on each input. Theorem 4.3. Let F : X → {−1, +1} be given, where X = X 1 × · · · × X k . Let c be the least cost of a public-coin protocol for F with error probability E(x) on x ∈ X, for some E : X → [0, 1/2]. Then for all functions H : X → {−1, +1} and all probability distributions P on X, Proof. A public-coin protocol with cost c is a probability distribution on deterministic protocols with cost at most c. Then by hypothesis, there are random variables χ 1 , χ 2 , . . . , χ 2 c : X → {0, 1}, each a cylinder intersection, and random variables σ 1 , σ 2 , . . . , σ 2 c ∈ {−1, +1}, such that Therefore, On the other hand, by the definition of discrepancy. The theorem follows at once from the last two inequalities.

Main Result
We now prove the claimed separations of nondeterministic, conondeterministic, and Merlin-Arthur communication complexity. It will be easier to first obtain these separations by a probabilistic argument and only then sketch an explicit construction. We start by deriving a suitable analytic property of the or function.
Proof. If | S z | > m2 k − d2 k−1 , then some S z must feature more than m − d elements that do not occur in u =z S u . But this forces Γ(Y ) = 0 since the Fourier transform of ψ is supported on characters of order d and higher.
In view of (5.4) and Claims 5.3 and 5.4, we have It remains to bound the probabilities in the last expression. With probability at least 1−k2 −n over the choice of Y , we have y 0 i = y 1 i for each i = 1, 2, . . . , k. Conditioning on this event, the fact that α is chosen uniformly at random means that the 2 k sets S z are distributed independently and uniformly over [n] m . A calculation now reveals that We are ready to prove our main result. It may be helpful to contrast the proof to follow with the proof of the pattern matrix method (Theorem 3.7).
On the other hand, as observed in [DP], the function F α has an efficient nondeterministic protocol. Namely, player 1 (who knows y 1 , . . . , y k ) nondeterministically selects an element i ∈ α(y 1 , . . . , y k ) and writes i on the shared blackboard. Player 2 (who knows x) then announces x i as the output of the protocol. This yields the desired upper bound in (5.5).
As promised, we will now sketch an explicit construction of the function whose existence has just been proven. For this, it suffices to invoke previous work by David, Pitassi, and Viola [DPV], who derandomized the choice of α in Theorem 5.2. More precisely, instead of working with a family {H α } of functions, each given by H α (x, y 1 , . . . , y k ) = h(x| α(y 1 ,...,y k ) ), the authors of [DPV] posited a single function H(α, x, y 1 , . . . , y k ) = h(x| α(y 1 ,...,y k ) ), where the new argument α is known to all players and ranges over a small, explicitly given subset A of all mappings ({0, 1} n ) k → [n] m . By choosing A to be pseudorandom, the authors of [DPV] forced the same qualitative conclusion in Theorem 5.2. This development carries over unchanged to our setting, and we obtain our main result. In particular, coNP cc k MA cc k and NP cc k = coNP cc k .
Proof. Identical to Theorem 5.5, with the described derandomization of α.

On Disjointness and Constant-Depth Circuits
In this final section, we revisit recent multiparty analyses of the disjointness function and other constant-depth circuits [C, LS, CA, BH]. We will see that the program of the previous sections applies essentially unchanged to these other functions. We start with some notation. Fix a function φ : {0, 1} m → R and an integer N with m | N. Define the (k, N, m, φ)-pattern tensor as the k-  [S1, S2] to higher dimensions. The two-party Theorem 3.6 has been adapted as follows to k 3 players. Let N be a given integer, m | N. Let H be the (k, N, m, h)-pattern tensor. Let P be the (k, N, m, 2 −m(N/m) k−1 +m (N/m) −m(k−1) µ)-tensor. If N 4em 2 (k− 1)2 2 k−1 /d, then disc P (F ) 2 −d/2 k−1 .
A proof of this exact formulation is available in the survey article [S3], pp. 85-86. We are now prepared to apply our techniques to the disjointness function.
Theorem 6.2. Let N be a given integer, m | N. Let F be the (k, N, m, or m )pattern tensor. If N 4em 2 (k − 1)2 2 k−1 /d, then In view of (6.1)-(6.3) and Corollary 4.2, the proof is complete.
Recall that disjointness has trivial nondeterministic complexity, O(log n). In particular, Theorem 6.2 shows that the disjointness function separates NP cc k from coNP cc k and witnesses that coNP cc k MA cc k for up to k = Θ(log log n) players. Our technique similarly applies to the follow-up work on disjointness by Beame and Huynh-Ngoc [BH], whence we obtain the stronger consequence that the disjointness function separates NP cc k from coNP cc k and witnesses that coNP cc k MA cc k for up to k = Θ(log 1/3 n) players. We conclude this section with a remark on constant-depth circuits. Let ǫ be a sufficiently small absolute constant, 0 < ǫ < 1. For each k = 2, 3, . . . , ǫ log n, the authors of [BH] construct a constant-depth circuit F : ({0, 1} n ) k → {−1, +1} with N (F ) = log O(1) n and R(F ) = n Ω(1) . A glance at the proof in [BH] reveals, once again, that the program of our paper is readily applicable to F, with the consequence that MA(−F ) = n Ω(1) .
In particular, our work shows that NP cc k = coNP cc k and coNP cc k MA cc k for up to k = ǫ log n players, as witnessed by a constant-depth circuit.