The layer complexity of Arthur-Merlin-like communication

In communication complexity the Arthur-Merlin (AM) model is the most natural one that allows both randomness and non-determinism. Presently we do not have any super-logarithmic lower bound for the AM-complexity of an explicit function. Obtaining such a bound is a fundamental challenge to our understanding of communication phenomena. In this article we explore the gap between the known techniques and the complexity class AM. In the first part we define a new natural class, Small-advantage Layered Arthur-Merlin (SLAM), that has the following properties: - SLAM is (strictly) included in AM and includes all previously known subclasses of AM with non-trivial lower bounds. - SLAM is qualitatively stronger than the union of those classes. - SLAM is a subject to the discrepancy bound: in particular, the inner product function does not have an efficient SLAM-protocol. Structurally this can be summarised as SBP $\cup$ UAM $\subset$ SLAM $\subseteq$ AM $\cap$ PP. In the second part we ask why proving a lower bound of $\omega(\sqrt n)$ on the MA-complexity of an explicit function seems to be difficult. Both of these results are related to the notion of layer complexity, which is, informally, the number of"layers of non-determinism"used by a protocol.

• SLAM is a subject to the discrepancy bound: for any f SLAM( f ) ∈ Ω log 1 disc( f ) .
In particular, the inner product function does not have an efficient SLAM-protocol.
Structurally this can be summarised as

Introduction
The communication model Arthur-Merlin (AM) is beautiful. It is the most natural regime that allows both randomness and nondeterminism. Informally, • BPP, the canonical complexity class representing randomised communication, contains such bipartite functions f that admit an approximate partition of the set f −1 (1) into quasi-polynomially many rectangles; • NP, the canonical complexity class representing nondeterministic communication, contains such f that admit an exact cover of the set f −1 (1) by quasi-polynomially many rectangles; • AM contains such f that admit an approximate cover of the set f −1 (1) by quasi-polynomially many rectangles.
While both BPP and NP are relatively well understood and many strong and tight lower bounds are known, we do not have any non-trivial lower bound for the AM-complexity of an explicit function. Obtaining such a bound is a fundamental challenge to our understanding of communication complexity.
Among numerous analytical efforts that have been made to understand AM, in this paper we are paying special attention to these two: • In 2003 Klauck [13] studied the class Merlin-Arthur (MA): while (again, informally) AM can be viewed as "randomness over nondeterminism," MA is "nondeterminism over randomness." Klauck has found an elegant way to exploit this difference in order to prove strong lower bounds against MA.
• In 2015 Göös, Pitassi and Watson [10] demonstrated strong lower bounds against the class Unambiguous Arthur-Merlin (UAM), which was defined in the same paper. Similarly to AM (and unlike MA), their class can be viewed as "randomness over nondeterminism," but only a very special form of nondeterminism is allowed: namely, only the (erroneously accepted) elements of f −1 (0) may belong to several rectangles; every element of f −1 (1) can belong to at most one rectangle of the nondeterministic cover. In other words, a UAM-protocol must correspond to an approximate partition of f −1 (1), but at the same time it may be an arbitrary cover of a small fraction of f −1 (0). Intuitively, a UAM-protocol must "behave like BPP " over f −1 (1) and is unrestricted over the small erroneously accepted fraction of f −1 (0).
Interestingly, the classes MA and UAM are incomparable: from the lower bounds demonstrated in [10] and in [9] it follows that UAM MA and MA UAM .
In the first half of this article (Section 3) we try to find a communication model that would be as close to AM as possible, while staying within the reach of our analytic abilities. Inspired by the (somewhat Hegelian) metamorphosis of "easy" BPP and NP into "hard" AM, we will try to apply a similar "fusion" procedure to the classes MA and UAM, hoping that the outcome will give us some new insight into the mystery of AM.
Namely, we start by looking for a communication complexity class, defined as naturally as possible and containing both MA and UAM. We will call it Layered Arthur-Merlin (LAM) (Definition 3.3). Informally, it can be described as letting a protocol behave like MA over f −1 (1) and arbitrarily over the erroneously accepted small fraction of f −1 (0). Note that it follows trivially from the previous discussion (at least on the intuitive level) that MA ∪ UAM ⊆ LAM .
Then we will add a few rather technical "enhancements" to LAM in order to get a class that includes all previously known classes "under AM" with non-trivial lower bounds: most noticeably, the class SBP, which is known to be strictly stronger than MA and strictly weaker than AM (see [11,9,14]).
We call the resulting model Small-advantage Layered Arthur-Merlin (SLAM) (Definition 3.4) and it holds that MA, UAM, SBP, LAM ⊆ SLAM ⊂ AM .
Moreover, we will demonstrate a partial function f ∈ SLAM \ (UAM ∪ SBP) , that is, SLAM will be strictly stronger than the union of all subclasses of AM with previously known non-trivial lower bounds (as UAM ∪ SBP includes them all). 1 Both LAM and SLAM seem to require a new approach for proving lower bounds. It will be developed in Section 3.1, showing that these classes are still a subject to the discrepancy bound: for any function f , where SLAM( f ) denotes the "SLAM-complexity" of f . In particular, the inner product function does not have an efficient SLAM-protocol. These properties of SLAM can be summarised structurally as where PP is the class consisting of functions with high discrepancy. The problem of proving a lower bound of ω ( √ n) for the MA-complexity of an explicit function has been open since 2003, when Klauck [13] showed that the MA-complexity of Disj and IP was in Ω ( √ n). At that point a number of researchers believed that the actual MA-complexity of these problems was in Ω (n), so it was surprising when Aaronson and Wigderson [1] demonstrated MA-protocols for Disj and IP of cost O ( √ n log n), which was later improved by Chen [6] to O √ n log n log log n . In the second part of this article (Section 4) we try to understand why proving a super-√ n lower bound against MA seems to be difficult. We will define a communication model MA that can be viewed, in certain sense, as a non-uniform MA. On the one hand, we will see that imposing the corresponding uniformity constraint on MA-protocols makes them not stronger than MA-protocols; on the other hand, all known lower bounds on MA( f ) readily translate to similar lower bounds on MA( f ).
Intuitively, a complexity analysis that would explore the uniformity of MA (as opposed to MA) must have a very unusual structure: the difference between the classes is subtle and we are not aware of any examples where this type of an argument is used. At the same time, we will see that MA( f ) ∈ O n · AM( f ) for any function f -that is, any lower bound of the form MA( f ) ∈ ω ( √ n) would have non-trivial implications for AM( f ). This partially explains why no super-√ n lower bound against MA has been found yet. 2 Why LAM is interesting. In the hope that it would benefit the reader, let us explain the motivation for defining and studying the communication models presented in the first part of this article. The strong lower bounds that were shown earlier for both MA and UAM were in the first place steps towards AM. Both MA and UAM have very natural definitions, they both can be viewed as weakened versions of AM, and the authors of both [13] and [10] have invented new insightful approaches while analysing these models. 2 It is relatively easy to show AM( f ) ∈ Ω (log n) for an explicit function (see Footnote 10); to improve that, a lower bound of the form MA( f ) ∈ ω √ n · log n would be needed. However, it seems that proving any MA( f 0 ) ∈ ω ( √ n) and deriving from it, via the argument of Section 4, that AM( f 0 ) ∈ ω (1) would by itself shed some new light on the enigma of AM. As one of the concluding open problems (Section 5), we suggest proving a lower bound of the form Ω √ n log n on the MA-complexity of an explicit function. THEORY OF COMPUTING, Volume 17 (8), 2021, pp. 1-28 The model LAM, in turn, has been defined as a natural "junction" of MA and UAM, at least as strong as either of the predecessors. 3 As the known approaches for analysing MA and UAM were rather different qualitatively, we expected the new model to be challenging enough to justify defining it. Our experience of proving a strong lower bound for the newly defined model has confirmed those expectations.
We hope that studying LAM will serve as the next step towards understanding AM.

Preliminaries and definitions
For x ∈ {0, 1} n and i ∈ [n] = {1, . . . , n}, we will write x i or x(i) to address the i-th bit of x (preferring "x i " unless it may cause ambiguity). Similarly, for S ⊆ [n], let both x S and x(S) denote the |S|-bit string, consisting of (naturally ordered) bits of x, whose indices are in S. For a (discrete) set A and k ∈ N, we denote by Pow(A) the set of subsets of A and by A k the set a ∈ Pow(A) |a| = k .
Our primary objects of computation will be bipartite Boolean functions of the form A × B → {0, 1} (typically, {0, 1} n × {0, 1} n → {0, 1}). At times we will consider partial bipartite Boolean functions, where some of the pairs are excluded: this can be interpreted either as assuming that those pairs are never given as input, or as allowing any output of a communication protocol when those pairs are received. We will view partial Boolean functions as total ones that are taking values from {0, 1, ⊥}, where "⊥" marks the excluded input values. Note that the total functions are a special case, so f : A × B → {0, 1, ⊥} can be either total or partial. When we refer to an input distribution for a function f : We will use the logical OR ( ) operator with respect to partial Boolean functions, defined as follows (note the asymmetry between the first two cases):

Communication complexity
We refer to [15] for a classical background on communication complexity and to [11] for a great survey of the more recent developments.
Communication problems. We will repeatedly consider the following two communication problems.  3 Here we are referring to LAM, as its definition is more natural and less technically involved than that of SLAM; on the other hand, the difference between the two models is, in our opinion, merely formal (as explained above, we wanted the corresponding complexity class to contain all previously studied subclasses of AM, including SBP, and that was the reason for "boosting" LAM, which resulted in SLAM).
THEORY OF COMPUTING, Volume 17 (8), 2021, pp. 1-28 Definition 2.2 (Inner product function, IP). For every n ∈ N, let (x, y) ∈ {0, 1} n × {0, 1} n . Then Both Disj and IP are total bipartite Boolean functions-that is, their input sets are bipartite and the function values are defined for every possible input pair.
Communication models. The study of communication complexity was initiated by Abelson [2] in the regime of real-valued messages and adapted by Yao [17] to the discrete regime that we are interested in. The models P and BPP that capture one's intuition of efficient communication (respectively, deterministic and randomised) date back to [17]. Later Babai, Frankl and Simon [3] introduced a number of stronger communication models-in particular, AM and MA-that intuitively corresponded to some classes studied in the context of structural computational complexity. Definition 2.3 (Polylogarithmic, P). We call deterministic bipartite communication protocols Pprotocols.
We denote by P the class of functions solved by P-protocols of cost at most poly-log(n). If for every input distribution µ n there exists a P-protocol of cost at most k ε (n) that solves f with error at most ε, then we say that the BPP ε -complexity of f , denoted by BPP ε ( f ), is at most k ε (n).
We let the BPP-complexity of f be its BPP1 3 -complexity. We denote by BPP the class of functions whose BPP-complexity is at most poly-log(n).
The above definition of BPP, as well those among the following model definitions that are distributiondependent, can be phrased in the "worst-case" formulations that do not make a reference to input distributions. Those variants usually correspond to the closures of our definitions with respect to mixed strategies, which, in turn, do not affect the resulting models, due to Von Neumann's minimax principle [16]. We call such Π an NP-protocol of cost k(n) that solves the function f = 2 k(n) i=1 r i (x, y) (as well any partial g that is consistent with f on g −1 (0) ∪ g −1 (1)).
We denote by NP the class of functions solved by NP-protocols of cost at most poly-log(n). If for every input distribution µ n there exists an NP-protocol of cost at most k(n) that solves f with error at most 1 3 , then we say that the AM-complexity of f , denoted by AM( f ), is at most k(n). We denote by AM the class of functions whose AM-complexity is at most poly-log(n). THEORY OF COMPUTING, Volume 17 (8), 2021, pp. 1-28 As we mentioned already, AM is a very strong model of communication, for which we currently do not have any non-trivial lower bound. All the following classes can be viewed (and some of them have been defined) as "weaker forms" of AM: for all of them we already have strong lower bounds.
We call Merlin-Arthur (MA) the class of functions whose MA-complexity is at most poly-log(n).
Note that " " of partial functions is defined as in (2.1).
If for input distribution µ n and some α > 0 there exists a P-protocol Π of cost at most k (n) such that then we call Π an SBP-protocol for f with respect to µ n . The complexity of this protocol is k (n) + log(1/α) (note that the value of α may depend on both n and µ n ).
If with respect to every µ n there exists a SBP-protocol for f of cost at most k(n), then we say that the SBP-complexity of f , denoted by SBP( f ), is at most k(n).
We denote by SBP the class of functions whose SBP-complexity is at most poly-log(n).
It was shown in [8,4] that MA ⊆ SBP ⊆ AM, in [14] that SBP = AM and in [9] that SBP = MA. Therefore, The following complexity measure is a core methodological notion for this work. We say that the protocol Π • has layer complexity l if every (x, y) ∈ {0, 1} n × {0, 1} n belongs to at most l rectangles of Π ; • has 0-layer complexity l 0 if every (x, y) ∈ f −1 (0) belongs to at most l 0 rectangles of Π ; • has 1-layer complexity l 1 if every (x, y) ∈ f −1 (1) belongs to at most l 1 rectangles of Π .
The concept of layer complexity in the context of nondeterministic communication is very natural and not new, dating back at least to [12] by Karchmer, Newman, Saks and Wigderson. We will use it extensively in order to analyse some previously known subclasses of AM with strong lower bounds and to define some new ones.
The following two classes were introduced quite recently by Göös, Pitassi and Watson [10].
If for some constant ε < 1 2 and every input distribution µ n there exists an NP-protocol of cost at most k(n) and 1-layer complexity 1 that solves f with error at most ε, then we say that the UAM-complexity of f , denoted by UAM( f ), is at most k(n).
We denote by UAM the class of functions whose UAM-complexity is at most poly-log(n).
Definition 2.11 (Unambiguous Arthur-Merlin with perfect completeness, UAM compl ). For every n ∈ N, If for every input distribution µ n there exists an NP-protocol of cost at most k(n) and 1-layer complexity 1 that solves f with perfect completeness (that is, it accepts every (x, y) ∈ f −1 (1)) and soundness error at most 1 2 We denote by UAM compl the class of functions whose UAM compl -complexity is at most poly-log(n).
The classes AM, MA, UAM compl and UAM can be defined in an alternative, more "narrative" way, where an almighty prover Merlin interacts with a limited verifier Arthur (who, in turn, is a two-headed union of the players Alice and Bob). In the cases of AM, UAM compl and UAM these variants correspond to the closures with respect to mixed strategies (that are equivalent to our definitions, as mentioned earlier).
Note that the error parameter in the definitions of AM and UAM compl are fixed without loss of generality, while for UAM it may be any constant ε < 1 2 . In the first two cases the error can be trivially reduced to an arbitrary constant by repeating the protocol constant number of times; on the other hand, in the case of UAM the possibility of efficient error reduction is not known, so fixing a specific ε might result in weakening the model. 5 It was shown in [10] that NP UAM. They also showed that UAM SBP held in the context of query complexity, later in [9] this separation was generalised to the case of communication complexity, thus implying that UAM and SBP are incomparable: On the other hand, UAM and SBP are the strongest previously known communication complexity classes contained in AM with non-trivial lower bounds, which makes it interesting to look for their "natural merge" and try to prove good lower bounds there. That will be the quest of the next section.
3 Layered Arthur-Merlin: getting as close to AM as we can Let us try to construct as strong a communication model "under AM" as we can analyse.
We start by considering several slightly stronger modifications of MA that will emphasise the intuition behind the main definitions that will follow.
If for some k(n) and We call such h i i ∈ [t] an MA -protocol for f . We address the value t as the layer complexity of this protocol. 6 Observe that always holds: the inequality follows trivially from the definitions, and the containment results from the well-known fact that for every function h and ε . So, MA is the class of functions, whose MA -complexity is at most poly-log(n).
, what does it imply with respect to a known input distribution µ? In this case for every h i there is a P-protocol of cost at most k(n) that computes a function g i , such that Pr Pr Pr µ [h i (X,Y ) = g i (X,Y )] ≤ 1 3·t 2 (n) ; accordingly, the union bound gives Pr Pr Pr .
What can we say about a communication complexity class that only requires that the above holds for every µ: in particular, what will be its relation to MA? Let us define it.
For some k(n) and If for every input distribution µ n there exists an MA-protocol of cost k(n) for f , then we say that the MA-complexity of f , denoted by MA( f ), is at most k(n).
We denote by MA the class of functions whose MA-complexity is at most poly-log(n).
Note that follows from the definition and the previous discussion: Towards our goal to construct a communication model under AM as strong as we can analyse, let us look at UAM compl : together with MA these are, arguably, the two most natural (though not the strongest) "sub-AM" models for which we have good lower bounds. Conceptually, the insightful lower bounds given by Klauck [13] for MA and by Göös, Pitassi and Watson [10] for UAM compl can be viewed as two different approaches to analysing strong "sub-AM" models of communication complexity.
On the one hand, the more recently defined UAM compl has at least one important "AM-like" property that MA lacks: AM puts no limitations on the layer complexity of protocols; MA limits the number of "layers" over any input pair; UAM compl and UAM only limit the 1-layer complexity (that is, they let the 0-layer complexity of a protocol be arbitrary, like AM and unlike MA). This difference seems to be rather crucial: • While the lower-bound argument of [13] against MA can be generalised to work against a communication model that would limit only the 0-layer complexity of a protocol, it does not seem to go through if only the 1-layer complexity is limited.
• If we consider the natural (and the most common) situation when the target function is balanced with respect to its "hard" distribution-which is the case, for instance, for all functions with low discrepancy-then the "expected density" of protocol's rectangles over the points in the (erroneously) accepted ε-fraction of f −1 (0) would be much higher than the density in the (rightly) accepted majority of f −1 (1). In other words, the expected number of protocol's rectangles that an accepted (x, y) ∈ f −1 (0) belongs to would be considerably higher than the analogous value for (x, y) ∈ f −1 (1). Accordingly, limiting only the 1-layer complexity feels like a weaker restriction (i. e., resulting in a stronger defined model) than limiting only the 0-layer complexity (or both).
On the other hand, even though the classes 7 UAM compl and UAM limit only the 1-layer complexity of a protocol, the actual quantitative limitation that they put is way too strong: it is 1, as opposed to the quasi-polynomial limitation on the (total) layer complexity of MA (as emphasised by Definition 3.2). For instance, it has been shown in [10] that NP UAM (note that NP ⊆ MA and UAM compl ⊆ UAM). To include NP, an "NP-like" class must allow protocols with super-constant 1-layer complexity.
On the technical level, comparing the definitions of MA (Definition 3.2) and of UAM compl (Definition 2.11), we can see that in both cases the membership of a function f implies existence of a family of rectangles, whose union approximates f -that is, existence of good NP-approximations of f : , then for some t(n) ∈ N and every input distribution µ n there exists an NP-protocol of cost at most poly-log(n) and layer complexity at most t(n) that solves f with error at most 1/3t(n); • if f ∈ UAM compl , then for every input distribution µ n there exists an NP-protocol of cost at most poly-log(n) and 1-layer complexity 1 that solves f with perfect completeness and soundness error at most 1/2 with respect to µ n .
Note that the above membership condition of UAM compl is sufficient, and that of MA is just necessary. Let us use this intuition to define a new communication complexity class that includes both UAM compl and MA. If for input distribution µ n there exists an NP-protocol Π of 1-layer complexity t that solves f with completeness error at most 1/3 and soundness error at most 1/3t, then we call Π a LAM-protocol for f with respect to µ n . If Π contains K rectangles, then its complexity is log (K).
If with respect to every µ n there exists a LAM-protocol for f of cost at most k(n), then we say that the LAM-complexity of f , denoted by LAM( f ), is at most k(n).
We denote by LAM the class of functions whose LAM-complexity is at most poly-log(n).
It follows readily from the previous discussion that To make it somewhat stronger and to simplify its definition, we have granted to LAM a few additional relaxations (not needed in order to include MA and UAM compl ): Most significantly, in LAM the layer complexity bound t can be chosen per distribution µ n , and it does not have to be error-independent, unlike in the cases of both MA and UAM compl (for the latter it equals 1).
Let us further strengthen the model, so that the corresponding complexity class would include all previously known subclasses of AM with strong lower bounds. The following definition can be viewed as LAM with relaxed accuracy requirements. If for input distribution µ n and some α > 0 there exists an NP-protocol Π of 1-layer complexity t such that then we call Π a SLAM-protocol for f with respect to µ n . If Π contains K rectangles, then its complexity is log (K/α) (the value of α may depend on both n and µ n ). If with respect to every µ n there exists a SLAM-protocol for f of cost at most k(n), then we say that the SLAM-complexity of f , denoted by SLAM( f ), is at most k(n).
We denote by SLAM the class of functions whose SLAM-complexity is at most poly-log(n).
As any LAM-protocol of cost k is also a SLAM-protocol of cost k + log holds for all f . Later (Section 3.2) we will see that SLAM indeed is a proper subclass of AM that includes all previously known (as far as we are aware) subclasses of AM with strong lower bounds: NP, MA, MA,UAM compl , LAM, UAM, SBP ⊆ SLAM ⊂ AM ; moreover, it is strictly stronger than their union:

Limitations of LAM and SLAM
Let us see that the SLAM-complexity is a subject to the discrepancy bound. The discrepancy of f with respect to µ n is defined as where r ranges over the combinatorial rectangles over {0, 1} n × {0, 1} n .
That is, where PP is the class consisting of functions with high discrepancy. Along with other mentioned properties, this implies as AM PP is known [14].
To prove the theorem we will use the following combinatorial lemma.
THEORY OF COMPUTING, Volume 17 (8), 2021, pp. 1-28 Let µ be a distribution on W , such that for some λ > 0. 8 Then for any γ > λ there exists J ⊆ [m] of size at most k def = log β +1 2 γ λ , such that for C J def = j∈J C j it holds that and Informally, the lemma says that for any family of sets C 1 , . . . ,C m there exists C J = j∈J C j that "highlights" points that are contained in more than a certain threshold number of sets C i ; moreover, the fraction of such points in W that end up in C J ∩W is not too small.
Proof. We will find a set C i 0 such that µ(C i 0 ∩ W 1 ) is not too small and, at the same time, the ratio µ(C i 0 ∩W 1 )/µ(C i 0 ∩W 0 ) is significantly larger than µ(W 1 )/µ(W 0 ). The result will follow by induction on t.
The first part of the argument is the same for the base case (t = 1) and the inductive step (t ≥ 2). Let C i (·) denote the characteristic function of C i , then Here let x 0 > y hold for any x, y > 0.
THEORY OF COMPUTING, Volume 17 (8), 2021, pp. 1-28 That is, and At this point we check whether letting J = {i 0 } would satisfy the statement of the lemma. Assume that it would not; as γ > λ ⇒ k ≥ 1, this necessarily means that (3.3) is insufficient to guarantee (3.1). In other words, it holds that where the latter inequality is the contrapositive of (3.1) with respect to J = {i 0 }, and therefore for all j = i 0 and Note that C i 0 ⊆ W 0 . How we continue from here depends on the value of t: first suppose that t = 1 (the base case for the induction). Let As t = 1, which satisfies (3.1) (according to Footnote 8). This finishes the proof of the base case. Now suppose t ≥ 2. That is, we are inside the inductive step, so let us apply Lemma 3.8 inductively to the family C j j = i 0 with parameters Note that this choice corresponds to where the last equality follows from (3.3). Note that λ < γ follows from (3.4). The lemma guarantees existence of (non-empty) J ⊆ [m] \ {i 0 } of size at most (where the inequality follows from β > β ), such that for C J def = j∈J C j it holds that where the last inequality follows from (3.2). Letting J def = J ∪ {i 0 } finishes the proof.
We are ready to prove the lower bound.
Proof. The argument is as follows. Recall that the core advantage of the models LAM and SLAM over MA is allowing arbitrarily high 0-layer complexity in efficient protocols: If the 0-layer complexity of a given protocol Π is not above its 1-layer complexity, then Klauck's argument for MA limits Π's strength.
The case when the 0-layer complexity is high was the main challenge of this work and the reason why it was written. Here we further assume that the average 0-layer complexity of Π is noticeably higher than its average 1-layer complexity (we notice that the other cases can be handled by a straightforward adaptation of Klauck's argument). The layer complexity measures the density of Π's rectangles, and the assumed difference in the average densities between f −1 (0) and f −1 (1) implies that the membership of a pair of random variables (X,Y ) in "too many" rectangles makes the event [ f (X,Y ) = 0] more likely. Lemma 3.8 gives us a "not too small" rectangle intersection-therefore a rectangle by itself-where many elements belong to many (original) rectangles. The discrepancy assumption applied to this new rectangle concludes the argument.
Let µ be a distribution that achieves disc( f ) = disc µ ( f ) and be a SLAM-protocol for f with respect to µ of cost k(n) + log(1/α(n)) that accepts α(n)-fraction of the elements of f −1 (1), and whose 1-layer complexity is t(n). By the definition of SLAM, the soundness error of Π in solving f with respect to µ is at most By the definition of disc µ (and the fact that {0, 1} n × {0, 1} n is a rectangle),

Pr Pr Pr
Let l av 0 (n) denote the average 0-layer complexity of Π, namely where "Π −1 (1)" denotes the set of input pairs accepted by Π. Fix n ∈ N. We will consider two cases, distinguished by the value of l av 0 (n). First suppose that where the last inequality follows from (3.6), and where the first inequality follows from (3.5) and (3.7), and the last one from (3.6). Therefore, and for some r 0 ∈ Π it holds that and k(n) + log 1 as required. Now suppose that THEORY OF COMPUTING, Volume 17 (8), 2021, pp. 1-28 Let us see that µ(A) cannot be too small. Therefore, and Pr Pr Pr On the other hand, where the last inequality follows from the fact that the µ-weight of the largest rectangle of Π is, due to (3.6), at least α(n) 2 k(n) · and the relative µ-weight of f −1 (0) in it is at least Assuming [disc( f ) ≤ 1 2 ] (otherwise the desired statement holds trivially), we get as Π accepts, with respect to µ, α(n)-fraction of f −1 (1) and smaller fraction of f −1 (0).
Note that as follows from their definitions and the fact that the 1-layer complexity of Π is t(n).
Let us make use of the difference in the "rectangle density" of A and B via applying Lemma 3.8. Namely, let m as follows from (3.8), (3.9) and the statement of the lemma. That is, As s ⊆ Π −1 (1), it follows from (3.10) that and since s is a rectangles' intersection and therefore a rectangle itself, as required. as PP is the class consisting of functions with high discrepancy and AM PP is known [14]. It remains to see that which will be implied by the upcoming Claims 3.9 and 3.13.
Claim 3.9. For any bipartite Boolean function f : Proof. The proof combines the "randomness sparsification" method of Goldwasser and Sipser [8] with NP-witnessing.
Assume SLAM( f ) = k(n). Then by Von Neumann's minimax principle [16] there exists a family Π = {h 1 , . . . , h m } for some m ∈ N, where every h i is a bipartite Boolean function computable by an NP-protocol of cost at most k(n), such that for some α ≥ 2 −k(n) . 9 By a standard Bernstein-type concentration argument (e. g., [7], Lemma 1, Hoeffding inequality), there exists l ∈ O (n/α) ⊆ O 2 k(n)+log n such that for some Π ⊆ Π of size l it holds that where we have assumed without loss of generality that Π = {h 1 , . . . , h l }. By another application of the Hoeffding inequality, for some s ∈ O n + 1 α ⊆ O 2 k(n)+log n and a uniformly random function g : [l] → [s] it holds with positive probability that (3.11) Fix any such g. Consider the following AM-protocol (described below in a distribution-free regime, which is the dual equivalent of Definition 2.6).
• The players pick Z ⊂ ∼ [s] and send it to Merlin.
• The players accept if and only if h i (X,Y ) = 1 ∧ g(i) = Z, where the former is witnessed by w (recall that NP(h i ) ≤ k(n)) .
By (3.11), this is an AM-protocol for f with error at most 2/5; repeating it several times and taking the majority vote brings the error bound to at most 1/3. The cost of the resulting protocol is in O (k(n) + log n), as required.
To see that UAM ∪ SBP ⊂ SLAM, we prove a somewhat stronger separation: For that we will use several results from [10,9]. The following partial function has been used to show that UAM compl SBP.  [10,9]).
Claim 3.13. For any n ∈ N such that Gut-IP n is defined and x 1 , . Then Proof. Consider an input distribution µ that fixes X 1 = Y 1 = 1 n and makes the pair (X 2 ,Y 2 ) come from a hard distribution for Gut-IP n (X 2 ,Y 2 ), then any SBP-protocol that solves f with respect to µ must have complexity Ω n 1/4 · log 3/4 n , according to Fact 3.12. Similarly, a distribution that fixes (X 2 ,Y 2 ) ∈ Gut-IP −1 n (1) arbitrarily and makes Disj(X 1 ,Y 1 ) hard for UAM witnesses that UAM( f ) ∈ Ω (n), according to Fact 3. 10.
To see that LAM( f ) ∈ O log 2 n , let µ be any input distribution for f and let µ be the marginal distribution of (X 2 ,Y 2 ) when ((X 1 , X 2 ), (Y 1 ,Y 2 )) ∼ µ. Consider a UAM compl -protocol Π of complexity O (log n) that solves Gut-IP n with perfect completeness and soundness error at most 1 2 with respect to µ , and let Π be its amplified version of complexity O log 2 n that solves Gut-IP n with soundness error at most 1 3n with respect to µ . Let Π ((X 1 , X 2 ), (Y 1 ,Y 2 )) be a nondeterministic protocol for f that does the following: • emulates the behaviour of Π (X 2 ,Y 2 ) ; • runs the trivial NP-protocol for ¬Disj(X 1 ,Y 1 ) ; • accepts if and only if the two steps above have accepted.
The complexity of Π is in O log 2 n .
Since an NP-protocol for ¬Disj is exact (though nondeterministic), an error can come only from the first step; since Π has perfect completeness, so does Π . The soundness error of Π in solving f with respect to µ equals that of Π in solving Gut-IP n with respect to µ , which is at most 1 3n . Since Π has 1-layer complexity 1, the 1-layer complexity of Π equals that of the NP-protocol for ¬Disj, which is n. So, Π is a valid LAM-protocol for f with respect to µ, as required.
4 On proving super-√ n lower bounds against MA When Klauck [13] showed that MA(Disj), MA(IP) ∈ Ω ( √ n), many believed that the actual MA-complexity of these problems was in Ω (n). So, it came as a surprise when Aaronson and Wigderson [1] demonstrated MA-protocols for Disj and IP of cost O ( √ n log n), later improved by Chen [6] to O √ n log n log log n . That highlighted the importance of proving the "ultimate" lower bound of Ω (n) for the MA-complexity of any explicit communication problem.
We will define a communication model MA (Definition 4.1) that can be viewed as "non-uniform MA." Non-uniformity is the only possible source of advantage of MA over MA: we will see (Claim 4.2) that imposing the "uniformity constraint" on MA-protocols makes them not stronger than MA-protocols. All known lower bounds on MA( f ) readily translate to MA( f ). Intuitively, a lower bound argument that explores the uniformity of MA (as opposed to MA) must have a very unusual structure.
We will see (Theorem 4.3) that for any f it holds that MA( f ) ∈ O n · AM( f ) ; in other words, any lower bound of the form MA( f ) ∈ ω ( √ n) will have non-trivial consequences for AM( f ). 10 Furthermore, according to Claim 4.2, any lower bound of the form MA( f ) ∈ ω ( √ n) either should exploit the uniformity of MA (the only difference between MA and MA), or it will have non-trivial consequences for AM( f ). This partially explains why no such lower bound has been found yet. If for some k(n), every input distribution µ n and every ε > 0 there are functions h 1 , . . . , h 2 k(n) : then we say that the MA-complexity of f , denoted by MA( f ), is in O (k(n)) .
The intuition behind the above formulation is the following. 11 Any MA-protocol allows for error reduction at the cost of (at most) a multiplicative factor of O log 1 ε , where ε is the "target error." It has been intuitively clear that this this property of MA is important: in particular, it was used by Klauck [13] to prove his lower bound on the MA-complexity. The concept of MA-complexity isolates this property, effectively allowing for its direct analysis, which is the main subject of this part of our work.
First of all, let us see that the non-uniformity is the only possible source of advantage of MA over MA.
Then the MA-complexity of f is in O (k(n)) . 10 It is not too hard to demonstrate AM( f ) ∈ Ω (log n) for an explicit f : for example, it holds for so-called index function Ind(x, i) def = x i ; however, it is not clear how to use such examples to obtain MA( f ) ∈ Ω √ n log n . 11 The author thanks the anonymous referee whose comment has resulted in the appearance of this paragraph.
Note that the functions g 1 , . . . , g 2 k(n) are fixed (for every n), in particular, they do not depend on µ n ε. The statement says that in order to become sufficient for MA, the definition of MA should be restricted by the additional requirement that all the h i are approximations of the corresponding g i . That is why we view MA as a non-uniform modification of MA.
Proof. Assume MA( f ) ∈ O (k(n)). For every input distribution µ and ε > 0, let h µ,ε i denote the function h i corresponding to these µ and ε from the definition of MA( f ).
Next we claim that a super-√ n lower bound on MA( f ) would have non-trivial consequences for AM( f ). Proof. Let AM( f ) = k(n), that is, for every input distribution ν there is an NP-protocol of cost at most k(n) that solves f with error at most 1 3 with respect to ν. Via the standard accuracy amplification technique this implies that for any input distribution ν and ε > 0 there is an NP-protocol of cost O k(n) · log 1 ε that solves f with error at most ε with respect to ν. In particular, for every input distribution ν there is an NP-protocol Π ν of cost O n · k(n) that solves f with error at most 2 − √ n/k(n) with respect to ν.
Let us see that MA( f ) ∈ O n · k(n) . For n ∈ N, take any input distribution µ n and any ε > 0. If ε ≤ 2 − √ n/k(n) , then let h 1 = f : as its P-complexity is at most n ∈ O n · k(n) · log 1 ε , the "decomposition" satisfies the requirements of Definition 4.1 with respect to (4.2). Now suppose that ε > 2 − √ n/k(n) . Then Π µ is an NP-protocol of cost O n · k(n) that solves f with error less than ε with respect to µ. Let K ∈ 2 O √ n·k(n) be the number of rectangles contained in Π µ , denote their characteristic functions by h 1 , . . . , h K . As the P-complexity of every such h i is 1 and

Conclusions
Among those communication complexity regimes that reside well beyond our current level of understanding, the model of Arthur-Merlin (AM) may be the closest to us. The motivation of this work has been to explore the "neighbourhood" of AM that we might be able to analyse.
• We have defined and analysed a new communication complexity class, SLAM, strictly included in AM and strictly stronger than the union of all previously known subclasses of AM.
• We have identified one possible source of hardness in proving ω ( √ n) lower bounds against MA: such a bound would either be of a "very special form," or imply a non-trivial lower bound against AM.
A few questions that have remained open can be viewed as possible further steps towards understanding AM. For instance: • What is the SLAM-complexity of Disj? Note that even its UAM-complexity is not understood yet (see [10] for details).
• Can we prove a lower bound of Ω √ n log n on the MA-complexity of an explicit function (see Footnote 10)?
• What approaches to understanding AM look promising?
-Shall we try hard to prove a lower bound of n 1/2+Ω(1) on the MA-complexity of an explicit function?
-Are there complexity classes inside AM with non-trivial advantage over SLAM (or incomparable to it), which we can analyse?
Finally, we would like to mention a result that is somewhat dual to this work from the conceptual point of view: Bouland, Chen, Holden, Thaler and Vasudevan [5] define communication complexity classes NISZK cc and SZK cc , and show that NISZK cc ⊆ SZK cc ⊆ AM and NISZK cc UPP , where UPP is the class consisting of functions with high sign-rank; UPP is known to strictly contain PP. Accordingly, the quest of understanding AM is at least as hard as that of understanding NISZK cc , and the latter might be simpler if NISZK cc ⊂ AM.

ABOUT THE AUTHOR
In the good old days DMITRY GAVINSKY studied at the Technion -Israel Institute of Technology and at the University of Calgary. Thanks to the support of his scientific advisers Nader Bshouty, Richard Cleve and John Watrous, he graduated from both institutions, and now he lives and works in the best city in the world. There he enjoys the best beer and loves good music and good books.