Separation of AC$^0[\oplus]$ Formulas and Circuits

This paper gives the first separation between the power of {\em formulas} and {\em circuits} of equal depth in the $\mathrm{AC}^0[\oplus]$ basis (unbounded fan-in AND, OR, NOT and MOD$_2$ gates). We show, for all $d(n) \le O(\frac{\log n}{\log\log n})$, that there exist {\em polynomial-size depth-$d$ circuits} that are not equivalent to {\em depth-$d$ formulas of size $n^{o(d)}$} (moreover, this is optimal in that $n^{o(d)}$ cannot be improved to $n^{O(d)}$). This result is obtained by a combination of new lower and upper bounds for {\em Approximate Majorities}, the class of Boolean functions $\{0,1\}^n \to \{0,1\}$ that agree with the Majority function on $3/4$ fraction of inputs. $\mathrm{AC}^0[\oplus]$ formula lower bound: We show that every depth-$d$ $\mathrm{AC}^0[\oplus]$ formula of size $s$ has a {\em $1/8$-error polynomial approximation} over $\mathbb{F}_2$ of degree $O(\frac{1}{d}\log s)^{d-1}$. This strengthens a classic $O(\log s)^{d-1}$ degree approximation for \underline{circuits} due to Razborov. Since the Majority function has approximate degree $\Theta(\sqrt n)$, this result implies an $\exp(\Omega(dn^{1/2(d-1)}))$ lower bound on the depth-$d$ $\mathrm{AC}^0[\oplus]$ formula size of all Approximate Majority functions for all $d(n) \le O(\log n)$. Monotone $\mathrm{AC}^0$ circuit upper bound: For all $d(n) \le O(\frac{\log n}{\log\log n})$, we give a randomized construction of depth-$d$ monotone $\mathrm{AC}^0$ circuits (without NOT or MOD$_2$ gates) of size $\exp(O(n^{1/2(d-1)}))$ that compute an Approximate Majority function. This strengthens a construction of \underline{formulas} of size $\exp(O(dn^{1/2(d-1)}))$ due to Amano.


Introduction
The relative power of formulas versus circuits is one of the great mysteries in complexity theory. The central question in this area is whether NC 1 (the class of languages decidable by polynomial-size Boolean formulas) is a proper subclass of P/poly (the class of languages decidable by polynomial-size Boolean circuits). Despite decades of efforts, this question remains wide open. 1 In the meantime, there has been progress on analogues of the NC 1 vs. P/poly question in certain restricted settings. For instance, in the monotone basis (with AND and OR gates only), the power of polynomial-size formulas vs. circuits was separated by the classic lower bound of Karchmer and Wigderson [8] (on the monotone formula size of st-Connectivity).
The bounded-depth setting is another natural venue for investigating the question of formula vs. circuits. Consider the elementary fact that every depth-d circuit of size s is equivalent to a depth-d formula of size at most s d−1 , where we measure size by the number of gates. This observation is valid with respect to any basis (i.e. set of gate types). In particular, we may consider the AC 0 basis (unbounded fan-in AND, OR, NOT gates) and the AC 0 [⊕] basis (unbounded fan-in MOD 2 gates in addition to AND, OR, NOT gates). With respect to either basis, there is a natural depth-d analogue of the NC 1 vs. P/poly question (where d = d(n) is a parameter that may depend on n), namely whether every language decidable by polynomial-size depth-d circuits is decidable by depth-d formulas of size n o(d) (i.e. better than the trivial n O(d) upper bound).
It is reasonable to expect that this question could be resolved in the sub-logarithmic depth regime (d(n) ≪ log n), given the powerful lower bound techniques against AC 0 circuits (Håstad's Switching Lemma [5]) and AC 0 [⊕] circuits (the Polynomial Method of Razborov [12] and Smolensky [15]). However, because the standard way of applying these techniques does not distinguish between circuits and formulas, it is not clear how to prove quantitatively stronger lower bounds on formula size vis-a-vis circuit size of a given function. Recent work of Rossman [13] developed a new way of applying Håstad's Switching Lemma to AC 0 formulas, in order to prove an exp(Ω(dn 1/(d−1) )) lower bound on the formula size of the Parity function for all d ≤ O(log n). Combined with the well-known exp(O(n 1/(d−1) )) upper bound on the circuit size of Parity, this yields an asymptotically optimal separation in the power of depth-d AC 0 formulas vs. circuits for all d(n) ≤ O( log n log log n ), as well as a super-polynomial separation for all ω(1) ≤ d(n) ≤ o(log n).
In the present paper, we carry out a similar development for formulas vs. circuits in the AC 0 [⊕] basis, obtaining both an asymptotically optimal separation for all d(n) ≤ O( log n log log n ) and a superpolynomial separation for all ω(1) ≤ d(n) ≤ o(log n). Our target functions lie in the class of Approximate Majorities, here defined as Boolean functions {0, 1} n → {0, 1} that approximate the Majority function on 3/4 fraction of inputs. First, we show how to apply the Polynomial Method to obtain better parameters in the approximation of AC 0 [⊕] formulas by low-degree polynomials over F 2 . This leads to an exp(Ω(dn 1/2(d−1) )) lower bound on the AC 0 [⊕] formula size of all Approximate Majority functions. The other half of our formulas vs. circuits separation comes from an exp(O(n 1/2(d−1) )) upper bound on the AC 0 [⊕] circuit size of some Approximate Majority function. In fact, this upper bound is realized by a randomized construction of monotone AC 0 circuits (without NOT or MOD 2 gates). Together these upper and lower bound give our main result: circuits (in fact, monotone AC 0 circuits) of depth d and size poly(n) that are not equivalent to any AC 0 [⊕] formulas of depth d and size n o(d) . Separation (i) is asymptotically optimal, in view of the aforementioned simulation of poly(n)size depth-d circuits by depth-d formulas of size n O(d) . Separation (ii) resembles an analogue of NC 1 = P/poly (or rather NC 1 = AC 1 ) within the class AC 0 [⊕]. In fact, extending separation (ii) from depth o(log n) to depth log n is equivalent the separation of NC 1 and AC 1 .

Proof outline
Improved polynomial approximation. The lower bound for AC 0 [⊕] formulas follows the general template due to Razborov [12] on proving lower bounds for AC 0 [⊕] circuits using low-degree polynomials over F 2 . Razborov showed that for any Boolean function f : {0, 1} n → {0, 1} that has an AC 0 [⊕] circuit of size s and depth d, there is a randomized polynomial P of degree O(log s) d−1 that computes f correctly on each input with probability 7 8 (we call such polynomials 1/8-error probabilistic polynomials). By showing that some explicit Boolean function f (e.g. the Majority function or the MOD q function for q odd) on n variables does not have such an approximation of degree less than Ω( √ n) [12,15,16], we get that any AC 0 [⊕] circuit of depth d computing f must have size exp(Ω(n 1/2(d−1) )).
In this paper, we improve the parameters of Razborov's polynomial approximation from above for We illustrate the idea behind this improved polynomial approximation with the special case of a balanced formula (i.e. all gates have the same fan-in) of fan-in t and depth d. Note that the size of the formula (number of gates) is Θ(t d−1 ) and hence it suffices in this case to show that it has a 1/8-error probabilistic polynomial of degree O(log t) d−1 . We construct the probabilistic polynomial inductively. Given a balanced formula F of depth d and fan-in t, let F 1 , . . . , F t be its subformulas of depth d − 1. Inductively, each F i has a 1/8-error probabilistic polynomial P i of degree O(log t) d−2 and by a standard error-reduction [10], it has a (1/16t)-error probabilistic polynomial of degree O(log t) d−1 (in particular, at any given input x ∈ {0, 1} n , the probability that there exists an i ∈ [t] such that P i (x) = F i (x) is at most 1/16). Using Razborov's construction of a 1/16-error probabilistic polynomial of degree O(1) for the output gate of F and composing this with the probabilistic polynomials P i , we get the result for balanced formulas. This idea can be extended to general (i.e. not necessarily balanced) formulas with a careful choice of the error parameter for each subformula F i to obtain the stronger polynomial approximation result.
Improved formula lower bounds. Combining the above approximation result with known lower bounds for polynomial approximation [12,15,16], we can already obtain stronger lower bounds for AC 0 [⊕] formulas than are known for AC 0 [⊕] circuits. For instance, it follows that any AC 0 [⊕] formula of depth d computing the Majority function on n variables must have size exp(Ω(dn 1/2(d−1) )) for all d ≤ O(log n), which is stronger than the corresponding circuit lower bound. Similarly stronger formula lower bounds also follow for the MOD q function (q odd).
Separation between formulas and circuits. However, the above improved lower bounds do not directly yield the claimed separation between AC 0 [⊕] formulas and circuits. This is because we do not have circuits computing (say) the Majority function of the required size. To be able to prove our result, we would need to show that the Majority function has AC 0 [⊕] circuits of depth d and size exp(O(n 1/2(d−1) )) (where the constant in the O(·) is independent of d). However, as far as we know, the strongest result in this direction [9] only yields AC 0 [⊕] circuits of size greater than exp(Ω(n 1/(d−1) )), 2 which is superpolynomially larger than the upper bound.
To circumvent this issue, we change the hard functions to the class of Approximate Majorities, which is the class of Boolean functions that agree with Majority function on most inputs. While this has the downside that we no longer are dealing with an explicitly defined function, the advantage is that the polynomial approximation method of Razborov yields tight lower bounds for some functions from this class.
Indeed, since the method of Razborov is based on polynomial approximations, it immediately follows that the same proof technique also yields the same lower bound for computing Approximate Majorities. Formally, any AC 0 [⊕] circuit of depth d computing any Approximate Majority must have size exp(Ω(n 1/2(d−1) )). On the upper bound side, it is known from the work of O'Donnell and Wimmer [11] and Amano [1] that there exist Approximate Majorities that can be computed by monotone AC 0 formulas of depth d and size exp(O(dn 1/2(d−1) )). (Note that the double exponent We use the above ideas for our separation between AC 0 [⊕] formulas and circuits. Plugging in our stronger polynomial approximation for AC 0 [⊕] formulas, we obtain that any AC 0 [⊕] formula of depth d computing any Approximate Majority must have size exp(Ω(dn 1/2(d−1) )). In particular, this implies that Amano's construction is tight (up to the universal constant in the exponent) even for AC 0 [⊕] formulas.
Further, we also modify Amano's construction [1] to obtain better constant-depth circuits for Approximate Majorities: we show that there exist Approximate Majorities that are computed by Smaller circuits for Approximate Majority. Our construction closely follows Amano's, which in turn is related to Valiant's probabilistic construction [18] of monotone formulas for the Majority function. However, we need to modify the construction in a suitable way that exploits the fact that we are constructing circuits. This modification is in a similar spirit to a construction of Hoory, Magen and Pitassi [6] who modify Valiant's construction to obtain smaller monotone circuits (of depth Θ(log n)) for computing the Majority function exactly.
At a high level, the difference between Amano's construction and ours is as follows. Amano constructs random formulas F i of each depth i ≤ d as follows. The formula F 1 is the AND of a 1 independent and randomly chosen variables. For even (respectively odd) i > 1, F i is the OR (respectively AND) of a i independent and random copies of F i−1 . For suitable values of a 1 , . . . , a d ∈ N, the random formula F d computes an Approximate Majority with high probability. In our construction, we build a depth i circuit C i for each i ≤ d in a similar way, except that each C i now has M different outputs. Given such a C i−1 , we construct C i by taking M independent randomly chosen subsets T 1 , . . . , T M of a i many outputs of C i−1 and adding gates that compute either the OR or AND (depending on whether i is even or odd) of the gates in T i . Any of the M final gates of C d now serves as the output gate. By an analysis similar to Amano's (see also [6]) we can show that this computes an Approximate Majority with high probability, which finishes the proof. 3

Preliminaries
Throughout, n will be a growing parameter. We will consider Boolean functions on n variables, i.e. functions of the form f : {0, 1} n → {0, 1}. We will sometimes identify {0, 1} with the field F 2 in the natural way and consider functions f : F n 2 → F 2 instead. Given a Boolean vector y ∈ {0, 1} n , we use |y| 0 and |y| 1 to denote the number of 0s and number of 1s respectively in y.
The Majority function on n variables, denoted MAJ n is the Boolean function that maps inputs x ∈ {0, 1} n to 1 if and only if |x| 1 > n/2.
As far as we know, the study of this class of functions was initiated by O'Donnell and Wimmer [11]. See also [1,4].
We refer the reader to [2,7] for standard definitions of Boolean circuits and formulas. We use AC 0 circuits (respectively formulas) to denote circuits (respectively formulas) of constant depth made up of AND, OR and NOT gates. Similarly, AC 0 [⊕] circuits (respectively formulas) will be circuits (respectively formulas) of constant depth made up of AND, OR, MOD 2 and NOT gates.
The size of a circuit will denote the number of gates in the circuit and the size of a formula will denote the number of its leaves which is within a constant multiplicative factor of the number of gates in the formula. 4

Lower Bound
In this section, we show that any AC 0 [⊕] formulas of depth d computing a (1/4, n)-Approximate Majority must have size at least exp(Ω(dn 1/2(d−1) )) for all d ≤ O(log n).
We work over the field F 2 and identify it with {0, 1} in the natural way. The following concepts are standard in circuit complexity (see, e.g., Beigel's survey [3]).

Lemma 4 (Smolensky
somewhat differently. 4 We assume here without loss of generality that the formula does not contain a gate of fan-in 1 feeding into another. Definition 6. An ε-error probabilistic polynomial of degree D for a Boolean function f : {0, 1} n → {0, 1} is a random variable P taking values from polynomials in F 2 [X 1 , . . . , X n ] of degree at most D such that for all x ∈ {0, 1} n , we have Pr[ f (x) = P(x) ] ≥ 1 − ε.

Definition 7.
Let D ε (f ) be the minimum degree of an ε-error probabilistic polynomial for f .
We will make use of the following two lemmas concerning D ε (·).
Lemma 8 (Razborov [12]). Let OR n and AND n be the OR and AND functions on n variables respectively. Then D ε (OR n ), D ε (AND n ) ≤ ⌈log(1/ε)⌉.
Lemma 9 (Kopparty and Srinivasan [10]). There is an absolute constant c 1 such that for any ε ∈ (0, 1), Proof. The proof is an induction on the depth d of the formula. The base case d = 0 corresponds to the case when the formula is a single AND, OR or MOD 2 gate and we need to show that D 1/8 (f ) ≤ 3. In the case that the formula is an AND or OR gate, this follows from Lemma 8. If the formula is a MOD 2 gate, this follows from the fact that the MOD 2 function is exactly a polynomial of degree 1.
Then (P 1 , . . . , P m ) jointly computes (f 1 , . . . , f m ) with error 1/16 (= m i=1 (s i /(16s))). By a reasoning identical to the base case, it follows that there exists a 1/16-error probabilistic polynomial Q of degree 4 for the output gate of the formula.
Then Q(P 1 , . . . , P m ) is a 1/8-error probabilistic polynomial for f of degree So long as c 2 ≥ 20c 1 , it suffices to show that for all i, Consider any i and let a, b ≥ 0 such that s i = 2 a and s = 2 a+b . We must show For fixed a ≥ 0, as a polynomial in b, the function which is zero iff b = a/d; this value is a minimum of p a,d with p a,d (a/d) = 0. Proof. Say that F is an AC 0 [⊕] formula of depth d and size s computing f . Then, by Lemma 9, we see that F has a 1/8-error probabistic polynomial P of degree D ≤ O(O( 1 d log s + 1) d−1 ). In particular, by an averaging argument, there is some fixed polynomial P ∈ F 2 [X 1 , . . . , X n ] of degree at most D such that P is a 1/8-error approximating polynomial for f . Corollary 5 implies that the degree of P must be Ω( Observe that Ω(dn 1/2(d−1) ) dominates O(d) so long as d ≤ ε log n for some absolute constant ε > 0 (depending on the constants in Ω(·) and O(·)). Hence, we get the claimed lower bound s ≥ exp(Ω(dn 1/2(d−1) )) for all d ≤ ε log n.

Upper Bound
In this section, we show that for any constant ε, there are (ε, n)-Approximate Majorities that can be computed by depth d AC 0 circuits of size exp(O(n 1/2(d−1) )). Let ε 0 ∈ (0, 1) be a small enough constant so that the following inequalities hold for any β ≤ ε 0 (It suffices to take ε 0 = 1/2.) We need the following technical lemma.  Proof. We give the proof only for I 0 (γ) and I 1 (γ). The proof for J 0 (γ) and J 1 (γ) is similar.
Consider first the case that x ∈ I 1 (γ). In this case, we have the following computation.
The above implies the first upper bound on Pr S [ j∈S x j = 0] from the lemma statement. When sγ ≤ ε 0 , we further have exp(−sγ) ≤ 1 − sγ exp(−sγ), which implies the second upper bound. This proves the lemma when x ∈ I 1 . Now consider the case that x ∈ I 0 (γ). We have where for the second inequality we have used the fact that since e −A ≤ 1 . Since e −A ≤ 1 n 3 ≤ 1 4n ≤ γ/4, we can lower bound the right hand side of (2) by exp(−s) · exp(sγ/2). Also, note that This implies that the RHS of (2) can also be lower bounded by exp(−s) exp(sγ exp(−sγ)) ≥ exp(−s) · (1 + sγ exp(−sγ)), which implies the claim about Pr S [ j∈S x j = 0] assuming that x ∈ I 0 (γ).
We now prove the main result of this section.
Proof. We assume throughout that ε is a small enough constant and that n is large enough for various inequalities to hold. We will actually construct a monotone circuit of depth d and size exp(O(n 1/2(d−1) log(1/ε)/ε)) computing a (4ε, n)-Approximate Majority, which also implies the theorem.
Fix parameters A = ⌊n 1/2(d−1) ⌋ and M = ⌈e 10A ⌉. We assume that A ≥ 10 log n (which holds as long as d ≤ c log n log log n for an absolute constant c > 0) and that ε ≤ ε 0 . Define a sequence of real numbers γ 0 , γ 1 , . . . , γ d−2 as follows: . As a result we also obtain Let The idea is to define a sequence of circuits C 1 , C 2 , . . . , C d−2 with n inputs and M outputs such that C i has depth i and iM many (non-input) gates. Further, for odd i and similarly for even i After this is done, we will add on top a depth-2 circuit that will reject most inputs from I 0 (γ d−2 ) or J 0 (γ d−2 ) -depending on whether d − 2 is odd or even respectively -and accept most inputs We begin with the construction of C 1 , . . . , C d−2 which is done by induction.
Construction of C 1 . The base case of the induction is the construction of C 1 , which is done as follows. We choose M i.i.d. random subsets T 1 , . . . , T M ⊆ [n] in the following way: for each i ∈ [M ], we sample A random elements of [n] with replacement. Let b x i = j∈T i x j . If x ∈ N ε , then the probability that b x i = 1 is given by where the last inequality follows from the fact that (1 − z) A ≤ (1 − zA + A 2 z 2 2 ). Let δ = 1/n 3 . Note in particular that 2δ/γ 0 A ≤ ε 0 for large enough n. By a Chernoff bound, the probability that 1 is bounded by exp(−Ω(δ 2 M/e A )) ≤ exp(−Ω(e 9A /n 6 )) ≤ exp(−n), since e A ≥ n 10 . Thus, with probability at least 1 − exp(−n), we have Above, we have used the fact that (1 − δ γ 0 A ) ≥ exp( −2δ γ 0 A ) since δ/γ 0 A ≤ ε 0 for large enough n, as noted above.
If x ∈ Y ε , then the probability that b x i = 1 is given by As above, we can argue that the probability that 1 is at most exp(−n). Thus, with probability 1 − exp(−n) Thus, by a union bound over x, we can fix a choice of T 1 , . . . , T M so that (6) holds for all x ∈ N ε and (7) holds for all x ∈ Y ε . Hence, (4) holds for i = 1 as required. This concludes the construction of C 1 , which just outputs the values of j∈T i x j for each i.
Construction of C i+1 . For the inductive case, we proceed as follows. We assume that i is odd (the case that i is even is similar). So by the inductive hypothesis, we know that (4) holds and hence that C i (x) ∈ I 0 (γ i ) or I 1 (γ i ) depending on whether x ∈ N ε or Y ε . Let γ := γ i . Let the output gates of C i be g 1 , . . . , g M .
We choose T 1 , . . . , T M ⊆ [M ] randomly as in the statement of Lemma 12 with s = A. Note that the chosen parameters satisfy all the hypotheses of Lemma 12. Further we also have sγ ≤ A · A i γ 0 ≤ A d−1 · ε √ n ≤ ε 0 . The random circuit C ′ is defined to be the circuit obtained by adding M OR gates to C i such that the jth OR gate computes k∈T j g k . Let b x j be the output of the jth OR gate on C i (x). By Lemma 12, we have Let δ = 1 n 3 . Note that Aγ ∈ [ 1 √ n , 1 n 1/2(d−1) ] and hence for large enough n, 2δ Aγ exp(−Aγ) ≤ ε 0 . Assume x ∈ N ε . In this case, the Chernoff bound implies that the probability that j∈ is at most exp(−Ω(δ 2 M/e A )) ≤ exp(−n). When this event does not occur, we have We have used above that for large enough n, Similarly when x ∈ Y ε , the Chernoff bound tells us that the probability that j∈[M ] b x j ≥ M exp(−A) · (1 − Aγ exp(−Aγ))(1 + δ) is at most exp(−n). In this case, we get By a union bound, we can fix T 1 , . . . , T M so that (9) and (10) are true for all x ∈ N ε and x ∈ Y ε respectively. This gives us the circuit C i+1 which satisfies all the required properties.
The top two levels of the circuit. At the end of the above procedure we have a circuit C d−2 of depth d − 2 and at most (d − 2)M gates that satisfies one of (4) or (5) depending on whether d − 2 is odd or even respectively. We assume that d − 2 is even (the other case is similar). Define We choose M ′ many subsets T 1 , . . . , T M ′ ⊆ [M ] i.i.d. so that each T j is picked as in Lemma 12 with s = 10A log(1/ε)/ε. Note that Say g 1 , . . . , g M are the output gates of C d−2 . We define the random circuit C ′ (with n inputs and M ′ outputs) to be the circuit obtained by adding M ′ AND gates such that the jth AND gate computes k∈T j g k . Let b x j be the output of the jth AND gate on C d−2 (x). By Lemma 12, we have Say x ∈ N ε . By a Chernoff bound, the probability that j b x j ≥ 2ε 2 M ′ exp(−s) is at most exp(−Ω(ε 2 M ′ exp(−s))) ≤ exp(−Ω(ε 2 e 10A )) ≤ exp(−n). Similarly, when x ∈ Y ε , the probability is also bounded by exp(−n). By a union bound, we can fix a T 1 , . . . , T M ′ to get a circuit C d−1 such that From (12) it follows that The final inequalities in each case above hold as long as ε is a small enough constant. It follows from the above that there is a choice for T such that C ′ d makes an error -i.e. C ′ d (x) = 1 for x ∈ N ε or C ′ d (x) = 0 for x ∈ Y ε -on at most a 2ε fraction of inputs from N ε ∪ Y ε . We fix such a choice for T and the corresponding circuit C.
Hence we see that the circuit C computes a (4ε, n)-Approximate Majority, which proves Theorem 13.

Conclusion
Our main results extend straightforwardly to AC 0 [MOD p ] for any fixed prime p. The proofs are exactly the same except for the fact that the approximating polynomials of degree O( 1 d log s) d−1 from Section 3 are constructed over F p .
Using the fact [15] that any (1/4)-approximating polynomial over F p (p odd) for the Parity function on n variables must have degree Ω( √ n), we see that any polynomial-sized AC 0 [MOD p ] formula computing the Parity function on n variables must have depth Ω(log n). This strengthens a result of Rossman [13] which gives this statement for AC 0 formulas.