On the Complexity of Computing a Random Boolean Function Over the Reals

We say that a first-order formula A(x1, . . . ,xn) over R defines a Boolean function f : {0,1}n→{0,1}, if for every x1, . . . ,xn ∈ {0,1}, A(x1, . . . ,xn) is true iff f (x1, . . . ,xn) = 1. We show that: (i) every f can be defined by a formula of size O(n), (ii) if A is required to have at most k ≥ 1 quantifier alternations, there exists an f which requires a formula of size 2Ω(n/k). The latter result implies several previously known as well as some new lower bounds in computational complexity: the nonconstructive lower bound on span programs of Babai, Gál, and Wigderson (Combinatorica 1999); Rothvoß’s result (CoRR 2011) that there exist 0/1 polytopes that require exponential-size linear extended formulations; a similar lower bound by Briët et al. (Math. Program. 2015) on semidefinite extended formulations; and a new result stating that a random Boolean function has exponential linear separation complexity. We note that (i) holds over any field of characteristic zero, and (ii) holds over any real closed or algebraically closed field. ACM Classification: F.1.3, F.2.3 AMS Classification: 68Q17, 03C10


Introduction
In computational complexity, we are typically interested in computing a Boolean function f : {0, 1} n → {0, 1}. The central computational model is a Boolean circuit which computes the function by means of the elementary operations ∧, ∨, ¬. The major open problem is to prove super-polynomial (or even super-linear) lower bounds on the circuit size of an explicit function f . On the other hand, it is easy to prove, non-constructively, that hard Boolean functions exist: comparing the number of circuits of a given size with the total number of functions, there must exist Boolean functions which require circuits of exponential size.
The counting argument relies on the fact that the elementary operations used are functions over a small finite set. In the complexity literature, we also encounter algebraic models of computation which do not have this property. While we are still interested in computing a Boolean function, we are allowed to use intermediate operations over an infinite domain-typically the real numbers or some other infinite field. To give a simple example: suppose we want to obtain f by computing a real polynomial g by means of an arithmetic circuit (see [20,11] for details) such that f (x) = g(x) holds over x ∈ {0, 1} n . Since an arithmetic circuit can use arbitrary real numbers as constants, we can no longer apply the counting argument in this case. A similar phenomenon occurs in the case of span programs [13,2], and others.
A well-known strategy is to replace the counting argument with Warren's theorem [22], or some variant of it [17,1] (see also Section 5). Warren's theorem tells us how many sign-patterns can be achieved in the image of a polynomial map, which is quite enough to prove the existence of hard functions in the aforementioned models [11,2,17]. There is, however, at least one instance where this tool is apparently insufficient. Suppose we want to compute f by means of a parametrized linear program as follows. We have a system L(x, y) of linear inequalities over R in the variables x = x 1 , . . . , x n and y = y 1 , . . . , y m . We require that for every x ∈ {0, 1} n , f (x) = 1 iff the system L(x, y) has a solution y ∈ R m . Is there a function f such that f requires an exponential number of inequalities to be defined this way? This measure, which we call linear separation complexity, has been considered at least in [23,15] and arises in the context of the so-called extension complexity of polytopes (see Section 3 for details). The author does not know how to resolve this question directly using Warren's theorem. Nor does he know how to extend the closely related result of Rothvoß [18] to this situation.
We can view these algebraic models a bit more abstractly. Consider a Boolean function defined by a first-order formula A(x 1 , . . . , x n ) over the reals. The function accepts on x 1 , . . . , x n ∈ {0, 1} iff A(x 1 , . . . , x n ) is true. Here, the formula A may contain constant symbols representing arbitrary real numbers as well as quantifiers over R. In all the above examples, we are in fact defining f in terms of an existentially quantified formula over the reals. Are there functions which are hard for this model? As we will see, this depends on whether we bound the quantifier complexity of A. First, if no restriction is imposed, then every Boolean function can be defined by a linear-size formula. Second, if A is required to have at most k ≥ 1 quantifier alternations in the prenex form then there is a Boolean function requiring a formula of size 2 Ω(n/k) . The latter implies an exponential lower bound on the linear separation complexity as well as the other models discussed. Our first result is achieved by a direct construction, the second one is a corollary of known results on quantifier elimination over the reals. In this respect, our question is closely related to the problem whether P R = NP R in the real Turing machine model (see [4] and [14] for survey). We will see that both results hold in greater generality, in other fields besides the reals.

Preliminaries
Let F be a field. An F-formula, or simply a formula, is a first-order formula built from the function and predicate symbols +, ×, = constant symbols c a for every element a of the field, as well as the usual logical symbols (variables, Boolean connectives, and quantifiers ∃, ∀). If F is an ordered field, we allow also the predicate symbols <, ≤ representing the ordering. 1 We define the size of a formula as the number of symbols in the formula (constants and variables having a unit cost). Every formula with no free variables is either true or false, under the intended interpretation of symbols as operations over F.
Every quantifier-free formula over a field is of the form B(t 1 = t 1 , . . . ,t m = t m ), where B is a propositional formula defining a Boolean function and t i ,t i are terms defining polynomials with coefficients from F. Over an ordered field, we may also encounter the atomic formulas t i < t i , t i ≤ t i . We will take the liberty to identify the constant c a with a and, occasionally, identify terms with the polynomials they represent. A Σ 1 -formula is a formula of the form ∃x 1 . . . ∃x n A, where A is quantifier-free (a. k. a. Σ 0 -formula). Similarly, a Σ 2 -formula is of the form ∃x 1 . . . ∃x n ∀y 1 . . . ∀y m A, and so on: a Σ k+2 -formula is of the form ∃x 1 . . . ∃x n ∀y 1 . . . ∀y m A with A being Σ k . Every formula can be converted to an equivalent Σ k -formula of nearly the same size, for some k. One could also define Π k -formulas, but we have no need for that.
Let F be a field or an ordered field. Let A(x 1 , . . . , x n ) be an F-formula with no free variables other than x 1 , . . . , x n . We will say that A defines a Boolean function f : {0, 1} n → {0, 1} if the following holds: In Sections 4 and 5, we will prove the following main results: Theorem 2.1. Let F be a field of characteristic zero. Given an n-variate Boolean function f and 1 ≤ k ≤ n, f can be defined by a Σ 2k−1 -formula of size O(k2 n/k ). Theorem 2.2. Let F be either a real closed field or an algebraically closed field. Then for every k > 0 and n, there exists a Boolean function f in n variables such that every Σ k -formula defining f must have size at least 2 Ω(n/k) .
Setting k = n, Theorem 2.1 implies that every n-variate Boolean function can be defined by a formula of linear size. We emphasize that Theorem 2.1 is possible, and Theorem 2.2 is non-trivial, only due to the fact that we allow arbitrary constants from F to appear in the formula defining f . Let us also note that Theorem 2.2 requires some assumption on the underlying field. Remarkably, it is false over the field of rationals.
Observation 2.3. Over Q (as an unordered field), every Boolean function in n variables can be defined by a Σ 4 -formula of size O(n).
Proof sketch. This relies on a beautiful result of Julia Robinson [16] who showed that integers can be defined inside Q by a single first-order formula. The same applies to non-negative integers. Working over N, a Boolean function can be defined by a Σ 1 -formula of linear size. This is because the truth-table of f can be encoded by a single natural number from which values of f can be recovered by a Σ 1 -formula-cf., the proof of Theorem 4.2. This gives a linear-size Σ k -formula over Q for some fixed k. The more accurate bound k = 4 is achieved by inspecting the quantifier complexity of Robinson's formula.
The power of Σ 1 -formulas We note that already the class of Σ 1 -formulas is quite robust. That is, many syntactic restrictions or relaxations of the definition lead essentially to the same class. Recall that a Σ 1 -formula is of the form ∃ y∈F r B(t 1 = t 1 , . . . ,t m = t m ), where B is a Boolean formula and t i ,t i are terms. The latter can be seen as the so-called arithmetic formulas defining polynomials over F. Note that if we allow B to be a Boolean circuit instead, we do not get a stronger model: introducing new variables representing the gates of the circuit we can rewrite B as a Σ 1 -formula of a linear size. The same applies if we allow the terms t i ,t i to be computed by arithmetic circuits. In fact, all polynomial-time computations in the sense of [4] can be expressed as small Σ 1 -formulas. In turn, every Σ 1 -formula A(x 1 , . . . , x n ) of size s can equivalently be written as ∃y 1 , and h 1 , . . . , h t are polynomials of degree two. This is true both in an ordered and an unordered field. In the ordered case, this can furthermore be written as ∃y 1 . . . ∃y m (h = 0), where h is a single polynomial of degree four. That is, the complexity of a Σ 1 -formula can be captured as the number of bound variables in an expression involving only low-degree polynomials. This allows us to redefine Σ 1 -complexity in a mathematically cleaner way.

An application: extension and separation complexity
As mentioned in the introduction, Theorem 2.2 has several obvious applications, and we focus on just one. Suppose we want to compute a Boolean function f (x), x ∈ {0, 1} n , by the following parametrized linear program. We have y = y 1 , . . . , y m new variables and a set L(x, y) of linear inequalities or equalities over R: We say that L(x, y) computes f , if for every x ∈ {0, 1} n , In other words, f accepts precisely on the Boolean inputs where A, B,C, D, a, b are real matrices and vectors describing the linear system. We define the linear separation complexity of f as the smallest r so that f can be computed as in (3.1) by a linear system with r inequalities. Note that we disregard m, the number of extra variables, as well as t, the number of equalities, in the definition. This is because both these parameters can be bounded in terms of n and r. The geometric interpretation is as follows. A polyhedron P ⊆ R n will be called a separating polyhedron for f , if THEORY OF COMPUTING, Volume 16 (9), 2020, pp. 1-12 i. e., the polyhedron contains all accepting inputs of f and excludes all its rejecting inputs. Following [23,18,8], define the extension complexity of P as the smallest r such that P is a linear projection of a polyhedron Q ⊆ R m where Q can be defined using r inequalities (and any number of equalities). In this language, the linear separation complexity of f equals the smallest r such that there exists a separating polyhedron for f of extension complexity r. While the phrase "linear separation complexity" is introduced here, the same concept has appeared earlier. Already in [21], Valiant has observed that linear separation complexity is, up to a constant factor, a lower bound on the Boolean circuit complexity of f . This appears again in the seminal paper of Yannakakis [23]. A similar quantity was also investigated by Pudlák and Oliveira in [15] in the context of proof complexity. Yannanakis's paper started a fruitful direction of research into the extension complexity of 0/1-polytopes. Rothvoß [18] has shown that there exists a polytope P ⊆ R n with vertices in {0, 1} n and extension complexity 2 Ω(n) . Since then, the same was proved for explicit polytopes (see, e. g., [8,19] and references in the latter).
In our setting, the smallest separating polyhedron for f is simply the convex hull of accepting inputs of f , P 0 = conv( f −1 (1)). Hence, the result [18] says that there exists an f such that P 0 has extponential extension complexity. This, however, does not imply a lower bound on the linear separation complexity, for there are infinitely many other separating polytopes. Furthermore, it is not apparent to the author how to adapt Rothvoß's proof to this setting. On the other hand, Theorem 2.2 readily implies the following result. Proof. Assume that f can be computed by a linear system L(x, y) as in (3.1). It is easy to see that the number of extra variables y can be bounded by r and the number of equalities by n. Hence, f can be defined by a Σ 1 -formula of size O((r + n) 2 ). By Theorem 2.2, this means that r ≥ 2 Ω(n) for some f . Theorem 3.1 also implies the result in [18]. However, Rothvoß's proof achieves better constants hidden in Ω(n) and is definitely more informative. The reasoning of Theorem 3.1 could also be applied to "semidefinite separation complexity" as considered in [5].

Proof of Theorem 2.1
We now show that quantifier alternations allow to efficiently define every Boolean function f . The idea is to encode the truth table of f as a natural number, a f , so that the values of f can be efficiently recovered from a f . The main ingredient is to show that over the field, we can argue about integers of doubly exponential size. This part is reminiscent of the construction in [10,7].
If k divides n, we can take B 2 n/k ,k as the formula A n,k . In general, let A n,k (x) := ∃u x + u + 1 = 2 2 n ∧ B 2 n/k ,k (u) ∧ B 2 n/k ,k (x) .
It remains to eliminate the constants τ i . To this end, view them as free variables and let T n be the conjunction of the equations τ 0 = 2, τ 1 = τ 2 0 , . . . , τ n = τ 2 n −1 .
The following is a stronger version of Theorem 2.1.
In other words, a f is the integer such that for every x, the b(x)-th bit of a f is f (x). Note that a f lies in {0, 1, . . . , 2 2 n − 1}. Furthermore, f (x) = 1 if and only if Using the previous lemma, the conditions y 1 , y 2 ∈ {0, 1, . . . , 2 2 n − 1} can be replaced by A n,k (y 1 ), A n,k (y 2 ). Also, the ordering y 1 < z on {0, 1, . . . , 2 2 n − 1} can be defined as ∃u(z = y 1 + u + 1 ∧ A n,k (u)). Finally, This allows us to write 2 b(x) and 2 b(x)+1 = 2 · 2 b(x) as O(n)-size terms using the auxiliary constants τ i = 2 2 i , i ≤ n − 1. As noted in the proof of the previous lemma, the constants can be defined by the formula T n−1 . Altogether, condition (4.1) can be written as a Σ 2k−1 -formula of size O(n + k2 n/k ), which in turn simplifies to O(k2 n/k ).
Let us remark that in the definition of a constant-free formula, one can insist that the formula contain no constants at all: this is because 0, 1 and −1 can be defined by such a formula. Furthermore, in the proof of Theorem 4.2, we did not use the fact that F is a field. It would be quite enough to assume that F is a ring or even a semiring with multiplicative unit 1 such that the "natural numbers" 1, 1 + 1, 1 + 1 + 1, . . . are distinct.

Proof of Theorem 2.2
Our proof of Theorem 2.2 uses tools from algebraic geometry and real algebraic geometry, namely, counting the sign-patterns or zero-patterns of a polynomial map and quantifier elimination. The author would be happy to see a more direct and self-contained proof at least for the case of Σ 1 -formulas.
We first give an overview of the results required.

Sign-patterns of a polynomial map
Let f = f 1 (x 1 , . . . , x n ), . . . , f m (x 1 , . . . , x n ) be a sequence of real polynomials of degree at most d. The same estimate clearly holds over any real closed field 2 . A similar bound holds for the number of zero-patterns over every field F.
For a ∈ F n , let sgn * f (a) := sgn * f 1 (a), . . . , sgn * f m (a) ∈ {0, 1} m , be the zero-pattern of f at a. A bound on the number of zero-patterns of f has been obtained by Heintz [10]. An elementary linear algebra proof of improved estimates was found more recently by Rónyai et al. in [17]. By [17], the number of zero-patterns can be bounded (assuming d ≥ 1, m ≥ n) by |{sgn * f (a) : a ∈ F n }| ≤ (edm/n) n .

Quantifier elimination
The celebrated Tarski-Seidenberg theorem asserts that every formula over a real closed field is equivalent to a quantifier-free formula. We are interested in the size of the resulting formula. It is known ( [10,7]) that in general, the size can increase double-exponentially if we allow a linear number of quantifier alternations. The situation is better if the number of quantifier alternations is small. The result of Grigoriev [9] (see also [3], Chapter 14, Theorem 14.16) implies the following: every Σ k -formula A of size s is equivalent to a quantifier-free formula of size 2 s O(k) . More specifically, A can be written as A similar result holds over any algebraically closed field, as shown by Chistov and Grigoriev [6] (see also Corollary 6.4 in [12]). Every Σ k -formula A of size s is equivalent to a quantifier-free formula of the form where m, the degrees of the f i , and the formula size of G, can again be bounded by 2 s O(k) . Let us remark that the cited papers contain more detailed information than presented here: they bound the number of the f i in (5.2) and their degrees separately, in terms of the number of atomic formulas in A, their degrees, and the number of quantifier alternations. Moreover, the constants in the big-O are different in the two cases (algebraically closed versus real closed field).
We now proceed to the proof of Theorem 2.2. At a high level, we use quantifier elimination to reduce to the quantifier-free case, and apply Warren's theorem to atoms of the quantifier-free formula.
For a formula A with no free variables, let [A] ∈ {0, 1} denote its truth-value. Let Given a ∈ F n , the truth-pattern of β at a is determined be the sign-pattern of f at a, and hence the number of truth-patterns is at most the number of sign-patterns of f . Using (5.1), the latter can be bounded by (8edM) n which can be written 3 as (2 s O(k) m) n . This completes the proof for the real closed case. In the case of algebraically closed fields the argument is the same, replacing sign-patterns by zero-patterns and using the algebraically closed version of quantifier elimination.
Proof of Theorem 2.2. Assume that s ≥ n is such that every Boolean function in n variables can be defined by a Σ k -formula of size at most s. Let F be the set of such formulas with free variables among x 1 , . . . , x n . Introduce fresh variables y = y 1 , . . . , y s and z = z 1 , . . . , z s . A formula S(x, y), with x = x 1 , . . . , x n , will be called a skeleton if (a) it contains only variables from x, y, z and no constant symbols, and (b) its free variables are from x or y. We think of y as representing constants from F and z as names of bound variables. Let S be the set of Σ k -skeletons of size at most s. Hence, for every A(x) ∈ F there exists S(x, y) ∈ S and a ∈ F s such that A(x) = S(x, a) (up to renaming of the bound variables z). Unlike F, S is a finite set. A skeleton is a string of symbols from the alphabet x, y, z, ∀, ∃, ∧, . . . of size O(s). Therefore, We will say that a skeleton S(x, y) defines a Boolean function f : {0, 1} n → {0, 1}, if there exists a ∈ F s such that S(x, a) defines f . Hence, every f is defined by some skeleton in S. We now want to bound the number of functions defined by a given skeleton S(x, y) ∈ S. Let β be the sequence of the 2 n formulas S(σ , y), σ ∈ {0, 1} n . Each formula in β has free variables in y. For a given a ∈ F s , the function defined by S(x, a) is uniquely determined by the truth-pattern of β at a: indeed, S(x, a) defines the function f such that f (σ ) = [S(σ , a)] for all σ . Hence, the number of functions defined by S(x, y) is at most the number of truth-patterns of β . By the previous lemma, this can be bounded by (2 s O(k) 2 n ) s which is of the form 2 s O(k) (we assumed s ≥ n).
Altogether, skeletons in S can define at most 2 O(s log s) 2 s O(k) Boolean functions. Since the total number of functions is 2 2 n , we must have s ≥ 2 Ω(n/k) .

Open problems
The proof of Theorem 2.2 uses machinery from algebraic geometry and real algebraic geometry as a black box. As a consequence, it teaches us little about the nature of the problem. For example, the aforementioned result of Rothvoss [18] is formally a consequence of Theorem 2.2. But his proof provides an additional insight into the geometry of polytopes which is completely lacking in our setting. Problem 6.1. Find a more direct and self-contained proof of Theorem 2.2, at least for Σ 1 -formulas.
This would be interesting especially over the reals. Already the case of linear separation complexity from Section 3, where the Σ 1 -formula contains only linear inequalities or equalities, seems to require a new insight. On the other hand, the first ingredient of the current proof of Theorem 2.2 over algebraically closed fields, counting zero-patterns of a polynomial map, has been given a simple proof in [17]. Hence, the question may be easier in the algebraically closed setting.
Second, the upper and lower bounds given by Theorem 2.1 and 2.2 do not match exactly (they hardly can, for the constants are hidden in Theorem 2.2). Theorem 2.1 implies that every Boolean function can be defined by a Σ 3 -formula of size O(2 n/2 ). But for Σ 1 -or Σ 2 -formulas, we are left with the trivial upper bound of 2 n(1−o(1)) . Whether we can improve the constant in the exponent is intriguing already in the case of Σ 1 -formulas. Problem 6.2. Over R, can every f : {0, 1} n → {0, 1} be defined by a Σ 1 -formula of size O(2 (1−ε)n ) for some 0 < ε < 1?