The shifted partial derivative complexity of Elementary Symmetric Polynomials

We continue the study of the shifted partial derivative measure, introduced by Kayal (ECCC 2012), which has been used to prove many strong depth-4 circuit lower bounds starting from the work of Kayal, and that of Gupta et al. (CCC 2013). We show a strong lower bound on the dimension of the shifted partial derivative space of the Elementary Symmetric Polynomials of degree d in N variables for d < logN/ log logN . This extends the work of Nisan and Wigderson (Computational Complexity 1997), who studied the partial derivative space of these polynomials. Prior to our work, there have been no results on the shifted partial derivative measure of these polynomials. Our result implies a strong lower bound for Elementary Symmetric Polynomials in the homogeneous ΣΠΣΠ model with bounded bottom fan-in. This strengthens (under our degree assumptions) a lower bound of Nisan and Wigderson who proved the analogous result for homogeneous ΣΠΣ model (i.e. ΣΠΣΠ formulas with bottom fan-in 1). Our main technical lemma gives a lower bound for the ranks of certain inclusion-like matrices, and may be of independent interest.


Introduction
Motivation. In an influential paper of Valiant [23] the two complexity classes VP and VNP were defined, which can be thought of as algebraic analogues of Boolean complexity classes P and NP, respectively. Whether VP equals VNP or not is one of the most fundamental problems in the study of algebraic computation. It follows from the work of Valiant [23] that a super-polynomial lower bound for arithmetic circuits computing the Permanent implies VP = VNP.
The best known lower bound on uniform polynomials for general arithmetic circuits is Ω(N lg N ) [3] which is unfortunately quite far from the desired superpolynomial lower bound. Over the years, though there has been no stronger lower bound for general arithmetic circuits, many super-polynomial lower bounds have been obtained for special classes for arithmetic circuits [15,17,16].
A very interesting such subclass of arithmetic circuits is the class of boundeddepth arithmetic formulas 5 . The question of proving lower bounds for boundeddepth formulas and in particular depth 3 and 4 formulas has received a lot of attention subsequent to the recent progress in efficient depth reduction of arithmetic circuits [24,1,11,22]. This sequence of results essentially implies that "strong enough" lower bounds for depth-4 homogeneous formulas suffice to separate VP from VNP. More formally, it proves that any sequence {f N } N of homogeneous N -variate degree d = N O(1) polynomials in VP has depth-4 homogeneous formulas of size N O( √ d) . Hence, proving an N ω( √ d) lower bound for depth-4 homogeneous formulas suffices to separate VP from VNP.
Even more can be said about the depth-4 formulas obtained in the above results. For any integer parameter t ≤ d, they give a ΣΠΣΠ formula for f N where the layer 1 product gates (just above the inputs) have fan-in at most t and the layer 3 gates are again Π gates with fan-in O(d/t). We will refer to such formulas as ΣΠ [O(d/t)] ΣΠ [t] formulas. The depth-reduction results mentioned above produce a depth-4 homogeneous ΣΠ [O(d/t)] ΣΠ [t] formula of size N O((d/t)+t) and top fan-in N O(d/t) ; at t = √ d , we get the above depth-reduction result. The tightness of these results follows from recent progress on lower bounds for the model of ΣΠ [O(d/t)] ΣΠ [t] circuits. A flurry of results followed the groundbreaking work of Kayal [7], who augmented the partial derivative method of Nisan and Wigderson [15] to devise a new complexity measure called the shifted partial derivative measure, using which he proved an exponential lower bound for a special class of depth-4 circuits. Building on this, the first non-trivial lower bound for ΣΠ [O(d/t)] ΣΠ [t] formulas was proved by Gupta, Kamath, Kayal, and Saptharishi [5] for the determinant and permanent polynomials. This was further improved by Kayal, Saha, and Saptharishi [9] who gave a family of explicit polynomials in VNP the shifted partial derivative complexity of which was (nearly) as large as possible 6 and hence showed a lower bound of N Ω(d/t) for the top fan-in of ΣΠ [O(d/t)] ΣΠ [t] formulas computing these polynomials. Later, a similar result for a polynomial in VP was proved in [4] and this was subsequently strengthened by Kumar and Saraf [12], who gave a polynomial computable by homogeneous ΠΣΠ formulas which have no ΣΠ [O(d/t)] ΣΠ [t] formulas of top fanin smaller than N Ω(d/t) . Finally, using a variant of the shifted partial derivative measure, Kayal et al. [8] and Kumar and Saraf [13] were able to prove similar lower bounds for general depth-4 homogeneous formulas as well.
In this work, we investigate the shifted partial derivative measure of the Elementary Symmetric Polynomials, which is a very natural family of polynomials whose complexity has been the focus of many previous works [15,21,20,6]. Nisan and Wigderson [15] proved tight lower bounds on the depth-3 homogeneous formula complexity of these polynomials. Shpilka and Wigderson [21] and Shpilka [20] studied the general (i.e. possibly inhomogeneous) depth-3 circuit complexity of these polynomials, and showed that for certain degrees, the O(N 2 ) upper bound due to Ben-Or (see [21]) is tight.
Under some degree constraints, we show strong lower bounds on the dimension of the shifted partial derivative space of these polynomials, which implies that the Elementary symmetric polynomial on N variables of degree d cannot be computed by a ΣΠ [O(d/t)] ΣΠ [t] circuit of top fan-in less than N Ω(d/t) . This strengthens the result of Nisan and Wigderson [15] for these degree parameters.
By the upper bound of Ben-Or mentioned above, this also gives the first example of an explicit polynomial with small ΣΠΣ circuits for which such a strong lower bound is known. Results. We show that, for a suitable range of parameters, the shifted partial derivative measure of the N -variate elementary symmetric polynomial of degree It was observed by [5] that for any homogeneous multilinear polynomial f on N variables of degree d, we have dim ∂ k S d N ≤ ≤ N + · N k , which is close to the numerator in the above expression. The theorem above should be interpreted as saying that the dimension is not too far from this upper bound.
A corollary of our main result is an N Ω(d/t) lower bound on the top fan-in of any  computing S d N must have top fan-in at least N Ω(d/t) . In particular 7 , any homogeneous ΣΠΣΠ circuit C with bottom fan-in bounded by t computing S d N must have top fan-in at least N Ω(d/t) .
By the above depth reduction results, this lower bound is tight up to the constant factor in the exponent. Before our work, [15] proved a lower bound for S d N of N Ω(d) for all d, however with respect to ΣΠΣ circuits (i.e. the case t = 1). Techniques. The analysis of the shifted partial derivative measure for any polynomial essentially requires the analysis of the rank of a matrix arising from the shifted partial derivative space. In this work, we analyse the matrix arising from the shifted partial derivative space of the symmetric polynomials. Our analysis is quite different from previous works (such as [4,8,13]), which are based on either monomial counting (meaning that we find a large identity or upper triangular submatrix inside our matrix) or an analytic inequality of Alon [2].
In our analysis of the shifted partial derivative space, we define a more complicated version of the Inclusion matrix (known to be full rank) and lower bound its rank by using a novel technique, which we describe in the next section.
Disjointness and inclusion matrices arises naturally in other branches of theoretical computer science such as Boolean circuit complexity [18], communication complexity [14,Chapter 2] and also in combinatorics [25,10]. Therefore, we believe that our analysis of the Inclusion-like matrix arising from the symmetric polynomial may find other applications. Organisation of the paper. In Section 2, we set up basic notation, fix the main parameters, and give a high-level outline of our proof of Theorem 1. In Section 3 we give the actual proof. The formula size lower bounds from Theorem 2 is established in Section 4. Due to space constraints, several proofs are omitted.

Proving Theorem 1: High-level outline
Notation: For a positive integer n, we let Our complexity measure is the dimension of this space, i.e., dim( ∂ k f ≤ ) [7,5]. Let M N the set of monomials of degree at most over the variables X. For integers n 1 , . . . , n p , let Given p > 0, a monomial m ∈ M N can be uniquely written as For a finite set S, let U(S) denote the uniform distribution over the set S. We assume that we are working over a field F of characteristic zero. Our results also hold in non-zero characteristic, but the first step of our proof (Lemma 1) becomes a little more cumbersome (this part is omitted in this version).
Proof Outline Our lower bound on dim( ∂ k S d N ≤ ) proceeds in 3 steps.
Step 1: We choose a suitable subset S of the partial derivative space. It is convenient to work with a set that is slightly different from the set of partial derivatives themselves. To understand the advantage of this, consider the simple setting where we are looking at the partial derivatives of the degree-2 polynomial S 2 N of order 1. It is not difficult to show that the partial derivative with respect to variable x i is r i := j =i x j . Over characteristic zero, this set of polynomials is known to be linearly independent. One way to show this is by showing that each polynomial x i can be written as a linear combination of the r j s; explicitly, one Since the x i s are distinct monomials, they are clearly linearly independent and we are done. This illustrates the advantage in moving to a "sparser" basis for the partial derivative space. We do something like this for larger d and k (Lemma 1).
Step 2: After choosing the set S, we construct the set P of shifts of S (actually, we will only consider a subset of P) and lower bound the rank of the corresponding matrix M . To do this, we also prune the set of rows of the matrix M . In other words, we consider a carefully chosen set of monomials M and project each polynomial in P down to these monomials. The objective in doing this is to infuse some structure into the matrix while at the same time preserving its rank (up to small losses). Having chosen M, we show that the corresponding submatrix can be block-diagonalized into matrices each of which is described by a simple inclusion pattern between the (tuples of) sets labelling its rows and columns. This is done in Lemmas 4, 5, 6.
Step 3: The main technical step in the proof is to lower bound the rank of the inclusion pattern matrix mentioned above with an algebraic trick. We first find a full-rank matrix that is closely related to our matrix and then show that the columns of our matrix can (with the aid of just a few other columns) generate the columns of the full-rank matrix.
The main parameters Let N, d and t be fixed. Throughout, we assume that d ≤ (1/10) lg N/ lg lg N . Let τ = 4t + 1, δ = 1/(2τ ) and = N 1−δ . Finally, let k be such that d − k = τ k. (In particular, we assume that 4t + 2 ≤ d.) The following are easy to verify for our choice of parameters: Remark 1. In the above setting of parameters, d has to be divisible by τ + 1 = 4t + 2. For ease of exposition, we present the proof for these parameters. Our proof can be modified so that it works for any d large enough as compared to τ (this part is omitted in this short version).
Hence, a lower bound on the dimension of span P is also a lower bound on dim( ∂ k S d N ≤ ).

Choice of shifts:
Step 2 of the proof Instead of considering arbitrary shifts m as in the definition of P, we will consider shifts by monomials m with various values of |supp i (m)| for i ∈ [τ ]. We first present a technical lemma that is needed to establish the lower bound. It is a concentration bound for support sizes in random monomials.
Remark 2. By Lemma 2, for any good signature (s 1 , . . . , s τ ), we have  i τ − 1 and r τ (s) = s τ + k; also, let r(s) = i r i (s) = i s i + k. Usually the signature s will be clear from context, and we use r i and r instead of r i (s) and r(s) respectively. The matrix M (s 1 , . . . , s τ ) is the matrix whose columns are indexed by polynomials m · p T ∈ P(s 1 , . . . , s τ ) and rows by the monomials w ∈ M +d−k N (r 1 , . . . , r τ ). The coefficient in row w and column m · p T is the coefficient of the monomial w in the polynomial m · p T .
Note that the columns of M (s 1 , . . . , s τ ) are simply the polynomials in P(s 1 , . . . , s τ ) projected to the monomials that label the rows. In particular, a lower bound on the rank of M (s 1 , . . . , s τ ) implies a lower bound on the rank of the vector space spanned by P(s 1 , . . . , s τ ).
It is not too hard to see that M (s 1 , . . . , s τ ) has |P(s 1 , . . . , s τ )| columns but only |P(s1,...,sτ )| ( sτ +k k ) rows. Hence, the rank of the matrix is no more than the number of rows in the matrix. The following lemma, proved in Section 3.3, shows a lower bound that is quite close to this trivial upper bound.
Since P(s 1 , . . . , s τ ) ⊆ P ⊆ ∂ k f ≤ , the above immediately yields a lower bound on dim( ∂ k f ≤ ). Our final lower bound, which further improves this, is proved by considering polynomials corresponding to a set of signatures.

Bounding the rank of M : Step 3 of the proof
We now prove the lower bound on the rank of the matrix M (s 1 , . . . , s τ ) as claimed in Lemma 3. We block diagonalize it with matrices that have a simple combinatorial structure (their entries are 0 or 1 depending on intersection patterns of the sets that label the rows and columns). We then lower bound the ranks of these matrices: this is the main technical step in the proof.
k by our choice of parameters r τ and k. Putting this together with the fact that w = X A · m for |A| = d − k, we see that X A can only 'contribute' to the "degree at most τ " part of m: formally, w = X A · m and hence,w =m.
Conversely, assume thatw =m and the inclusions T ⊆ R 1 , Hence the entry in row w and column m · p T is 1. We now lower bound the rank of each block in the block diagonalization.
Lemma 7. For a good signature s = (s 1 , . . . , s τ ), and the corresponding (r 1 , . . . , r τ ) as in Definition 3, Proof Sketch. Let M be a diagonal block of the matrix M (s 1 , . . . , s τ ). Recall from Lemma 6 that such a diagonal block is defined by a monomialw and a subset R ⊆ [N ]. Rows of this block are labelled with all monomials w ≡ [w, R 1 , . . . , R τ ] such that R 1 ∪ . . . ∪ R τ = R and columns of this block are labelled with all polynomials m · p T where m ≡ [w, S 1 , . . . , S τ ] is such that T ∪ S 1 ∪ . . . ∪ S τ = R. First, we set up some notation.
The rows and columns of M are indexed by elements of X and Y respectively (Lemma 6). Let I be the identity matrix with rows/columns indexed by elements of X. We define two auxiliary matrices M 1 and M 2 as follows. The rows and columns of M 1 are indexed by elements of X. The entries of M 1 are in {0, 1} and are defined as follows: The rows and columns of M 2 are indexed by elements of X and Z respectively. The entries of M 2 are in {0, 1} and are defined as follows: Our proof proceeds as follows: 1. Show that the columns of M and M 2 together span the columns of M 1 . 2. Show that the columns of M 1 and M 2 together span the columns of I.
It then follows that which is what we had set out to prove. To prove steps 1 and 2, we describe columns of M , M 1 , M 2 , I using functions that express whether two partitions are related in a certain way. In particular, we express the inclusion relations described in Lemma 5, which characterise the non-zeroes in M . The functions we use are mutlivariate polynomials, whose evaluations at the characteristic vectors of row indices give the entries in the rows. A careful choice of a small basis for these functions yields the result (details are omitted in this short version).
Lemma 3 can now be proved using the block-diagonal decomposition (Lemma 6) and the rank lower bound (Lemma 8).

Putting it together
We now have all the ingredients to establish that the shifted partial derivative measure of S d N is large. Proof. (of Theorem 1.) By Lemma 1, dim ∂ k S d N ≤ ≥ dim(span(P)). This in turn is at least as large as rank(M (S)), since M (S) is a submatrix of the matrix that describes a basis for P. We now choose a well-separated set of good signatures S and apply Lemmas 4, 3, and 2 to lower bound the rank of M (S). This will allow give us our lower bound on dim ∂ k S d N ≤ . Let us see how to choose S. Let S 0 denote the set of all good signatures. For integers d 1 , . . . , d τ ∈ [2d], denote by S(d 1 , . . . , d τ ) the signatures (s 1 , . . . , s τ ) ∈ S 0 such that s i ≡ d i (mod 2d) for each i ∈ [τ ]. It is easily checked that for any choice of d 1 , . . . , d τ ∈ [2d], the set of signatures S(d 1 , . . . , d τ ) is well separated. Since there are (2d) τ choices for d 1 , . . . , d τ , there must be one such that We fix d 1 , . . . , d τ so that the above holds and let S = S(d 1 , . . . , d τ ). This is the set of signatures we will consider. By Lemma 4, we know that the rank of M (S) is equal to the sum of the ranks of M (s) for each s ∈ S. Hence, by Lemma 3, we have Plugging this bound into (3), we see that where the second inequality follows from the fact that all signatures s ∈ S are good, so s τ ≤ 3ŝ τ /2 and hence sτ +k k ≤ s k τ ≤ (3ŝ τ /2) k ≤ (3 √ N /2) k (using Lemma 2); the final inequality is a consequence of Equation (2).
Finally, by Lemma 2 (see also Remark 2), (1)) · N + , which along with the above computation yields the claimed lower bound on rank(M (S)).

Remark 3.
For any multilinear polynomial F (X) on N variables, the quantity dim ∂ k F ≤ is at most the number of monomial shifts -which is N + -times the number of possible partial derivatives of order k, which is at most N k . Our result says that this trivial upper bound is (in some sense) close to optimal for the polynomial S d N (the ( √ N ) k factor in the denominator can be made N εk for any constant ε > 0, see the discussion at the end of the proof of Theorem 2). All previous lower bound results using the shifted partial derivative method also obtain similar statements [5,4,12,13].

Lower bound on the size of depth four formulas
In this section, we establish the lower bounds claimed in Theorem 2. As in [5], we say that a ΣΠΣΠ formula C is a ΣΠ [D] ΣΠ [t] formula if the product gates at level 1 (just above the input variables) have fan-in at most t and the product gates at level 3 have fan-in bounded by D.
The following is implicit in [5] and is stated explicitly in [9].
It is not hard to see that the above proof idea (with some changes in parameters) can be made to give lower bounds of N Ω(d/t) for D ≤ N 1−ε for any constant ε > 0. Specifically, choose τ = Θ t ε (instead of 4t + 1) and δ such that δτ = 1 − Θ(ε) (instead of 1 2 ) in the entire proof. We omit the details.