Anti-concentration for polynomials of independent random variables

We prove anti-concentration results for polynomials of independent random variables with arbitrary degree. Our results extend the classical Littlewood-Offord result for linear polynomials, and improve several earlier estimates. We discuss applications in two different areas. In complexity theory, we prove near optimal lower bounds for computing the Parity, addressing a challenge in complexity theory posed by Razborov and Viola, and also address a problem concerning OR functions. In random graph theory, we derive a general anti-concentration result on the number of copies of a fixed graph in a random graph.


Introduction
Let ξ be a Rademacher random variable (taking value ±1 with probability 1/2) and A = {a 1 , . . . , a n } be a multi-set in R (here n → ∞). Consider the random sum S := a 1 ξ 1 + · · · + a n ξ n where ξ i are iid copies of ξ.
In 1943, Littlewood and Offord, in connection with their studies of random polynomials [20], raised the problem of estimating P(S ∈ I) for arbitrary coefficients a i . They proved the following remarkable theorem: Theorem 1.1. There is a constant B such that the following holds for all n. If all coefficients a i have absolute value at least 1, then for any open interval I of length 1, P(S ∈ I) ≤ Bn −1/2 log n.
Shortly after the Littlewood-Offord result, Erdős [12] removed the log n term to obtain the optimal bound using an elegant combinatorial proof. Littlewood-Offord type results are commonly referred to as anticoncentration (or small-ball) inequalities. Anti-concentration results have been developed by many researchers through decades, and have recently found important applications in the theories of random matrices and random polynomials; see, for instance, [22] for a survey.
The goal of this paper is to extend Theorem 1.1 to higher degree polynomials. Consider (1) The first result in this direction, due to Costello, Tao, and the third author, [9], is There is a constant B such that the following holds for all d, n. If there are mn d−1 coefficients a S with absolute value at least 1, then for any open interval I of length 1, P(P (ξ 1 , . . . , ξ n ) ∈ I) ≤ Bm The exponent 1 2 (d 2 +d)/2 tends very fast to zero with d, and it is desirable to improve this bound. For the case d = 2, Costello [8] obtained the optimal bound n −1/2+o (1) . In a more recent paper [23], Razborov and Viola proved This theorem improves the bound in Theorem 1.2 to m − 1 d2 d+1 via a simple counting argument.
Their result has been extended by Mossel, O'donnell and Oleszkiewicz [21] to general variables, at a cost of an extra term on the right hand side, which involves the regularity of P (see Section 3).
The goal of this paper is to further improve these anti-concentration bounds, with several applications in complexity theory. Our new results will be nearly optimal in a wide range of parameters. Let [n] = {1, 2, . . . , n}. Following [23], we first introduce a definition Definition 1.5. For a degree d multi-linear polynomial of the form (1), the rank of P , denoted by rank(P ), is the largest integer r such that there exist disjoint sets S 1 , . . . , S r ⊆ [n] of size d with |a Sj | ≥ 1, for j ∈ [r].
Our first main result concerns the Rademacher case. Let ξ i , i = 1, . . . , n be iid Rademacher random variables.
Theorem 1.6. There is an absolute constant B such that the following holds for all d, n. Let P be a polynomial of the form (1) whose rank r ≥ 2. Then for any interval I of length 1, P(P (ξ 1 , . . . , ξ n ) ∈ I) ≤ min For the case when d is fixed, it has been conjectured [22] that P(P (ξ 1 , . . . , ξ n ) ∈ I) = O(r −1/2 ). This conjectural bound is a natural generalization of Erdos-Littlewood-Offord result and is optimal, as shown by taking P = (ξ 1 + · · · + ξ n ) d , with n even. For this P , the rank r = Θ(n) and P(|P | ≤ 1/2) = P(P = 0) = Θ(n −1/2 ). Our result confirms this conjecture up to the sub polynomial term exp(Bd 2 (log log r) 2 ).
In applications it is important that we can allow the degree d tends to infinity with n. Our bounds in Theorem 1.6 are non-trivial for degrees up to c log r/ log log r, for some positive constant c. Up to the log log term, this is as good as it gets, as one cannot hope to get any non-trivial bound for polynomials of degree log 2 r. For example, the degree d polynomial on 2 d · d variables defined by P (ξ) = , where ξ ij are iid Rademacher random variables, has r = 2 d and P(P (ξ) = 0) = Ω(1).
Next, we generalize our result for non-Rademacher distributions. As a first step, we consider the p-biased distribution on the hypercube. For p ∈ (0, 1), let µ p denote the Bernoulli variable with p-biased distribution: P x∼µp (x = 0) = 1 − p, P x∼µp (x = 1) = p and let µ n p be the product distribution on {0, 1} n . Theorem 1.7. There is an absolute constant B such that the following holds. Let P be a polynomial of the form (1) whose rank r ≥ 2. Let p be such thatr := 2 d α d r ≥ 3 where α := min{p, 1 − p}. Then for any interval I of length 1, The distribution µ n p plays an essential role in probabilistic combinatorics. For example, it is the ground distribution for the random graphs G(N, p) (with n := N 2 ). We discuss an application in the theory of random graphs in the next section.
Finally, we present a result that applies to virtually all sets of independent random variables, with a weak requirement that these variables do not concentrate on a short interval. Theorem 1.8. There is an absolute constant B such that the following holds. Let ξ 1 , . . . , ξ n be independent (but not necessarily iid) random variables. Let P be a polynomial of the form (1) whose rank r ≥ 2. Assume that there are positive numbers p and such that for each 1 ≤ i ≤ n, there is a number y i such that min{P(ξ i ≤ y i ), P(ξ i > y i )} = p and P(|ξ i − y i | ≥ 1) ≥ . Assume furthermore thatr := (p ) d r ≥ 3. Then for any interval I of length 1 Notice that even in the gaussian case, Theorem 1.8 is incomparable to Theorem 1.4. If we use Theorem 1.4 to bound P(P ∈ I) for an interval I of length 1, then we need to set = Var(P ) −1/2 , and the resulting bound becomes B (VarP ) 1/2d . For sparse polynomials, it is typical that r is much larger than (VarP ) 1/d and in this case our bound is superior. To illustrate this point, let us fix a constant d > c > 0 and consider where a S are iid random Bernoulli variables with P(a S = 1) = n −c . It is easy to show that the following holds with probability 1 − o(1) • For any set X ⊂ {1, . . . , n} of size at least n/2, there is a subset S ⊂ X, |S| = d, such that a S = 1.
• The number nonzero coefficients is at most n d−c .
In other words, these two conditions are typical for a sparse polynomial with roughly n d−c nonzero coefficients. On the other hand, if the above two conditions holds, then we have Var(P ) ≤ n d−c and r ≥ n/2d (by a trivial greedy algorithm). Our bound implies that P(P ∈ I) ≤ C(d)n −1/2+o (1) while Cabery-Wright bound only gives The rest of the paper is organized as follows. In Section 2 below, we discuss applications in complexity theory and graph theory, with one long proof delayed to Section 7. Sections 3 and 4 are devoted to some combinatorial lemmas. In Section 5, we treat polynomials with Rademacher variables. The generalizations are discussed in Section 6. All asymptotic notations are used under the assumption that n tends to infinity. All the constants are absolute, unless otherwise noted.

Applications
2.1. Applications in complexity theory. We use our anti-concentration results to prove lower bounds for approximating Boolean functions by polynomials in the Hamming metric. The notion of approximation we consider is as follows.
We define d µ, (f ) to be the least d such that there is a degree d polynomial which -approximates f with respect to µ.
An alternate (dual) way to view the above notion is in terms of distributions over low-degree polynomials-"randomized polynomials"-which approximate the function in the worst-case. In particular, by Yao's min-max principle, d µ, (f ) ≤ d for every distribution µ if and only if there exists a distribution D over degree at most d polynomials which approximates f in the worst-case: for all x, Approximating Boolean functions by polynomials in the Hamming metric was first considered in the works of Razborov [24] and Smolensky [25] over fields of finite characteristic as a technique for proving lower bounds for small-depth circuits. This was also studied in a similar context over real numbers by the works of [4], [2]; the latter work uses them to prove lower bounds for AC(0). More recently, in a remarkable result, Williams [27] (also see [28,1]) used polynomial approximations in Hamming metric for obtaining the best known algorithms for all-pairs shortest path and other related algorithmic questions. Here, we study lower bounds for the existence of such approximations.
In [23], Razborov and Viola introduced another way to look at this problem. For two functions f, g : {0, 1} n → R, define their "correlation" to be the quantity where x is uniformly distributed over {0, 1} n . They highlighted the following challenge Challenge. Exhibit an explicit boolean function f : {0, 1} n → {0, 1} such that for any real polynomial P of degree log 2 n, one has Cor This challenge is motivated by studies in complexity theory and has connections to many other problems, such as the famous rigidity problem; see [23] for more discussion.
The Parity function seems to be a natural candidate in problems like this. Razborov and Viola, using Theorem 1.3, proved For all sufficiently large n, Cor n (par n , P ) ≤ 0 for any real polynomial P of degree at most 1 2 log 2 log 2 n.
With Theorem 1.6, we obtain the following improvement, which gets us within the Challenge by a log log n factor. Proof. Let d be the degree of P . Following the arguments in the proof of [23, Theorem 1.1], we can assume that P contains at least √ n pairwise disjoint subsets S i each of size d and non-zero coefficients. It suffices to show that the probability that P outputs a boolean value is at most 1/2. By replacing P by q(x 1 , . . . , x n ) := P ((x 1 + 1)/2, . . . , (x n + 1)/2), one can convert the problem into polynomial of the same degree defined on {±1} n , in other words, on Rademacher variables. Then by Theorem 1.6, this probability is bounded by 2B d 4/3 log 1/2 n n 1/(8d+2) . This is less than 1/2 for every d ≤ log n 15 log log n when n is sufficiently large.
Approximating AND/OR. One of the main building blocks in obtaining polynomial approximations in the Hamming metric is the following result for approximating the OR function 2 .
By iteratively applying the above claim, Aspnes, Beigel, Furst, and Rudich [2] showed that AC(0) circuits of depth d have -approximating polynomials of degree at most O(((log s)(log(1/ ))) d · (log(s/ )) d−1 ). We prove that the following lower bound for such approximations: There is a constant c > 0 and a distribution µ on {0, 1} n such that for any polynomial P : {0, 1} n → R of degree d < c(log log n)/(log log log n), To the best of our knowledge no ω(1) lower bound was known for approximating the OR function. We give an explicit distribution (directly motivated by the upper bound construction in [2]) under which OR has no 1/3-error polynomial approximation. The distribution µ on {0, 1} n we consider is as follows: (1) With probability 1/2 output x = 0.
(2) With probability 1/2 pick an index i ∈ [D] uniformly at random and output x ← µ n 2 −a i for some suitably chosen parameters a, D.
The analysis then proceeds at a high level as in the lower bound for parity. However, we need some extra care with the inductive argument as unlike for parity, we can't consider arbitrary fixings of subsets of coordinates of the OR function. We get around this hurdle by instead only considering fixing parts of the input to 0 and decreasing the bias p to make sure that these coordinates are indeed set to 0 with high probability. The details are defered to Section 7.

2.2.
The number of small subgraphs in a random graph. Consider the Erdős-Rényi random graph G(N, p). Let H be a small fixed graph (a triangle or C 4 , say). The problem of counting the number of copies of H in G(N, p) is a fundamental topics in the theory of random graphs (see, for instance, the text books [5,16]). In fact, one can talk about a more general problem of counting the number of copies of H in a random subgraph of any deterministic graph G on N vertices, formed by choosing each edges of G with probability p. We denote the F (H, G, p) this random variable. In this setting we understand that H has constant size, and the size of G tends to infinity.
It has been noticed that F can be written as a polynomial in term of the edge-indicator random variables. For example, the number of C 4 (circle of length 4) is where the summation is over all quadruple ijkl which forms a C 4 in G and the Bernoulli random variable ξ ij represents the edge ij. Clearly, any polynomial of this type has n = e(G) iid Bernoulli p-bias variables ξ ij , and its degree equals the number of edges of H. The rank r of F is exactly the size of the largest collection of edge disjoint copies of H in G.
The polynomial representation has been useful in proving concentration (i.e.large deviation ) results for F (see [19,26], for instance). Interestingly, it has turned out that one can also use this to derive anti-concentration result, in particular bounds on the probability that the random graph has exactly m copies of H.
By Theorem 1.7, we have Corollary 2.6. Assume that p is a constant in (0, 1). Then for fixed H and any integer m which may depend on G where r is the size of the largest collection of edge-disjoint copies of H in G. In particular, if G = K n , then A similar argument can be used to deal with the number of induced copies of H, which can be also written as a polynomial with degree at most v 2 , with v being the number of vertices of H. Details are left out as an exercise.
Finally, let us mention that in a recent paper [13], Gilmer and Kopparty obtained a precise estimate for P(F (H, K n , p) = m) in the case when H is a triangle. 3 Their approach relies on a careful treatment of the characteristic function. It remains to be seen if this method applies to our more general setting.

Regular polynomials
Our proofs of anti-concentration bounds use the techniques developed in the context of bounding the noise sensitivity of polynomial threshold functions in the works [10,15,18]. In particular, we use the concept of regular polynomials, the invariance principle of Mossel, O'donnell, and Oleszkiewicz [21], and the regularity lemma of [10,15]. In this and the following section, we discuss these tools.
To start, we define regular polynomials and discuss an anti-concentration result for them. The influence of the i-th variable on P is defined to be Inf i = Inf i (P ) = i∈S a 2 S . Since Var(P ) = S =∅ a 2 S , we have Assume the random variables are ordered such that Inf 1 ≥ Inf 2 ≥ · · · ≥ Inf n . Let τ > 0, the τ -critical index of P is the least i such that Inf i+1 ≤ τ n j=i+1 Inf j . If it does not hold for any i, we say that the P has τ -critical index ∞. If P has τ -critical index 0, we say that P is τ -regular. The following is a corollary of strong results from [7] and [21].

A regularization lemma
Proposition 3.1 would yield our desired bound in Theorem 1.6 if τ is small (say at most r −1 ). However, there is no guarantee for this assumption. In order to go from the regular case to the general case, we will use the following regularization lemma, whose proof is a slight modification of [10, Theorem 1.1] (the version below gives us better quantitative bounds in our applications). The main idea is to condition on the random variables with large influence. With high probability, the resulting polynomial is either regular or dominated by its constant part.
For a set S ⊂ [n], we consider a random assignment ρ ∈ {±1} |S| which assigns values ±1 to variables (ξ i ) i∈S . We say that "ρ fixes S". For each such ρ, the polynomial P becomes a polynomial of (ξ i ) i / ∈S which is denoted by P ρ . We write P ρ = P * (ρ) + q ρ (ξ i ) i / ∈S where P * is the constant part of P ρ consisting of monomials of (ξ i ) i∈S only. For C > 0 and 0 < β < 1, we say that P ρ is (C, β)-tight if and Note that it is always true that E (ξi) i / ∈S q ρ = 0. We shall see later that (5) actually implies (6). Proposition 4.1. There exist absolute constants C and C such that the following holds true. Let P (ξ 1 , . . . , ξ n ) be a a degree-d polynomial, let 0 < τ, β < 1 3 . Let α = C(d log log 1/β + d log d) and τ = (C d log d log 1 τ ) d τ . Let M ∈ N such that M α τ ≤ n. Then, there exists a decision tree of depth at most M α τ with P at the root, variables ξ i 's at each internal node, and a degree-d polynomial P ρ at each leaf ρ, with the following property: with probability at least 1 − (1 − 1 2C d ) M , a random path from the root P reaches a leaf ρ such that P ρ is either τ -regular or (C, β)-tight.
Proof. First, we consider the case when the τ -critical index of P is large. For a positive integer K, denote by [K] the set {1, . . . , K}.
Lemma 4.2. There exists a constant C such that the following holds true. Let 0 < τ, β < 1 3 be deterministic constants that may depend on n. Suppose that P has τ -critical index at least K = α τ , where α = C(d log log 1/β + d log d). Then for at least 1 2C d fraction of restrictions ρ fixing [K], the polynomial P ρ is (C, β)-tight.
Roughly speaking, the (C, β)-tightness asserts that the resulting polynomial P ρ has large constant term, compared to the random part, and therefore, it concentrates around the constant part.
From (7) and (9), with probability at least 1 2C d over all possible ρ, (5) happens. For each such ρ, using Theorem 4.4 for q, we obtain which gives (6) and completes the proof of Lemma 4.2.
Next, we consider the case when P has small critical index. We'll use the following Lemma [10, Lemma 3.9] which asserts that by assigning values to the random variables with large influences, with significant probability, one gets a regular polynomial. There exists an absolute constant C such that the following holds. Let 0 < τ < 1 3 . Assume that P has τ -critical index k ∈ [n]. Let ρ be a random restriction fixing [k], and τ = (C d log d log 1 τ ) d τ . With probability at least 1 2C d over the choice of ρ, the restricted polynomial P ρ is τ -regular.
(2) The τ -critical index of P is at least α τ and the conclusion of Lemma 4.2 holds. (3) The τ -critical index of P is k < α τ and the conclusion of Lemma 4.5 holds.
Now, we are ready for the proof of Proposition 4.1. The strategy is to apply Lemma 4.6 repeatedly M times. At first, if P is not τ -regular, we apply Lemma 4.6 to P and obtain an initial tree of depth at most α τ . We know that at least 1 2C d fractions of the restricted P ρ are "good", i.e., either τ -regular or (C, β)-tight. We keep them as leaves of our final tree and leave them untouched during the next stages. At the second stage, for each of the remaining "bad" polynomials P ρ , we order the unrestricted variables in decreasing order of their influences in P ρ , and then apply lemma 4.6 to it. Note that probability of reaching a bad leaf in this second tree is at most ( Continuing in this manner M times, we get the desired tree and complete the proof of Theorem 4.1.

Proof of Theorem 1.6
The high-level argument for the first bound of 1.6 is as follows. If the polynomial is sufficiently regular, we apply the anti-concentration property of regular polynomials; the latter property in turn follows from the invariance principle and a similar anti-concentration property for polynomials with respect to the Gaussian distribution.
To complete the argument, we use the regularity lemma which shows that any polynomial can be written as a small-depth decision tree where most leaves are labeled by polynomials which are either (1) Regular or (2) Polynomials which are fixed in sign with high probability over a uniformly random input. In the first case, you get a regular polynomial of high rank (as the tree is shallow) and we apply the previous argument. In the second case, we argue directly that the probability of taking the value 0 is small.
To prove the second bound of 1.6, we follow the same conceptual approach but adopt a more careful analysis following the work of Kane [17]. We defer the details to the actual proof.

First bound.
Without loss of generality, we can assume that I is centered at 0 and r is larger than some constant. We can also assume that d ≤ 2 log r log log r because otherwise dr −1/(4d+1) ≥ 1 and the desired bound becomes trivial.
Let τ ∈ (0, 1 3 ) and let β = 1 r . We will use Proposition 4.1 to reduce to the regular case. Let α, τ be as in that Proposition, i.e., α = C(d log log 1 β + d log d) and τ = (C d log d log 1 τ ) d τ . Let M = rτ 2α . Call a leaf of the decision tree good if P ρ is either τ -regular or (C, β)-tight and bad otherwise. Now, following our decision tree, we have P(P ∈ I) ≤ P(reaching a bad leaf) + ρ is a good leaf P(reaching ρ and P ρ ∈ I) ρ is a good leaf P(reaching ρ and P ρ ∈ I) ρ is a good leaf P(reaching ρ and P ρ ∈ I). (10) Now, for each good leaf ρ, P ρ is either (C, β)-tight or τ -regular. Let S be the set of indices i of the internal nodes ξ i that lead to ρ. In other words, ρ fixes S. Since the depth of the decision tree is at most M α τ ≤ r 2 , one has |S| ≤ r 2 and so q ρ contains at least r/2 monomials of degree d each, with mutually disjoint sets of random variables, and with coefficients at least 1 in magnitude. Therefore, Assume P ρ is (C, β)-tight, then by (5), one has |P * (ρ)| = Ω( √ r) ≥ 2. This together with (6) give P(reaching ρ and P ρ ∈ I) = P ξi,i∈S (reaching ρ)P ξi,i / ∈S (P ρ ∈ I) Next, assume that P ρ is τ -regular. By Proposition 3.1, P(reaching ρ and P ρ ∈ I) = P ξi,i∈S (reaching ρ)P ξi,i / ∈S (P ρ ∈ I) ≤ P ξi,i∈S (reaching ρ) Cd r 1/2d + Cdτ 1/(4d+1) ≤ P ξi,i∈S (reaching ρ) Cd Since the events that the root P reaches different leaves on the tree are disjoint, from (10), (11), and (12), we get that for any 0 < τ < 1 3 , Set τ = 8C d+1 log r(d log log r+d log d) r then τ < 1 3 because we assumed that d ≤ 2 log r log log r . The first term on the right of (13) becomes 2r −2 and the third term is bounded from above by B d 4/3 log 1/2 r r 1/(4d+1) . This completes the proof of the first bound.

Second bound.
We next build on the arguments in the previous section to prove the second bound in Theorem 1.6.
The main ingredient in proving the second bound is the following technical lemma of [18] which says that a random restriction of a sufficiently regular polynomial will likely have a much larger expectation compared to its standard-deviation. This is useful because polynomials with large expectation relative to standarddeviation have small probability of vanishing by tail bounds such as Theorem 4.4. In case the tail bound does not give a sufficiently good bound, we recurse on the new restricted polynomial. To state the lemma we need the following definition: For γ ≥ 0, call a polynomial P : R n → R γ-spread if Var (P (ξ 1 , . . . , ξ n )) 1/2 ≥ |E(P (ξ 1 , . . . , ξ n ))|/γ. Proposition 5.1. Let b, n be such that b|n. Let P : R n → R be a non-constant τ -regular degree d polynomial. Let S 1 , . . . , S b be a partition of [n] into equal-sized blocks. For ∈ [b], and an assignment ξ l ∈ {1, −1} [n]\S to the variables not in S , let P ξ : R S → R denote the polynomial obtained by fixing the variables not in S to ξ l . Then, where for clarity, the assignments ξ l for different l are independent.
In particular, there exists an index l ∈ [b], such that For the proof, we need the following definitions from [17]: • For a function f : R n → R and a vector v ∈ R n , D v f (x) = v · ∇f (x).
where for clarity, the assignments ξ l for different l are independent.
Proof. Notice that the right-hand side of (14) doesn't change if the assignments ξ l are obtained by choosing n random variables ξ 1 , . . . , ξ n and then looking at the b different restrictions ξ l . The lemma is then proved in [17,Proposition 19] (essentially Equation (4)).
Combining the above two claims gives us the proposition.
Proof of Proposition 5.1. For any index ∈ [b], we have The claim now follows as α(P ) ≤ 1 by definition.
We are now ready to prove the second bound of Theorem 1.6. Similar to the proof of the first bound, without loss of generality, we can assume that I = [−1, 1], r is sufficiently large, and that d ≤ √ log r log log r . Let, f (r, d) = max{P(P (ξ) ∈ I) : P degree d polynomial with rank(P ) ≥ r}.
Let P be a degree d multi-linear polynomial with rank(P ) = r achieving the minimum f (r, d). For fixed parameters τ ∈ (0, 1/3) and γ > 2 to be chosen later, let β = 1 r and let T be a decision tree as guaranteed by Proposition 4.1 with M = rτ 2α where α and τ are as in that Proposition. Then the depth of the tree is at most r 2 , and as in the proof of the first bound, Now, consider a leaf ρ so that Q ≡ P ρ is τ -regular. Note that rank(Q) ≥ r/2 and in particular Q is non-constant. Fix b < r/4, a parameter to be chosen later. Fix a partition S 1 , . . . , S b of the variables of Q such that for ∈ [b], the restricted polynomials Q obtained by fixing the variables not in S each satisfy rank(Q ) ≥ rank(Q)/b (this can be done for instance by first partitioning the variables witnessing rank(Q)). Note that if the number of variables in Q is not divisible by b, we only need to add a few variables to Q without affecting its output nor its regularity. Now, by Proposition 5.1 applied to the polynomial Q, there exists ∈ [b] such that the polynomial Q obtained by a random assignment to the variables not in A is γ-spread with probability at most Finally, to bound the last term, observe that if Q is not γ-spread and not identically zero, then where in the next to last inequality, we use the inequalities |E(Q )| ≥ γ.
6. General distributions 6.1. Proof of Theorem 1.7. We reduce the p-biased case to the uniform distribution at the expense of a loss in the rank of the polynomial and then apply Theorem 1.6.
Our assumption 2 d p d r ≥ 3 guarantees that log log(2 d p d r) = Ω(1) and hence by choosing the implicit constants on the right-hand side of Theorem 1.7 to be sufficiently large, we can assume that 2 d p d r is greater than 100 (say).
By the classical Chernoff's bound we have, for 0 < γ < 1, P(|X − EX| ≥ γEX) ≤ 2e −γ 2 EX/3 . Thus, we conclude that with probability at least 1 − exp(−2 d−1 p d r/6), there are at least 2 d−1 p d r indices j with |b j | ≥ 1. Conditioning on this event, we obtain a polynomial of degree d in terms of η 1 , . . . , η n which has rank at least 2 d−1 p d r. The theorem now follows from applying Theorem 1.6 to this polynomial and noting that the additional error of exp(−2 d−1 p d r/6) is smaller than both terms from Theorem 1.6.
6.2. Proof of Theorem 1.8. By replacing P (x 1 , . . . , x n ) by Q(x 1 , . . . , x n ) = P (x 1 + y 1 , . . . , x n + y n ) and ξ i by ξ i − y i , we can also assume without loss of generality that y i = 0 for all i. Furthermore, we can assume that P(ξ i ≤ 0) = p for all i. Indeed, if for some i, P(ξ i > 0) = p, we replace ξ i by −ξ i and modify the polynomial P accordingly to reduce to the case P(ξ i < 0) = p. And then the proof runs along the same lines as in the case P(ξ i ≤ 0) = 0.
Now, since the sets S j are disjoint, the events |b Sj | ≥ 1 are independent. Therefore, using a Chernoff-type bound as in the proof of Theorem 1.7, one can conclude that with probability at least 1 − exp(−2 −d d r/12), there are at least r2 −d d /2 indices j with |b j | ≥ 1. Conditioning on this event, we obtain a polynomial of degree d in terms of η 1 , . . . , η n which has rank at least r2 −d d /2. Using Theorem 1.7, one obtains the desired bound.

Proof of Theorem 2.5
Let a be an integer to be chosen later. Let D = log a (log 2 n−1) be the largest integer such that 2 −a D ≥ 2/n. Let µ be the distribution obtained by the following procedure: (1) With probability 1/2 output x = 0 (the all 0's vector).
Iterating the argument with P 1 and so forth, we get a sequence of polynomials P 1 , P 2 , . . . , P k−1 such that for 1 ≤ j ≤ min(d, k − 1), P j is of degree at most d − j, P j (0) = 0 and for x ∼ µ This clearly leads to a contradiction if k > d and a ≥ Cd log d for a large enough constant C (so that the right hand side of the above equation is non-zero for j = d).
Therefore, setting a = Cd log d, for a sufficiently big constant C, we must have k = Ω(D) ≤ d. That is, log 2 (n − 1) = a O(d) = d O(d) . Thus, we must have d = Ω(1)(log log n)/(log log log n).