Dual Polynomials for Collision and Element Distinctness

The approximate degree of a Boolean function $f: \{-1, 1\}^n \to \{-1, 1\}$ is the minimum degree of a real polynomial that approximates $f$ to within error $1/3$ in the $\ell_\infty$ norm. In an influential result, Aaronson and Shi (J. ACM 2004) proved tight $\tilde{\Omega}(n^{1/3})$ and $\tilde{\Omega}(n^{2/3})$ lower bounds on the approximate degree of the Collision and Element Distinctness functions, respectively. Their proof was non-constructive, using a sophisticated symmetrization argument and tools from approximation theory. More recently, several open problems in the study of approximate degree have been resolved via the construction of dual polynomials. These are explicit dual solutions to an appropriate linear program that captures the approximate degree of any function. We reprove Aaronson and Shi's results by constructing explicit dual polynomials for the Collision and Element Distinctness functions.


Introduction
The ε-approximate degree of a Boolean function f : {−1, 1} n → {−1, 1} is the least degree of a real polynomial that approximates f to within error ε in the ℓ ∞ norm. Approximate degree is a fundamental measure of the complexity of a Boolean function, and has wide-ranging applications in theoretical computer science. For example, approximate degree upper bounds underly several of the best known algorithms for PAC learning [24], agnostic learning [22,23], learning in the presence of irrelevant information [25,31], and differentially private data release [19,44]. Meanwhile, lower bounds on approximate degree imply many optimal lower bounds on quantum query complexity, circuit complexity, and communication complexity (see for example [4, 10-12, 16, 33-35, 40]).
In an influential result, Aaronson and Shi proved tightΩ(n 1/3 ) andΩ(n 2/3 ) lower bounds on the approximate degree of the Collision and Element Distinctness functions [4]. 1 The Collision lower bound matched an earlier O(n 1/3 ) upper bound due to Brassard et al. [15], while the lower bound for Element Distinctness was later shown to be tight by Ambainis [8].
The Collision lower bound subsequently found many applications and extensions in quantum complexity theory; Aaronson recently provided a retrospective overview of these developments [3]. Moreover, thẽ Ω(n 2/3 ) lower bound for Element Distinctness remains the best known approximate degree lower bound for any function in AC 0 .
Aaronson and Shi proved their lower bound for Collision with a symmetrization argument. This style of argument proceeds in two steps. First, a polynomial p on n variables (which is assumed to approximate the target function f ) is transformed into a polynomial q on m < n variables in such a way that deg(q) ≤ deg(p). Second, a lower bound on deg(q) is proved, typically by applying Markov-Bernstein type inequalities from approximation theory. Aaronson and Shi's proof of the Collision lower bound is a particularly sophisticated application of this style of argument.
The lower bound for Element Distinctness follows from a reduction to the lower bound for Collision. This reduction is discussed in Section 5.
The Method of Dual Polynomials. Despite the many applications of approximate degree in theoretical computer science, significant gaps remain in our understanding of this complexity measure, and there are many simple functions whose approximate degree remains unknown. The slow nature of progress can be attributed in part to the limitations of symmetrization arguments. At an intuitive level, the process of symmetrization is inherently lossy: by turning a polynomial p on n variables into a polynomial q on m < n variables, information about p is necessarily thrown away. Hence, several works have identified that an important research direction is to develop techniques beyond symmetrization for lower bounding the approximate degree of Boolean functions [2,17,38].
The last few years have seen significant progress toward this goal. In particular, a series of works has proved new approximate degree lower bounds for important classes of functions by constructing explicit dual polynomials, which are dual solutions to a certain linear program capturing the approximate degree of any function. These polynomials act as certificates of the high approximate degree of a function. Moreover, strong LP duality implies that the technique is lossless, in contrast to symmetrization. That is, for any function f and any ε, there is always some dual polynomial φ that witnesses a tight approximate degree lower bound for f ; the challenge is to construct φ.
This "method of dual polynomials" was recently used to resolve the approximate degree of the AND-OR tree [17,37], closing a long line of incrementally larger lower bounds [6,21,28,38,39]. It has also been used 2 Preliminaries

Approximate Degree and its Dual Characterization
The ε-approximate degree of f , denoted deg ε (f ), is the minimum degree of an ε-approximation for f . We use deg(f ) to denote deg 1/3 (f ), and refer to this quantity without qualification as the approximate degree of f . The choice of 1/3 is arbitrary, as deg(f ) is related to deg ε (f ) by a constant factor for any constant ε ∈ (0, 1).
Given a partial Boolean function f , let p be a real polynomial that attains the smallest ε subject to the constraints above, over all polynomials of degree at most d. Since we work over x ∈ {−1, 1} n , we may assume without loss of generality that p is multilinear with the representation p(x) = |S|≤d c S χ S (x), where the coefficients c S are real numbers. Then p is an optimum of the following linear program.
min ε The dual linear program is as follows.
Strong LP-duality thus implies the following dual characterization of approximate degree: and If φ satisfies Eq. (1), we say that φ has correlation greater than ε with f . If φ satisfies Eq. (2), we say φ has pure high degree d. We refer to any feasible solution φ to the dual linear program as an (ε, d)-dual polynomial for f .

The Collision and Element Distinctness Functions
. . , N }, and fix a triple of positive integers n, N, R such that R ≥ N , and n = N ·log 2 R. For simplicity throughout, we assume that R is a power of 2. The Collision and Element Distinctness functions are typically thought of as properties of functions mapping [N ] to [R]. However, it will be convenient for us to think of them instead as functions on the Boolean hypercube {−1, 1} n . To this end, given an input x ∈ {−1, 1} n , we interpret x as the evaluations of a function g x mapping [N ] → [R]. That is, we break x up into N blocks, each of length ⌈log 2 R⌉, and regard each block x i as the binary representation of g x (i).
is the partial Boolean function corresponding to the property that g x is a 1-to-1 function, with the promise that g x is either 1-to-1 or 2-to-1.

Definition 3 (Element Distinctness Function).
The Element Distinctness function, denoted ED N,R , is the total Boolean function defined such that ED N,R (x) = 1 if and only if g x is 1-to-1. That is, ED N,R is the total Boolean function corresponding to the property that g x is 1-to-1.
Let B ⊂ {−1, 1} n denote the set of inputs x such that g x is neither 1-to-1 nor 2-to-1. Then an (ε, d)-dual polynomial φ for Col N,R has the following properties (cf. Section 2.2): Similarly, an (ε, d)-dual polynomial for ED N,R satisfies:

Overview of the Symmetrization-Based Proof of the Collision Lower Bound
Kutin's simplified proof of the Collision lower bound [26] proceeds in two steps. The first step is a symmetrization step, which establishes the following remarkable result (we state this result slightly informally in this overview). Note in the above lemma that the sets R m,a,b are not uniquely determined; for instance R m,1,1 = R 0,a,1 = R N,1,b = T 1 for every triple (m, a, b).
The second step of Kutin's proof argues that if p is a (1/3)-approximating polynomial for the Collision function, then P must have degree Ω(N 1/3 ). Hence by Lemma 4, p must have degree Ω(N 1/3 ) as well.
In more detail, the second step of Kutin's proof proceeds via a case analysis. Four cases are considered.
A key technical complication that must be dealt with in the argument above is that |P (m, a, b)| may be much larger than 1 for invalid triples (m, a, b). This may seem like a minor technicality, but in fact it is a central issue: if P (m, a, b) were bounded for all invalid triples, then it would be possible to argue that the total degree of P is Ω(N 1/2 ), which would imply a (false) lower bound of Ω(N 1/2 ) on the approximate degree of Col N,R .

Overview of Our Construction for the Collision Function
Like Kutin's proof, our construction also makes essential use of Lemma 4. Whereas Kutin used Lemma 4 to reduce to a setting where Markov-Bernstein inequalities could be applied in a non-constructive manner, we instead use Lemma 4 to argue that the dual polynomial φ that we construct has pure high degree Ω(N 1/3 ).
In more detail, we present our construction in two stages, in order to highlight distinct ideas that go into the proof. In the first stage, we construct a simpler dual polynomial φ : {−1, 1} n → {−1, 1} that exhibits an Ω( log N/ log log N ) lower bound on the approximate degree of Col N,R . The second stage constructs a dual polynomial ψ exhibiting the optimal Ω(N 1/3 ) lower bound.
Overview of the first stage. Let H k ⊆ {−1, 1} n denote the set of inputs of Hamming weight n. The symmetrization-based proof of the Collision lower bound from [4,26] carries the strong intuition that the sets T k should play the same role that H k plays in Nisan and Szegedy's seminal symmetrization-based lower bound for the OR function [28]. We direct the interested reader to Aaronson's lecture notes [27] for a detailed explanation of this intuition. The construction of our simpler dual witness φ instantiates this intuition in the dual setting.
Recall that a dual polynomial φ witnessing the fact that deg(Col N,R ) ≥ d must satisfy two properties: (1) it must have correlation greater than ε with Col N,R , and (2) it must have pure high degree at least d. We define φ in a way that mimics the structure of known dual witnesses for symmetric functions, even though φ is not itself symmetric. Specifically, our construction ensures that the analysis establishing Properties (1) and (2) becomes similar to the analyses of known dual polynomials for the OR function [17,41].
In more detail, our prior work [17] built on work ofŠpalek [41] to give a dual witness γ for the fact that deg ε (OR n ) = Ω( √ n) for any constant ε < 1; moreover, γ places non-zero weight only on sets H k , for values of k equal (up to scaling factors) to perfect squares. The pure high degree of γ is shown to be equal to (at least) the number of sets H k upon which γ places non-zero weight.
Call an input x ∈ {−1, 1} n valid if it is in R m,a,b for some valid triple (m, a, b). By analogy with γ, the dual witness φ that we construct in Stage 1 places weight only on inputs x ∈ T k for divisors k of N that are also (up to scaling factors) perfect squares. In particular, our definition of φ ensures that: We are able to combine Eq. (3) with Lemma 4 and a basic combinatorial identity (cf. Lemma 9) to show that the pure high degree of φ is at least |S|, where S denotes the set of T k 's upon which φ places non-zero weight. Moreover, our definition of φ is carefully chosen to ensure that its correlation with Col N,R is large: the precise calculation is closely analogous to the analysis from [17,41] showing that γ is well-correlated with the OR function [17,41].
Overview of the second stage. In the second stage, we construct a dual polynomial ψ that exhibits the optimal Ω(N 1/3 ) lower bound. Rather than only weighting inputs in T k for some some divisors k of n, ψ weights inputs in R m,a,b for many valid triples (m, a, b). There are two key ideas that go into the construction of ψ.
The first idea is to define ψ as the sum of two simpler dual polynomials ψ 1 and ψ 2 , each with pure high degree Ω(N 1/3 ) -then the sum ψ also has pure high degree Ω(N 1/3 ) (see Lemma 15). The first polynomial ψ 1 places a large constant fraction (close to 1/2) of its L 1 mass on T 1 , whereas ψ 2 places a large constant fraction of its L 1 mass on T 2 . Neither ψ 1 nor ψ 2 is well-correlated with Col N,R in the sense of Eq. (3). However, they each place a constant fraction of their L 1 mass on R N/2,2,1 , and they are designed so that their values exactly cancel out on inputs in R N/2,2,1 . This allows us to show that ψ = ψ 1 + ψ 2 satisfies Eq. (3), even though ψ 1 and ψ 2 individually do not.
The second idea goes into the construction of ψ 1 and ψ 2 themselves. Specifically, we think of ψ 1 and ψ 2 as each being constructed in a two-step process. We focus on ψ 1 in this discussion, since the construction of ψ 2 is similar. Very roughly speaking, in the first step, we consider a "polynomial" ψ ′ of pure high degree Ω(N 1/3 ) that places a large constant fraction of its L 1 mass on T 1 ; the construction of ψ ′ is closely related to our construction of the simpler dual polynomial φ from Stage 1.
The reason we place the term "polynomial" in quotes above is that there is an important technical caveat to our construction of ψ ′ : we think of ψ ′ as placing weight on sets R N/2,a,1 for many invalid triples (N/2, a, 1), in addition to some valid ones. Of course, if (N/2, a, 1) is invalid, then R N/2,a,1 = ∅, so ψ ′ cannot place non-zero weight on the set. To address this issue, in Step 2, we add to ψ ′ a bunch of polynomials ψ ′′ N/2,a,1 , each of pure high degree Ω(N 1/3 ). For each invalid triple (N/2, a, 1), ψ ′′ N/2,a,1 is specifically constructed to cancel out the weight that ψ ′ "places" on R N/2,a,1 .
Analogously to how our constructions of φ and ψ ′ were closely related to the dual witness for OR constructed in our earlier work [17], our construction of ψ ′′ N/2,a,1 is closely related to a dual witness η for the Majority function, MAJ, that we constructed in the same work. Each ψ ′′ N/2,a,1 places additional nonzero mass on (non-empty) sets of the form R m,a,1 for some a = 1 and m ∈ [N ], but we are able to show that the total mass placed on such sets is small, using an analysis closely related to the analysis of η from [17]. Hence we are able to show that ψ 1 = ψ ′ + invalid triples (N/2,a,1) ψ ′′ N/2,a,1 still places a large constant fraction of its L 1 mass on T 1 .

Discussion
On Kutin's second step. Our construction of the optimal dual witness ψ for the Collision function mimics the second step of Kutin's symmetrization argument in three important ways described below. We find this mimicry to be somewhat surprising -in our earlier work [17], we constructed an optimal dual polynomial for symmetric Boolean functions that bore little relation to Paturi's well-known symmetrization-based proof of the same result [30]. We believe that this mimicry sheds new light, or at least gives a new perspective, on why Kutin's proof takes the structure that it does.
Recall that the second step of Kutin's proof (cf. Section 2.4) proceeds via a case analysis. The first "branch" in the case analysis depends on whether the expected value of the assumed n-variate approximation p to Col N,R on the set R N/2,2,1 is large or small. This is mimicked in our construction of ψ as a sum of two dual polynomials ψ 1 and ψ 2 , both of which individually place a lot of weight on R N/2,2,1 , but whose sum places zero weight on R N/2,2,1 .
The second "branch" in Kutin's case analysis depends on whether |P (N/2, a, 1)| or |P (N/2, 2, b)| is small for all a, b ≤ N 2/3 . He needs to consider this second branch because P (m, a, b) is not guaranteed to be bounded for invalid triples (m, a, b).
Finally, recall that Kutin applied Markov's inequality from approximation theory in two of the four cases considered in his analysis, and Bernstein's inequality in the other two cases. Markov's inequality underlies Nisan and Szegedy's standard symmetrization-based proof that the approximate degree of OR is Ω( √ n) [28], while Berstein's inequality underlies Paturi's proof that the approximate degree of MAJ is Ω(n) [30]. This is mimicked in our construction of ψ 1 and ψ 2 as the sum of ψ ′ and the ψ ′′ N/2,a,1 and ψ ′′ N/2,2,b polynomials: the construction of ψ ′ is closely analogous to the dual witness for OR from [17], while the construction of the ψ ′′ N/2,a,1 and ψ ′′ N/2,2,b polynomials is based on the dual witness for MAJ from [17].
On the first step, or why k-to-1 inputs matter. As noted by several authors (e.g., [2, Slide 36]), the most miraculous element of the symmetrization-based proof of the Collision lower bound is the first step (cf. Lemma 4). The crux of this step is to establish, roughly speaking, that for any n-variate polynomial p of total degree d, the function P (k) := E x∈T k [p(x)] is a polynomial in k of degree at most d. Why should this hold? More basically, why should inputs that are k-to-1 even play a prominent role in the proof? We provide some partial intuition for this in Section 6. Specifically, we explain that there is an (asymptotically) optimal approximation p for Col N,R such that k-to-1 inputs correspond to constraints that are made tight by the solution corresponding to p in the primal linear program of Section 2.2. Hence, complementary slackness suggests that there should be a corresponding dual witness ψ that places weight only on inputs that are k-to-1, or nearly so, justifying the prominent role that k-to-1 inputs play in both the symmetrization-based proof and our new dual proof.

Formal Statement of Lemma 4
Following Kutin [26], we define a special collection of functions which are a-to-1 on one part of the domain and b-to-1 on the other part. For N > 0, recall that a triple of numbers (m, a, b) is valid if a|m and b|(N − m). For each valid triple (m, a, b), we define Moreover, for each valid triple (m, a, b), we define a set R m,a,b that is the orbit of g m,a,b under the automorphism group S N × S R . Namely, Note that the sets R m,a,b are not uniquely determined; for instance R m,1,1 = R 0,a,1 = R N,1,b = T 1 for every m, a, b.

Lemma 5.
Let p(x) be a real polynomial over {−1, 1} n of degree d. There is a trivariate polynomial P of degree at most d with the property that for all valid triples (m, a, b), The statement of Lemma 5 differs slightly from the corresponding lemma in Kutin's work [26] (Lemma 7 below). Lemma 5 follows by combining Kutin's formulation with the following simple lemma from [18].
Then P is a degree d polynomial in m, a, b.

An Ω( log N/ log log N ) Lower Bound for the Collision Function
The following lemma is a refinement of [17,Proposition 14], which was used there to construct a dual polynomial for OR.

Lemma 8.
There exists a constant ζ > 0 such that for all δ ∈ (0, 1) and L ≥ 1, there is an explicit ω : {1, . . . , L} → R with The proof will make use of the following simple combinatorial identity, a simple proof of which can be found in [29, Appendix A].
On the other hand, for k = cℓ 2 with ℓ > 0, |ω(k)| equals: where the last inequality follows because is a product of factors that are each smaller than 1. Thus, the total contribution of terms excluding 0 and 1 to the ℓ 1 mass ofω is at most Now let b = 0 if |T | is even, and b = 1 otherwise, and define ω : {1, . . . , L} → R via: This yields the first two claims about ω. The third claim follows immediately from the definition. Finally, let p be a polynomial of degree strictly less than |T | − 1. Then where is a polynomial of degree less than L. Since q(L) = 0, the right hand side of Eq. (4) is zero by Lemma 9. This gives the last claim.
Our prior work [17], building on work ofŠpalek [41], obtained a dual polynomial γ for OR L by setting the total weight of γ on inputs in H k (the set of inputs of Hamming weight k) to be ω(k + 1). In that work, the first three properties of ω ensured that γ had high correlation with OR, while the fourth ensured that it had pure high degree Ω( √ L). Analogously, our dual polynomial φ for Col N,R below sets the total weight of φ on T k to be ω(k). Then again, the first three properties of ω ensure that φ is well-correlated with Col N,R , and the fourth ensures that it has pure high degree Ω( √ L). However, there is the complication that T k must be non-empty, i.e., k must divide N , for every k in the support of ω. To handle this complication, we take N large enough so that all k = 1, 2, . . . , L divide N , yielding an Ω( log N/ log log N ) lower bound. Proof. First, notice that k|N for all k ∈ [L], so T k = ∅ for every such k.
, and φ(x) = 0 otherwise, where ω is obtained by applying Lemma 8. Note that φ(x) is well-defined since |T k | = 0 for all k ∈ [L], and each x ∈ {−1, 1} n is in T k for at most one value of k.
We check: where the inequality holds by combining Parts 1-3 of Lemma 8. Thus, Second, where the final equality holds by Part 3 of Lemma 8. Finally, let d = ζ √ δL where ζ is as in the statement of Lemma 8, and let S ⊆ [n] with |S| ≤ d. We must show that x∈T 1 ∪T 2 ∪B φ(x)χ S (x) = 0. Note that: where the first equality holds because φ(x) = 0 if x is not in T k for some k ∈ [L].
By Lemma 5, there is a trivariate polynomial P of total degree at most d such that P (m, a, b) = E x∈R m,a,b [χ S (x)] for all valid triples (m, a, b). In particular, since k|N for all k ∈ [L], q(k) := P (N, k, 1) is a univariate polynomial in k such that q(k) = E x∈T k [χ S (x)] for all k ∈ [L]. Hence, Part 4 of Lemma 8 implies that L k=1 ω(k) · E x∈T k [χ S (x)] = 0.

An Ω(N 1/3 ) Lower Bound for the Collision Function
The following lemma is a refinement of [18, Proposition 10], which constructed an explicit dual polynomial for MAJ.

For every polynomial
Proof. Throughout the proof, we assume for simplicity that N/2 is not a multiple of 2k. The analysis when N/2 is a multiple of 2k is similar. Let c = ⌈ 10 √ δ ⌉ and t = 2⌊N/(4k)⌋k and define the set S = {t ± 2cℓk : 0 ≤ ℓ ≤ ⌊t/(2ck)⌋}.
Note that |S| = Ω(N/ck). We claim that π S (i) := j∈S,j =i |j − i| is minimized at i = t. Notice that translating all points in S by a constant does not affect π S (i), and scaling all points in S by a constant does not affect argmin i π S (i). Thus, it is enough to show that π S * (i) is minimized at i = 0 for the set S * = {±ℓ : ℓ ≤ t}. In this case, π S * (i) takes the simple form (t − i)!(t + i)!, and we see that for all i ∈ S * , is a product of terms smaller than 1, so π S * (i) is indeed minimized at i = 0. Now let T = S ∪ {t − 2k, N/2} and define the function where h = ⌊t/2ck⌋. The normalization is chosen so that |η(t)| = 1.
The reason that we include both (r − (t − 2k)) and (r − (N/2)) in the denominator ofη is to ensure that the rate of decay ofη(r) is at least quadratic as r moves away from t. This will ultimately allow us to show that a large fraction of the ℓ 1 mass ofη comes from the point r = N/2.
For r = t − 2k, the mass |η(r)| is where the first inequality holds because N/2 − t ≤ 2k, combined with the fact that h ℓ=1 (1 + a ℓ ) ≤ exp( h ℓ=1 a ℓ ) for nonnegative a ℓ . For r = N/2, we get Now we analyze the remaining summands, and show that their total contribution is much smaller than 1. Recall that the choice i = t minimizes π S (i), and that π S (t) = (2ck) 2h (h!) 2 . Therefore, where the final inequality exploits the fact that N/2 − t < 2k. Similarly, We can use this quadratic decay to bound the total ℓ 1 mass of the points outside of {t − k, t, N/2}: Now let η k (r) = (−1) r+h+N/2η (r)/ η 1 . Sinceη is supported on T ⊆ {2k, 4k, . . . , 2⌊N/2k⌋k} ∪ {N/2}, the function η k is as well, giving the first claim. Moreover, This yields the second claim about η k . The third claim follows immediately from the definition. Finally, let p be a polynomial of degree strictly less than |T | (where |T | ≥ ρN/k for a constant ρ). Then for a polynomial q of degree strictly less than N . This is equal to zero by Lemma 9, giving the final claim.
We obtain our dual polynomial ψ for the Col N,R as a linear combination of two simpler functions ψ 1 and ψ 2 . These functions have the following properties.

Remark 14.
The dependence of the lower bound Theorem 13 on both parameters δ and N for 1 N ≤ δ ≤ 1 10 , is tight up to a logarithmic factor in the size of the range. We show this in Appendix A by constructing an explicit approximating polynomial for Col N,R of the appropriate degree, by building on the ideas underlying the quantum query algorithm of Brassard et al. [15].
. This inequality uses Properties 2 and 3 of Lemma 12.
We start by defining a function Ψ(m, k) as follows. Here, We first show how to use Ψ to construct the polynomial ψ 1 . Analogously to our construction of φ, we want ψ 1 to place a total weight of Ψ(m, a) on each set R m,a,1 . Recall from our overview in Section 2.5 that we think of ψ 1 = ψ ′ + invalid triples (N/2,a,1) ψ ′′ N/2,a,1 , where ψ ′ looks like the simpler "first stage" dual polynomial φ from our informal overview (which we constructed in Section 3) and each ψ ′′ N/2,a,1 cancels out the weight φ places on values of k the do not divide N . This structure underlies our construction of Ψ, where we add multiples of the polynomials η k (m) to cancel out the weight ω(k) places on invalid triples. Now we construct and analyze the polynomial ψ 1 . Definê Notice thatψ 1 is well-defined, because any x ∈ T 1 is in R m,k,1 for at most one triple (m, k, 1). We collect several calculations withψ 1 . First, x∈T 1ψ where the penultimate inequality exploits Properties 2 and 3 of Lemma 11, and the final inequality exploits Properties 1-3 of Lemma 8. Noting that |ω(1)| + |ω(2)| ≤ 1, it follows that ψ 1 1 ≤ 1 + δ/2. So setting ψ 1 =ψ 1 / ψ 1 1 , it is immediate that ψ 1 satisfies the first three properties in the statement of the lemma. ψ 1 also satisfies the fourth property, since for any x ∈ T 2 , ψ 1 (x) = Ψ(N, 2)/|T 2 | = 0. Now we will show thatψ 1 , and hence ψ 1 , has pure high degree at least d. We require two observations.
Fix any S ⊆ [n] with |S|≤d. Let Q(m, k) be a polynomial of degree at most d − 1 in each variable such that, for all pairs (m, k) with k|m, Q(m, k) = E x∈R m,k,1 [χ S (x)]. The existence of such a bivariate polynomial Q is guaranteed by Lemma 5. Then the previous two observations together imply that: We remark that a key point is the derivation of Eq. (6) is that we have no control over the evaluations Q(m, k) when k does not divide m, yet this is rendered irrelevant because Ψ(m, k) = 0 for all such pairs. The right hand side of Eq. (6) equals: The first sum in Eq. (7) is zero by Lemma 8 since Q(N/2, k) is a polynomial of degree at most d in k. The second sum is also zero because for each fixed k, Q(r, k) is a polynomial of degree at most d in the variable r, and hence the term in parentheses is zero by Lemma 11 (Parts 1 and 4). Thusψ 1 has pure high degree at least d.
The construction of ψ 2 is similar. This time, we let Note thatψ 2 is well-defined, because any x ∈ T 2 is in R m,k,2 for at most one triple (m, k, 2). We define ψ 2 =ψ 2 / ψ 2 1 . Showing that ψ 2 satisfies Properties 1-4 of the lemma follows from the same calculations we used for ψ 1 .
To show that ψ 2 has pure high degree at least d, we require the following additional observations.
• Ψ is supported on pairs (m, k) for which k|m and 2|(N − m). To see the latter property, note that if Ψ(m, k) = 0, then m is even (this holds because N/2 is even, which follows from our requirement that N is a multiple of 4), and hence N − m is as well.

A Dual Polynomial for Element Distinctness
We first recall the reduction from Collision to Element Distinctness given in [4]. 3 The reduction shows how to turn a polynomial p approximating ED M,R into a polynomial q approximating Col N,R , with N ≈ M 2 and deg q ≤ deg p. That is, q(y) is the expected value of p(x) where x is the concatenation of a random subset of M of the blocks y 1 , . . . , y N . To simplify notation, for a set S = {i 1 , i 2 , . . . , i M }, let y| S = (y i 1 , y i 2 , . . . , y i M ). Note that deg q ≤ deg p. Moreover, since q is an average of values in [−7/6, 7/6], it is always in [−7/6, 7/6] itself. To finish arguing that q is a (1/3)-approximation to Col N,R , we consider two cases: 1. If y ∈ T 1 , i.e., y is a 1-to-1 input, then y| S is always 1-to-1. Hence p(y| S ) ∈ [5/6, 7/6] for every subset of indices, so q(y) ∈ [2/3, 4/3].
The construction we give in this section takes a dual view of the reduction above. Namely, we show how to transform a dual polynomial ψ for Col N,R into a dual polynomial ϕ for ED M,R , with M 2 ≈ N . In the primal reduction, we constructed q(y) from p(x) by averaging p over all subsets of size M . The right analogue in the dual reduction is to construct ϕ(x) by averaging ψ(y) over a carefully constructed set of extensions from x to a longer input y. In particular, ϕ(x) averages ψ(y) over all y for which x could have been produced by taking a subset of M blocks of y.
We give this reduction formally below.  Let ϕ(x) = 0 for x / ∈ {−1, 1} m . We claim that ϕ is a good dual polynomial for the Element Distinctness function ED, which requires us to show 1.
To verify the first property, define We collect a few observations about A.
3. If y ∈ T 2 , then Pr Hence, Therefore we get For the second property, let T be a subset of [N ] with |T | ≤ d. Then where T | S denotes the subset of T contained in the blocks specified by S.

On Complementary Slackness
Recalling that any bounded-error quantum query algorithm can be converted into an approximating polynomial [10], the collision-finding algorithm of Brassard, Høyer, and Tapp [15] yields an explicit, asymptotically optimal approximating polynomial for Col N,R . We describe this polynomial p below.
Recall that any approximating polynomial for Col N,R represents a feasible solution to the primal linear program considered in Section 2.2. If the polynomial p were an exactly optimal ε-approximation for Col N,R , then complementary slackness would imply that the optimal dual polynomial ψ for Col N,R is supported on the points corresponding to constraints made tight by p. That is, ψ : {−1, 1} n → {−1, 1} is supported on x ∈ {−1, 1} n for which |p(x) − Col(x)| = ε. We refer to these as the maximum-error points of p.
While we do not know whether p is an exactly optimal approximating polynomial for Col N,R , we might still expect that an approximate version of complementary slackness might holds, in the sense that a "good" dual polynomial should place all or most of its weight on points that are "nearly" maximum-error points of p. Indeed, this intuition has proven accurate for all of the dual polynomials constructed in prior work, including for symmetric functions (see [17,Section 4.5]), block-composed functions (see [43,Section 1.2.4]), and the intersection of two majorities [38]. Below, we argue that k-to-1 inputs are nearly maximum-error points for p, which explains why our dual polynomials for collision are supported on inputs that are roughly k-to-1, in addition to why these inputs play a prominent role in the original symmetrization-based proof.
An asymptotically optimal approximation p for Col N,R . For a subset S ⊂ [N ], define cross S : {−1, 1} n → R via: where EQ denotes the equality function. That is, cross S (x) counts the number of cross-collisions between indices in S and indices outside of S. Notice that EQ(x i , x j ) is a function of only 2 · log R variables, and hence cross S (x 1 , . . . , x N ) is exactly computed by a polynomial of degree 2 · log R. In addition, for a subset S ⊂ [N ], define the function I ED,S (x 1 , . . . , x N ) to be 1 if x i = x j for all pairs i, j ∈ S with i = j, and 0 otherwise. That is, I ED,S indicates whether x is 1-to-1 on the indices in S. Notice that I ED,S is a function of only |S| · log R variables, and hence is exactly computed by a polynomial of degree |S| · log R.
For the remainder of the discussion, let r = N 1/3 -we focus on the quantity cross S (x) when |S| = r. We will need the following simple observations. 1. If x ∈ T 1 , i.e., x is a 1-to-1 input, then cross S (x) = 0 and I ED,S (x) = 1 for any S.
Let T d : R → R denote the degree-d Chebyshev polynomial of the first kind. This polynomial has the following properties: • T d (1 + M/d 2 ) ≥ 10 for a constant M independent of d.
• The extreme points of T d in [−1, 1] are the degree-d Chebyshev nodes, which take the form cos(iπ/d) for 0 ≤ i ≤ d.
Applying an appropriate affine transformation to T d , we obtain a polynomial A d with the following properties: • A d (0) = 1. Then p is a polynomial of degree |S| log R + 2 · d · log R = O(N 1/3 log R). We argue that p approximates Col N,R to error ε for some ε ≤ 1/3. The analysis falls into three cases.  Identifying maximum-error points of p. For any fixed S, the maximum error points of p S are wellapproximated by the x ∈ {1, 1} n for which the following two equations hold: cross S (x) = c · i 2 · r for some i ∈ {0, 1, . . . , ⌊d · M −1/2 ⌋} and I ED,S (x) = 1.
(This follows from the fact that the extreme points of A d are roughly of the form c·i 2 for 0 ≤ i ≤ d·M −1/2 ). However, the maximum-error points for the averaged polynomial p(x) = E |S|=r [p S (x)] are the points x that satisfy Eq. (8) and Eq. (9) with high probability over the choice of S. Indeed, for these points x, the error of p(x) is at least ε · (1 − o(1)) ≈ ε.
When k takes the form k = c · i 2 + 1, this means that x satisfies Eq. (8). Hence, x has nearly maximal error even for the averaged polynomial p.