Dimension-free L2 maximal inequality for spherical means in the hypercube

We establish the result of the title. In combinatorial terms this has the implication that for sufficiently small eps>0, for all n, any marking of an eps fraction of the vertices of the n-dimensional hypercube necessarily leaves a vertex x such that marked vertices are a minority of every sphere centered at x.


Introduction
Let I n be the n-dimensional hypercube: the set {0, 1} n equipped with Hamming metric d(x, y) = |{i : x i = y i }|. Let V = R I n be the vector space of real-valued functions on the hypercube. For x ∈ I n , let π x denote the evaluation map 1 from V → R defined by π x f = f (x), for f ∈ V . If A ⊆ Hom(V, V ), (the vector space consisting of all linear mappings from V into itself) the maximal operator M A : V → V is the sublinear operator in which M A f is defined by Of interest is the family S = {S k } n k=0 of spherical means, the stochastic linear operators S k : V → V given by π x S k f = {y:d(x,y)=k} π y f / n k Applying M , we have the spherical maximal operator M S : V → V defined by The result of this paper is the following dimension-free bound.
Theorem 1. There is a constant A I such that for all n, M S 2→2 < A I .
Equivalently, for all n and f , M S f 2 ≤ A I f 2 . Maximal inequalities for various function spaces such as L p (R n ), and with stochastic linear operators defined by various distributions such as uniform on spheres (as above), uniform on balls, or according to the distribution of a random walk of specified length (ergodic averaging), have been extensively studied; see [26] for a good review. Most previous work does not explicitly consider finite or discrete metric spaces; however, see [27] for a maximal inequality on the free group on finitely many generators.
One may ask whether the hypercube bound should follow from known results for larger spaces. The hypercube of dimension greater than 1 does not embed isometrically in Euclidean space of any dimension [9,23], so inequalities for Euclidean spaces do not seem to be a useful starting point. The hypercube does embed isometrically in R n with the L 1 norm, but there is no maximal inequality for this metric. To see this, still in the context of discrete metric spaces, consider the space Z 2 with the L 1 distance. Fixing any nonnegative integer N let f be the indicator function of {x : x i = 0, |x i | ≤ 2N }. Then f 2 2 = 2N + 1 while M S f 2 2 ∈ Ω(N 2 ). A similar gap (between O(N n−1 ) and Ω(N n )) occurs in any fixed dimension n, because there exists a set of size O(N n−1 ) constituting a positive fraction of Ω(N n ) L 1 -spheres, necessarily of many radii. It is therefore not the L 1 metric structure of the hypercube which makes a maximal inequality possible, but, essentially, its bounded side-length.

Combinatorial interpretation
In the special case that f is the indicator function of a set of vertices F in I n , Theorem 1 has the following consequence: For nonnegative ε less than some ε 0 and for all n, if |F | < ε2 n then there exists x ∈ I n such that in every sphere about x, the fraction of points which lie in F is O( √ ε).
The aspect of interest is that this holds for every sphere about x. The analogous claim for a fixed radius is a trivial application of the Markov inequality; by a union bound the same holds for any constant number of radii. Avoiding the union bound over all n + 1 radii is the essence of the maximal inequality.
The combinatorial interpretation also has an edge version. Let F ′ be a set of edges in I n . Let the distance from a point to an edge be the distance to the closer point on that edge. Theorem 1 has the following consequence: For nonnegative ε less than some ε 1 and for all n, if |F ′ | < εn2 n then there exists x ∈ I n such that in every sphere about x, the fraction of edges which lie in (Define a function f on vertices by f (y) = the fraction of edges adjacent to y that lie in F ′ . Note that f 2 ∈ O( √ ε). Apply Theorem 1 to f . For the desired conclusion observe that in the sphere of edges of distance k from x, the fraction of edges lying in F ′ is bounded for k ≤ n/2 by 2π x S k f , and for k > n/2 by 2π x S k+1 f .)

Maximal inequalities and Unique Games on the Hypercube
In this section we discuss one of the motivations for the present work, although our main result does not directly yield progress on the question. Khot's Unique Games Conjecture (UGC) [13] has, for a decade, been the focus of great attention. Either resolution of the conjecture will have implications for the hardness of approximating NP-hard problems. Falsification, in particular, is likely to provide powerful new algorithmic techniques. On the other hand verification of UGC will imply that improving the currently best known approximation algorithms for many important computational problems such as Min-2Sat-Deletion [13], Vertex Cover [15], Maximum Cut [14] and non-uniform Sparsest Cut [6,16], is NP-hard. In addition, in recent years, UGC has also proved to be intimately connected to the limitations of semidefinite programming (SDP). Making this connection precise, the authors in [4] and [28] show that if UGC is true, then for every constraint satisfaction problem (CSP) the best approximation ratio is given by a certain simple SDP. While the UGC question, in short, is arguably among the most important challenges in computational complexity, the current evidence in either direction is not strong, and there is no consensus regarding the likely answer.
A Unique Game instance is specified by an undirected constraint graph G = (V, E), an integer k which is the alphabet size, a set of variables {x u } u∈V , one for each vertex u, and a set of permutations (constraints) π uv : [k] → [k], one for each (u, v) s.t. {u, v} ∈ E, with π uv = (π vu ) −1 . An assignment of values in [k] to the variables is said to satisfy the constraint on the edge {u, v} if π uv (x u ) = x v . The optimization problem is to assign a value in [k] to each variable x u so as to maximize the number of satisfied constraints.
Khot [13] conjectured that it is NP-hard to distinguish between the cases when almost all, or very few, of the constraints of a Unique Game are satisfiable: Conjecture 2. (UGC) For any constants ǫ, δ > 0, there is a k(ǫ, δ) such that for any k > k(ǫ, δ), it is NP-hard to distinguish between instances of Unique Games with alphabet size k where at least 1 − ǫ fraction of constraints are satisfiable and those where at most δ fraction of constraints are satisfiable.
Numerous works have focused on designing approximation algorithms for Unique Games. While earlier papers [13,32,12,5,7] presented efficient algorithms for any instance that might have very poor approximation guarantees when run on the worst case input, recent works have focused on ruling out hardness for large classes of instances. Such instances include expanders [3,24], local expanders [2,29], and more generally, graphs with a few large eigenvalues [17]. In [18], hardness of random and semi-random distributions over instances is also ruled out. We note that in [1], a sub-exponential time algorithm for general instances is given.
This recent line of work can be seen as a strategy to disprove UGC by ruling out hardness of instances largely based on their spectral profile, and more specifically the number of "large enough" eigenvalues (small set expansion) of the instance's constraint graph. Following this strategy, the question to ask next is what type of graphs are the ones on which all those techniques fail. The hypercube is typical of such graphs, with its spectrum lying in a "temperate zone" in terms of expansion. This property, along with the high symmetry of the hypercube, makes it a natural next frontier toward disproving UGC.
It is typical when dealing with a 1 − ǫ satisfiable instance to think of it as originating from a completely satisfiable instance, where a malicious adversary picks an ǫ fraction of edges in the constraint graph and "spoils" them, by modifying their corresponding constraints. Some likely algorithms for Unique Games on the hypercube are seed-and-propagate (exhaustively range over assignments for a small subset of nodes and propagate the assignment from those nodes according to the constraints on the edges, using some fixed conflict resolution strategy), or "local SDP". In either case the performance of these algorithms depends on whether the adversary can "rip up" the graph enough so that the algorithm has difficulty propagating the seeded values, or otherwise combining local solutions into larger regions-even though in a pure expansion sense the graph is not so easy to rip up that the algorithm could afford to work just in local patches of the graph and throw away all the edges between patches. A good strategy to show that the adversary can win, is to show that the adversary can select an ε fraction of the edges of the graph in such a way that around every seed vertex x there is a sphere in which a majority of the of the edges have been selected by the adversary.
Theorem 1 shows that, to the contrary, no adversary has this power. It is a challenging problem to analyze the performance of seed-and-propagate algorithms, but the maximal inequality is a favorable indication for the research program aimed at showing their effectiveness.
Regarding seed-and-propagate on the hypercube we only add that seeding at a single vertex is insufficient, as the graph is sufficiently weakly connected that a sub-linear number of edge removals suffices to partition it into components each of sub-linear size. However multiple seedings remains a very viable strategy.

Possible generalizations
Let G = (V, E) be any finite connected graph, with shortest-path metric d G . Let G n be the nth Cartesian power of G, the graph on V n in which (v, w) is an edge if there is a unique i for which (v i , w i ) ∈ E. The shortest path metric on G n is therefore the L 1 metric induced by d G . Spherical operators and spherical maximal operators are now defined, and we conjecture that Theorem 1 holds for a suitable constant A G . Our proof techniques would require modification even for the next simplest cases of G taken to be the path or the complete graph on three vertices.
In a different direction, the existence of a dimension-free bound for all I n begs the question whether there is a natural limit object T , such that for every I n there exists a single morphism I n → T and thus each n occurs in it as a special case.

Proof overview
Our proof is in two main steps, in each of which we obtain a maximal inequality for one class of operators based on comparison with another more tractable class. To introduce the first of these reductions we need to define the senate operator 2 Sen. Let T = {T k } be any family of operators indexed by a parameter k which varies over an interval [0, a] (a possibly infinite) of either nonnegative reals or nonnegative integers. (E.g., S = {S k } n 0 as above.) Then the family Sen(T ) = {Sen(T ) k }, indexed by k in the same range, consists of the operators depending as k ranges over integers or reals, and taking the limit from above at k = 0 in the continuous case.
In the first step of our argument we follow a comparison method due to Stein [31] to show that Bounds on the Krawtchouk polynomials play a key role in this argument. We will introduce the polynomials and prove these bounds in Sec. 2, and then use them to prove Proposition 3 in Sec. 3.
To introduce the second reduction we need to define the family of stochastic noise operators This has the following interpretation. π x N t f is the expectation of π y f where y is obtained by running n independent Poisson processes with parameter 1 from time 0 to time t, and re-randomizing the ith bit of x as many times as there are events in the ith Poisson process. The N t 's form a semigroup: The process is equivalent to a Poisson-clocked random walk on the hypercube. We show (in Sec. 4) by a direct pointwise comparison that: Finally (see [21]) M Sen(N ) 2→2 ≤ 2 √ 2 (indeed, M Sen(N ) p→p ≤ (p/(p − 1)) 1/p for p > 1) by previous results: the Hopf-Kakutani-Yosida maximal inequality and Marcinkiewicz's interpolation theorem. For the reader's convenience, we restate these results in the Appendix.
Combining these results, we have Theorem 1 by Remark 5. While our main result is in terms of the 2 → 2 norm, many of our techniques generalize to other norms. Here we are limited by Proposition 3, which does not conveniently generalize to other norms. Very recently, this difficulty has been overcome by Krause; for a preliminary version of his work see [20]. On the other hand, M S 1→1 = n + 1, as can be seen by taking f to be nonzero only on a single point.
We are concerned in this paper solely with maximal operators for sets A of nonnegative matrices. For any such maximal operator M A , |π x M A f | ≤ π x M A |f |. So it suffices to show Theorem 1 for nonnegative f ; this simplifies some expressions and will be assumed throughout.

Fourier analysis and Krawtchouk polynomials
For y ∈ I n , define the character χ y ∈ R I n by π x χ y = (−1) x·y / √ 2 n . The normalization is chosen so that the χ y form an orthonormal basis of R I n . This basis simultaneously diagonalizes each S k , as they commute with I n as an abelian group. A direct calculation (see also [10]) shows that where |y| = |{i : y i = 1}|, and κ (n) k (|y|) is the normalized k th Krawtchouk polynomial, defined by We collect here some facts about Krawtchouk polynomials.

Reflection symmetry
4. Roots: The roots of κ (n) k (x) are real, distinct, and lie in the range n/2 ± k(n − k). The proofs of the first three claims are straightforward (see [22]), and the fourth claim is a weaker version of Theorem 8 of [19]. We sometimes abbreviate κ k (x) = κ (n) k (x). Before going into further technical detail, we give an overview of our goals in this section. As we have noted in Sec. 1, maximal inequalities are easily proved for semigroups, such as the noise operators N t . In some ways S k resembles N k/n , since N t is approximately an average of S k for k = nt ± nt(1 − t). While direct comparison is difficult (e.g. writing S k as a linear combination of N t necessarily entails large coefficients), we can argue that the spectra of these operators should be qualitatively similar.
Indeed, the N t are also diagonal in the χ y basis, and for |y| = x, their eigenvalue for χ y is (1 − 2t) x . Thus, our goal in this section will be to show that κ k (x) has similar behavior 3 to (1 − 2k/n) x . More precisely, we prove Lemma 7. There is a constant c > 0 such that for all n and integer 0 ≤ x, k ≤ n/2, Due to Lemma 6.1,2 it suffices to bound κ k (x) only when 0 ≤ k ≤ x ≤ n/2.
Proof. The main complication in working with Krawtchouk polynomials is that they have several different forms of asymptotic behavior depending on whether x and k are in the lower, middle or upper part of their range; indeed, [8] breaks the asymptotic properties of κ k (x) into 12 different cases. However, for our purpose, we need only two different upper bounds on the Krawtchouk polynomials, based on whether k/n is greater than or less than 0.14; a somewhat arbitrary threshold that we will see the justification for below. Case I: k, x ≥ 0.14n. This is the simpler upper bound, which relies only on the orthogonality property Lemma 6.3. Setting k = ℓ in Lemma 6.3 and observing that all of the terms on the LHS are nonnegative, it follows that Based on Stirling's formula, Lemma 17.5.1 of [25] states that Note that H 2 (0.14) > ln(2)/2, so, for k, x ≥ 0.14n, (7) implies that κ 2 k (x) ≤ 2ne (ln(2)−2H2(0.14))n ≤ 2ne −c1n for some c 1 > 0.116. Now let n 0 be sufficiently large (n 0 = 100 suffices) that c 1 ≥ 2 ln(2n 0 )/n 0 ; then for all n ≥ n 0 , 2ne −c1n ≤ e −c1n/2 . So for all n ≥ n 0 and all 0.14n ≤ k, x ≤ n/2, κ 2 k (x) ≤ e −2c1(n/2)(n/2)/n ≤ e −2c1kx/n .
To handle the n < n 0 case, we define It is immediate from Definition (4) that |κ (n) k (x)| < 1 if 1 ≤ k, x ≤ n − 1, so c 2 > 0. Finally, the lemma follows with c = min{2c 1 , c 2 }. Case II: k ≤ 0.14n or x ≤ 0.14n. By the symmetry between k and x, we can assume WLOG that k ≤ 0.14n. It is convenient to make the change of variable and expand µ k as µ k (z) = k i=0 α k,i z i . µ k is either symmetric or anti-symmetric about 0, and we focus on bounding |µ k | in the range 0 ≤ z ≤ 1 corresponding to 0 ≤ x ≤ n/2.
Let y 1 , . . . , y k be the roots of µ k . By Lemma 6.2 the multiset {y 1 , . . . , y k } is identical to the multiset {−y 1 , . . . , −y k }. So we can write It is immediate from Definition (4) that µ 2 k (1) = 1 (10) Furthermore, by Lemma 6.4, We now obtain an upper bound on µ 2 k (z) simply by maximizing (9) subject to the constraints (10) and (11). Observe that Consider the problem of choosing y i to maximize a single term |z 2 − y 2 As a result, the maximum over |y i | < z is found at y i = 0 and the maximum over |y i | > z is found at y i = y max . In the former case, |z 2 − y 2 i |/(1 − y 2 i ) = z 2 . In the latter case, The last inequality uses the fact that y max ≤ 2 √ 0.14 · 0.86 (recalling that k ≤ 0.14n). So |z 2 −y 2 If z 2 ≥ 0.93 then we use The threshold of 0.14 used here could be replaced by any p satisfying H 2 (p) > 1/2 > p(1 − p). We prove the claim in the following two subsections.

A method of Stein
The bounds on S ℓ 2→2 for even and odd radius ℓ are technically distinct (but not in any interesting way). We present the arguments in parallel.
Abel's lemma gives the following easily verified identity: Hence we have the following pointwise (that is to say, valid at each point x) inequality for 0 ≤ r ≤ r max , r max = ⌊⌊n/2⌋/2⌋: The first inequality is by Cauchy-Schwartz. This suggests defining an "error term" operator R 0 : V → V by so that for any r ≤ r max , 3.1.2 Odd radius: S 2r+1 for 0 ≤ r ≤ r max = ⌊(⌊n/2⌋ − 1)/2⌋ Note that for n = 4m + a, 0 ≤ a ≤ 3, this gives Abel's lemma gives: Hence we have the following pointwise inequality for 0 ≤ r ≤ r max = ⌊(⌊n/2⌋ − 1)/2⌋: This suggests defining an "error term" operator R 1 : V → V by so that for any r ≤ r max ,

Bounding the error term
Define Rf by π x Rf := max{π x R 0 f, π x R 1 f }. Combining (16) and (19) we have for each x ∈ I n , each f ∈ R I n and each r ≤ n/2, Maximizing the LHS over r, squaring and summing over x, we obtain that Claim 8 (and Proposition 3) now follow from: There is a C < ∞ such that R 0 f 2 , R 1 f 2 ≤ C f 2 .
Proof. As seen in the preliminaries, the operators S k commute and share the eigenvectors {χ y } y∈I n , with eigenvalues given by Eqn. (3): S k χ y = κ (n) k (|y|)χ y . Let E x be the projection operator on the n |y| -dimensional eigenspace spanned by {χ y } |y|=x ; so S k = n x=0 κ (n) k (x)E x . We calculate: Likewise: (here and below the value of r max depends on whether R 0 or R 1 is being bounded) Since f 2 2 = n x=0 E x f 2 2 , it suffices to show that there is a C < ∞ such that for every 0 ≤ x ≤ n, Recall that it suffices by Lemma 6.2 to consider x ≤ n/2. For x = 0, (20) is trivial as the LHS is 0. For x > 0 we use Lemma 6.1 to rewrite the parenthesized term (with ℓ = 2k or ℓ = 2k + 1) as follows: To see this, recall that n x κ (n) x (ℓ) counts x-subsets of {1, . . . , n} according to the parity of their intersection with {1, . . . , ℓ}; now condition on whether the x-subset contains the element ℓ. Consequently, which by a similar argument is The two terms on the LHS of (20) are now For x = 1, quantities (22) come to rmax k=1 16kn −2 which is upper bounded by a constant. For x > 1 we upper bound (22), first by x−1 (2k − 1) 2 which in turn are upper bounded by (applying each value of r max ): Now apply Lemma 7 to upper bound this by where in (23) we have defined α = 2c(x−1)/(n−2) and in (24) we have used the identity ∞ This completes the requirement of Eqn. 20. Proof. The proof relies on pointwise comparison of maximal functions. If A, B are matrices, write A ≤ B if B − A is a nonnegative matrix. If A, B are sets of nonnegative matrices indexed by integers or reals, we write A B if for every A ∈ A there is a probability measure µ A on B such that A ≤ B dµ A (B). Observe that in this case for any nonnegative function f and any x, sup A∈A π x Af ≤ sup B∈B π x Bf , and therefore for any norm, M A ≤ M B (and in particular for · 2→2 ).
For any k > ⌊n/2⌋, π x Sen(S) k f ≤ π x Sen(S) ⌊n/2⌋ (f + ιf ) Therefore M Sen(S) 2→2 ≤ 2 M Sen(S) 2→2 . However, we will not compare Sen(S) and Sen(N ) directly. Instead, we will introduce a variant of N that more closely resembles S, but is no longer a semigroup.
Recall that N t represents the average over independently flipping each bit with probability p = (1 − e −t )/2. DefineÑ p to represent the same noise process but parameterized by p instead of t. Thus While the sets {N t } t≥0 and {Ñ p } p∈[0,1/2) are of course the same, their Senate operators Sen(N ) and Sen(Ñ ) are different: Sen(Ñ ) P = 1 P P 0Ñ p dp = 1 2P In (26), P is varies over (0, 1/2) and in (25), T can be any positive real number. Hence Proposition 4 is established in two subsidiary claims: Lemma 10. Sen(Ñ ) Sen(N ).
Proof of Lemma 10. For each P we will write Sen(Ñ ) P as a convex combination of Sen(N ) T for different values of T . By considering the action of each side on the constant function we will see that it suffices to write Sen(Ñ ) P as a bounded nonnegative combination of Sen(N ) T for different values of T (i.e. a linear combination with coefficients that are nonnegative and whose sum is bounded).
Since f ′ (T ) < 0, (31) is the desired nonnegative combination. We have written the proof in this way to stress that the only features of (25) and (26) used are that Sen(N ) T is a "flat" distribution over N t and Sen(Ñ ) P has N t weighted by a weakly decreasing function.
Proof of Lemma 11. To compare Sen(S) and Sen(Ñ ), we need to show that for any K ≤ n/2, we can find a distribution over P such that Sen(S) K is pointwise ≤ the appropriate average over Sen(Ñ ) P times a constant. In fact, it will suffice to consider a distribution that is concentrated on a single value of P . Define P K := min( K+ √ K n , 1 2 ). In Lemma 12, we will show that Sen(S) K ≤ C · Sen(Ñ ) PK , thus implying that Sen(S) C · Sen(Ñ ). The idea behind Lemma 12 is that for each k ≤ K, there are significant contributions to the S k coefficient of Sen(Ñ ) PK for p throughout the range [k/n, (k+ √ k)/n]. This window has width √ k/n, contributes Ω(1/ √ k) weight to S k at each point and is normalized by 1 PK ≈ n K . The total contribution is thus Ω(1/K).
Proof. Observe that P := P K ≤ 2K/n. For 0 ≤ k ≤ K, we now compare the coefficient of S k in Sen(S) K (where it is 1/(K + 1)) to its value in Sen(Ñ ) P , where it is 1 P P 0 dp B(n, p, k). Denote this latter quantity by a k . Consider first k = 0. If K = 0 and P = 0, then S 0 has weight 1 in both cases. Otherwise P ≥ 1/n, and a 0 ≥ 1 P 1/n 0 (1 − p) n dp ≥ n 2K · 1 n · 1 − 1 n n ≥ 1 8K .
This last inequality uses the fact that (1 − 1/n) n is increasing with n and thus is ≥ 1/4 for n ≥ 2.
In our case, a maximal operator M A has M A ∞→∞,w = 1 and if A is a positive contractive semigroup, then Theorem 13 implies that M A 1→1,w ≤ 1 as well. This implies that for any 1 < p, we have which is 2 √ 2 when p = 2.