Outlaw distributions and locally decodable codes

Locally decodable codes (LDCs) are error correcting codes that allow for decoding of a single message bit using a small number of queries to a corrupted encoding. Despite decades of study, the optimal trade-off between query complexity and codeword length is far from understood. In this work, we give a new characterization of LDCs using distributions over Boolean functions whose expectation is hard to approximate (in~$L_\infty$~norm) with a small number of samples. We coin the term `outlaw distributions' for such distributions since they `defy' the Law of Large Numbers. We show that the existence of outlaw distributions over sufficiently `smooth' functions implies the existence of constant query LDCs and vice versa. We give several candidates for outlaw distributions over smooth functions coming from finite field incidence geometry, additive combinatorics and from hypergraph (non)expanders. We also prove a useful lemma showing that (smooth) LDCs which are only required to work on average over a random message and a random message index can be turned into true LDCs at the cost of only constant factors in the parameters.


Introduction
Error correcting codes (ECCs) solve the basic problem of communication over noisy channels.They encode a message into a codeword from which, even if the channel partially corrupts it, the message can later be retrieved.With one of the earliest applications of the probabilistic method, formally introduced by Erdős in 1947, pioneering work of Shannon [Sha48] showed the existence of optimal (capacity-achieving) ECCs.The problem of explicitly constructing such codes has fueled the development of coding theory ever since.Similarly, the exploration of many other fascinating structures, such as Ramsey graphs, expander graphs, two source extractors, etc., began with a striking existence proof via the probabilistic method, only to be followed by decades of catch-up work on explicit constructions.Locally decodable codes (LDCs) are a special class of error correcting codes whose development has not followed this line.The defining feature of LDCs is that they allow for ultra fast decoding of single message bits, a property that typical ECCs lack, as their decoders must read an entire (possibly corrupted) codeword to achieve the same.They were first formally defined in the context of channel coding in [KT00], although they (and the closely related locally correctable codes) implicitly appeared in several previous works in other settings, such as program checking [BK95], probabilistically checkable proofs [AS98, ALM + 98] and private information retrieval schemes (PIRs) [CKGS98].More recently, LDCs have even found applications in Banach-space geometry [BNR12] and LDC-inspired objects called local reconstruction codes found applications in fault tolerant distributed storage systems [GHSY12].See [Yek12] for a survey of LDCs and some of the applications.
Despite their many applications, our knowledge of LDCs is very limited; the best-known constructions are far from what is currently known about their limits.Although standard random (linear) ECCs do allow for some weak local-decodability, they are outperformed by even the earliest explicit constructions [KS07].All the known constructions of LDCs were obtained by explicitly designing such codes using some algebraic objects like low-degree polynomials or matching vectors [Yek12].
In this paper, we give a characterization of LDCs in probabilistic and geometric terms, making them amenable to probabilistic constructions.On the flip side, these characterizations might also be easier to work with for the purpose of showing lower bounds.We will make this precise in the next section.Let us first give the formal definition of an LDC.Definition 1.1 (Locally decodable code).For positive integers k, n, q and η, δ ∈ (0, 1/2], a map C : {0, 1} k → {0, 1} n is a (q, δ, η)-locally decodable code if, for every i ∈ [k], there exists a randomized decoder (a probabilistic algorithm) A i such that: • For every message x ∈ {0, 1} k and string y ∈ {0, 1} n that differs from the codeword C(x) in at most δn coordinates, • The decoder A i (non-adaptively) queries at most q coordinates of y.1 Known results.The main parameters of LDCs are the number of queries q and the length of the encoding n as a function of k and q, typically the parameters δ, η are some fixed constants.The simplest example is the Hadamard code, which is a 2-query LDC with n = 2 k .The 2-query regime is the only nontrivial case where optimal lower bounds are known: it was shown in [KW04,GKST02] that exponential length is necessary.In general, Reed-Muller codes of degree q − 1 are q-query LDCs of length n = exp(O(k 1/q−1 )).For a long time, these were the best constructions for constant q, until in a breakthrough work by [Yek07,Efr09], 3-query LDCs were constructed with subexponential length n = exp(exp(O( √ log k))).More generally they constructed 2 r -query LDCs with length n = exp(exp(O(log 1/r k))).For q ≥ 3, the best-known lower bounds leave huge gaps, giving only polynomial bounds.Any 3-query LDC must have length n ≥ Ω(k 2 ) [Woo12], and more generally any q-query LDC must have length n ≥ Ω(k 1+1/(⌈q/2⌉−1) ) [KT00,Woo07].LDCs where the codewords are over a large alphabet are also studied because of their relation to private information retrieval schemes [CKGS98,KT00].In [DG15], 2-query LDCs of length n = exp(k o(1) ) over an alphabet of size exp(k o(1) ) were constructed.There is also some exciting recent work on LDCs when the number of queries can also grow with k, in which case there are explicit constructions with constant-rate (that is, n = O(k)) and query complexity q = exp(O( √ log n)); in fact we can even achieve the optimal rate-distance tradeoff of traditional error correcting codes [KSY10, KMRS16, GKdO + 16].We cannot yet rule out the exciting possibility that constant rate LDCs with polylogarithmic query complexity exist.

LDCs from distributions over smooth Boolean functions
Our main result shows that LDCs can be obtained from "outlaw" distributions over "smooth" functions.The term outlaw refers to the Law of Large Numbers, which says that the average of independent samples tends to the expectation of the distribution from which they are drawn.Roughly speaking, a probability distribution is an outlaw if many samples are needed for a good estimation of the expectation and a smooth function over the n-dimensional Boolean hypercube is one that has no influential variables.Paradoxically, while many instances of the probabilistic method use the fact that sample means of a small number of independent random variables tend to concentrate around the true mean, as captured for example by the Chernoff bound, our main result requires precisely the opposite.We show that if at least k samples from a distribution over smooth functions are needed to approximate the mean, then there exists an O(1)-query LDC sending {0, 1} Ω(k) to {0, 1} n , where the hidden constants depend only the smoothness and mean-estimation parameters.
To make this precise, we now formally define smooth functions and outlaw distributions.Given a function f : {−1, 1} n → R, its spectral norm (also known as the algebra norm or Wiener norm) is defined as where f (S) are the Fourier coefficients of f (see Section 2 for some preliminaries in Fourier analysis).We also consider the supremum norm, , where x i is the point that differs from x on the ith coordinate.Smooth functions are functions whose discrete derivatives have small spectral norms.
Definition 1.2 (σ-smooth functions).For σ > 0, a function f : Intuition for the above definition may be gained from the fact that smooth functions have no influential variables.The influence of the ith variable, (E x∈{−1,1} n [(D i f )(x) 2 ]) 1/2 , measures the extent to which changing the ith coordinate of a randomly chosen point changes the value of f .Since D i f L∞ ≤ D i f sp , the directional derivatives of σ-smooth functions are uniformly bounded by σ/n, which is a much stronger condition than saying that the derivatives are small on average.Outlaws are defined as follows.
Definition 1.3 (Outlaw).Let n be a positive integer and µ be a probability distribution over real-valued functions on {−1, 1} n .For a positive integer k and ε > 0, say that µ is a (k, ε)-outlaw if for independent random µ-distributed functions f 1 , . . ., Denote by κ µ (ǫ) the largest integer k such that µ is a (k, ε)-outlaw.
To approximate the true mean of an outlaw µ to within ε on average in the L ∞ -distance, one thus needs κ µ (ε) + 1 samples.Note that if µ is a distribution over σ-smooth functions, then the distribution μ obtained by scaling functions in the support of µ by 1/σ is a distribution over 1-smooth functions and κ μ(ε/σ) = κ µ (ε).
Our main result is then as follows.
Note that the smoothness requirement is essential.For example the uniform distribution over the n dictator functions f i (x) = x i for i ∈ [n] is an (n/2, 1)-outlaw, but it cannot imply constant rate, constant query LDCs which we know do not exist.In fact we establish a converse to Theorem 1.4, showing that its hypothesis is essentially equivalent to the existence of LDCs in the small query complexity regime.
Let us remark in passing that Theorem 1.5 can in turn convert the problem of proving lower bounds on the length of LDCs to a problem on Banach space geometry.In particular, it can be shown that for a distribution µ over 1-smooth degree-q functions on {0, 1} n , one can upper bound κ µ (ε) in terms of type constants of the space of q-linear forms on ℓ n+1 q [Bri15].
Candidate outlaws.One scenario in which outlaw distributions can be obtained is using incidence geometry in finite fields.In particular, the following result can be derived from our main theorem (stated a bit informally here, see Section 6.1 for the formal version).
Corollary 1.6.Let p > 2 be a fixed prime.Suppose that for every set of directions D ⊂ F n p of size |D| ≤ k, there exists a set B ⊂ F n p of size |B| ≥ Ω(p n ) which does not contain any lines with direction in D.Then, there exists a p-query LDC sending {0, 1} Ω(k) to {0, 1} p n .
Another setting in which our approach leads to interesting open problems is in relation to expansion in hypergraphs.Consider a partition of the complete bipartite graph K n,n into n perfect matchings.It is known that picking k = O(log(n)) of these matchings at random will give us an expander graph (of degree k).For some particular partitions (e.g., given by an Abelian group) this bound is tight.The questions arising from our approach can be briefly summarized as follows: Can one find an n-vertex hypergraph H (say three uniform to be precise) and a partition of H into matchings so that, to get an expander (defined appropriately) one needs at least k random matchings.This would give a code sending k bit messages with encoding length n and so, becomes interesting when k is super poly-logarithmic in n.We elaborate on this in Section 6.2

Techniques
Our proof of Theorem 1.4 proceeds in two steps.The first step consists of turning an outlaw over smooth functions into a seemingly crude type of LDC that is only required to work on average over a uniformly distributed message and a uniformly distributed message index.We call such codes average-case smooth codes (see below).The second step consists of showing that such codes are in fact not much weaker than honest LDCs.
From outlaws to average-case smooth codes.The key ingredient for the first step is symmetrization, a basic technique from high-dimensional probability.We briefly sketch how this is used (we refer to Section 3 for the full proof).Suppose that f 1 , . . ., f k are independent smooth functions distributed according to a (k, ε)-outlaw with expectation f .We introduce an independent copy2 f ′ i of f i for each i ∈ [k] and consider the symmetrically distributed random functions Jensen's inequality and Definition 1.3 imply that Now, using the fact that the random functions f i − f ′ i are independent and symmetric, for independent uniformly distributed random signs x 1 , . . ., x k ∈ {−1, 1}, the above left-hand side equals Now, the triangle inequality and the Averaging Principle show that there exist fixed smooth functions f ⋆ 1 , . . ., f ⋆ k such that on average over the random signs, we have To get an average-case smooth code out of this, we view each sequence x = (x 1 , . . ., x k ) as a k-bit message and choose an arbitrary n-bit string for which the L ∞ -norm in (2) is achieved to be the its encoding, C(x).This gives a map Equivalently, for uniform x and i, we have Pr[ Finally, we use the smoothness property to transform the f ⋆ i into decoders with the desired properties.This is done in Section 3. Let us point out that it is in the application of the Averaging Principle where the probabilistic method appears in our construction of LDCs.
Average-case smooth codes are LDCs.Katz and Trevisan [KT00] observed that LDC decoders must have the property that they select their queries according to distributions that do not favor any particular coordinate.The intuition for this is that if they did favor a certain coordinate, then corrupting that coordinate would cause the decoder to err with too high a probability.If instead, queries are sampled according to a "smooth" distribution, they will all fall on uncorrupted coordinates with good probability provided the fraction of corrupted coordinates δ and query complexity q aren't too large.The following definition allows us to make this intuition precise.
• For each j ∈ [n], the probability that The formal version of Katz and Trevisan's observation is as follows.
Our second step in the proof of Theorem 1.4 is a stronger form of the converse part in Theorem 1.8 showing that even smooth codes that are only required to work on average can be turned into LDCs, losing only a constant factor in the rate and success probability.Definition 1.9 (Average-case smooth code).A code as in Definition 1.7 is a (q, c, η)-averagecase smooth code if instead of the first item, (3) is required to hold only on average over uniformly distributed x ∈ {0, 1} k and uniformly distributed i ∈ [k], which is to say that where the probability is taken over x, i and the randomness used by A i .
The idea behind the proof of Lemma 1.10 is as follows.We first switch the message and codeword alphabets to {−1, 1} and let has large Gaussian width, in particular it holds that for a standard k-dimensional Gaussian vector g, we have E[sup t∈T g, t ] εk.3 Next, we employ a powerful result of [MV03] showing that T contains an l-dimensional hypercube-like structure with edge length some absolute constant c ∈ (0, 1], for l k.Roughly speaking, this implies that C is a smooth code on {−1, 1} l whose decoding probability depends on ε and c.Finally, we obtain an LDC via an application of Theorem 1.8.The full proof is given in Section 4.

Organization
Section 2 contains some preliminaries in Fourier analysis over the Boolean cube.In Section 3, we prove our main theorem (Theorem 1.4) by first showing that outlaw distributions over smooth functions imply existence of average-case smooth codes and using Lemma 1.10 to convert them to LDCs.In Section 4, we prove Lemma 1.10 showing how to convert average-case smooth-codes to LDCs.In Section 5, we show the converse to our main theorem (Theorem 1.5) showing how to get outlaw distributions over smooth functions from LDCs.Finally in Section 6, we give some candidate constructions of outlaw distributions over smooth functions using incidence geometry and Cayley hypergraphs.

Preliminaries
We recall a few basic definitions and facts from analysis over the n-dimensional Boolean hypercube {−1, 1} n .Equipped with the coordinate-wise multiplication operation, the hypercube forms an Abelian group whose group of characters is formed by the functions χ S (x) = i∈S x i for all S ⊆ [n].The characters form a complete orthonormal basis for the space of real-valued functions on {−1, 1} n endowed with the inner product f, g = ], where we use the notation E a∈S to denote the expectation with respect to a uniformly distributed element a over a set S. The Fourier transform of a function The Fourier inversion formula (which follows from orthonormality of the character functions) asserts that Parseval's Identity relates the L 2 -norms of f and its Fourier transform by A function f has degree q if f (S) = 0 when |S| > q and the degree-q truncation of f , denoted f ≤q , is the degree-q function defined by A function f is a q-junta if it depends only on a subset of q of its variables, or equivalently, if there exists a subset T ⊆ [n] of size |T | ≤ q such that f (S) = 0 for every S ⊆ T .The ith discrete derivative , where x i is the point that differs from x on the ith coordinate.It is easy to show that the ith discrete derivative in of a function f is given by Hence, it follows that

From outlaws to LDCs
In this section we prove Theorem 1.4.For convenience, in the remainder of this paper, we switch the message and codeword alphabets of all codes from {0, 1} n to {−1, 1} n .We begin by showing that outlaw distributions over degree-q functions give q-query average-case smooth codes.Combined with Lemma 1.10, this implies the second part of Theorem 1.4.
Theorem 3.1.Let µ be a probability distribution on 1-smooth degree-q functions on {−1, 1} n , let ε ∈ (0, 1] and let k = κ µ (ε).Then, there exists a (q, 1, ε/4)-average-case smooth code sending Then, by definition of κ µ (ε) and Jensen's inequality, The random variables f i − f ′ i are symmetrically distributed, which is to say that they have the same distribution as their negations f ′ i − f i .Since they are independent, it follows that for every Applying the Averaging Principle to the outer expectation, we find that there exist 1-smooth degree-q functions f ⋆ 1 , . . ., f ⋆ k : {−1, 1} n → R such that Define the code For each i ∈ [k], define the decoder A i as follows.Let i sp , the decoder A i returns a uniformly random sign, and with probability f ⋆ i sp , it samples a set S ⊆ [n] according to ν i and returns χ S (z).This is a valid probability distribution since for any 1-smooth function f , we have Then, A i queries at most q coordinates of z and since f ⋆ i is 1-smooth, the probability that it queries any coordinate j ∈ [n] is at most D j f ⋆ i sp ≤ 1/n.We also have E[A i (z)] = f ⋆ i (z).Therefore, by (4) and (5), we have Hence, C is a (q, 1, ε/4)-average-case smooth code.✷ The final step before the proof of Theorem 1.4 is to show that for any distribution µ over smooth functions, there exists a distribution μ over smooth functions of bounded degree that is not much more concentrated than µ.Lemma 3.2.Let µ be a probability distribution over 1-smooth functions on {−1, 1} n and let ε > 0.Then, there exists a probability distribution μ over 1-smooth functions of degree q = 4/ǫ such that κ μ(ǫ/2) ≥ κ µ (ǫ).
For the last part of the theorem we can simply apply Theorem 3.1 directly.✷ 4 From average-case smooth codes to LDCs In this section, we prove Lemma 1.10.For this, we need the notion of the Vapnik-Chervonenkis dimension (VC-dimension).
Definition 4.1 (VC-dimension).For T ⊂ [−1, 1] k and w > 0, vc(T, w) is defined as the size of the largest subset σ ⊂ [k] such that there exists a shift s ∈ [−1, 1] k satisfying the following: for every x ∈ {−1, 1} σ , there exists t ∈ T such that for every i ∈ σ, Observe that if T is convex, then vc(T, w) is the maximum dimension of a shifted hypercube with edge lengths at least w contained in T .Definition 4.2 (Gaussian width).Let g be a k-dimensional standard Gaussian vector, with independent standard normal distributed entries.The Gaussian with of a set T ⊆ R k is defined as It is easy to see that a large VC-dimension implies a large Gaussian width.The following theorem shows the converse: containing a hypercube-like structure is the only way to have large Gaussian width.
Finally, we use that fact that, as for LDCs, we can assume that on input y ∈ {0, 1} n , the decoder A i of a smooth code first samples a set S ⊆ [n] of at most q coordinates according to a probability distribution that depends on i only and then returns a random sign depending only on i, S and the values of y at S. of Lemma 1.10:The proof works by showing that the average-case smooth code property implies that the image of the (average) decoding functions should have large Gaussian width.We then use Theorem 4.3 to find a hypercube like structure inside the image, which we use to construct a smooth code.Finally we use Theorem 1.8 to convert the smooth code to an LDC.

Recall the switch of the message and codeword alphabets to
Let g be a standard k-dimensional Gaussian vector and T = {(f 1 (z), . . ., f k (z)) : z ∈ {−1, 1} k }.By the definition of average-case smooth code we have (See for instance [Tal14, Lemma 3.2.10]for the last inequality.)By Theorem 4.3, for some constant α > 0, we have where we used the fact that vc(T, w) is decreasing in w.So for τ = αη, we have vc(T, τ ) η 2 k/ log(1/η).By the definition of VC-dimension, there exists a subset σ ⊂ [k] of size |σ| ≥ vc(T, τ ) and a shift s ∈ [−1, 1] k such that for every x ∈ {−1, 1} σ there exists t ∈ T such that (t i − s i )x i ≥ τ /2 for every i ∈ σ.Now we will define the code Let W p denote a {−1, 1}-valued random variable with mean p.The decoding algorithms A ′ i (y) run A i (y) internally and give their output as follows: Therefore, for every x ∈ {−1, 1} σ and for every i ∈ σ, Since the probability that A ′ i (C ′ (x)) queries any particular location of C ′ (x) is still at most c/n, it follows that C ′ is a (q, c, Ω(η))-smooth code.By Theorem 1.8, C ′ is also a (q, Ω(η/c), Ω(η))-LDC.✷

From LDCs to outlaws
In this section we prove Theorem 1.5, the converse of our main result.
Proof of Theorem 1.5:By Theorem 1.8, C : {−1, 1} k → {−1, 1} n is also a (q, q/δ, η)-smooth code.For each i ∈ [k], let B i be its decoder for the ith index.Let ν i : 2 [n] → [0, 1] be the probability distribution used by B i to sample a set S ⊆ [n] of at most q coordinates and let f i,S : {−1, 1} n → [−1, 1] be function whose value at y ∈ {−1, 1} n is the expectation of the random sign returned by B i (y) conditioned on the event that it samples S. Since this value depends only on the coordinates in S, the function f i,S is a q-junta.
].Then, since a q-junta has degree at most q, so does f i .We claim that f i is δ/(q2 q/2 )smooth.Since the functions f i,S : {−1, 1} n → {−1, 1} are q-juntas, it follows from Parseval's identity that they have spectral norm at most 2 q/2 .Moreover, for each j ∈ [n], we have Pr S∼ν i [j ∈ S] ≤ q/(δn).Hence, since f i,S depends only on the coordinates in S, we have which gives the claim.By (3), it holds for every x ∈ {−1, 1} k and every i ∈ [k] that Define the distribution µ to correspond to the process of sampling i ∈ [k] uniformly at random and returning be the mean of µ.We show that κ µ (η) ≥ ηk.To this end, let l = ηk, let σ : [l] → [k] be an arbitrary map and define the functions g 1 , . . ., g l by g i = f σ(i) .Let x ∈ {−1, 1} k be such that for each i ∈ [l], we have x σ(i) = 1 and x j = −1 elsewhere.It follows from (7) that f σ(i) C(x) ∈ [2η, 1] for every i ∈ [l] and that f i C(x) ≤ 0 for every other i ∈ If σ maps each element in [l] to a uniformly random element in [k], then g 1 , . . ., g l are independent, µ-distributed and satisfy which shows that κ µ (η) ≥ l.Finally we can scale all the functions in µ to make them 1-smooth, and get a distribution μ over 1-smooth functions with κ μ(ηδ/(q2 q/2 )) ≥ ηk.✷

Candidate outlaws
In this section we elaborate on the candidate outlaws mentioned in the introduction.

Incidence geometry
We begin by describing a variant of Corollary 1.6 based on a slightly different assumption and show conditions under which this assumption holds.Let p be an odd prime, let F p be a finite field with p elements and let n be a positive integer.For x, y ∈ F n p , the line with origin x in direction y, denoted ℓ x,y , is the sequence (x+ λy) λ∈Fp .A line is nontrivial if y = 0. Corollary 6.1.For every odd prime p and ε ∈ (0, 1], there exist a positive integer n 1 (p, ε) and a c = c(p, ε) ∈ (0, 1/2] such that the following holds.Let n ≥ n 1 (p, ε) and k be positive integers.Assume that for every set A ⊆ F n p of size |A| ≤ k, there exists a set B ⊆ F n p of size εp n such that every nontrivial line through A contains at most p − 2 points of B. Then, there exists a (p − 1, c, c)-LDC sending {0, 1} l to {0, 1} p n , where l = Ω(c 2 k/ log(1/c)).
The proof uses the following version of Szemerédi's Theorem [Tao12, Theorem 1.5.4] and its standard "Varnavides-type" corollary (see for example [TV06, Exercise 10.1.9]).Theorem 6.2 (Szemerédi's theorem).For every odd prime p and any ε ∈ (0, 1], there exists a positive integer n 0 (p, ε) such that the following holds.Let n ≥ n 0 (p, ε) and let S ⊆ F n p be a set of size |S| ≥ εp n .Then, S contains a nontrivial line.Corollary 6.3.For every odd prime p and any ε ∈ (0, 1], there exists a positive integer n 1 (p, ε) and a c(p, ε) ∈ (0, 1] such that the following holds.Let n ≥ n 1 (p, ε) and let S ⊆ F n p be a set of size |S| ≥ εp n .Then, S contains at least c(p, ε)p 2n nontrivial lines, that is, ). Proof of Corollary 6.1:With some abuse of notation, we identify functions f : Then, for a set B ⊆ F n p , the value F x (φ −1 (1 B )) equals the fraction of all nontrivial lines ℓ x,y through x of which B contains the p − 1 points {x + λy : λ ∈ F * p }.If B has size at least εp n , it thus follows from Corollary 6.3 that Moreover, since the monomials in the expectation of (8) involve disjoint sets of variables and can be expanded as Let µ be the uniform probability distribution over all F x .We claim that κ µ (c(p, ε)) ≥ k, which implies the result by Theorem 1.4 since µ is supported by degree (p − 1)-functions.For every set A ⊆ F n p of size |A| ≤ k, let B ⊆ F n p be an arbitrary set as in the assumption of the corollary and let f A = φ −1 (1 B ).Let z be a uniformly distributed random variable over F n p , let z 1 , . . ., z k be independent copies of z and let A = {z 1 , . . ., z k }.Then, F z 1 , . . ., F z k are independent µ-distributed random functions and since every nontrivial line through A meets B in at most p − 2 points, we have which gives the claim.✷ The proof of the formal version of Corollary 1.6 (given below) is similar to that of Corollary 6.1, so we omit it.In the following, PF n−1 p is the projective space of dimension n − 1, which is the space of directions in F n p .The formal version of Corollary 1.6 is then as follows.
Corollary 6.4.For every odd prime p and ε ∈ (0, 1], there exist a positive integer n 1 (p, ε) and a c = c(p, ε) ∈ (0, 1/2] such that the following holds.Let n ≥ n 1 (p, ε) and k be positive integers.Suppose that for every set of directions D ⊂ PF n−1 p of size |D| ≤ k, there exists a set B ⊂ F n p of size |B| ≥ εp n which does not contain any lines with direction in D.Then, there exists a (p, c, c)-LDC sending {0, 1} l to {0, 1} p n , where l = Ω(c 2 k/ log(1/c)).
Feasible parameters for Corollary 6.1.Proving lower bounds on k for which the assumption of Corollary 6.1 holds true thus allows one to infer the existence of (p − 1)-query LDCs with rate Ω(k/N) for N = p n , provided p and ε are constant with respect to n.We establish the following bounds, which imply the (well-known) existence of (p−1)-query LDCs with message length k = Ω((log N) p−2 ).Theorem 6.5.For every odd prime p there exists an ε(p) ∈ (0, 1] such that the following holds.For every set A ⊆ F n p of size |A| ≤ n+p−3 p−2 − 1, there exists a set B ⊆ F n p of size ε(p)p n such that every line though A contains at most p − 2 points of B.
The proof uses some basic properties of polynomials over finite fields.For an n-variate The starting point of the proof is the following standard result (see for example [Tao14]), showing that small sets can be 'captured' by zero-sets of nonzero, homogeneous polynomials of low degree.The next two lemmas show that if f is nonzero, homogeneous and degree d, and if a ∈ F * p is such that f −1 (a) is nonempty, then lines through Z(f ) meet f −1 (a) in at most d points.
Lemma 6.7.Let f ∈ F p [x 1 , . . ., x n ] be a nonzero homogeneous polynomial of degree d.Let a ∈ F * p be such that the set f −1 (a) is nonempty.Then, every line that meets f −1 (a) in d + 1 points must have direction in Z(f ).
Proof: The univariate polynomial g(λ) = f (x + λy) formed by the restriction of f to a line ℓ x,y has degree at most d.By the Factor Theorem, such a polynomial must be the constant polynomial g(λ) = a to assume the value a for d + 1 values of λ.Since f is homogeneous, the coefficient of λ d , which must be zero, equals f (y), giving the result.✷ The following lemma is essentially contained in [BR].
Lemma 6.8 (Briët-Rao).Let f ∈ F p [x 1 , . . ., x n ] be a nonzero homogeneous polynomial of degree d.Let a ∈ F * p be such that f −1 (a) is nonempty.Then, there exists no line that intersects Z(f ), meets f −1 (a) in at least d points and has direction in Z(f ).
Proof: For a contradiction, suppose there exists a line ℓ x,y through Z(f ) that meets f −1 (a) in d points and has direction y ∈ Z(f ).Observe that for every λ ∈ F p , the shifted line ℓ x+λy,y also meets f −1 (a) in d points.Hence, without loss of generality we may assume that the line starts in Then g(λ) − a is a degree d − 1 polynomial with d distinct roots.But it cannot be the zero polynomial since it takes value −a when λ = 0. ✷ The final ingredient for the proof of Theorem 6.5 is the DeMillo-Lipton-Schwartz-Zippel Lemma as it appears in [CT14].Lemma 6.9 (DeMillo-Lipton-Schwartz-Zippel).Let f ∈ F p [x 1 , . . ., x n ] be a nonzero polynomial of degree d and denote r = |F p |.Then, x n ] be a nonzero degree-(p − 2) homogeneous polynomial such that A ⊆ Z(f ), as promised to exist by Lemma 6.6.By Lemma 6.9, there exists an a ∈ F * p such that the set B = f −1 (a) has size at least |B| ≥ p n /p (2p−3)/(p−1) .By Lemma 6.7, every line that meets B in p − 1 points must have direction in Z(f ), but by Lemma 6.8 no such line can pass through Z(f ).Hence, every line through A meets B in at most p − 2 points.✷

Uniformity of random Cayley hypergraphs
A second candidate for constructing outlaws comes from quasirandom properties of Cayley graphs and hypergraphs.
Random Cayley graphs and 2-query LDCs.For graphs, an important quasirandom property is spectral expansion.If for a regular graph G, we let 1 ≥ −1 denote the eigenvalues of the normalized adjacency matrix, then the second eigenvalue is defined as λ(G) = max i∈{2,...,n} |λ i (G)|.The importance of this parameter stems from the fact that if it is small, then every large subset of vertices is connected to its complement by a large number of edges [Tan84, AM85], a property that sparse random graphs have with high probability (we refer to [HLW06] for a survey on expander graphs).
The link with Theorem 1.4 follows from the fact that the above result can equivalently be phrased as saying that the normalized adjacency matrix A G of the random graph G as in Theorem 6.10 concentrates around its expectation.The normalized adjacency matrix of Cay(Γ, {g i }), denoted A i , has expectation J/n, where J is the all-ones matrix, and A G is the average (A 1 + • • • + A k )/k.The Schatten-∞ norm (also known as the spectral norm or operator norm) of a matrix B is given by B S∞ = sup{x T By : x ℓ 2 ≤ 1, y ℓ 2 ≤ 1}.Due to the characterization λ(G) = A G − J/n S∞ , Theorem 6.10 says that the value 1 k k i=1 (A i − J/n) S∞ is small with good probability provided k is large enough.Good concentration is thus good for expansion.Our Theorem 1.4 implies that poor concentration is good for LDCs!Corollary 6.11.Let Γ be a finite group of cardinality n and let ε ∈ (0, 1).Let k be the largest positive integer such that with probability at least 1/2, for independent uniformly distributed elements g 1 , . . ., g k ∈ Γ, the graph G = Cay(Γ, {g 1 , . . ., g k }) satisfies λ(G) > ε.Then, there exists a (2, 1, Ω(ε))-LDC sending {0, 1} l to {0, 1} n , where l = Ω(ε 2 k/ log(1/ε)).
It is not hard to show that if Γ is Abelian, then any Cayley graph G = Cay(Γ, S) generated by |S| ≤ (log |Γ|)/3 group elements satisfies λ(G) ≥ 1/2 [HLW06, Proposition 11.5].Together with Corollary 6.11, this fact implies the existence of 2-query LDCs of exponential length; arguably the most round-about way to prove this! Below, we prove a more general version of Corollary 6.11, giving the existence of q-query LDCs from lower bounds on the required degree for uniformity of random q-uniform Cayley hypergraphs.For this, we gather the following definitions.
Proof: Associate with every q-linear form A : R Γ × • • • × R Γ → R a homogeneous q|Γ|-variate degree-q function f : {−1, 1} q|Γ| → R in the obvious way.It follows from (10) and the normalization of the adjacency forms that for every fixed g ∈ Γ, the functions f g associated with the adjacency form of the hypergraph Cay (q) (Γ, r, {g}) are 1-smooth.
The form A J is the average of k independent identically distributed adjacency forms A 1 , . . ., A k that have expectation A H . Letting f 1 , . . ., f k and f be the functions associated with A 1 , . . ., A k and A H , respectively, it follows that where the last line follows from the fact that uniformity is nonnegative.The result now follows from Theorem 1.4.✷ Spectral expansion and uniformity.For regular graphs, the famous Expander Mixing Lemma [HLW06] shows that ∆(G) ≤ λ(G) holds in general.Whereas the reverse inequality does not hold in general [CZ16], the reason why Corollary 6.11 could be stated in terms of the second eigenvalue is that for Cayley graphs, a near-reverse inequality does hold: for some absolute constant c ∈ (0, ∞), we have ∆(G) ≥ cλ(G) [KRS16,CZ16].A second eigenvalue for hypergraphs, analogous to uniformity for hypergraphs, was defined and studied in [BR], where first steps to generalize the Alon-Roichman Theorem were taken.While the Expander Mixing Lemma easily generalizes, it is unknown whether the analogue of [KRS16,CZ16] holds for Cayley hypergraphs.
Lemma 6.6 (Homogeneous Interpolation).For every A ⊆ F n p of size |A| ≤ n+d−1 d −1, there exists a nonzero homogeneous polynomial f ∈ F p [x 1 , . . ., x n ] of degree d such that A ⊆ Z(f ).