From Weak to Strong Linear Programming Gaps for All Constraint Satisfaction Problems

We study the approximability of constraint satisfaction problems (CSPs) by linear programming (LP) relaxations. We show that for every CSP, the approximation obtained by a basic LP relaxation is at least as strong as the approximation obtained using relaxations given by c · logn/ log logn levels of the Sherali–Adams hierarchy (for some constant c > 0) on instances of size n. It was proved by Chan et al. [FOCS 2013] (and recently strengthened by Kothari et al. [STOC 2017]) that for CSPs, any polynomial-size LP extended formulation is at most as strong as the relaxation obtained by a constant number of levels of the Sherali–Adams hierarchy (where the number of levels depend on the exponent of the polynomial in the size bound). Combining this with our result also implies that any polynomial-size LP extended formulation is at most as strong as the basic LP, which can be thought of as the base level of A conference version of this paper appeared in the Proceedings of the 32nd Computational Complexity Conference (CCC’17) [14]. ∗NSF award number CCF-1254044. †NSF award number CCF-1254044. ACM Classification: F.2.2, G.1.6 AMS Classification: 68Q17, 90C05


Introduction
Given a finite alphabet [q] = {0, . . ., q − 1} and a predicate f : [q] k → {0, 1}, an instance of the problem MAX k-CSP( f ) consists of m constraints over a set of n variables, x 1 , . . ., x n , taking values in the set [q].Each constraint C i is of the form f (x i 1 + b i 1 , . . ., x i k + b i k ) for some k-tuple (x i 1 , . . .x i k ) of variables, and constants b i 1 , . . ., b i k ∈ [q], where the addition is taken to be modulo q.We say that an assignment σ to the variables satisfies the constraint C i if C i (σ (x i 1 ), . . ., σ (x i k )) = 1.Given an instance Φ of the problem, the goal is to find an assignment σ to the variables satisfying as many constraints as possible.The approximability of the MAX k-CSP( f ) problem has been extensively studied for various predicates f (see, e. g., [30] for a survey), and special cases include several interesting and natural problems such as MAX 3-SAT, MAX 3-XOR and MAX-CUT.
A topic of much recent interest has been the efficacy of Linear Programming (LP) and Semidefinite Programming (SDP) relaxations.For a given instance Φ of MAX k-CSP( f ), let OPT(Φ) denote the fraction of constraints satisfied by an optimal assignment, and let FRAC(Φ) denote the value of the convex (LP/SDP) relaxation for the problem.Then, the performance guarantee of this algorithm is given by the integrality gap which equals the supremum of FRAC(Φ)/OPT(Φ), over all instances Φ.
The study of unconditional lower bounds for general families of LP relaxations was initiated by Arora, Bollobás and Lovász [2] (see also [3]).They studied the Lovász-Schrijver [25] LP hierarchy and proved lower bounds on the integrality gap for Minimum Vertex Cover (their technique also yields similar bounds for MAX-CUT).De la Vega and Kenyon-Mathieu [12] and Charikar, Makarychev and Makarychev [9] proved a lower bound of 2 − o(1) for the integrality gap of the LP relaxations for MAX-CUT given respectively by O(log log n) and n O(1) levels of the Sherali-Adams LP hierarchy [29].Several follow-up papers have also shown lower bounds for various other special cases of the MAX k-CSP problem, both for LP and SDP hierarchies [1,28,33,27,6,5,22].
An LP extended formulation of a polytope P ⊆ R d is a linear program of the form Ex + Fy = g and E x + F y ≤ g , where x ∈ R d , y ∈ R t , and x ∈ P if and only if there exists y such that (x, y) is a solution to the above LP.We take the size of an LP extended formulation to be the sum of the number of variables and the number of constraints (equalities plus inequalities).We refer the interested reader to a discussion in [13] for different (equivalent) notions of size.A recent result by Chan et al. [7] shows a connection between strong lower bounds for the Sherali-Adams hierarchy, and lower bounds on the size of LP extended formulations for the corresponding problem.In fact, their result proved a connection not only for a lower bound on the worst case integrality THEORY OF COMPUTING, Volume 14 (10), 2018, pp.gap, but for the entire approximability curve.We say that Φ is (c, s)-integrality gap instance for a relaxation of MAX k-CSP( f ), if we have FRAC(Φ) ≥ c and OPT(Φ) < s .
And we say that Φ is (c, s)-approximable by a relaxation of MAX k-CSP( f ), if for instances with OPT(Φ) < s, we have FRAC(Φ) ≤ c.They showed that for any fixed t ∈ N, if there exist (c, s)-integrality gap instances of size n for the relaxation given by t levels of the Sherali-Adams hierarchy, then for all ε > 0 and sufficiently large N, there exists a (c − ε, s + ε)-integrality gap instance of size (number of variables) N, for any linear extended formulation of size at most N t/2 .They also give a trade-off when t is a function of n.This was recently improved by Kothari et al. [21] and we describe the improved trade-off later.
We strengthen the above results by showing that for all c, s ∈ [0, 1], (c, s)-integrality gap instances for a "basic LP" can be used to construct (c − ε, s + ε)-integrality gap instances for Ω ε (log n/ log log n) levels of the Sherali-Adams hierarchy.The basic LP uses only a subset of the constraints in the relaxation given by k levels of the Sherali-Adams hierarchy for MAX k-CSP( f ).In particular, this shows that a lower bound on the integrality gap for even the basic LP implies a similar lower bound on the integrality gap of any polynomial-size extended formulation.This can also be viewed as a dichotomy result showing that for any predicate f , either MAX k-CSP( f ) is (c, s)-approximable by the basic LP relaxation (where the number of LP variables and constraints are linear in the size of the instance) or for all ε > 0, a (c − ε, s + ε)-approximation cannot be achieved by any polynomial-size LP extended formulation.We note that both the above results and our result apply to all f , q and all c, s ∈ [0, 1].

Comparison with (implications of) Raghavendra's UG-hardness result
A remarkable result by Raghavendra [26] shows that a (c, s)-integrality gap instance for a "basic SDP" relaxation of MAX k-CSP( f ) implies Unique-Games-hardness (UG-hardness) [16] of distinguishing instances Φ with OPT(Φ) < s from instances with OPT(Φ) ≥ c.The basic SDP considered by Raghavendra involves moments for all pairs of variables and all subsets of variables included in a constraint.The basic LP we consider is weaker than this SDP and does not contain the positive semidefiniteness constraint.
Combining Raghavendra's result with known constructions of integrality gaps for Unique Games by Raghavendra and Steurer [27], and by Khot and Saket [17], one can obtain a result qualitatively similar to ours, for the mixed hierarchy.In particular, a (c, s)-integrality gap for the basic SDP implies a (c − ε, s + ε)-integrality gap for Ω((log log n) 1/4 ) levels of the mixed hierarchy.
Note however, that the above result is incomparable to our result, since it starts with stronger hypothesis (a basic SDP gap) and yields a gap for the mixed hierarchy as opposed to the Sherali-Adams hierarchy.While the above can also be used to derive lower bounds for linear extended formulations, one needs to start with an SDP gap instance to derive an LP lower bound.The basic SDP is known to be provably stronger than the basic LP for several problems including various 2-CSPs.Also, for the worst case f for q = 2, the integrality gap of the basic SDP is O(2 k /k) [10], while that of the basic LP is 2 k−1 .The integrality gap for the basic LP is achieved by "all zero or all one" predicate.
A recent result by Khot and Saket [18] shows a connection between the integrality gaps for the basic LP and those for the basic SDP.They prove that a (c, s)-integrality gap instance for the basic LP implies UG-hardness of distinguishing instances Φ with OPT(Φ) ≥ Ω c/(k 3 • log q) from instances with OPT(Φ) ≤ 4s.Their result also shows that a (c, s)-integrality gap instance for the basic LP can be used to produce a Ω c/(k 3 • log q) , 4s -integrality gap instance for the basic SDP, and hence for Ω((log log n) 1/4 ) levels of the mixed hierarchy.

Other related work
The power of the basic LP for solving valued CSPs to optimality has been studied in several previous papers.These results concerns the problem of minimizing the penalty for unsatisfied constraints, where the penalties take values in Q ∪ {∞}.Also, they study the problem not only in terms of single predicate f , but rather in terms of the constraint language generated by a given set of (valued) predicates.
It was shown by Thapper and Živný [31] that when the penalties are finite-valued, if the problem of finding the optimum solution cannot be solved by the basic LP, then it is NP-hard.Kolmogorov, Thapper and Živný [20] give a characterization of CSPs where the problem of minimizing the penalty for unsatisfied constraints can be solved exactly by the basic LP.Also, a recent result by Thapper and Živný [32] shows that the valued CSP problem for a constraint language can be solved to optimality by a bounded number of levels of the Sherali-Adams hierarchy if and only if it can be solved by a relaxation obtained by augmenting the basic LP with constraints implied by three levels of the Sherali-Adams hierarchy.However, the above papers only consider the case when the LP gives an exact solution, and do not focus on approximation.
The techniques from [9] used in our result, were also extended by Lee [24] to prove a hardness for the Graph Pricing problem.Kenkre et al. [15] also applied these to show the optimality of a simple LP-based algorithm for Digraph Ordering.

Our results
Our main result is the following theorem, which shows that for every CSP, for instances of size n, the basic LP is at least as strong as any relaxation given by o(log n/ log log n) levels of the Sherali-Adams hierarchy.
Theorem 1.1.Let f : [q] k → {0, 1} be any predicate.Let Φ 0 be a (c, s)-integrality gap instance for basic LP relaxation of MAX k-CSP ( f ).Then for every ε > 0, there exists c ε > 0 such that for infinitely many N ∈ N, there exist (c − ε, s + ε)-integrality gap instances of size N for the LP relaxation given by c ε • log N/ log log N levels of the Sherali-Adams hierarchy.
Combining the above with the connection between Sherali-Adams gaps and extended formulations by [7,21] also yields that the basic LP is at least as strong as any LP extended formulation of size n o(log n/ log log n) .Corollary 1.2.Let f : [q] → {0, 1} be any predicate.Let Φ 0 be a (c, s)-integrality gap instance for basic LP relaxation of MAX k-CSP ( f ).Then for every ε > 0, there exists c ε > 0 such that for infinitely many N ∈ N, there exist (c − ε, s + ε)-integrality gap instances of size N, for every linear extended formulation of size at most N c ε •log N/ log log N .THEORY OF COMPUTING, Volume 14 (10), 2018, pp.As an application of our methods, we also simplify and strengthen the approximation resistance results for LPs proved by Khot et al. [19].They studied predicates f : {0, 1} k → {0, 1} and provided a necessary and sufficient condition for the predicate to be strongly approximation resistant for the Sherali-Adams LP hierarchy.One says a predicate is strongly approximation resistant if for all ε > 0, it is hard to distinguish instances Φ for which In the context of the Sherali-Adams hierarchy, they showed that when this condition is satisfied, there exist instances Φ satisfying where FRAC(Φ) is the value of the relaxation given by O ε (log log n) levels of the Sherali-Adams hierarchy.We strengthen their result (and provide a simpler proof) to prove the following.
Theorem 1.3.Let f : {0, 1} k → {0, 1} be any predicate satisfying the condition for strong approximation resistance for LPs, given by [19].Then for every ε > 0, there exists c ε > 0 such that for infinitely many N ∈ N, there exists an instance Φ of MAX k-CSP( f ) of size N, satisfying where FRAC(Φ) is the value of the relaxation given by c ε • log N/ log log N levels of the Sherali-Adams hierarchy.
As before, the above theorem also yields a corollary for extended formulations.

Proof overview and techniques
The gap instance.The construction of our gap instances is inspired by the construction by Khot et al. [19].They gave a generic construction to prove integrality gaps for any approximation resistant predicate (starting from certificates of hardness in form of certain "vanishing measures"), and we use similar ideas to give a construction which can start from a basic LP integrality gap instance as a certificate, to produce a gap instance for a large number of levels.This construction is discussed in Section 5.
Given an integrality gap instance Φ 0 on n 0 variables, we treat this as a "template" (as in Raghavendra [26]) and generate a random instance using this.Concretely, we generate a new instance Φ on n 0 sets of n variables each.To generate a constraint, we sample a random constraint C 0 ∈ Φ 0 , and pick a variable randomly from each of the sets corresponding to variables in C 0 .Thus, the instances generated are n 0 -partite random hypergraphs, with each edge being generated according to a specified "type" (indices of sets to chose vertices from).
Note that previous instances of gap constructions for LP and SDP hierarchies were hypergraphs generated according to the model G n,p with the signs of the literals chosen independently at random.However, proving an LP/SDP lower bound using such instances implies a strong result: The predicate f is useless for the corresponding relaxation, in the sense defined by [4] (replacing the "P = NP" assumption by the assumption that UG does not belong to P). Uselessness only holds for a limited class of predicates f (when f −1 (1) supports a balanced pairwise independent distribution on [q] k ) [4].Thus, proving an SDP lower bound for predicates which are not expected to be useless requires a new construction of instances, which cannot be generated uniformly at random.Our construction provides such a generalization, and may be useful in proving new SDP lower bounds.The properties of random G n,p hypergraphs easily carry over to our instances, and we collect these properties in Section 3.
The above construction ensures that if the instance Φ 0 does not have an assignment satisfying more than an s fraction of the constraints, then OPT(Φ) ≤ s + ε with high probability.Also, it is well-known that providing a good LP solution to the relaxation given by t levels of the Sherali-Adams hierarchy is equivalent to providing distributions D S on [q] S for all sets of variables S with |S| ≤ t, such that the distributions are consistent restricted to subsets, i. e., for all S with |S| ≤ t and all T ⊆ S, we have D S|T = D T .Thus, in our case, we need to produce such consistent local distributions such that the expected probability that a random constraint C ∈ Φ is satisfied by the local distribution on the set of variables involved in C (which we denote as S C ) is at least c − ε.
Local distributions from local structure.Most papers on integrality gaps for CSPs utilize the local structure of random hypergraphs to produce such distributions.Since the girth of a sparse random hypergraph is Ω(log n), any induced subgraph on o(log n) vertices is simply a forest.In case that the induced hypergraph G S on a set S is a tree, there is an easy distribution to consider: simply choose an arbitrary root and propagate down the tree by sampling each child conditioned on its parent.It is also easy to see that for T ⊆ S, if the induced hypergraph G T is a subtree of G S , then the distributions D S and D T produced as above are consistent.
The extension of this idea to forests requires some care.One can consider extending the distribution to forests by propagating independently on each tree in the forest.However, if for T ⊆ S G T is a forest while G S is a tree, then a pair of vertices disconnected in G T will have no correlation in D T but may be correlated in D S .This was handled, for example, in [19] by adding noise to the propagation and using a large ball B(S) around S to define D S .Then, if two vertices of T are disconnected in B(T ) but connected in B(S), then they must be at a large distance from each other.Thus, because of the noise, the correlation between them (which is zero in D T ) will be very small in D S .However, correcting approximate consistency to exact consistency incurs a cost which is exponential in the number of levels (i.e., the sizes of the sets), which is what limits the results in [19,12] to O(log log n) levels.This also makes the proof more involved since it requires a careful control of the errors in consistency.
Consistent partitioning schemes.We resolve the above consistency issue by first partitioning the given set S into a set of clusters, each of which have diameter ∆ H = o(log n) in the underlying hypergraph H. Since each cluster has bounded diameter, it becomes a tree when we add all the missing paths between any two vertices in the cluster.We then propagate independently on each cluster (augmented with the missing paths).This preserves the correlation between any two vertices in the same cluster, even if the path between them was not originally present in G S .
Of course, the above plan requires that the partition obtained for T ⊆ S, is consistent with the THEORY OF COMPUTING, Volume 14 (10), 2018, pp.1-33 restriction to T of partition obtained for the set S. In fact, we construct distributions over partitions {P S } |S|≤t , which satisfy the consistency property P S|T = P T .These distributions over partitions, which we call consistent partitioning schemes, are constructed in Section 4.
In addition to being consistent, we require that the partitioning scheme cuts only a small number of edges in expectation, since these contribute to a loss in the LP objective.We remark that such low-diameter decompositions (known as separating and padded decompositions) have been used extensively in the theory metric embeddings (see, e. g., [23] and the references therein).The only additional requirement in our application is consistency.
We obtain the decompositions by proving the (easy) hypergraph extensions the results of Charikar, Makarychev and Makarychev [11], who exhibit a metric which is similar to the shortest path metric on graphs at small distances, and has the property that its restriction to any subset of size at most n ε (for an appropriate ε < 1) is 2 embeddable.This is proved in Section 3. A variant of this metric was used by Charikar, Makarychev and Makarychev [9] to prove lower bounds for MAX-CUT, for n ε levels of the Sherali-Adams hierarchy.They used the embedding to construct a "local SDP solution" for any n ε variables (with value 1 − ε ) and produced the distributions required for Sherali-Adams by rounding the SDP solutions (which gives value 1 − O( √ ε )).However, rounding an SDP solution with a high value does not always produce a good integral solution for other CSPs.
Instead, we use these metrics in Section 4 to construct the consistent partitioning schemes as described above, by applying a result of Charikar et al. [8] giving separating decompositions for finite subsets of 2 .We remark that it is the consistency requirement of the partitioning procedure that limits our results to O (log n/ log log n) levels.The separation probability in the decomposition procedure grows with the dimension of the 2 embedding, while (to the best of our knowledge) dimension reduction procedures seem to break consistency.

Preliminaries
We use [n] to denote the set {1, . . ., n}.The only exception is [q], where we overload this notation to denote the set {0, . . ., q − 1}, which corresponds to the the alphabet for the Constraint Satisfaction Problem under consideration.We use D S and P S to denote probability distributions over assignments to and partitions of a set S, respectively.For T ⊆ S, the notation D S|T is used to denote the restriction (marginal) of the distribution D S to the set T (and similarly for P S|T ).

Constraint Satisfaction Problems
and the constraint is of the form , where the addition is modulo q.For an assignment σ : [n] → [q], let sat(σ ) denote the fraction of constraints satisfied by σ (where x i gets assigned to σ (i)).The maximum fraction THEORY OF COMPUTING, Volume 14 (10), 2018, pp.1- of constraints that can be simultaneously satisfied is denoted by OPT(Φ), i. e., sat(σ ) .
For a constraint C of the above form, we use x C to denote the tuple (x i 1 , . . ., x i k ) of variables, and b C to denote the tuple (b i 1 , . . ., b i k ).We then write the constraint as f (x C + b C ).We also denote by S C the set of indices, {i 1 , . . ., i k }, of the variables participating in the constraint C.

The LP relaxations for Constraint Satisfaction Problems
Below we present various LP relaxations for the MAX k-CSP q ( f ) problem that are relevant in this paper.
We start with the level-t Sherali-Adams relaxation.The intuition behind it is the following.Note that an integer solution to the problem can be given by an assignment σ : [n] → [q].Using this, we can define {0, 1}-valued variables x (S,α) for each S ⊆ [n], 1 ≤ |S| ≤ t and α ∈ [q] S , with the intended solution x (S,α) = 1 if σ (S) = α and 0 otherwise.We also introduce a variable x ( / 0, / 0) , which equals to 1.We relax the integer program and allow variables to take real values in [0, 1].Now the variables {x (S,α) } α∈[q] S give a probability distribution D S over assignments to S. We can enforce consistency between these local distributions by requiring that for T ⊆ S, the distribution over assignments to S, when marginalized to T , is precisely the distribution over assignments to T , i. e., D S|T = D T .The relaxation is shown in Figure 1.
The basic LP relaxation is a reduced form of the above relaxation where only those variables x (S,α) are included for which S = S C is the set of CSP variables for some constraint C. The consistency constraints are included only for singleton subsets of the sets S C .We note here that for any feasible solution to basic LP relaxation, the local distributions {x (S,α) } assign the same value to the repeated variables of a constraint.Note that the all the constraints for the basic LP are implied by the relaxation obtained by level k of the Sherali-Adams hierarchy.
For an LP/SDP relaxation of MAX k-CSP q , and for a given instance Φ of the problem, we denote by FRAC(Φ) the LP/SDP (fractional) optimum.A relaxation is said to have a (c, s)-integrality gap if there exists a CSP instance Φ such that FRAC(Φ) ≥ c and OPT(Φ) < s.

Hypergraphs
An instance Φ of MAX k-CSP defines a natural associated hypergraph H = (V, E) with V being the set of variables in Φ and E containing one hyperedge for every constraint C ∈ Φ.We remind the reader of the familiar notions of degree, paths, and cycles for the case of hypergraphs: • For a vertex v ∈ V , the degree of the vertex v is defined to be the number of distinct hyperedges containing it.
• A simple path P is a finite alternate sequence of distinct vertices and distinct edges starting and ending at vertices, i. e., P = v 1 , e 1 , v 2 , . . ., v , e , v +1 , where Furthermore, e i contains v i , v i+1 for each i.Here is called the length of the path P. All paths discussed in this paper will be simple paths.
and v 1 ∈ e .We note that we don't include hyperedges with only one vertex towards forming cycles.For a path P (or cycle C), we use V(P) (or V(C)) to denote the set of all the vertices that occur in the edges, i. e., the set {v : , where e 1 , . . ., e h are the hyperedges included in P (or C).For a path P, |P| denotes the number of hyperedges on the path P.
• For a given hypergraph H, the length of the smallest cycle in H is called the girth of H.
To observe the difference the notions of cycle in graphs and hypergraphs, it is instructive to consider the following example: let u, v be two distinct vertices in a k-uniform hypergraph for k ≥ 3, and let e 1 , e 2 be two distinct hyperedges both containing u and v. Then u, e 1 , v, e 2 , u is a cycle of length 2, which cannot occur in a graph.
We shall also need the following notion of the closure of a set S ⊆ V in a given hypergraph H, defined by [9] for the case of graphs.A stronger notion of closure was also considered by [5].Definition 2.3.For a given hypergraph H and R ∈ N, and a set S ⊆ V(H), we denote by cl R (S) the R-closure of S obtained by adding all the vertices in all the paths of length at most R connecting two vertices of S, i. e., cl .
For ease of notation, we use cl(S) to denote cl 1 (S).

Properties of random hypergraphs
In this section we collect various properties of the hypergraphs corresponding to our integrality gap instances.The gap instances we generate contain several disjoint collections of variables.Each constraint in the instance has a specified "type", which specifies which of the collections each of the participating k variables much be sampled from.The constraint is generated by randomly sampling each of the k variables, from the collections specified by its type.This is captured by the generative model described below.
In the model below and in the construction of the gap instance, the parameter n 0 should be thought of as constant, while the parameters n and m should be though of a growing to infinity.We will choose m = γ • n for γ = O k,q (1).Definition 3.1.Let n 0 , k ∈ N with k ≥ 2. Let m, n ∈ N and let Γ be a distribution on [n 0 ] k .We define a distribution H k (m, n, n 0 , Γ) on n 0 -partite hypergraphs on N = n 0 • n vertices, divided into n 0 sets, X 1 , . . ., X n 0 , of size n each.Each H ∈ Supp(H k (m, n, n 0 , Γ)) has m edges and each edge has at most k vertices.A random hypergraph H ∼ H k (m, n, n 0 , Γ) is generated by sampling m random hyperedges independently as follows: • For all distinct i j , sample v i j independently and uniformly from X i j .
• Add the edge e i = {v i 1 , . . ., v i k } to H.
Note that as specified above, the model may generate a multi-hypergraph.However, the number of such repeated edges is likely to be small, and we will bound these, and in fact the number of cycles of size o(log n) in Lemma A.2.
We will study the following metrics (similar to the ones defined in [11]): THEORY OF COMPUTING, Volume 14 (10), 2018, pp.1-33 We primarily need the local 2 -embeddability of the metric ρ H µ .The following theorem captures various properties of random hypergraphs required for our construction.The proof of the theorem heavily uses results proved in [3] and [9] and we defer the details to Appendix A. Theorem 3.3.Let H ∼ H k (m, n, n 0 , Γ) with m = γ • n edges and let ε > 0. Then for large enough n, with high probability (at least 1 − ε, over the choice of H ), there exist δ > 0, constant c = c(k, γ, n 0 , ε), θ = θ (k, γ, n 0 , ε) and a subhypergraph H ⊂ H with V (H) = V (H ) satisfying the following: • For all t ≤ n θ , for µ ≥ c • (logt + log log n)/ log n, for all S ⊆ V(H ) with |S| ≤ t, the metric ρ H µ restricted to S is isometrically embeddable into the unit sphere in 2 .

Decompositions of hypergraphs from local geometry
We will construct the Sherali-Adams solution by partitioning the given subset of vertices into trees, and then creating a natural distribution over satisfying assignments on trees.We define below the kind of partitions we need.Definition 4.1 (Consistent Partitioning Scheme).Let X be a finite set.For a set S, let P S denote a distribution over partitions of S. For T ⊆ S, let P S|T be the distribution over partitions of T obtained by restricting the partitions in P S to the set T .We say that a collection of distributions {P S } |S|≤t forms a consistent partitioning scheme of order t, if ∀S ⊆ X, |S| ≤ t and ∀T ⊆ S P T = P S|T .
In addition to being consistent as described above, we also require the distributions to have small probability of cutting the hyperedges for the hypergraphs corresponding to our CSP instances.We define this property below.Definition 4.2.Let H = (V, E) be a hypergraph with each hyperedge having at most k vertices.Let {P S } |S|≤t be a consistent partitioning scheme of order t for the vertex set V , with t ≥ k.We say the scheme [e is cut by P] ≤ ε .
In this section, we will prove that the hypergraphs arising from random CSP instances admit sparse and consistent partitioning schemes.Recall that for a hypergraph H, we define (Definition 3.2) the metrics d H µ and ρ H µ as: Lemma 4.3.Let H = (V, E) be hypergraph with each hyperedge containing at most k vertices and let ρ H µ be the metric as defined above.Further, let H be such that for all sets S ⊆ V with |S| ≤ t, the metric induced on ρ H µ on S is isometrically embeddable into 2 .Then, there exists ε ≤ 10k • √ µ • t and ∆ H = O(1/µ) such that H admits an ε-sparse consistent partitioning scheme of order t, with each partition consisting of clusters of diameter at most ∆ H in H.
We use the following result of Charikar et al. [8] which shows that low-dimensional metrics have good separating decompositions with bounded diameter, i. e., decompositions which have a small probability of separating points at a small distance.

Theorem 4.4 ([8]
).Let W be a finite collection of points in R d and let ∆ > 0 be given.Then there exists a distribution P over partitions of W such that -∀P ∈ Supp(P), each cluster in P has 2 diameter at most ∆, and -for all x, y ∈ W P P∼P [P separates x and y] ≤ 2 We also need the observation that the partitions produced by the above theorem are consistent, assuming the set S considered above lies in a fixed bounded set (using a trivial modification of the procedure in [8]).For the sequel, we use B(x, δ ) to denote the 2 ball around x of radius δ and B H (u, r) to denote a ball of radius r around a vertex u ∈ V (H).Thus, The balls B(S, δ ) and B H (S, r) are defined similarly.
Claim 4.5.Let S and T be sets such that T ⊆ S. Let W S = {w u } u∈S and W T = {w u } u∈T be 2 -embeddings of S and T satisfying φ (W T ) ⊆ W S ⊆ B(0, R 0 ) ⊂ R d , for some unitary transformation φ and R 0 > 0. Let P S and P T be distributions over partitions of S and T respectively, induced by partitions on W S and W T as given by Theorem 4.4.Then P S|T = P T .
Proof.The claim follows simply by considering (a trivial modification of) the algorithm of [8].For a given set W and a parameter ∆, they produce a partition using the following procedure: • Repeat until W = / 0 -Pick a random point x in B(W, ∆/2) according to the Haar measure.Let C x = B(x, ∆/2) ∩W .
-If C x = / 0, set W = W \C x .Output C x as a cluster in the partition.
THEORY OF COMPUTING, Volume 14 (10), 2018, pp.1-33 [8] show that the above procedure produces a distribution over partitions satisfying the conditions in Theorem 4.4.We simply modify the procedure to sample a random point x in B(0, R 0 + ∆/2) instead of B(W, ∆/2).This does not affect the separation probability of any two points, since the only non-empty clusters are still produced by the points in B(S, ∆/2).Since R 0 + ∆ < ∞, the above procedure almost surely terminates in finitely many steps.
Let P be a partition of S produced by the above procedure when applied to the point set W S , and let P be a random partition produced when applied to the point set φ (W T ).It is easy to see from the above procedure that the distribution P T is invariant under a unitary transformation of W T .By coupling the random choice of a point in B(0, R 0 + ∆/2) chosen at each step in the procedures applied to W S and φ (W T ) ⊆ W S , we get that P(T ) = P , i. e., the partition P restricted to T equals P .Thus, we get P S|T = P T .
We can use the above to prove Lemma 4.3.
Proof of Lemma 4.3.Given a set S, let W S be an 2 embedding of the metric ρ µ restricted to S. Since |S| ≤ t, we can assume W S ∈ R t .We apply partitioning procedure of Charikar et al. from Theorem 4.4 with ∆ = 1/2.From the definition of the metric ρ H µ , we get that there exists a Moreover, for u, v contained in an edge e, we have that ρ µ (u, v) ≤ √ 5µ and hence the probability that u and v are separated is at most 10 √ µ • t.Thus, the probability that any vertex in e is separated from u is at most 10k • √ µ • t -as e contains at most k vertices.Finally, for any S ⊆ T , if W S and W T denote the corresponding 2 embeddings, by the rigidity of 2 we have that for φ (W T ) ⊆ W S for some unitary transformation φ .Thus, by Claim 4.5, we get that this is a consistent partitioning scheme of order t.

Integrality gaps from the basic LP
Recall that the basic LP relaxation for MAX k-CSP q ( f ) as given in Figure 2. In this section, we will prove Theorem 1.1.We recall the statement below.
Theorem 1.1.Let f : [q] k → {0, 1} be any predicate.Let Φ 0 be a (c, s)-integrality gap instance for basic LP relaxation of MAX k-CSP ( f ).Then for every ε > 0, there exists c ε > 0 such that for infinitely many N ∈ N, there exist (c − ε, s + ε)-integrality gap instances of size N for the LP relaxation given by c ε • log N/ log log N levels of the Sherali-Adams hierarchy.
Let Φ 0 be a (c, s) integrality gap instance for the basic LP relaxation for MAX k-CSP q ( f ) with n 0 variables and m 0 constraints.We use it to construct a new integrality gap instance Φ.The construction is similar to the gap instances constructed by Khot et al. [19] discussed in the next section.However, we describe this construction first since it is simpler.The procedure for constructing the instance Φ is described in Figure 3.
Output: An instance Φ with N = n • n 0 variables and m constraints.
The variables are divided into n 0 sets, X 1 , . . ., X n 0 , one for each variable in Φ 0 .We generate m constraints independently at random as follows: variables in this constraint.
2. For each distinct i j for j ∈ [k], sample a random variable x i j ∈ X i j .We note that if i j = i j then we set X i j = X i j .

Soundness
We first prove that no assignment satisfies more than s + ε fraction of constraints for the above instance.
Proof.Fix an assignment σ ∈ [q] N .We will first consider E [sat Φ (σ )] for a randomly generated Φ as above.
where for each i ∈ [n 0 ], Z i is an independent random variable with the distribution and Z C 0 denotes the collection of variables in the constraint C 0 , i. e., Thus, the random variables Z 1 , . . ., Z n 0 define a random assignment to the variables in Φ 0 , which gives, for any σ Consider a randomly added constraint C to the instance Φ.We have that THEORY OF COMPUTING, Volume 14 (10), 2018, pp.1-33 for any fixed σ over random choice of the constraint C. Thus, for an instance Φ with m independently and randomly generated constraints, we have Taking a union bound over all assignments, we get

Completeness
To prove the completeness, we first observe that the instance Φ as constructed above is also a gap instance for the basic LP.We will then "boost" this hardness to many levels of the Sherali-Adams hierarchy.
Lemma 5.2.For every ε > 0, there exists γ = γ(ε) such that for an instance Φ generated by choosing at least γ • n constraints independently at random as above, with probability 1 − exp (−Ω(n)) there exist distributions D S C over [q] S C for each C ∈ Φ, and distributions D i over [q] for each variable i ∈ [n • n 0 ], satisfying: -For all C ∈ Φ and all i ∈ S C , D S C |{i} = D i ; -The distributions satisfy Proof.For each C 0 ∈ Φ 0 and each j ∈ [n 0 ], let D Each constraint C ∈ Φ is sampled according to some constraint C 0 ∈ Φ 0 , and we take D S C := D (0) S C 0 for the corresponding constraint C 0 ∈ Φ 0 .Also, each variable x i for i ∈ [n 0 • n], belongs to one of the sets X j for j ∈ [n 0 ], and we take D i := D (0) j for the corresponding j ∈ [n 0 ].The consistency of the distributions follows immediately from the construction of the instance Φ.Let C ∈ Φ be any constraint and let C 0 be the corresponding constraint in Φ 0 .If x C 0 = ( j 1 , . . ., j k ), then x C = (i 1 , . . ., i k ) where each i r ∈ { j r } × [n] for all distinct j r .Thus, for any r ∈ [k], THEORY OF COMPUTING, Volume 14 (10), 2018, pp.We note that the repeated variables in x C 0 gets the same value under the basic LP solution D (0) S C 0 and therefore the same is true for D S C .To bound the objective value, we again consider its expectation over a randomly generated instance Φ.Let C be a random constraint added to Φ.Then, if we define D S C as above for this constraint, we have Thus, the expected contribution of each constraint is at least c.The probability that the average of m constraints deviates by at least ε/10 from the expectation, is at most exp −Ω(ε 2 • m) .There exists γ = O(1/ε 2 ) such that for m ≥ γ • n, the probability is at most exp(−Ω(n)).
To construct local distributions for the Sherali-Adams hierarchy, we will consider (a slight modification of) the hypergraph H corresponding to the instance Φ.We first show that distributions on hyperedges of this hypergraph can be consistently propagated in a tree, provided they agree on intersecting vertices.
For a set U ⊆ V(H) in a hypergraph H, recall that cl(U) includes all paths of lengths at most 1 between any two vertices in U. Thus, E(cl(U)) = {e ∈ E | |e ∩U| ≥ 2}.Note that Lemma 5.2 implies that hyperedges forming a tree in H satisfy the hypothesis of Lemma 5.3 below.Proof.We define the distribution by starting with an arbitrary hyperedge and traversing the tree in an arbitrary order.Let e 1 , . . ., e r be a traversal of the hyperedges in E(cl(U)) such that for all i, j<i e j ∩ e i = 1 .
Let U 0 = j<i e j be the set of vertices for which we have already sampled an assignment and let e i be the next hyperedge in the traversal, with u being the unique vertex in e i ∩U 0 .We sample an assignment to the vertices in e, conditioned on the value for the vertex u.Formally, we extend the distribution D U 0 to U 0 ∪ e by taking, for any α ∈ The above process defines a distribution D cl(U) on cl(U), with In the above expression, we use deg(u) to denote the degree of vertex u in tree formed by the hyperedges in E(cl(U)), i. e., deg(u) = |{e ∈ E(cl(U)) | u ∈ e}|.We then define the distribution D U as the marginalized distribution D cl(U)|U , i. e., Note that the distribution D cl(U) and hence also the distribution D U are independent of the order in which we traverse the hyperedges in E(cl(U)).Also, since the above process samples each hyperedge according to the distribution D e , we have that for any e ∈ E(U), D cl(U)|e = D e .Thus, also for any e ∈ E(U), Let U ⊆ U be any set such that E(cl(U )) forms a subtree of E(cl(U)).Then there exists a traversal e 1 , . . ., e r , and i ∈ [r] such that e j ∈ E(cl(U )) ∀ j ≤ i and e j / ∈ E(cl(U )) ∀ j > i.However, the distribution defined by the partial traversal e 1 , . . ., e i is precisely D cl(U ) .Thus, we get that We can now prove the completeness for our construction using consistent decompositions.Lemma 5.4.Let ε > 0 and let Φ be a random instance of MAX k-CSP q ( f ) generated by choosing γ • n constraints independently at random as above, for large enough n.Then, there is a t = Ω ε,k,n 0 (log n/ log log n), such that with probability 1 − ε over the choice of Φ, there exist distributions {D S } |S|≤t satisfying: -For all S ⊆ V with |S| ≤ t, D S is a distribution on [q] S .
-For all T ⊆ S ⊆ V with |S| ≤ t, D S|T = D T .
-The distributions satisfy Proof.By Theorem 3.3, we know that there exists δ such that with probability 1 − ε/4, after removing a set of constraints C B of size at most (ε/4) • m, we can assume that the remaining instance has girth at least g = δ • log n.Also, there exist θ , c > 0 such that for all large enough n, for all t ≤ n θ , the metric ρ H µ restricted to any set S of size at most t embeds isometrically into the unit sphere in 2 , for all µ ≥ c • (logt + log log n)/ log n.
Given a set S, the distribution D S is a convex combination of several distributions D S,P , corresponding to different partitions P sampled from P S .We describe the distribution D S by giving the procedure to sample an α ∈ [q] S .Given the set S with |S| :≤ t -Sample a partition P = (U 1 , . . .,U r ) from the distribution P S .
-For each set U i , consider the set C (U i ) obtained by including the vertices contained in all the hyperedges in the shortest path between all u, v ∈ U i .Note that since U i has diameter at most for all e ∈ E (C (U i )).Here, D e are the distributions given by Lemma 5.2, which form a solution to the basic LP for Φ, with value at least c − ε/4.For each U i , define the distribution -Sample α ∈ [q] S according to the distribution Thus, we have where the distributions D U i are defined as above.
We first prove the distributions are consistent on intersections, i. e., D S|T = D T for any T ⊆ S. Note that by Lemma 4.3, the distributions P S and P T satisfy P S|T = P T .Each partition (U 1 , . . .,U r ) of S also produces a partition T .For ease of notation, we assume that the first (say) r clusters have non-empty intersection with T .Let . Then, we have THEORY OF COMPUTING, Volume 14 (10), 2018, pp.1-33 The second to last equality above uses the fact that C (V i ) is a subtree of C (U i ) and thus by Lemma 5.3.The last equality uses the fact that P S|T = P T by Lemma 4.3.We now argue that the LP solution corresponding to the above distributions {D S } |S|≤t has value at least c − ε.Recall that the value of the LP solution is given by Consider any constraint C in Φ, with the corresponding set of variables S C and the corresponding hyperedge e.When defining the distribution D S C , we will partition S C according to the distribution P S C .By Lemma 4.3 and our choice of parameters For a constraint set which is not in the deleted set C B , if the hyperedge e corresponding to the constraint C is not split by a partition P sampled according to P S C , then by Lemma 5.3 Here, D S C is the distribution given by Lemma 5.2.Since f is Boolean, we have that for Using Lemma 5.2 again, we get where the penultimate inequality uses the fact that the fraction of constraints in the initially deleted set C B is at most ε/4 (for large enough n).

Integrality gaps for resistant predicates
Let f : {0, 1} k → {0, 1} be a Boolean predicate and let ρ( f ) = f −1 (1) /2 k be the fractions of satisfying assignments to f .Then f is approximation resistant if it is hard to distinguish the MAX-CSP instances on f between which are at least 1 − o(1) satisfiable vs which are at most ρ( f ) + o(1) satisfiable.
In [19] the authors introduce the notion of vanishing measure (on a polytope defined by f ) and use it to characterize a variant of approximation resistance, called strong approximation resistance, assuming the Unique Games conjecture.They also gave a weaker notion of vanishing measures, which they used to characterize strong approximation resistance for LP hierarchies.In particular, they proved that when the condition in their characterization is satisfied, there exists a (1 − o(1), ρ( f ) + o(1))-integrality gap for O(log log n) levels of Sherali-Adams hierarchy for predicates f .Here, we show that using Theorem 1.1, their result can be simplified and strengthened 1 to O (log n/ log log n) levels.
Let us first recall some useful notation defined by Khot et al. [19] before we define the notion of vanishing measure: Definition 5.5.For a predicate f : {0, 1} k → {0, 1}, let C( f ) be the convex polytope of first moments (biases) of distributions supported on satisfying assignments of f , i. e., and permutation π : S → S, let Λ S,π,b denote the induced measure on R S by considering vectors with coordinates where ζ ∼ Λ.
We recall below the definition of vanishing measure for LPs from [19] (see Definition 1.3) : Definition 5.6.A measure Λ on C( f ) is called vanishing (for LPs) if for every 1 ≤ t ≤ k, the following signed measure is identically 0. We say f has a vanishing measure if there exists a vanishing measure Λ on C( f ).
In particular, they prove the following theorem: Theorem 5.7.Let f : {0, 1} k → {0, 1} be a k-ary Boolean predicate that has a vanishing measure.Then for every ε > 0, there is a constant c ε > 0 such that for infinitely many N ∈ N, there exists an instance Φ of MAX k-CSP( f ) on N variables satisfying the following: • The optimum for the LP relaxation given by c ε • log log N levels of Sherali-Adams hierarchy has Combining this with our Theorem 1.1 already gives us the following stronger result: Corollary 5.8.Let f : {0, 1} k → {0, 1} be a k-ary Boolean predicate that has a vanishing measure.Then for every ε > 0, there is a constant c ε > 0 such that for infinitely may N ∈ N, there exists an instance Φ of MAX k-CSP( f ) on N variables satisfying the following: • All integral assignment of Φ satisfies at most ρ( f ) + ε fraction of constraints.
• The LP relaxation given by c ε • log N/ log log N levels of Sherali-Adams hierarchy has However, note that to apply Theorem 1.1, one only needs a gap for the basic LP, which is much weaker requirement than the O(log log N)-level gap given by Theorem 5.7.We observe below that the gap for the basic LP follows very simply from the construction by Khot et al. [19].One can then directly use this gap for applying Theorem 1.1 instead of going through Theorem 5.7.
Khot et al. [19] use the probabilistic construction given in Figure 4, for a given ε > 0. The construction actually requires Λ to be a vanishing measure over the polytope Let n 0 = 1 /ε .Partition the interval [0, 1] into n 0 + 1 disjoint intervals I 0 , I 1 , . . ., I n 0 where I 0 = {0} and I i = ((i − 1)/n 0 , i/n 0 ] for 1 ≤ i ≤ n 0 .For each interval I i , let X i be a collection of n variables (disjoint from all X j for j = i).
Generate m constraints independently according to the following procedure: • For each j ∈ [k], let i j be the index of the interval which contains |ζ ( j)|.Sample uniformly a variable y j from the set X j .
• Introduce the constraint f on the sampled k-tuple of literals.They show for a sufficiently large constant γ, an instance Φ with m = γ • n constraints satisfies with high probability, that for all assignments σ , |sat Φ (σ ) − ρ( f )| ≤ ε (see Lemma 4.4 in [19]).The proof is similar to that of of Lemma 5.1.THEORY OF COMPUTING, Volume 14 (10), 2018, pp.1-33 Additionally, we need the following claim from [19] (see Claim 4.7 there), which allows one to "round" coordinates of the vectors ζ ∈ C δ ( f ) to the end-points of the intervals I 0 , . . ., I n 0 .This ensures that any two variables in the same collection X i have the same bias.The proof of the claim follows simply from a hybrid argument.We include it in the appendix for completeness.Claim 5.9.Let ζ ∈ C δ ( f ) and let ν be the corresponding distribution supported in f −1 (1) such that for all i ∈ [k], we have Then there exists a distribution ν on {0, 1} k such that Proof.Let r j = sign(ζ j ) •t j be the desired bias of the j-th coordinate.Then, We construct a sequence of distributions ν 0 , . . ., ν k such that ν 0 = ν and ν k = ν .In ν j , the biases are (r 1 , . . ., r j , ζ j+1 , . . ., ζ k ).
The biases in ν 0 satisfy the above by definition.We obtain ν j from ν j−1 as, where D j is the distribution in which all bits, except for the j-th one, are sampled independently according to their biases in ν j−1 .For the j-th bit, we fix it to sign(r j − ζ j ) (if r j − ζ ( j) = 0, we can simply proceed with ν j = ν j−1 ).The biases for all except for the j-th bit are unchanged.For the j-th bit, the bias now becomes r j if Since ζ ∈ C δ ( f ), we know that sign(r j − ζ ( j)) − r j ≥ δ /2.Also, r j − ζ ( j) ≤ ε by assumption.Thus, we can choose τ j = O(ε/δ ) which gives that ν j − ν j−1 1 = O(ε/δ ).The final bound then follows by triangle inequality.
We can now use the above to give a simplified proof of Corollary 5.8.
Proof of Corollary 5.8.Here we exhibit a solution of the basic LP (Figure 2) for the instance given in Figure 4.For each variable y j coming from the set X j for j ∈ {0, 1, . . ., n 0 }, we fix the bias t j of the variable to be the rightmost point of the interval I j , i. e., fix x (y j ,−1) = (1 − j/n 0 ) /2 and x (y j ,1) = (1 + j/n 0 ) /2.
For each constraint C of the form f (y be the point used to generate it, and let ν(C) denote the corresponding distribution on {0, 1} k .By Claim 5.9, there exists a distribution ν (C) such that ν(C) − ν (C) 1 = O(k • ε/δ ) and such that the biases of the literals satisfy where t i j denotes the bias for the interval to which y i j belongs.When t i j = 0, we negate a variable only when sign(ζ j ) < 0. Thus, we have E α∼ν (C) (−1) α j +b j = t i j , THEORY OF COMPUTING, Volume 14 (10), 2018, pp.1-33 which is consistent with the bias given by the singleton variables x (y i j ,1) and x (y i j ,−1) .We thus define the local distribution on the set S C as Thus, we have for all C ∈ Φ, Taking δ = √ ε proves the claim.

Lower bounds for LP extended formulations
A connection between LP integrality gaps for the Sherali-Adams hierarchy, and lower bounds on the size of LP extended formulations, was first established by Chan et al. [7] and later improved by Kothari et al. [21].In [21], the authors proved the following: Theorem 5.10 ([21], Theorem1.2).There exist constants 0 < h < H such that the following holds.Consider a function g : N → N. Suppose that the g(n)-level Sherali-Adams relaxation for a CSP cannot achieve a (c, s)-approximation on instances on n variables.Then, no LP extended formulation (of the original LP) of size at most n h•g(n) can achieve a (c, s)-approximation for the CSP on n H variables.
Combining Theorem 1.1 with Theorem 5.10 yields (with g(N) := c ε • log N/ log log N) we get the following corollary.
Corollary 1.2.Let f : [q] → {0, 1} be any predicate.Let Φ 0 be a (c, s)-integrality gap instance for basic LP relaxation of MAX k-CSP ( f ).Then for every ε > 0, there exists c ε > 0 such that for infinitely many N ∈ N, there exist (c − ε, s + ε)-integrality gap instances of size N, for every linear extended formulation of size at most N c ε •log N/ log log N .

Conclusions and open problems
This work shows a dichotomy result for approximating CSPs using linear programs, proving that if a (c, s) approximation is not achievable using the basic LP, then for every ε > 0, (c − ε, s + ε) approximation is not achievable using O ε (log n/(log log n)) levels of the Sherali-Adams hierarchy.A natural open problem is to extend this result to n O(1) levels of the Sherali-Adams hierarchy.Using the results of [21], THEORY OF COMPUTING, Volume 14 (10), 2018, pp.this would also show that even exponential-size LP extended formulations are at most as strong as the basic LP.
As mentioned in the paper, the current limitation on the number of levels in our result, comes from the consistency requirement of the low-diameter decompositions.Given a d-dimensional 2 embedding of the restriction of the metric ρ H µ to a set of vertices S, the fraction of edges cut by the decomposition procedure of [8] grows as √ d.Our current proof only uses the trivial bound that when |S| = t, the metric admits a t-dimensional 2 embedding.Even though for a single set S, this bound can be improved to O(logt) at the cost of slight errors in the distances (using randomized dimension reduction), we do not know how to do this consistently across various sets S. In particular, since we want to always obtain low-diameter components, we may need to reject a small fraction of dimension reduction maps for S, if they shrink the distances too much.However, such maps may not necessarily be rejected when considering T ⊆ S, which can violate the consistency requirement P S|T = P T .Understanding how and when randomized dimension reduction can be combined with the consistency requirement is an intriguing question which may also be useful in other applications.
A Local 2 -embeddability of the Metric ρ H µ The goal of this section is to prove the following result about the local 2 -embeddability of the metric ρ H µ .
To prove the above theorem, we will use the local structure of random hypergraphs.We first prove that with high probability for random hypergraphs (sampled from H k (m, n, n 0 , Γ)) a few hyperedges can be removed to obtain a hypergraph whose girth is Ω(log n) and the degree is bounded.The following lemma shows a possible trade-off between the degree of the hypergraph vs the number of hyperedges required to be removed.
Lemma A.1.Let H ∼ H k (m, n, n 0 , Γ) be a random hypergraph with m = γ • n hyperedges for large enough γ.Then for any ε > 0, with probability 1 − ε, there exists a sub-hypergraph H with V (H) = V (H ) Proof.By linearity of expectation, the expected degree of any vertex v in H is at most k • γ.Let D = 100 • log (n 0 /ε) • k • γ, and let S be the set of all vertices u with deg H (u) > D. Let E S be the set of all hyperedges with at least one vertex in S. We shall take E(H) = E(H ) \ E S .Note that for any u ∈ V (H ), ) by a Chernoff-Hoeffding bound.We use this to bound the expected number of edges deleted.
The penultimate inequality uses the independence of the hyperedges in the generation process, which gives Thus, the number of edges deleted is at most ε • m with probability at least 1 − ε.
The following lemma shows that the expected number of small cycles in random hypergraph is small.Lemma A.2. Let H ∼ H k (m, n, n 0 , Γ) be a random hypergraph and for ≥ 2, let Z (H) denote the number of cycles of length at most in H.For m, n and k such that k 2 • (m/n) > 2, we have Proof.Given any pair (u , v ) of vertices of H , for u = v , the probability of the pair (u , v ) belonging together in some hyperedge of H is at most mk2 /n 2 -since each hyperedge e contains at most k vertices.Consider a given h-tuple u = (u i 1 , • • • u i h ) of variables.Note that we require that hyperedges participating in a cycle be distinct and we don't count hyperedges with one vertex in them while forming cycle.So, the probability that u is part of a cycle in H , i. e., there exists distinct hyperedges e j ∈ H for j ∈ [h] such that u i j , u i j+1 ∈ e j for j ∈ [h − 1], and u i 1 , u i h ∈ e h is at most mk 2 /n 2 h .As a result, the expected number of cycles of length h in H is bounded above by From the geometric form of the bound, it follows that expected number of cycles of length at most in H is at most Using the above lemma, it is easy to show that one can remove all small cycles in a random hypergraph by deleting only a small number of hyperedges.
Corollary A.3.Let H ∼ H k (m, n, n 0 , Γ) be a random hypergraph with m = γ • n for γ > 1.Then, there exists δ = δ (γ, k) > 0 such that with probability 1 − n −1/6 , all cycles of length at most δ • log n in H can be removed by deleting at most n 2/3 hyperedges.Proof.As above, let Z denote the number of cycles of length at most .With the choice of m, n, and k, we have . Thus, with probability 1 − n −1/6 , one can remove all cycles of length at most δ • log n by deleting at most n 2/3 edges.Charikar et al. [11] prove an analogue of Theorem 3.3 for metrics defined on locally-sparse graphs (see Definition A.5).In fact, they use a consequence of sparsity, which they call -path decomposability.To this end, we define the incidence graph 2 associated with a hypergraph, on which we will apply their result.
Definition A. 4. Let H = (V (H), E(H)) be a hypergraph.We define its incidence graph as the bipartite graph G H defined on vertex sets V (H) and E(H), and edge set E defined as . We now define local sparsity of the incidence graph.

Definition A.5 (Local Sparsity). A graph
, where E(S) denotes the set of edges contained in S.
We note that we will require the sparsity η to be O k,γ (1/ log n).This gives sparsity only for sublinearsize sets, as compared to sets of size Ω(n) in previous results where η is a constant.We also observe that if G 1 is (τ, η)-sparse and G 2 is a subgraph of G 1 then G 2 is also (τ, η)-sparse.Now we prove that the incidence graph of a hypergraph sampled from H k (m, n, n 0 , Γ) is locally sparse with high probability.The proof of the following lemma follows closely proofs of analogous statements in [1,6].
Lemma A.6.For every γ > 1 and m = γ • n • n 0 , let η < 1/4.Then there exists a constant Hence the probability that a set T of h vertices induces at least r = h/(1 − η) edges is at most .
We split the above sum in two parts depending on the range of h.Let us first consider .
The above property was also implicitly used by Arora et al. [3] in proving the following lemma (see Lemma 2.12 in [3]).
Lemma A.7.Let > 0 be an integer and 0 < η < 1/(3 − 1) < 1.Let G be a η-sparse graph with girth g > .Then G is -path decomposable.We note that if H is a hypergraph and G H is its incidence graph, then the metrics d G H µ and ρ G H µ restricted to V (H), coincide with the metrics d H µ and ρ H µ defined on H. Charikar et al. proved the following theorem (see Theorem 5.2) in [9].
Theorem A.8 ([9]).Let G be a graph on n vertices with maximum degree D. Let t < √ n and > 0 be such that for t = D +1 • t, every subgraph of G on at most t vertices is -path decomposable.Also, let µ, t and satisfy the relation (1 − µ) /9 ≤ µ/(2t + 2).Then for every subset S of at most t vertices there exists a mapping ψ S from S to the unit sphere in 2 such that all u, v ∈ S ψ S (u) − ψ S (v) 2 = ρ G µ (u, v) .
We use this theorem to prove the main theorem of the section.By Lemma A.6, there exists η = Ω n 0 ,k,γ (1/(log n)) such that for all large enough n, G H is (τ, η)sparse with probability at least 1 − ε/4, for τ ≥ n −1/4 .
Hence with probability 1 − ε, we have that H = (V (H ), E(H 1 ) ∩ E(H 2 )) satisfies: • The maximum degree of H is bounded above by D.
• Girth of H is at least g > δ • log n.
We now show that the metric ρ H µ is locally 2 embeddable.For ease of notation, let us denote G H , incidence graph for the hypergraph H, as G.Note that N ≤ |V(G)| ≤ N • (1 + γ) and the maximum degree of G is also bounded by D. Since a cycle in G is also a cycle in H, the girth of G is also at least g ≥ δ • log n.
We can now apply Theorem A.8 to construct the embedding.Given any subset S of V(H) of size at most t ≤ n θ , note that S is also a subset of V(G).Moreover, we have t ≤ n θ ≤ (N + m) 1/2 .Also, we have t • D +1 ≤ n 1/2 • n 1/6 = n 2/3 ≤ τ • (N + m).Thus, any subgraph of G on t • D +1 vertices is -path decomposable.
Thus, when µ ≥ µ 0 , by Theorem A.8 there exists a mapping ψ S from S to the unit sphere, such that for all u, v ∈ S, we have ψ S (u) − ψ S (v) 2 = ρ G µ (u, v) = ρ H µ (u, v) , where the last equality uses the fact that for all u, v ∈ V(H), MADHUR TULSIANI is an assistant professor at TTI-Chicago, interested in various aspects of approximability and pseudorandomness.Madhur went to college at IIT Kanpur and spent some wonderful years at (the coffee shops around) UC Berkeley, while working on his Ph.D. with Luca Trevisan.Madhur enjoys biking, running, and aspires to one day learn some music (though it's perhaps better for his neighbors that he hasn't).

Figure 3 :
Figure 3: Construction of the gap instance Φ.

Lemma 5 . 3 .
Let H = (V, E) be a hypergraph.Let U ⊆ V be such that the set of hyperedges E(cl(U)) form a tree.For each e ∈ E(cl(U)), let D e be a distribution on [q] e such that for any u ∈ U and e 1 , e 2 ∈ E(cl(U)) such that e 1 ∩ e 2 = {u}, we have D e 1 |u = D e 2 |u = D u .Then, -there exists a distribution D U on [q] U such that D U|e∩U = D e|e∩U for all e ∈ E(U); -if U ⊆ U is such that the hyperedges in E(cl(U )) form a subtree of E(cl(U)), then D U|U = D U .
http://ttic.uchicago.edu/~madhurtABOUT THE AUTHORS MRINALKANTI GHOSH (called "Mrinal" by friends and colleagues) is a 5-th year graduate student at TTI-C working with Madhur Tulsiani.Before joining Ph.D. program, he completed his Masters at IIT Kanpur where he was working on topics in the intersection of Ergodic Theory and Computability Theory.Later, Mrinal switched to the more practical field of Computational Complexity Theory.Currently, while he is taking a break from his busy TV watching schedule, he tries to think about approximation algorithms.
since the each vertex in an included path is within distance at most ∆ H /2 of an end-point, and U i has diameter at most ∆ H , we know that the diameter of C (U i ) is at most 2 • ∆ H . Hence, C (U i ) is a tree.Finally, we must have cl(C (U i )) = C (U i ) any additional path of length 1 would create a cycle of length at most 2 • ∆ H + 1.
Thus, by Lemma 5.2 and Lemma 5.3 (with probability at least 1 − ε/4) there exists a distributionD C(U i ) for each U i , satisfying D C(U i )|e = D e Let the vertices of H correspond to the set [n 0 ] × [n].Suppose we contract the set [n 0 ] × { j} of vertices into a single vertex j ∈ [n] to get a random multi-hypergraph H on vertex set [n].An equivalent way to view the sampling to H is: for each i ∈ [m], the i-th hyperedge e i of H is sampled by independently sampling k e i vertices (with replacement) uniformly at random from [n].Note that the sampling of H is independent of Γ in the definition of H k (m, n, n 0 , Γ).Clearly, a cycle of length at most in H produces a cycle of length at most in H . Hence, it suffices to bound the expected number of cycles in H For a randomly generated hypergraph H ∼ H k (m, n, n 0 , Γ), we can think of the incidence graph as a randomly generated bipartite graph on the vertex sets [m] and V (H).We denote second vertex set by [m] instead of E(H) since the set E(H) is only defined after sampling H.For each i ∈ [m], we randomly sample a type from Γ, and sample at most k neighbors according to the type.Conditioned on a fixed type, we have that for any v ∈ V (H),P [(v, i) ∈ E] ≤ 1/n.Let T be a set of vertices of G H and let v, i ∈ T be fixed.Since vertices in each bucket are chosen independently and choices for different indices i ∈ [m] are made independently, we have that the probability that