Small Extended Formulation for Knapsack Cover Inequalities from Monotone Circuits

Initially developed for the min-knapsack problem, the knapsack cover inequalities are used in the current best relaxations for numerous combinatorial optimization problems of covering type. In spite of their widespread use, these inequalities yield linear programming (LP) relaxations of exponential size, over which it is not known how to optimize exactly in polynomial time. In this paper we address this issue and obtain LP relaxations of quasi-polynomial size that are at least as strong as that given by the knapsack cover inequalities. For the min-knapsack cover problem, our main result can be stated formally as follows: for any $\varepsilon>0$, there is a $(1/\varepsilon)^{O(1)}n^{O(\log n)}$-size LP relaxation with an integrality gap of at most $2+\varepsilon$, where $n$ is the number of items. Prior to this work, there was no known relaxation of subexponential size with a constant upper bound on the integrality gap. Our construction is inspired by a connection between extended formulations and monotone circuit complexity via Karchmer-Wigderson games. In particular, our LP is based on $O(\log^2 n)$-depth monotone circuits with fan-in~$2$ for evaluating weighted threshold functions with $n$ inputs, as constructed by Beimel and Weinreb. We believe that a further understanding of this connection may lead to more positive results complementing the numerous lower bounds recently proved for extended formulations.


Introduction
Capacitated covering problems 1 play a central role in combinatorial optimization. These are the problems modeled by Integer Programs (IPs) of the form min{ n i=1 c i x i | Ax b, x ∈ {0, 1} n }, where A is a size-m × n nonnegative matrix and b, c size-n nonnegative vectors. The min-knapsack problem is the special case arising when there is a single covering constraint, that is, when m = 1. This is arguably the simplest interesting capacitated covering problem.
In terms of complexity, the min-knapsack problem is well-understood: on the one hand it is weakly NP-hard [26] and on the other hand it admits an FPTAS [27,32]. However, for its own sake and since it appears as a key substructure of numerous other IPs, improving our polyhedral understanding of the problem is important. By this, we mean finding "good" linear programming (LP) relaxations for the min-knapsack problem. Indeed, the polyhedral study of this problem has led to the development of important tools, such as the knapsack cover inequalities, for the strengthening of LP relaxations. These inequalities and generalizations thereof are now used in the current best known relaxations for several combinatorial optimization problems, such as single-machine scheduling [5] and capacitated facility location [1]. However, despite this important progress in the past, many fundamental questions remain open even in the most basic setting.
State of the Art. The feasible region of a min-knapsack instance is specified by positive item sizes s 1 , . . . , s n and a positive demand D. In this context, a vector x ∈ {0, 1} n is feasible if To specify completely an instance of the min-knapsack problem, we are further given nonnegative item costs c 1 , . . . , c n . Solving the resulting instance then amounts to solving the 1] n }, provides an estimate on the optimum value that can be quite bad. More precisely, defining the integrality gap as the supremum over all instances of the ratio of the optimum value of the IP to the optimum value of the LP relaxation, it is easy to see that the integrality gap is unbounded.
Several inequalities have been proposed for strengthening this basic LP relaxation. Already in the 70's, Balas [2], Hammer, Johnson and Peled [23] and Wolsey [35] independently proposed to add the uncapacitated knapsack cover inequalities: for every subset A ⊆ [n] of the items such that i∈A s i < D, add the inequality i ∈A x i 1 (saying that at least one item in [n] \ A needs to be picked in order to satisfy the demand). Unfortunately, these (exponentially many) inequalities are not sufficient for bringing down the integrality gap to a constant. A strengthening of these inequalities was therefore proposed more recently by Carr, Fleischer, Leung and Philipps [13]. They defined the following valid inequalities: for every set of items A ⊆ [n] := {1, . . . , n} such that i∈A s i < D, there is a corresponding (capacitated) knapsack cover inequality where U = U (A) := D − i∈A s i is the residual demand and s ′ i = s ′ i (A) := min{s i , U }. The validity of (1) is due to the fact that every feasible solution x ∈ {0, 1} n has to contain some object i / ∈ A. This object can be large, that is, have s i U , and in this case the inequality is clearly satisfied. Otherwise, in case every object i / ∈ A is small, the total size of the objects i / ∈ A picked by x has to be at least the residual demand U .
Carr et al. [13] proved that whenever x ∈ R n 0 satisfies all knapsack cover inequalities, 2x dominates a convex combination of feasible solutions, that is, there exist feasible solutions x (j) ∈ {0, 1} n (j ∈ [q]) and coefficients λ j 0 summing up to 1 such that 2x q j=1 λ j x (j) . Given any nonnegative item costs, one of the x (j) will have a cost that is at most 2 times that of x. This implies that the integrality gap of the corresponding LP relaxation is at most 2.
The LP relaxation defined by the knapsack cover inequalities is "good" in the sense that it has a constant integrality gap. However, it has exponential size, that is, exponentially many inequalities, over which it is not known how to optimize exactly in polynomial time; in particular, it is not known how to employ the Ellipsoid algorithm because the problem of separating the knapsack cover inequalities reduces to another knapsack problem (which is NP-hard in general).
In contrast, for the max-knapsack problem, Bienstock [9] proved that for all ε > 0 there exists a size-n O(1/ε) LP relaxation whose integrality gap 2 is at most 1 + ε. That LP is defined by an extended formulation that uses n O(1/ε) extra variables besides the x-variables. We remark that it is a notorious open problem to prove or disprove the existence of a f (1/ε) · n O(1) -size LP relaxation for max-knapsack with integrality gap at most 1 + ε, see e.g. the survey on extended formulations by Conforti, Cornuéjols and Zambelli [18]. Coming back to the min-knapsack problem, it is not known whether there exists a polynomial-size LP relaxation with constant integrality gap or not. 3 Main Result. We come close to resolving the question and show that min-knapsack admits a quasi-polynomial-size LP relaxation with integrality gap at most 2 + ε. The upper bound on the integrality gap originates from the fact that our LP relaxation is at least as strong as that provided by a slightly weakened form of the knapsack cover inequalities. We point out that, under some conditions, we can bound the size of our relaxation by a polynomial, see Section 3.2. A more precise statement of our main result is as follows.
As the result is obtained by giving quasi-polynomially many inequalities of roughly the same strength as the exponentially many knapsack cover inequalities, our techniques also lead to relaxations of quasi-polynomial size for the numerous applications of these inequalities. We mention some of these applications below when we discuss related works.
Beyond the result itself, the novelty of our approach lies in the concepts we rely on and the techniques we develop. Our starting point is a connection between monotone circuits and extended formulations that we explain below. This connection was instrumental in the recent lower bounds of Göös, Jain and Watson on the extension complexity of independent set polytopes [22], and can be traced back to a paper of Hrubeš [24]. Here we use it for the first time to prove an upper bound.
From Monotone Circuits to Extended Formulations. Each choice of item sizes and demand gives rise to a weighted threshold function f : {0, 1} n → {0, 1} defined as Since we assume that the item sizes and demand are nonnegative, f is monotone in the sense that Clearly, we have that x ∈ {0, 1} n is feasible if and only if x ∈ f −1 (1). Furthermore, for a ∈ f −1 (0), we can rewrite the uncapacitated knapsack cover inequalities as i: (1). By Yannakakis' factorization theorem [36], the existence of a size-r LP relaxation of min-knapsack that is at least as strong as that given by the uncapacitated knapsack cover inequalities is equivalent to the existence of a decomposition of the slack matrix S as a sum of r nonnegative rank-1 matrices. Now suppose that there exists a depth-t monotone circuit (that is, using only AND gates and OR gates) of fan-in 2 for computing f (x). A result of Karchmer and Wigderson [25] then implies a partition of the entries of S into at most 2 t rectangles 4 R ⊆ f −1 (0) × f −1 (1) such that in each one of these rectangles R, there exists some index i * = i * (R) such that a i * = 0 and b i * = 1 for all (a, b) ∈ R. Then we may write, for (a, b) ∈ R, so that S restricted to the entries of R can be expressed as a sum of at most n − 1 nonnegative rank-1 matrices of the form This implies a decomposition of the whole slack matrix S as a sum of at most 2 t (n − 1) nonnegative rank-1 matrices, and thus the existence of a 2 t (n − 1)-size LP relaxation of min-knapsack that captures the uncapacitated knapsack cover inequalities. Since f is a weighted threshold function, we can take t = O(log 2 n), as proved by Beimel and Weinreb [8]. Therefore, we obtain a n O(log n) -size extended formulation for the uncapacitated knapsack cover inequalities. Unfortunately, these inequalities do not suffice to guarantee a bounded integrality gap. For the full-fledged knapsack cover inequalities (1), the simple idea described above breaks down. If the special index i * = i * (R) for some rectangle R corresponds to a large object, we can write depends on a only. However, i * may correspond to a small object, in which case we cannot decompose the slack matrix as above.
Nevertheless, we prove that it is possible to overcome this difficulty. Two key ideas we use to achieve this are to discretize some of the quantities (which explains why we lose an ε in the integrality gap) and resort to several weighted threshold functions instead of just one. If all these functions admit O(log n)-depth monotone circuits of fan-in 2, then we obtain a size-n O(1) LP relaxation.
Related Works. Knapsack cover inequalities and their generalizations such as flow cover inequalities were used as a systematic way to strengthen LP formulations of other (seemingly unrelated) problems [13,12,29,3,4,14,5,17,19]. By strengthening we mean that one would start with a polynomial size LP formulation with a potentially unbounded integrality gap for some problem of interest, and then show that adding (adaptations) of knapsack cover inequalities reduces this integrality gap (we illustrate in Section 4 how this strengthening works for the Single Demand Facility Location problem, reducing the integrality gap down to 2). However, similar to the case of min-knapsack discussed above, the drawback of this approach is that the size of the resulting LP formulation becomes exponential. We can extend our result to show that it yields quasi-polynomial size LP formulation for many such applications. To name a few: • Carr et al. [13] applied these inequalities to the Generalized Vertex Cover problem, Multicolor Network Design problem and the Fixed Charge Flow problem, and showed how these inequalities reduce the integrality gap of the starting LP formulations.
• Bansal and Pruhs [5] studied the Generalized Scheduling Problem (GSP) that captures many interesting scheduling problems such as Weighted Flow Time, Flow Time Squared and Weighted Tardiness. In particular, they showed a connection between GSP and a certain geometric covering problem, and designed an LP based approximation algorithm for the later that yields an approximate solution for the GSP. The LP formulation that they use for the intermediate geometric cover problem is strengthened using knapsack cover inequalities, and yields an O(log log nP )-approximation for the GSW where n is the number of jobs, and P is the maximum job size. In the special case of identical release time of the jobs, their LP formulation yields a 16-approximation algorithm. This constant factor approximation was later improved by Cheung and Shmoys [17] and Mestre and Verschae [30] to a (4 + ε)approximation, where the authors added the knapsack cover inequalities directly to the LP formulation of the scheduling problem, i.e., without resorting to the intermediate geometric cover problem as in [5]. For both the GSP and its special case, our method yields an LP formulation whose size is quasi-polynomial in n, and polynomial in both log P and log W , where W is the maximum increase in the cost function of a job at any point in time.
• Efsandiari et al. [19] used a knapsack-cover-strengthened LP formulation to design an O(log k)approximation algorithm for Precedence-Constrained Single-Machine Deadline scheduling problem, where k is the number of distinct deadlines.
• Carnes and Shmoys [12] designed primal-dual algorithms for the Single-Demand Facility Location, where the primal LP formulation is strengthened by adding (generalizations) of knapsack cover inequalities.
Extended formulations have received a considerable amount of attention recently, mostly for proving impossibility results. Pokutta and Van Vyve [33] proved a worst-case 2 Ω( √ n) size lower bound for extended formulations of the max-knapsack polytope, which directly implies a similar result for the min-knapsack polytope. Other recent works include [21,11,15,34,28,6].
Outline. We prove our main result in Section 3, after giving preliminaries in Section 2. Instead of explicitly constructing our extended formulation, we provide a nonnegative factorization of the appropriate slack matrix. For this, we use the language of communication complexity -we give an O(log 2 n + log(1/ε))-complexity two-party communication protocol with private randomness and nonnegative outputs whose expected output is the slack of a given feasible solution with respect to a given (weakened) knapsack cover inequality.
Next, in Section 4, we extend our communication protocol to the flow cover inequalities for the Single-Demand Facility Location problem, and show how to approximate the exponentially many flow cover inequalities using a smaller LP formulation.
Finally, in Section 5, we show that although we do not know how to write down our extended formulation for min-knapsack in quasi-polynomial time, we can at least compute a (2 + ε)approximation of the optimum from the extended formulation in quasi-polynomial time, given any cost vector, without relying on the ellipsoid algorithm. This is done via a new cutting-plane algorithm that might be of independent interest.

Preliminaries.
In this section, we introduce some key notions related to our problem. We review extended formulations and extension complexity of pairs of polyhedra in Section 2.1. Next, we define randomized communication protocols with non-negative outputs that compute entries of matrices in expectation. Finally, in Section 2.3, we review some constructions of low-depth monotone circuits, and the Karchmer-Wigderson game that relates circuit complexity and communication complexity.

Polyhedral Pairs, Extended Formulations and Slack Matrices.
Let P ⊆ R n be a polytope and Q ⊆ R n be a polyhedron containing P . The complexity of the polyhedral pair (P, Q) can be measured by its extension complexity, which roughly measures how compactly we can represent a relaxation of P contained in Q. The formal definition is as follows.
Definition 1. Given a polyhedral pair (P, Q) where P ⊆ Q ⊆ R n , we say that a system E x + F y g , E = x + F = y = g = in R n+k is an extended formulation of (P, Q) if the polyhedron R := {x ∈ R n | ∃y ∈ R k : E x + F y g , E = x + F = y = g = } contains P and is contained in Q. The size of the extended formulation is the number of inequalities in the system. The extension complexity of (P, Q), denoted by xc(P, Q), is the minimum size of an extended formulation of (P, Q).
Although the case P = Q is probably the most frequent, we will need polyhedral pairs here. In a seminal paper, Yannakakis [36] showed that one can study the extension complexity of a polytope P through the non-negative rank of a matrix associated with P , namely, its slack matrix.
be an inner description of P and Q = {x ∈ R n | Ax b} be an outer description of Q, where A ∈ R m×n and b ∈ R m . We now define the slack matrix S of the pair (P, Q) with respect to the given representations of P and Q. The ith row of S corresponds to the constraint Note that the slack matrix is not unique as it depends on the choices of points v 1 , . . . , v p and linear description Ax b.
Definition 3. Given a non-negative matrix M ∈ R m×n 0 , we say that a pair of matrices T, U is a rank-r non-negative factorization of M if T ∈ R m×r 0 , U ∈ R r×n 0 , and M = T U . We define the non-negative rank of M as rk + (M ) := min{r : M has a rank-r non-negative factorization}. Notice that a non-negative factorization of M of rank at most r is equivalent to a decomposition of M as a sum of at most r non-negative rank-1 matrices.
Yannakakis [36] proved that for a polytope P of dimension at least 1 and any of its slack matrices S, the extension complexity of P is equal to the non-negative rank of S. Namely, xc(P ) = rk + (S). In particular, all the slack matrices of P have the same nonnegative rank.

Randomized Communication Protocols.
We now define a certain two-party communication problem and relate it to the non-negative rank discussed earlier, following the framework in Faenza, Fiorini, Grappe and Tiwary [20].
be a non-negative matrix whose rows and columns are indexed by A and B, respectively. Let Π be a communication protocol with private randomness between two players Alice and Bob. Alice gets an input a ∈ A and Bob gets an input b ∈ B. They exchange bits in a pre-specified way according to Π, and at the end either one of the players outputs some non-negative number ξ ∈ R 0 . We say that Π computes S in expectation if for every a and b, the expectation of the output ξ equals S a,b .
The Faenza et al. [20] relate the non-negative rank of a non-negative matrix S, to the communication complexity R cc exp (S). In particular, they prove that if rk + (S) = 0, then R cc exp (S) = log 2 rk + (S) + Θ(1). Combining this with the factorization theorem, we get R cc exp (S) = log 2 xc(P, Q) + Θ(1) whenever (P, Q) is a polyhedral pair with slack matrix S, provided that xc(P, Q) = 0.

Weighted Threshold Functions and Karchmer-Widgerson Game.
An important part of our protocol depends on the communication complexity of (monotone) weighted threshold functions. We start with the following result from [7,8] which gives lowdepth circuits for such functions. Another construction was given in [16]. The circuits as stated in [7,8,16] have logarithmic depth, polynomial size and unbounded fan-in, thus it is straightforward to convert them into circuits with fan-in 2 with a logarithmic increase in depth. Below we state the result for circuits of fan-in 2 as will be used later. Recall that a circuit is monotone if it uses only AND and OR gates, but no NOT gates.

Small LP relaxation for Min-Knapsack.
In this section, we show the existence of a (1/ε) O(1) n O(log n) -size LP relaxation of min-knapsack with integrality gap 2 + ε, proving Theorem 1. First, we give a high-level overview of the construction in Section 3.1. The actual protocol is described and analyzed in Section 3.2.

Overview.
Consider the slack matrix S that has one row for each knapsack cover inequality and one column for each feasible solution of min-knapsack. More precisely, let f : {0, 1} n → {0, 1} denote the weighted threshold function defined by the item sizes s i (i ∈ [n]) and demand D as in (2). The rows and columns of S are indexed by a ∈ f −1 (0) and b ∈ f −1 (1) respectively. The entries of S are given by Geometrically, S is the slack matrix of the polyhedral pair (P, Q) in which P is the min-knapsack polytope and Q is the (unbounded) polyhedron defined by the knapsack cover inequalities.
Ideally, we would like to design a communication protocol for S, as those discussed in Section 2.2, with low communication complexity. This would imply a low-rank non-negative factorization of S. From the factorization theorem of Section 2.1, it would follow that there exists a small-size extended formulation yielding a polyhedron R containing the min-knapsack polytope P and contained in the knapsack-cover relaxation Q. Hence, we would get a small-size LP relaxation for min-knapsack that implies the exponentially many knapsack cover inequalities, and thus have integrality gap at most 2.
However, due to the fact that the quantities involved can be exponential in n, making them too expensive to communicate directly, we have to settle for showing the existence of small-size extended formulation that approximately implies the knapsack cover inequalities. Before discussing further these complications, we give an idealized version of the protocol to help with the intuition. Assume for now that all item sizes and the demand are polynomial in n. Thus Alice and Bob can communicate them with O(log n) bits.
The goal of the two players is to compute the slack when Alice is given an infeasible a ∈ {0, 1} n and Bob is given a feasible b ∈ {0, 1} n . That is, after several rounds of communication, either one of them outputs some non-negative value ξ, such that the expectation of ξ equals S a,b .
We define for a set of items J ⊆ [n] the quantity s(J) := j∈J s j , and s ′ (J) := j∈J s ′ j . Let A and B be the subsets of [n] corresponding to Alice's input a and Bob's input b, respectively. The slack we want to compute thus becomes s ′ (B A) − U .
At the beginning, Alice computes the residual demand U and sends it to Bob. Now observe that if there is some i * ∈ B A, such that s i * U , then we have s ′ i * = U , and we can easily write the slack as similarly to the uncapacitated case discussed in the introduction). Recall that we call an item i large if s i U and small otherwise. Let I large be the set of large items and I small be the set of small items. The rest of the protocol is divided into two cases as follows, depending on whether Alice and Bob can easily find a large item i * ∈ B A. To this end, Alice sends s(I large ∩ A) to Bob. Note that now Bob can compute s(I small ∩ A) = D − U − s(I large ∩ A). Bob computes the contribution of large items in B, that is, s(I large ∩ B).
If s(I large ∩ B) > s(I large ∩ A), then we are guaranteed that there is some i * ∈ I large ∩ (B A). Moreover, defining the threshold function We now write the slack as Alice and Bob can compute the first and the last term in expectation using a protocol similar to that in the previous case. The term in the middle can be computed by Bob with all the information he has at this stage. To conclude, in both cases, Alice and Bob can compute the exact slack S a,b with O(log 2 n) bits of communication.

The Protocol.
The actual slack matrix S ε we work with is defined as where ε > 0 is any small constant, a ∈ f −1 (0) and b ∈ f −1 (1). S ε is the slack matrix of the polyhedral pair (P, Q ε ) where P is the min-knapsack polytope and Q ε is the polyhedron defined by a slight weakening of the knapsack cover inequalities obtained by replacing the right hand side of (1) by 2 2+ε U < U . For every x ∈ R n 0 that satisfies all weakened knapsack cover inequalities, we have that 2+ε 2 x satisfies all original knapsack cover inequalities, and thus (2 + ε)x dominates a convex combination of feasible solutions. Therefore the integrality gap of the resulting LP relaxation (obtained from a non-negative factorization of S ε ) is at most 2 + ε.
In order to refer to the "derived" weighted threshold functions g as in (4), we need a last bit of terminology. We say that g : We are now ready to state our main technical lemma.
Lemma 4. For all constants ε ∈ (0, 1), item sizes s i ∈ Z >0 (i ∈ [n]), all smaller than 2 ⌈n log n⌉ and demand D ∈ Z >0 with max{s i | i ∈ [n]} D n i=1 s i , such that all truncations of the corresponding weighted threshold function admit depth-t monotone circuits of fan-in 2, there is a O(log(1/ε) + log n + t)-complexity randomized communication protocol with non-negative outputs that computes the slack matrix S ε in expectation. Since we may always take t = O(log 2 n), this gives a O(log(1/ε) + log 2 n)-complexity protocol, unconditionally.
Before giving the proof, let us remark that Theorem 1 follows directly from this lemma. Indeed, the extra assumptions in the lemma are without loss of generality: the fact that we may assume without loss of generality that the item sizes s i are positive integers that can be written down with at most ⌈n log n⌉ bits, is due to a classic result from [31]; and the fact that we may also assume that the demand D is a positive integer with max{s i | i ∈ [n]} D Then, Alice sends Bob the unique nonnegative integer k such that (1+δ) k U < (1+δ) k+1 . This sets the scale at which the protocol is operating. Since U n·2 ⌈n log n⌉ 2 n 2 , we have (1+δ) k 2 n 2 . This implies that k = O((1/ε)n 2 ), thus k can be sent to Bob with log(1/ε) To efficiently communicate an approximate value of s(I large ∩ A), Alice sends the unique nonnegative integer ℓ such that Since small items have size at most U and we have at most n of them, we have s(I small ∩ A) U n. Hence, D − s(I large ∩ A) = U + s(I small ∩ A) (n + 1)U (n + 1)(1 + δ) U . Since (1 + ℓδ) U < (n + 1)(1 + δ) U , we have ℓ = O((1/ε)n). This means that Alice can communicate ℓ to Bob with only O(log(1/ε) + log n) bits. Let∆ =∆(δ) := (1 + ℓδ) U . This is Bob's strict under-approximation of D − s(I large ∩ A), so that D −∆ is a strict over-approximation of s(I large ∩ A). Bob checks if s(I large ∩ B) D −∆. If this is the case, then the weighted threshold function g such that g(x) = 1 iff i∈I large s i x i D −∆ separates a from b in the sense that g(a) = 0 and g(b) = 1. Since g is a truncation of f , Alice and Bob can exchange t bits to find an index i * ∈ I large such that a i * = 0 and b i * = 1.
We can rewrite the slack S ε a,b = s ′ (B A) − αU as With the knowledge of i * , Alice and Bob can compute the slack as follows: 1. Alice samples a uniformly random number i ∈ [n]. If i / ∈ A, continue to the next step, otherwise Alice outputs 0 and terminates the communication.
where σ is the unique integer multiple of δ U such that Since σ s(I small ∩ A) U n (1 + δ) U n, Alice can communicate σ to Bob with O(log(1/ε) + log n) bits.
This implies Recall that by definition of U , we have (1 + δ) U > U , therefore We now rewrite the slack as (8) and (9) .
Similar to the previous case, we design a protocol to compute the slack as follows: 1. Alice samples a uniformly random number i ∈ [n + 2]. If i = n + 2, Alice outputs the normalized value of the last term, i.e., (n + 2) · ( σ − s(I small ∩ A) + (1 − δ) U − αU ), and terminates the communication. Otherwise, she sends i to Bob using O(log n) bits.
Otherwise, he replies to Alice with b i .
3. If i ∈ I large A, Alice outputs (n + 2) · s ′ i b i ; if i ∈ I small ∩ A, she outputs (n + 2) · s i (1 − b i ); otherwise she outputs 0.
We can verify that the outputs of both players can be computed with information available to them, and that the outputs are non-negative due to Equation (7), (8) and (9), and the definition of the variables.

Flow-cover inequalities.
A variant of the knapsack cover inequalities, known as the flow cover inequalities, was also used to strengthen LPs for many problems such as the Fixed Charge Network Flow problem [13] and the Single-Demand Facility Location problem [12]. In this section, we describe the application of flow cover inequalities to the Single-Demand Facility Location problem as used in [12], and then give an O(log 2 n)-bit two-party communication protocol that computes a weakened version of these inequalities.
In the Single-Demand Facility Location problem, we are given a set F of n facilities, such that each facility i ∈ F has a capacity s i , an opening cost f i , and a per-unit cost c i to serve the demand. The goal is to serve the demand D by opening a subset S ⊆ F of facilities such that the combined cost of opening these facilities and serving the demand is minimized. The authors of [12] cast this problem as an Integer Program, and showed that its natural LP relaxation has an unbounded integrality gap. To reduce this gap, they strengthened the relaxation by adding the so-called flow cover inequalities that we define shortly (See Section 3 in [12] for a more elaborate discussion).
A feasible solution (x, y) with y ∈ {0, 1} n and x ∈ [0, 1] n for the Single-Demand Facility Location LP can be thought of as follows: for each i ∈ F , y i ∈ {0, 1} indicates if the i-th facility is open, and x i ∈ [0, 1] indicates the fraction of the demand D being served by the i-th facility. A feasible solution (x, y) must then satisfy that 1. The demand is met, i.e., i x i = 1.
2. No facility is supplying more than its capacity, i.e., 0 x i D y i s i for all i ∈ F . For a subset J ⊆ F of facilities and a feasible solution (x, y), we denote by B = {i ∈ F : y i = 1} ⊆ [F ] the set of open facilities according to y, and we define the quantity x(J) to be the overall demand served by the facilities in J, i.e., x(J) = i∈J x i D. 6 We also define the quantities s(·) and s ′ (·) as in Section 3.1.
Carnes and Shmoys [12] showed that adding the flow cover inequalities (FCI) reduces the integrality gap of the natural LP relaxation down to 2. These inequalities are defined as follows: for any infeasible set A ⊆ F (i.e., A ⊆ F such that s(A) < D), and for all partitions of F \ A = F 1 ⊔ F 2 , the following inequality holds for all feasible solutions (x, y): where U = D − s(A) is the residual demand and s ′ i = min{s i , U }. For brevity, we refer to an infeasible set A along with some partition F 1 ⊔ F 2 = F \ A as an infeasible tuple (A, F 1 , F 2 ). Note that for F 2 = ∅, the flow-cover inequalities are the same as the knapsack cover inequalities.
Similar to the knapsack cover inequalities, the goal is to compute the slack of a relaxed version of (FCI) in expectation for any feasible solution (x, y) and any infeasible tuple (A, F 1 , F 2 ). Namely, for any ε ∈ (0, 1), let α = 2/(2+ε), then our goal is to design an O(log 2 n+log(1/ε))-complexity twoparty communication protocol with private randomness and nonnegative outputs whose expected output equals s ′ (F 1 ∩ B) + x(F 2 ∩ B) − αU . That is, we want to compute the slack with respect to a given (weakened) flow-cover inequality s ′ (F 1 ∩ B) + x(F 2 ∩ B) αU , where the RHS of (FCI) is replaced by αU . This implies the existence of an LP of size (1/ε) O(1) n O(log n) with an integrality at most 2 + ε for the Single-Demand Facility Location problem.
In Section 4.1, we set up the notation and define a class of feasible solutions with a certain special structure which we refer to as canonical feasible solutions. We design the promised communication protocol restricted to canonical solutions in Section 4.2, and extend it to arbitrary feasible solutions in Section 4.3.

Preliminaries.
Let (x, y) be a feasible solution for the flow-cover problem with demand D, and let B = {i ∈ F : y i = 1} denote the support of y. In this terminology, B only indicates which facilities are open, but it does not capture the relative demand being served through each of them. However this distinction will be essential for designing the protocol, hence we partition B into three disjoint sets B = F 1 ⊔ F 2 ⊔ F 3 as follows: We first focus on feasible solutions (x, y) that exhibit a certain structure, and then generalize to arbitrary solutions. Namely, we restrict our attention here and in Section 4.2 to canonical feasible solutions defined as follows: In other words, in a canonical feasible solution, there is at most one facility j that supplies a non-zero demand x j D > 0 which is not equal to its full capacity s j .
Recall that we are interested in computing in expectation, which can be expanded as follows: We get from the definition of the set F 3 that the second to last term in the above equation is 0 when restricted to canonical feasible solutions. In fact, one can completely get rid of the overall contribution of F 3 in the above equation, since intuitively, closing down the facilities in F 3 should not alter the feasibility of the solution, and hence Equation (11) should still be positive even without accounting for the contribution of s ′ (F 1 ∩ F 3 ). In the communication protocol setting, this intuition translates to designing a protocol that only deals with canonical feasible solutions restricted to To see that this is without loss of generality, consider a canonical feasible solution (x, y) such that F 3 is not empty, and let (x,ȳ) be the projection of (x, y) on F 1 ∪ F 2 -that is, for all i ∈ B \ F 3 , setȳ i = y i , and for all i ∈ F 3 , setȳ i = 0. It follows that (x,ȳ) is also a canonical feasible solution, as the items whose support is F 3 do not contribute to the feasibility of the solution, and the cardinality of F 2 does not change. Thus, for any infeasible tuple (A, F 1 , F 2 ), Equation (11) applied to (x,ȳ) can be written as which is also non-negative, as it is the slack of (x,ȳ) and (A, F 1 , F 2 ). Therefore, for any feasible solution (x, y), the slack as given by Equation (11) can be viewed as the summation of Equation (12) and the non-negative term s ′ (F 1 ∩ F 3 ). The latter is easy to compute with a small communication protocol 7 , thus if Alice and Bob can devise a communication protocol Π that computes (12) in expectation, they can then easily compute (11) in expectation. For example, Alice can generate a uniformly random bit b ∈ {0, 1}, and • if b = 0, then Alice and Bob run the protocol that computes s ′ (F 1 ∩ F 3 ), and return twice its output.
• if b = 1, then Alice and Bob run the protocol Π that computes (12), and return twice its output.
Moreover, since | F 2 | 1, and using the fact that x i D = s i y i for i ∈ F 1 , we can further simplify Equation (12) as follows: where the function γ := γ(x, y, A, F 1 , F 2 ) is defined as For simplicity of notation, we drop the parameters from γ(x, y, A, F 1 , F 2 ) when its is clear from the context.

Randomized Protocol for Canonical Feasible Solutions.
In what follows, we define a randomized communication protocol where Alice gets an infeasible tuple (A, F 1 , F 2 ), and Bob gets a canonical feasible solution (x, y) with F 3 = ∅, and the goal is to compute the value of (13) in expectation. 7 To compute s ′ (F1 ∩ F3), Bob samples an index i ∈ [n]. If i / ∈ F3, he outputs 0 and terminates the protocol, otherwise he sends i to Alice. If i ∈ F1, Alice outputs n · s ′ (i), otherwise, she outputs 0.
For a fixed ε > 0, we define α := α(ε) = 2/(2 + ε), δ := δ(ε) = ε/(6 + 2ε) as in the min-knapsack case. Similar to the protocol for the knapsack cover inequalities, Alice sends Bob O(log n) bits at the beginning so that Bob knows I large , I small , U , σ and ∆. Recall that I large is the set of large items (i.e., i ∈ F such that s(i) U ), I small is the set of small items, U is an under-approximation of the residual demand U , D− ∆ is an over-approximation of s(I large ∩A) and σ is an under-approximation of s(I small ∩ A). Moreover, knowing his input (x, y), Bob can construct the sets F 1 and F 2 . Thus, by exchanging an additional O(log n) bits, Alice and Bob can both figure out which condition is satisfied for Equation (14).
To compute the value of (13) in expectation, we distinguish between the following cases: Case 1: Either F 2 = ∅, or F 2 = {j} and j ∈ A ∪ F 1 . In this case, we have that the value γ is either 0 or s ′ j y j . Bob now checks if Equation (15) holds: In the same way as in the min-knapsack protocol, Alice and Bob exchange O(log 2 n) bits to identify an index i * ∈ I large such that i * ∈ (( F 1 ∪ F 2 ) \ A). More precisely, this index i * belongs to one of the following three sets: either i * ∈ F 1 ∩ F 1 , or i * ∈ F 2 ∩ F 1 , or i * = j and F 2 = {j}. Alice and Bob can thus exchange O(1) more bits to figure out the condition that i * satisfies. In what follows, we design an O(log n)communication protocol to handle each of these cases.
If i * ∈ F 2 ∩ F 1 , then Equation (13) can be rewritten as One can see that each of the above four terms is non-negative, and similar to the min-knapsack protocol, Alice and Bob can exchange O(log n) bits and compute the value of (16) as follows: 1. Bob sends Alice the bit y j and the index j using ⌈log(n)⌉ + 1 bits if and only if F 2 = {j}, and he sends 0 if F 2 = ∅.
2. Alice samples a uniformly random index i ∈ [n+1]. If i = n+1, Alice uses the knowledge of F 2 (and thus γ) to compute the normalized value of the last terms, that is, she outputs (n + 1) · (γ + s i * − αU ), and terminates the communication. Otherwise, she sends i to Bob using ⌈log(n)⌉ bits.
3. If i ∈ F 1 , Bob sends y i to Alice; otherwise, Bob outputs 0 and terminates the communication.
The above communication costs O(log n) bits, all outputs are non-negative and can be computed with the information available to each player, and by linearity of expectation, the expected output is exactly the slack (13) when i * ∈ F 2 ∩ F 1 .
The case where i * ∈ F 1 ∩ F 1 is handled similarly.
In the remaining case, we have F 2 = {j} and i * = j ∈ F 1 ∩ I large , and hence γ = s ′ j y j > αU . This can be handled by changing the second step of the protocol described earlier in such a way that Alice outputs (n + 1) · (s ′ j − αU ) if i = n + 1.

Equation (15) does not hold:
Recall that since (x, y) is a feasible solution (and F 3 = ∅), we have By the assumption that Equation (15) does not hold, together with the argument in Equation (7), we conclude that Note that since | F 2 | 1, we get that We also have that x(I small ∩ F 1 ) = s(I small ∩ F 1 ) by the definition of F 1 . Together this gives that the summation s(I small ∩ F 1 ) + x(I small ∩ F 2 ) is lower bounded by σ + (1 − δ) U . We rewrite (13) as The non-negativity of the first three terms is straightforward, and Alice and Bob can compute them by exchanging O(log n) bits 8 . By adding and subtracting ( σ + (1 − δ) U − x(I small ∩ F 2 )) to the remaining terms in (18), we can rearrange the terms and rewrite the rest as the sum of the following three non-negative terms that we can easily compute: The non-negativity of the first part follows from (17), and Bob has all the information to compute it on his own. The non-negativity of the second part follows from our definition of σ and U , and their relation to δ and α. Moreover, Alice has all the information to compute this part.
To see that the third part (i.e., s(I small ∩ A ∩ F 2 ) + γ − x(I small ∩ F 2 )) is also non-negative and can easily be computed by one of the players, note that: 1. If x(I small ∩ F 2 ) = 0, then clearly it is non-negative. In this case, Bob communicates the set F 2 to Alice using O(log n) bits so that she knows whether F 2 = ∅, or the item j if F 2 = {j} and j ∈ I large . Once F 2 is known to Alice, she can compute both s(I small ∩A∩ F 2 ) and γ (recall that γ would be either 0 or s ′ j y j = U ). 2. If x(I small ∩ F 2 ) = x j D = 0, then we have that F 2 = {j} and j ∈ I small . From our assumption of Case 1, we also have that j ∈ A ∪ F 1 . Since A and F 1 are two disjoint sets, we get that: (a) If j ∈ A, then Thus it is also non-negative, and Bob can compute it on his own in this case.
This concludes the communication problem in the case where either Case 2: F 2 = {j} and j ∈ F 2 . In this case γ = x j D. This case is quite similar to Case 1, with the difference being that Bob checks at the beginning if i.e., without including F 2 compared to (15).
If the condition was indeed satisfied, then the same reasoning as the first part of Case 1 resolves this case. Otherwise, we get and using Equation (18) from the second part of Case 1 yields that that first four terms in this case are non-negative and easy to compute. Similarly, adding and subtracting ( σ + (1 − δ) U ) to the last four terms of (18), and rearranging the terms we get The first part of the summation is non-negative by Equation (20) and can be computed by Bob. The second part is the same as the second part in Equation (19). It is non-negative by definition and can be computed by Alice. This completes the proof.
This concludes the promised communication problem in the case where Alice is given an infeasible tuple (A, F 1 , F 2 ), and Bob is given a canonical feasible solution with F 3 = ∅. As argued in Section 4.1, this generalizes to any canonical feasible solution without any restriction on F 3 .

Randomized Protocol for Arbitrary Feasible Solutions.
We now extend the communication protocol of canonical feasible solutions to arbitrary feasible solutions. To that end, we denote by R = {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x r , y r )} the set of all canonical feasible solutions.
In this non-restricted setting, Alice still gets an infeasible tuple (A, F 1 , F 2 ), but Bob gets a feasible solution (x, y) that is not necessarily canonical, and the goal remains to compute the slack of the corresponding flow-cover inequality (i.e., Equation (10)) in expectation. We show that the communication protocol that we developed in the previous section can be used as a black-box to handle this general case, by noting that any feasible solution (x, y) can be written as a convex combination of canonical feasible solutions (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x r , y r ). In other words, there exists λ 1 , λ 2 , . . . , λ r 0, r k=1 λ k = 1, such that This is formalized in Lemma 5.
To see that this is enough, note that the expansion in Equation (21) of (x, y) allows us to rewrite slack of the flow-cover inequalities in (10) as Thus in order to compute the slack in expectation, Bob samples a canonical feasible solution (x k , y k ) ∈ R with probability λ k , then together with Alice, they compute the slack of as discussed in the previous section. It remains to prove that any feasible solution can indeed be written as a convex combination of canonical feasible solutions. This is formalized in Lemma 5. y 2 ), . . . , (x r , y r )} be the set of all the canonical feasible solutions for the flow cover problem, then any feasible solution (x, y) can be written as such that λ k 0 for all 1 k r, and k λ k = 1.
Proof. Given a feasible solution (x, y), define its support F x,y = {i : i ∈ F, and y i = 1}, and define the set R x,y to be the set of all canonical feasible solutions whose support equals F x,y , i.e., Without loss of generality, we assume that F x,y = [n] to simplify the presentation.
We now consider the following polytope P (y): Note that for any feasible solution (x, y) to the flow cover problem, we have that x ∈ P (y). Moreover, we get from Definition 5 that for any canonical feasible solution (x ′ , y) ∈ R x,y , all except at most one item i ∈ [n], either has x ′ i = 0 or x ′ i D = s i y i . Thus x ′ satisfies at least n − 1 linearly independent constraints of type ( * * ) with equality. Conversely, if a point x ∈ P (y) satisfies at least n − 1 constraints of type ( * * ) with equality, then (x, y) ∈ R x,y .
Recall that a point z is an extreme point solution of P (y) iff there are n linearly independent constraints that are set to equality by z. Since constraint ( * ) is an equality constraint and is linearly independent from any set of n − 1 constraints from ( * * ), we conclude that {x ′ : (x ′ , y) ∈ R x,y } is the set of all extreme points of P (y). This implies that for any x ∈ P (y), there exists λ k 0 for each 1 k r such that k λ k = 1 and Since all these points have the same y-support, it follows that (x, y) = r k=1 λ k (x k , y k ) .

Algorithmic Aspects.
Theorem 1 relies on the existence of a quasi-polynomial size extended formulation for the weakened knapsack cover inequalities. However, we do not know how to construct the full extended formulation in quasi-polynomial time. Nevertheless, there is a way to use the extended formulation algorithmically, which we describe here.
We adopt a more general point of view, since the findings of this section are applicable beyond the context of the knapsack cover inequalities. Consider any system of p inequalities A 1 x b 1 , . . . , A p x b p , and q solutions x (1) , . . . , x (q) ∈ R n of this system. In the context of the min-knapsack problem, the inequalities A i x b i (i ∈ [p]) are all the weakened knapsack cover inequalities and the solutions x (j) (j ∈ [q]) are all the feasible solutions x ∈ {0, 1} n . Typically, both p and q are exponentially large as functions of n.
To this data corresponds a slack matrix S ∈ R p×q 0 defined by S ij := A i x (j) − b i . As observed by Yannakakis [36], every non-negative factorization S = F V where F ∈ R p×r 0 and V ∈ R r×q 0 determines a system y 0 whose projection to the x-space gives a polyhedron {x ∈ R n | ∃y ∈ R r : Ax − b = F y, y 0} containing each of the solutions x (j) and contained in each of the halfspaces A i x b i .
Usually, the number p of equations in (22) is much bigger than both the number n of x-variables and rank r of the non-negative factorization. Thus the equation system is largely overdetermined and can be replaced by a smaller equivalent subsystem with at most n + r equations. However, it is not obvious to tell efficiently what are the indices i for which the corresponding equation in (22) should be kept.
To avoid this difficulty, we assume that the way in which we want to use the extended formulation (shorthand: EF) Ax − b = F y, y 0 is to solve the LP min{c ⊺ x | Ax b} for a given objective vector c ∈ R n , through the extended formulation.
For I ⊆ [p], consider the linear program In fact, we will only need to consider sets I of size at most n + r ≪ p. Algorithm 1 solves the LP min{c ⊺ x | Ax b} in several steps. In each step, it solves the smaller LP(I) where I ⊆ [p] and calls a separation routine to check whether x * , the x-part of the optimum solution found, satisfies Ax b or not. In the first case, it returns x * and stops. In the second case, it adds the index i * of any violated constraint to I and continues. At the beginning of the algorithm, I is initialized to [n]. To avoid technicalities, we assume that LP([n]) is bounded. For the sake of concreteness, we assume furthermore that the n first inequalities of the system Ax b are the nonnegativity inequalities x 1 0, . . . , x n 0, and that c ∈ R n 0 . if there exists i * ∈ [p] such that A i * x * < b i * then 6: add i * to I end if 10: until feasible = true 11: return x * To analyze the running time of the algorithm, we make the following assumptions: • the size of each coefficient in (22) and each c i is upper-bounded by ∆ = ∆(n); • the separation problem (given x * ∈ R n , find an index i * ∈ [p] such that A i * x < b i * or report that no such index exists) can be solved in T sep (n) time; • each single equation in (22) can be written down in T constr (n) time; • LP(I) can be solved in time T solve (n) for any set I of size at most n + r, where r = r(n) is the rank of the nonnegative factorization giving rise to the extended formulation Ax − b = F y, y 0.
Notice that T solve (n) = O(n 3 (n + r)∆) if an interior point method is used to solve LP(I).
Lemma 6. Under the above assumptions, the main loop of Algorithm 1 is executed at most r + 1 times. Thus the complexity of Algorithm 1 is O(r · (T solve (n) + T sep (n) + T constr (n))).
Proof. The result follows directly from the simple observation that each time a new equation it is linearly independent from the current equations in the system. Notice that by assumption, the algorithm starts with n linearly independent constraints. By the above observation, we always have |I| n + r.
From now on, we assume that the non-negative factorization of the slack matrix S comes from a communication protocol with non-negative outputs computing S in expectation. The protocol is specified by a binary protocol tree, in which each internal node is owned either by Alice or Bob, and each leaf corresponds to an output of the protocol. At each internal node u owned by Alice, a branching probability p branch (i, u) ∈ [0, 1] is given for each input i ∈ [p] of Alice. Similarly for each internal node v owned by Bob, we are given a branching probability q branch (j, v) ∈ [0, 1], where j ∈ [q] is Bob's input. These branching probabilities specify the chance for the protocol of following the left branch. Finally, each leaf ℓ has a nonnegative number λ(ℓ) ∈ R 0 attached to it.
The corresponding extended formulation can be written as y ℓ 0 ∀ℓ leaf where p reach (i, u) denotes the probability of reaching node u of the protocol tree on any input pair of the form (i, * ).

Lemma 7.
Let ∆ be any number that is at least max{− log(p reach (i, ℓ)) | i ∈ [p], ℓ leaf , p reach (i, ℓ) > 0} and let h denote the height of the protocol tree. For any fixed i ∈ [p], one can write down the right-hand side of the corresponding equation in (23) in O(2 h ∆ log ∆ log log ∆) time and O(2 h ∆) space.
Proof. Clearly, at the root of the protocol tree, we have p reach (i, root) = 1. At an internal node u owned by Alice with left child v and right child w, we have p reach (i, v) = p reach (i, u) · p branch (i, u) and p reach (i, w) = p reach (i, u) · (1 − p branch (i, u)) = p reach (i, u) − p reach (i, v). In case u is owned by Bob, we simply have p reach (i, v) = p reach (i, w) = p reach (i, u) since the behavior of the communication protocol at node u on input pair (i, j) is independent of i. Using this, we can compute recursively p reach (i, u) for all nodes u of the protocol tree, and thus for the leaves of the tree. All arithmetic operations are performed on numbers of at most O(∆) bits. If we use the Schoolbook algorithm for subtraction and the Schönhage-Strassen algorithm for multiplication, we obtain the claimed bounds for the time-and space-complexity. Now, we discuss how Algorithm 1 and its analysis apply to the (weakened) knapsack cover inequalities and the corresponding slack matrix (S ε ab ) a∈f −1 (0), b∈f −1 (1) as in (5), where f is the weighted threshold function defining the knapsack. In order to do that, we first have to construct the protocol tree of the protocol described in the proof of Lemma 4. We claim that this can be done in time (1/ε) O(1) n O(log n) .
The protocol has several deterministic parts (in which the branching probabilities are in {0, 1} locally). Each corresponds to the resolution of a Karchmer-Wigderson game. For writing down the corresponding subtrees of the protocol tree, we just need log 2 (n)-depth monotone circuits of fan-in 2 for computing certain truncations of the weighted threshold function f . The translation of the circuit into a protocol tree follows the standard construction of Karchmer and Wigderson [25]. For constructing the circuits, we rely either on the construction of Beimel and Weinreb [7,8] or the simpler and more recent construction of Chen, Oliveira and Servedio [16]. Both constructions can be executed in n O(1) time.
The remaining parts of the protocol can be readily translated into the corresponding subtrees of the protocol tree.
Since the reaching probabilities in the protocol tree can be written down with O(log n) bits, each coefficient in the right-hand side of (23) can be written down in O(log n) bits. Assuming as before that all item sizes and demand can be written down with O(n log n) bits (which is without loss of generality), the coefficients of the left-hand side of (23) can be written down with O(n log n) bits. Therefore, we can take ∆ = O(n log n) From what precedes and Lemma 7, we have that T constr (n) = (1/ε) O(1) n O(log n) . Moreover, Lemma 4 gives r(n) = (1/ε) O(1) n O(log n) .
For the separation routine, we deviate significantly from Algorithm 1: instead of using an exact separation routine (efficient exact separation of the knapsack cover inequalities is an open problem), we rely on a separation trick from Carr et al. [13]. That is, we simply check if the knapsack cover inequality for A := {i ∈ [n] | x * i 1/2} is satisfied. This is enough to guarantee that the modified Algorithm 1 computes a quantity that is within a 2 + ε factor of the integer optimum for that particular cost function c. Unfortunately, by relying on the pseudo-separation of Carr et al., we cannot guarantee that the modified Algorithm 1 optimizes exactly over all weakened knapsack cover inequalities.
If we further assume that the coefficients of c can be written with O(n log n) bits, we conclude that one can find a (2 + ε)-approximation of min{ n i=1 c i x i | n i=1 s i x i D, x ∈ {0, 1} n } in time (1/ε) O(1) n O(log n) , without relying on the ellipsoid algorithm, using our extended formulation.

Conclusion.
After the recent series of strong negative results on extended formulations, we have presented a positive result inspired by a connection to monotone circuits. Namely, we obtain the first quasipolynomial-size LP relaxation of min-knapsack with constant integrality gap from polylog-depth circuits for weighted threshold functions.
This result sheds new light on the approximability of min-knapsack via small LPs by connecting it to the complexity of monotone circuits. For instance, it follows from our results that proving that no n O(1) -size LP relaxation for min-knapsack can have integrality gap at most α for some α > 2 would rule out the existence of O(log n)-depth monotone circuits with bounded fan-in for weighted threshold functions on n inputs, which is an open problem.
Finally, let us further mention two open questions following this work. First, it would be interesting to find an efficient (quasi-polynomial time) procedure to explicitly write down our linear program for min-knapsack. Second, it would be interesting to understand whether there is a "combinatorial" interpretation of our relaxation.