Concentration for Limited Independence via Inequalities for the Elementary Symmetric Polynomials

We study the extent of independence needed to approximate the product of bounded random variables in expectation. This natural question has applications in pseudorandomness and min-wise independent hashing. For random variables with absolute value bounded by 1, we give an error bound of the form σΩ(k) when the input is k-wise independent and σ2 is the variance of their sum. Previously, known bounds only applied in more restricted settings, and were quantitatively weaker. Our proof relies on a new analytic inequality for the elementary symmetric polynomials Sk(x) for x ∈ Rn. We show that if |Sk(x)|, |Sk+1(x)| are small relative to |Sk−1(x)| for some k > 0 then |S`(x)| is also small for all ` > k. We use this to give a simpler and more modular analysis of a construction of min-wise independent hash functions and pseudorandom generators for combinatorial rectangles due to Gopalan et al., which also improves the overall seed-length. ACM Classification: G.3 AMS Classification: 68Q87


Introduction
The power of independence in probability and randomized algorithms stems from the fact that it lets us control expectations of products of random variables. If X 1 , . . . , X n are independent random variables, then E [∏ n i=1 X i ] = ∏ n i=1 µ i where the µ i are their respective expectations. (To avoid measurability issues, we assume all random variables have finite support.) However, there are numerous settings in computer science, where true independence either does not hold, or is too expensive (in terms of memory or randomness).
Motivated by this, we explore settings when approximate versions of the product rule for expectations hold even under limited independence. Concretely, let X 1 , . . . , X n be random variables lying in the range [−1, 1], where X i has mean µ i and variance σ 2 i . We are interested in the smallest k = k(δ ) such that whenever the X i s are drawn from a k-wise independent distribution D, it holds that (1.1) As stated, we cannot hope to make do even with k = n − 1. Consider the case where each X i is a random {±1} bit. If X n = ∏ n−1 i=1 X i , then the resulting distribution is (n − 1)-wise independent, but E[∏ i X i ] = 1, whereas ∏ i µ i = 0. So, we need some additional assumptions about the random variables.
The main message of this paper is that small total variance is sufficient to ensure that the product rule holds approximately even under k-wise independence. Theorem 1.1. Let X 1 , . . . , X n be random variables each distributed in the range [−1, 1], where X i has mean µ i and variance σ 2 i . Let σ 2 = ∑ i σ 2 i . There exist constants c 1 > 1 and 1 > c 2 > 0 such that under any k-wise independent distribution D, (
An important restriction that naturally arises is positivity, where each X i lies in the interval [0, 1]. This setting of parameters (positive variables, small total variance) is important for the applications considered in this paper: pseudorandom generators (PRGs) for combinatorial rectangles [7,12] and minwise independent permutations [4]. The former is an important problem in the theory of unconditional pseudorandomness which has been studied intensively [7,12,19,3,13,9]. Min-wise independent hashing was introduced by Broder et al. [4] motivated by similarity estimation, and further studied by [11,5,19]. The authors of [19] showed that PRGs for rectangles give min-wise independent hash functions.
The results of [7,11] tell us that under k-wise independence, positivity and boundedness, the LHS of Equation (1.1) is bounded by exp(−Ω(k)), hence k = O(log(1/δ )) suffices for error δ . In contrast, we have seen that such a bound cannot hold in the [−1, 1] case. However, once the variance is smaller than some constant, our bound beats this bound even in the [0, 1] setting. Concretely, when σ 2 < n −ε for some ε > 0, our result says that O(1)-wise independence suffices for inverse polynomial error in Equation (1.1), as opposed to O(log(n))-wise independence. This improvement is crucial in analyzing THEORY OF COMPUTING, Volume 16 (17), 2020, pp.  PRGs and hash functions in the polynomially small error regime. A recent result of [9] achieves nearlogarithmic seed-length for both these problems, even in the regime of inverse polynomial error. Their construction is simple, but its analysis is not. Using our results, we give a modular analysis of the pseudorandom generator construction for rectangles of [9], using the viewpoint of hash functions. Our analysis also improves the seed-length of the construction, getting the dependence on the dimension n down to O(log log(n)) as opposed to O(log(n)), which (nearly) matches a lower bound due to [12].
The main technical ingredient in our work is a new analytic inequality about symmetric polynomials in real variables which we believe is of independent interest. The k-th symmetric polynomial in a = (a 1 , a 2 , . . . , a n ) is defined as (we let S 0 (a) = 1). We show that for any real vector a, if |S k (a)|, |S k+1 (a)| are small relative to |S k−1 (a)| for some k > 0, then |S (a)| is also small for all > k. This strengthens and generalizes a result of [9] for the case k = 1.
We give an overview of the new inequality, its use in the derivation of bounds under limited independence, and finally the application of these bounds to the construction of pseudorandom generators and hash functions.

The elementary symmetric polynomials
The elementary symmetric polynomials appear as coefficients of a univariate polynomial with real roots, since ∏ i∈[n] (ξ + a i ) = ∑ n k=0 ξ k S n−k (a). They have been well studied in mathematics, dating back to classical results of Newton and Maclaurin (see [20] for a survey). This work focuses on their growth rates. Specifically, we study how local information on S k (a) for two consecutive values of k implies global information for all larger values of k.
It is easy to see that symmetric polynomials over the real numbers have the following property: This is equivalent to saying that if p(ξ ) is a real univariate polynomial of degree n with n nonzero roots and p (0) = p (0) = 0 then p ≡ 0. This does not hold over all fields, for example, the polynomial p(ξ ) = ξ 3 + 1 has three nonzero complex roots and p (0) = p (0) = 0.
A robust version of Fact 1.2 was recently proved in [9]: For every a ∈ R n and k ∈ [n], That is, if S 1 (a), S 2 (a) are small in absolute value, then so is everything that follows. We provide an essentially optimal bound. The parameters promised by Theorem 1.3 are tight up to an exponential in k which is often too small to matter (we do not attempt to optimise the constants). For example, if a i = (−1) i for all i ∈ [n] then |S 1 (a)| ≤ 1 and |S 2 (a)| ≤ n + 1 but S k (a) is roughly (n/k) k/2 .
A more general statement than Fact 1.2 actually holds (see Section 2.1 for a proof). We prove a robust version of this fact as well: A twice-in-a-row bound on the increase of the symmetric functions implies a bound on what follows. Theorem 1.5. For every a ∈ R n , if S k (a) = 0 and Theorem 1.5 is proved by reduction to Theorem 1.3. The proof of Theorem 1.3 is analytic and uses the method of Lagrange multipliers, and is different from that of [9] which relied on the Newton-Girard identities. The argument is quite general, and similar bounds may be obtained for functions that are recursively defined.
Stronger bounds are known when the inputs are nonnegative. When a i ≥ 0 for all i ∈ [n], the classical Maclaurin inequalities [20] imply that S k (a) ≤ (e/k) k (S 1 (a)) k . In contrast, when we do not assume non-negativity, one cannot hope for such bounds to hold under the assumption that |S 1 (a)| or any single |S k (a)| is small (see the alternating signs example above).

Tail bounds
We return to the question alluded to earlier about how much independence is required for the approximate product rule of expectation. This question arises in the context of min-wise hashing [11], PRGs for combinatorial rectangles [7,9], read-once DNFs [9] and more.
One could derive bounds of similar shape to ours using the work of [9], but with much stronger assumptions on the variables. More precisely, one would require , and get an error bound of roughly k O(k) σ Ω(k) . These stronger assumptions limit the settings where their bound can be applied (biased variables typically do not have good moment bounds), and ensuring these conditions hold led to tedious case analysis in analyzing their PRG construction.
We briefly outline our approach. We start from the results of [7,11] who give an error bound of δ ≤ exp(−k) in (1.1). To prove this, they consider random variables (1.5) THEORY OF COMPUTING, Volume 16 (17), 2020, pp. 1-29 By inclusion-exclusion/Bonferroni inequalities, the sum on the right gives alternating upper and lower bounds, and the error incurred by truncating to k terms is bounded by S k (Y ). So we can bound the expected error by E[S k (Y )] for which k-wise independence suffices. Our approach replaces inclusion-exclusion by a Taylor-series style expansion about the mean, as in [9]. Let us assume µ i = 0 and let X i = µ i (1 + Z i ). Thus, (1.6) In this approach, it is usually not sufficient to bound E[|S k (Z)|], since Z may have negative entries (even if we start with X i s all positive). So, to argue that the first k terms are a good approximation, we need to bound the tail ∑ ≥k S (Z) . At first, this seems problematic, since this involves high degree polynomials, and it seems hard to get their expectations right assuming just k-wise independence. 1 Even though we cannot bound E[S (Z)] under k-wise independence once k, we use our new inequalities for symmetric polynomials to get strong tail bounds on them. This lets us show that truncating Equation (1.6) after k terms gives error roughly σ k , and thus k = O(log(1/δ )/ log(1/σ )) suffices for error δ . We next describe these tail bounds in detail.
We assume the following setup: Z = (Z 1 , . . . , Z n ) is a vector of real valued random variables where Z i has mean 0 and variance σ 2 i , and σ 2 = ∑ i σ 2 i < 1. Let U denote the distribution where the coordinates of Z are independent. One can show that E Z∈U [|S (Z)|] ≤ σ / √ ! and hence by Markov's inequality (see Corollary 3.2) when t > 1 and tσ ≤ 1/2, (1.7) Although k-wise independence does not suffice to bound E[S (Z)] for k, we use Theorem 1.5 to show that a similar tail bound holds under limited independence. Theorem 1.6. Let D denote a distribution over Z = (Z 1 , . . . , Z n ) as above where the Z i s are (2k + 2)-wise independent. For t > 0 and 2 16etσ < 1, (1.8) Typically proofs of tail bounds under limited independence proceed by bounding the expectation of some suitable low-degree polynomial. The proof of Theorem 1.6 does not follow this route. In Section 3.2, we give an example of Z i s and a (2k + 2)-wise independent distribution on where E[|S (Z)|] for ∈ {2k + 3, . . . , n − 2k − 3} is much larger than under the uniform distribution. The same example also shows that our tail bounds are close to tight. 1 We formally show this in Section 3.2. 2 A weaker but more technical assumption on t, σ , k suffices, see Equation (3.11).

Applications
The notion of min-wise independent hashing was introduced by Broder et al.
Combinatorial rectangles are a well-studied class of tests in pseudorandomness [7,12,19,3,13,9]. In addition to being a natural class of statistical tests, constructing generators for them with optimal seeds (up to constant factors) will improve on Nisan's generator for logspace [3], a long-standing open problem in derandomization.
A generator G : {0, 1} r → [m] n can naturally be thought of as a collection of 2 r hash functions, one for each seed. For y ∈ {0, 1} r , let G(y) = (x 1 , . . . , x n ). The corresponding hash function is given by g y (i) = x i . The corresponding hash functions have the property that the probability that they fool all test functions given by combinatorial rectangles. Saks et al. [19] showed that this suffices for -minima-wise independence. They state their result for = 1, but their proof extends to all .
Constructions of PRGs for rectangles and min-wise hash functions that achieve seed-length O(log(mn) log(1/ε)) were given by [7,11] using limited independence. The first construction G MR to achieve seed-lengthÕ(log(mn/ε)) was given recently by [9]. We use our results to give an analysis of their generator which we believe is simpler and more intuitive, and also improves the seed-length, to (nearly) match the lower bound from [12].
We take the view of G MR as a collection of hash functions g : , based on iterative applications of an alphabet squaring step. We describe the generator formally in Section 5. We start by observing that fooling rectangles is easy when m is small; O(log(1/δ ))-wise Independence suffices, and this requires The key insight in [9] is that gradually increasing the alphabet is also easy (in that it requires only logarithmic randomness). Assume that we have a hash function g 0 : [n] → [m] and from it, we define To do this, we pick a function g 1 : The key observation is that it suffices to pick g 1 using only O(log(1/δ )/ log(m))-wise independence, rather than the O(log(1/δ ))-wise independence needed for one shot (the larger m is, the less independence is required). THEORY OF COMPUTING, Volume 16 (17), 2020, pp.  To see why this is so, fix subsets S i ⊆ [m 2 ] for each co-ordinate and pretend that g 0 is truly random. One can show that the random variable Pr g 0 [g 1 (i) ∈ S i ] over the choice of g 1 has variance 1/poly(m). Since we are interested in ∏ i Pr g 0 [g 1 (i) ∈ S i ], which is the product of n small variance random variables, Theorem 1.1 says it suffices to use limited independence. 3 Theorem 1.9. Let G MR be the family of hash functions from [n] to [m] defined in Section 5.2 with error parameter δ > 0. The seed length is at most O((log log(n) + log(m/δ )) log log(m/δ )). Then, for every This improves the bound from [9] in the dependence on n and δ ; their bound was In particular, the dependence on n reduces from log(n) to 4 log log(n). The authors of [12] showed a lower bound of Ω(log(m) + log(1/ε) + log log(n)) even for hitting sets, so our bound is tight upto the log log(m/δ ) factor. While [12] constructed hitting-set generators for rectangles with near-optimal seedlength, we are unaware of previous constructions of pseudorandom generators for rectangles where the dependence of the seedlength on n is o(log(n)).
Saks et al. [19] showed how to translate a PRG for combinatorial rectangles to an approximately minima-wise independent family (for completeness, see Section 5.5 for a proof). We thus get the following corollary. Corollary 1.11. For every , there is a family of approximately -minima-wise independent hash functions with error ε and seed length at most O((log log(n) + log(m /ε))(log log(m /ε))).

Follow-up work
The basic nature of the questions we consider has led to follow-up work which we now briefly describe. (A preliminary version of this article appeared as [10]).
Gopalan, Kane and Meka [8] constructed the first PRG with seed-length O((log(n/δ ) log log(n/δ ) 2 ) for several classes of functions, including halfspaces, modular tests and combinatorial shapes. The key technical ingredient of their work is a generalization of Theorem 1.1 to the complex numbers. Their proof, however, is different from ours, and in particular it does not imply the inequalities and tail bounds for symmetric polynomials that are proved here.
Understanding the tradeoff between space and randomness as computational resources is an important problem in computational complexity theory. A central technique for understanding this tradeoff is via PRGs for branching programs [17]. Meka, Reingold and Tal [14] constructed the first PRG for width-3 branching programs with nearly optimal seed length. Their proof relies on the proof strategy describe here.
The iterated restrictions approach is one of the few general mechanisms for constructing PRGs. It was suggested by Ajtai and Wigderson [1] and in most applications does not yield truly optimal seed length. Doron, Hatami and Hoza [6] showed that that the iterated restrictions approach can achieve optimal seed length (in a specific scenario). A key step in their proof is an extension of our results on the elementary symmetric polynomials to subset-wise symmetric polynomials.

Organization
We present the proofs of our inequalities for symmetric polynomials (Theorems 1.3 and 1.5) in Section 2 and tail bounds for symmetric polynomials (Theorem 1.6) in Section 3. We use these bounds to prove the bound on products of low-variance variables (Theorem 1.1) in Section 4, and to analyze the generator from [9] in Section 5.

Inequalities for symmetric polynomials
The proof of our inequality for the elementary symmetric polynomials is by induction on k, and uses the method of Lagrange multipliers together with the Maclaurin identities.
Proof of Theorem 1.3. It will be convenient to use By Newton's identity, E 2 = S 2 1 − 2S 2 so for all a ∈ R n , It therefore suffices to prove that for all a ∈ R n and k ∈ [n], We prove this by induction. For k ∈ {1, 2}, it indeed holds. Let k > 2. Our goal will be upper bounding the maximum of the projectively defined 5 function under the constraint that S 1 (a) is fixed. Since φ k is projectively defined, its supremum is attained in the (compact) unit sphere, and is therefore a maximum. Choose a = 0 to be a point that achieves the maximum of φ k . We assume, without loss of generality, that S 1 (a) is non-negative (if S 1 (a) < 0, consider −a instead of a). There are two cases to consider: The first case is that for all i ∈ [n], In this case we do not need the induction hypothesis and can in fact replace each a i by its absolute value.
We then bound The second case is that there exists i 0 ∈ [n] so that In this case we use induction and Lagrange multipliers. For simplicity of notation, for a function F on R n denote F(−i) = F(a 1 , a 2 , . . . , a i−1 , a i+1 , . . . , a n ) for i ∈ [n]. For every δ ∈ R n so that ∑ i δ i = 0 we have φ k (a + δ ) ≤ φ k (a). Hence, 6 for all δ so that .
Hence, for all δ close enough to zero so that ∑ i δ i = 0, For the above inequality to hold for all such δ , it must be that there is λ so that for all i ∈ [n], To see why this is true, set λ i = a i S k (a)k − (S 2 1 (a) + E 2 (a))S k−1 (−i) . We now have λ 1 , . . . , λ n so that for every δ 1 , . . . , δ n of sufficiently small norm where ∑ i δ i = 0. We claim that this implies that in fact λ i = λ for every i. To see this, assume for contradiction that λ 1 = λ 2 and |λ 1 | > |λ 2 |. Set Sum over i to get Thus, for all i ∈ [n], 6 Here and below, O(δ 2 ) means of absolute value at most C · δ ∞ for C = C(n, k) ≥ 0.
This specifically holds for i 0 , so using Equation (2.2) we have To apply induction we need to bound S 2 1 (−i 0 ) + E 2 (−i 0 ) from above. Since we have the bound Finally, by induction and Equation (2.5), THEORY OF COMPUTING, Volume 16 (17), 2020, pp. 1-29 The proof of the more general inequality (Theorem 1.5) is by reduction to Theorem 1.3, and uses the connection between real polynomials in one variable and the symmetric polynomials.
Proof of Theorem 1. 5. Assume a 1 , . . . , a m are nonzero and a m+1 , . . . , a n are zero. Denote a = (a 1 , . . . , a m ) and notice that for all 7 k ∈ [n], S k (a) = S k (a ). Write Derive k times to get Since p has m real roots, p (k) has m − k real roots. Since By assumption,

Zeros of polynomials
We conclude the section with a proof of Fact 1.4 which states that over the reals, if S k (a) = S k+1 (a) = 0 for k > 0 then S (a) = 0 for all ≥ k. For a univariate polynomial p(ξ ) and a root y ∈ R of p, denote by mult(p, y) the multiplicity of the root y in p. We use the following property of polynomials p(ξ ) with real roots (see, e. g., [18]), which can be proved using the interlacing of the zeroes of p(ξ ) and p (ξ ): If mult(p , y) ≥ 2 then mult(p, y) ≥ mult(p , y) + 1.

Tail bounds under limited independence
In this section we work with the following setup. Let X = (X 1 , . . . , X n ) be a vector of real valued random variables so that E[X i ] = 0 for all i ∈ [n]. Let σ 2 i denote the variance of X i , and let σ 2 = ∑ n i=1 σ 2 i . The goal is proving a tail bound on the behavior of the symmetric functions under limited independence.
We start by obtaining tail estimates, under full independence. Let U denote the distribution over X = (X 1 , . . . , X n ) where X 1 , . . . , X n are independent.
Proof. Since the expectation of X i is zero for all i ∈ [n], (3.1) If 2e 1/2 tσ ≤ k 1/2 then by the union bound We now consider limited independence.
We claim that there must exist k 0 ∈ {0, . . . , k − 1} for which the following bounds hold: To see this, mark point j ∈ {0, . . . , k + 1} as high if A point is marked both high and low if equality holds. Observe that 0 is marked high (and low) since S 0 (x) = 1 and k and k + 1 are marked low by Equation (3.4). This implies the existence of a triple k 0 , k 0 + 1, k 0 + 2 where the first point is high and the next two are low.

Proof of tail bounds
Proof of Theorem 1.6. As in Lemma 3.3, fix x = (x 1 , . . . , x n ) such that Equation 3.4 holds (the random vector X has this property with D-probability at least 1 − 2t −2k ). By the proof of lemma, since by assumption 8etσ < 1/2,

On the tightness of the tail bounds
We conclude by showing that (2k + 2)-wise independence is insufficient to fool |S | for > 2k + 2 in expectation. We use a modification of a simple proof due to Noga Alon of the Ω(n k/2 ) lower bound on the support size of a k-wise independent distribution on {−1, 1} n , which was communicated to us by Raghu Meka. For this section, let X 1 , . . . , X n be so that each X i is uniform over In contrast we have the following: There is a (2k + 2)-wise independent distribution on X = (X 1 , X 2 , . . . , X n ) in {−1, 1} n such that for every ∈ [n], Specifically, Proof. Let D be a (2k + 2)-wise independent distribution on {−1, 1} n that is uniform over a set D of size 2(n + 1) k+1 ≤ 3n k+1 . Such distributions are known to exist [2]. Further, by translating the support by some fixed vector if needed, we may assume that (1, 1, . . . , 1) ∈ D. It is easy to see that every such translate also induces a (2k + 2)-wise independent distribution. The claim holds since S (1, . . . , 1) = n .
When, for example, k = O(log n), which is often the case of interest, for 2k + 3 ≤ ≤ n − (2k + 3), the RHS of (3.13) is much larger than the bound guaranteed by Equation 3.12. The tail bound provided by Lemma 3.3 can not therefore be extended to a satisfactory bound on the expectation. Furthermore, applying implies that for any (2k + 2)-wise independent distribution, When k = o(n), this is at most O(n −k+o (1) ). Comparing this to the bound given in Lemma 3.4, we see that the bound provided by Lemma 3.3 is nearly tight.

Limited independence fools products of variables
In this section we work with the following setup. We have n random variables X 1 , . . . , X n , each distributed in the interval [−1, 1]. Let µ i and σ 2 i denote the mean and variance of X i , and let σ 2 = ∑ n i=1 σ 2 i . The following theorem shows that limited independence fools products of bounded variables with low total variance. Theorem 4.1. There exists C > 0 such that under any Ck-wise independent distribution D, Proof. Denote by U the distribution on (X 1 , . . . , X n ) in which the X i s are independent with the same marginal distribution as in D. Define H ⊆ [n] to be the set of indices such that |µ i | ≤ √ σ . There are two cases to consider.
Case one: The first case is that |H| ≥ 2k. In this case, let H be a subset of H of size 2k. Since the variables are bounded in [−1, 1], we have The 2k-wise independence implies The same bound also holds under U. Hence, Case two: The second case is that |H| < 2k. Let T = H \ [n]. For ease of notation, we shall assume that T = [m] for some m ≤ n. We may assume that m > 10k, since otherwise there is nothing to prove. Even after conditioning on the outcome of variables in H, the resulting distribution on X 1 , . . . , X m is 10k-wise independent. Since the variables have absolute value at most 1, it suffices to show that for a 10k-wise independent distribution D, Write X i = µ i (1 + Z i ) so that Z i has mean 0 and variance σ 2 i /µ 2 i . Define the random variables where Z = (Z 1 , . . . , Z m ). We will prove the following claim.
Claim 4.2. For a 4k-wise independent distribution D, We first show how to finish the proof of Theorem 4.1 with this claim. We have The first two terms are bounded from above by (cσ k )/2 by the claim, and the last is 0 since 10k-wise independence fools degree 4k polynomials, such as P .
Proof of Claim 4.2. Denote byσ 2 i the variance of Z i . By definition of T , we haveσ 2 i = σ 2 i /µ 2 i ≤ σ 2 i /σ . The variance of the Z can be bounded bȳ Let G denote the event that |P − P | ≤ 2(8e √σ ) 4k , and denote by ¬G the complement of G. Write By the definition of G, Bound the second term as follows. First, since −1 ≤ P ≤ 1, Recall that S .

Analyzing the PRG for rectangles
Gopalan et al. [9] proposed and analyzed a PRG for combinatorial rectangles, which we denote by G MR .
In this section, we provide a different presentation and analysis of their construction, which is based on our results concerning the symmetric polynomials. Our analysis is simpler and follows the intuition that products of low variance events are easy to fool using limited independence. It also improves on their seedlength in the dependence on n, δ , as discussed above.

Preliminaries
Let U denote the uniform distribution on [m] n , and let D be a distribution on [m] n . We denote by Pr x∈D the probability distribution induced by choosing x according to D. For K ⊆ [n], denote by D K the marginal distribution of D in co-ordinates in K.
of size at most k, the total variation distance between D K and U K is at most ε.
Naor and Naor [16] showed that such distributions (for m a power of two) can be generated using seed-length O(log log(n) + k log(m) + log(1/ε)). Indeed, such distributions can be generated by taking a (k log(m))-wise ε-dependent string of length n log(m). We can also assume that every co-ordinate is uniformly random in [m], by adding the string (a, a, . . . , a) modulo m, where a ∈ [m] is uniformly random.
The following property holds. Let P be a real linear combination of combinatorial rectangles, , 1} for all S, i. Let L 1 (P) = ∑ S |c S |. The degree of P is the maximum size of S for which c S = 0. Convexity implies that if D is (k, ε)-wise independent and P has degree at most k then |E x∈D [P(x)] − E x∈U [P(x)]| ≤ L 1 (P)ε.

The generator
We use an alternate view of G MR as a collection of hash functions g : [n] → [m]. The generator G MR is based on iterative applications of an alphabet increasing step. The first alphabet m 0 is chosen to be large enough, and at each step t > 1 the size of the alphabet m t is squared m t = m 2 t−1 . There is a constant C > 0 so that the following holds. Denote by δ the error parameter of the generator. Let T ≤ C log log(m) be the first integer so that m T ≥ m. Let δ = δ /T . Squaring the alphabet: Define a hash function g t : This requires seed length O(log log(n) + log(m t ) + log(log log(m)/δ )).

Two lemmas
We first analyze the base case using the inclusion-exclusion approach of [7]. We need to extend their analysis to the setting where the co-ordinates are only approximately k-wise independent.
Lemma 5.2. Let D be a (k, ε)-wise independent distribution on [m] n with k odd. Then, for every S 1 , . . . , S n ⊆ [m], Proof. Let p i = |S i |/m, and q i = 1 − p i . Observe that We consider two cases based on ∑ i q i . THEORY OF COMPUTING, Volume 16 (17), 2020, pp. 1-29 Case 1: When ∑ i q i ≤ k/(2e). Since every non-zero q i is at least 1/m, there can be at most mk/(2e) indices i so that q i > 0. For i so that q i = 0, we have S i = [m], so we can drop such indices and assume n ≤ mk/(2e). By Bonferroni inequality, since k is odd, A similar bound holds for h. Using the (k, ε)-wise independence, and since n k ≤ (en/k) k , The second term is twice S k (q 1 , . . . , q n ), which we can bound by Maclaurin's identity as The lemma is proved since n ≤ mk/(2e).
Case 2: When ∑ i q i > k/(2e). Once again, we drop indices i so that q i = 0. Consider the largest n such that Repeating the argument from Case 1 for this n , we get

Similarly to Equation (5.2), Pr
Finally, since Pr the lemma is proved.
To analyze the iterative steps, we use the following lemma. To simplify notation, for a finite set X, we denote by Pr x∈X the probability distribution induced by choosing x uniformly in X. Lemma 5.3. There is C > 0 so that the following holds. Let 0 < δ < 1/C. Assume  The head is large. If |H| ≥ k, we show that both probabilities are small which means that they are close. Indeed, let H be the first k indices in H. First, by definition of H, so the proof is complete.
The head is small. From now on, we may assume that |H| < k. We may also assume that q i ≥ 1/m and p i > 0 for all i ∈ T , since otherwise S i is trivial and we can drop such an index. As in the proof of Lemma 5.2, by restricting to a subset if necessary, we can also assume that ∑ i∈T q i ≤ C log(1/δ ). For i ∈ T , define the random variable Define the random variable Q = Pr So, for every fixed g , The degree of P is at most 2k. We will show that P is a good approximation to P.
The claim completes the proof: Bound the first and third terms by O( −0.2k ) using the claim (U is also (Ck, ε)-wise independent). Bound the second term as follows. Since k ≥ 2, for all i, Plugging in the bounds from Equations (5.3): Thus, since the degree of P is 2k, Overall, Bound the first term as follows. If the co-ordinates of A are (Ck, 0)-wise independent, then, by Lemma 3.1, Hence, under (Ck, ε)-wise independence, As in the proof of Theorem 1.6, conditioned on G, Thus, It remains to bound the second term from above. Note that 0 ≤ P ≤ 1 since it is the probability of an event. Bound  Let h 0 , h 1 , . . . , h t be truly random hash functions with similar domains and ranges. For 0 ≤ t, q ≤ T , define the hybrid family G q t = { f q t :

Completing the analysis
[n] → [m t ]} as follows: for t = 0 and every q, define f q 0 = g 0 for q = 0, h 0 for q > 0, and for t > 0 and every q, f q t (i) = g t ( f q t−1 (i), i) for t ≥ q, h t ( f q t−1 (i), i) for t < q. For every q, let G q = G q T . Thus, G 0 = G MR and G T = U. We will show that for every q ≥ 0, The desired bound then follows by the triangle inequality.
In the case q = 0, couple G 0 and G 1 by picking the same g 1 , . . . , g T , and use them to define the function f :  . For the case q > 0, couple G q+1 and G q by picking the same g q+1 , . . . , g T , and pick x ∈ [m q−1 ] n uniformly at random. There is a function f : [m q ] × [n] → [m] so that f q (i) = f (h q (x i , i), i), f q−1 (i) = f (g q (x i , i), i).

Minima-wise independence
Saks et al. [19] showed how to translate a PRG for combinatorial rectangles to an approximately minimawise independent family. We conclude with a routine extension of their result to large .
Proof of Theorem 1.10. Fix S ⊆ [n] and a sequence T = (t 1 , . . . ,t ) of distinct elements from S. The event g(t 1 ) < · · · < g(t ) < min g(S \ T ) can be viewed as the disjoint union of m events by fixing the set A = {a 1 < . . . < a } that T maps to.
The indicator 1 A of the event g(t 1 ) = a 1 , . . . , g(t ) = a , g(S \ T ) > a is a combinatorial rectangle: Define f i (x i ).
Since g(i) = x i , it follows that 1 A (g) = f A (x). Further, choosing h ∈ U is equivalent to choosing x ∈ [m] n uniformly at random. Hence, Pr g∈G [g(t 1 ) < · · · < g(t ) < min g(S \ T )]