Optimality of Correlated Sampling Strategies

In the correlated sampling problem, two players are given probability distributions P and Q, respectively, over the same finite set, with access to shared randomness. Without any communication, the two players are each required to output an element sampled according to their respective distributions, while trying to minimize the probability that their outputs disagree. A well known strategy due to Kleinberg–Tardos and Holenstein, with a close variant (for a similar problem) due to Broder, solves this task with disagreement probability at most 2δ/(1+ δ ), where δ is the total variation distance between P and Q. This strategy has been used in several different contexts, including sketching algorithms, approximation algorithms based on rounding linear programming relaxations, the study of parallel repetition and cryptography. In this paper, we give a surprisingly simple proof that this strategy is essentially optimal. Specifically, for every δ ∈ (0,1), we show that any correlated sampling strategy incurs ∗Work done while at MIT. Supported in part by NSF Award CCF-1420692. †Work done while at MIT. Supported in part by NSF CCF-1420956, NSF CCF-1420692 and CCF-1217423. ‡Work done while at Harvard. Supported by NSF Award CCF-1565641. §Work done while at MIT. Supported in part by NSF CCF-1420956 and NSF CCF-1420692. ¶This work was supported by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCF-0939370. ‖This work was supported by NSF Awards CCF-1565641 and CCF-1715187, and a Simons Investigator Award. ACM Classification: F.0, G.3 AMS Classification: 68Q99, 94A20, 68W15


Introduction
In this paper, we study correlated sampling, a basic task, variants of which have been considered in the context of sketching algorithms [2], approximation algorithms based on rounding linear programming relaxations [7,3], the study of parallel repetition [6,12,1] and cryptography [13].
The correlated sampling problem is defined as follows. Alice and Bob are given probability distributions P and Q, respectively, over a finite set Ω. Without any communication, using only shared randomness as the means to coordinate, Alice is required to output an element a distributed according to P and Bob is required to output an element b distributed according to Q. Their goal is to minimize the disagreement probability Pr[a = b], which we will compare with the total variation distance between P and Q, defined as A correlated sampling strategy is formally defined below, where ∆ Ω denotes the set of all probability distributions over Ω and (R, F, µ) denotes the probability space corresponding to the randomness shared by Alice and Bob. R is the sample space, F is a σ -algebra over R and µ is a probability measure over (R, F). Even though Ω is finite, we allow R to be infinite. For simplicitly, we abuse notation and use R to denote both the sample space and the probability space.
Definition 1.1. A correlated sampling strategy for a finite set Ω with error ε : [0, 1] → [0, 1] is specified by a probability space R and a pair of functions 1 f , g : ∆ Ω × R → Ω, such that for all P, Q ∈ ∆ Ω with d TV (P, Q) ≤ δ , the following hold.
In the above, { f (P, r)} r∼R denotes the push-forward measure, that is, the distribution of the random variable f (P, r) over Ω, where r ∼ R is the source of shared randomness. For simplicity, we will often not mention R explicitly when talking about correlated sampling strategies. While we defined correlated sampling strategies for finite sets only, it is possible to define it for infinite sets Ω; see Section 5 for a discussion. In this paper we consider finite sets Ω only, except where otherwise stated. 1 We require both functions to be measurable in their second argument.
THEORY OF COMPUTING, Volume 16 (12), 2020, pp. [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18] A correlated sampling strategy is notably different from the fundamental notion of a coupling (see, e. g., [14] for an introduction), where we require a single coupling function h : ∆ Ω × ∆ Ω → ∆ Ω×Ω such that for any distributions P and Q it holds that the marginals of h(P, Q) are P and Q respectively. In other words, a coupling function has the knowledge of both P and Q, whereas a correlated sampling strategy operates locally on the knowledge of P and on the knowledge of Q. It is well known that for any coupling function h, it holds that Pr (a,b)∼h(P,Q) [a = b] ≥ d TV (P, Q) and that this bound is achievable. Observe that a correlated sampling strategy induces a coupling given as {( f (P, r), g(Q, r))} r∼R . Thus, it follows that ε(δ ) ≥ δ . And yet a priori, it is unclear whether any non-trivial correlated sampling strategy can even exist, since the error ε is not allowed to depend on the size of Ω.
Somewhat surprisingly, there exists a simple strategy whose error can be bounded by roughly twice the total variation distance (and in particular does not degrade with the size of Ω). Variants of this strategy have been rediscovered multiple times in the literature, yielding the following theorem. 2,7,6]). For all n ∈ Z ≥0 , there exists a correlated sampling strategy for sets of size n, with error ε : Strictly speaking, Broder's paper [2] did not consider the general correlated sampling problem. Rather it introduced the MinHash strategy, which can be interpreted as a correlated sampling strategy for the special case where P and Q are flat distributions, i. e., they are uniform over some subsets of Ω. In particular, if P = U(A) and Q = U(B) are distributions that are uniform over sets A, B ⊆ Ω, respectively, then the MinHash strategy gives an error probability of 1 − |A ∩ B|/|A ∪ B|, also known as the Jaccard distance between A and B. In the special case when |A| = |B|, this is equivalent to the bound above.
The technique can be generalized to other (non-flat) distributions to get the bound in Theorem 1.2, thereby yielding a strategy due to Kleinberg-Tardos and Holenstein. 2 Several variants of this (sometimes referred to as "consistent sampling" protocols) have been used in applied work, including [10,4,9,5].
Given Theorem 1.2, a natural and basic question is whether the bound on the error can be improved; the only lower bound we are aware of is the one that arises from coupling, namely ε(δ ) ≥ δ . This question was recently raised by Rivest [13] in the context of a new encryption scheme and was one of the motivations for this work. We give a surprisingly simple proof that the bound in Theorem 1.2 is essentially tight.
. For all δ , γ ∈ (0, 1), for all sufficiently large n, any correlated sampling strategy for a set of size n with error ε : Organization of the paper. In Section 2, we prove Theorem 1.3. In Section 3, we consider the setting where Ω is of a fixed finite size, which was the question originally posed by Rivest [13]. In this regime, there turns out to be a surprising strategy that gets better error than Theorem 1.2 in a very special case.
However, it was conjectured in [13] that in fact a statement like Theorem 1.3 holds in every other case and we make progress on this conjecture by proving it in one such case. For completeness, in Section 4 we describe the correlated sampling strategies of Broder, Kleinberg-Tardos, and Holenstein, underlying Theorem 1.2. We conclude with some more observations and open questions in Section 5.

Lower bound on correlated sampling
In order to prove Theorem 1.3, we first introduce the constrained agreement problem, a relaxation of the correlated sampling problem. In this problem, Alice and Bob are given sets A ⊆ Ω and B ⊆ Ω, respectively, where the pair (A, B) is sampled from some (known) distribution D. Alice and Bob are required to output elements a ∈ A and b ∈ B, respectively, such that the disagreement probability This can be viewed as a relaxation of the correlated sampling problem by first considering the case of flat distributions in Theorem 1.1 and relaxing the restrictions of { f (P, r)} r∼R = P and {g(Q, r)} r∼R = Q to only requiring that f (P, r) ∈ supp(P) and g(Q, r) ∈ supp(Q) for all r ∈ R. This makes it a constraint satisfaction problem and we consider a distributional version of the same.
In the following definition, we use 2 Ω to denote the powerset of Ω.
Definition 2.1. A constrained agreement strategy for a finite set Ω and a probability distribution D over 2 Ω × 2 Ω is specified by a pair of functions f , g : 2 Ω → Ω and has error err D ( f , g) if the following hold. Note that since the constrained agreement problem is defined with respect to a (known) probability distribution D on pairs of sets, we can require, without loss of generality, that the strategies ( f , g) be deterministic (since any randomized strategy can be derandomized with no degradation in the error).
In order to prove Theorem 1.3, we characterize the optimal constrained agreement strategy in the special case when D = D p where every element ω ∈ Ω is independently included in each of A and B with probability p.
Proof of Lemma 2.2. For ease of notation, let Ω = [n]. Let ( f , g) be a constrained agreement strategy.
We will construct functions f * and g * such that For every i ∈ [n], let β i := Pr B [g(B) = i]. Without loss of generality (by suitably permuting [n]), we can assume that β 1 ≥ β 2 ≥ · · · ≥ β n . Since A and B are independently sampled in D p , it follows that THEORY OF COMPUTING, Volume 16 (12), 2020, pp. 1-18 when Bob's strategy is fixed to g, the strategy of Alice that results in the largest agreement probability is simply f * (A) := argmax i∈A β i = min {i : i ∈ A} for all A ⊆ [n].
So far we have err D p ( f , g) ≥ err D p ( f * , g). We can repeat the same process again. For every i ∈ [n], define α i := Pr A [ f * (A) = i]. Due to the specific choice of f * , it holds that α i = (1 − p) i−1 p and hence α 1 ≥ α 2 ≥ · · · ≥ α n . Thus, when Alice's strategy is fixed to f * , the strategy of Bob that results in the largest agreement probability is given by g Thus, we conclude that Before turning to the proof of Theorem 1.3, we note a couple of basic facts. The following concentration bound, due to Sergei Bernstein from the 1920s, is often referred to as Chernoff's or Hoeffding's bound. The Bernoulli random variable X ∼ Ber(p) is a 0-1 random variable with Pr[X = 1] = p.
Fact 2.4 (See, e. g., Cor 4.6 in [11])). For X 1 , . . . , X n drawn i. i. d. from Ber(p), it holds for all τ > 0, that Proof of Theorem 1.3. Fix δ , γ ∈ (0, 1). Assume, for the sake of contradiction, that for infinitely many values of n, there is a correlated sampling strategy ( f * , g * ) for a set of size n with error Let δ ∈ (0, δ ) be such that Hence, by the union bound and using p 2 ≤ p, we get that with probability at least 1 − 6 · e −p 2 ·n 0.6 /3 , we have that ||A| − p · n| ≤ pn 0.8 , ||B| − p · n| ≤ pn 0.8 and ||A ∩ B| − p 2 · n| ≤ p 2 n 0.8 . Thus, for the distributions P = U(A) and Q = U(B), it holds with probability at least 1 − 6 · e −p 2 ·n 0.6 /3 that The assumed strategy ( f * , g * ) is such that when d TV (P, Q) ≤ δ and at most 1 otherwise. In our random choice of the pair of distributions (P, Q), the probability of d TV (P, Q) > δ is at most o n (1). Thus, when applied on the randomly sampled (P, Q). In particular, by averaging, there exists a deterministic constrained agreement strategy with no worse disagreement probability. That is, But from Lemma 2.2 we have that, Putting 3 Correlated sampling over a fixed set of finite size Theorem 1.3 establishes that the correlated sampling strategy underlying Theorem 1.2 is nearly optimal for Ω that is sufficiently large in size. However, it does not say that the strategy underlying Theorem 1.2 is exactly optimal for a fixed set of finite size. The quest for understanding optimality in this setting was motivated by a new encryption scheme proposed by Rivest [13]. But as we will see shortly, this quest is not entirely straightforward! In order to elaborate on this, it will be useful to formally define restricted versions of the correlated sampling strategy which are required to work only when the input pair (P, Q) is promised to lie in a given relation G ⊆ ∆ Ω × ∆ Ω .
Definition 3.1. For a finite set Ω and a relation G ⊆ ∆ Ω × ∆ Ω , a G-restricted correlated sampling strategy with error ε is specified by a probability space R, a pair of functions f , g : ∆ Ω × R → Ω if the following hold for all pairs of distributions (P, Q) ∈ G, For example, letting G to be set of all pairs (P, Q) with d TV (P, Q) ≤ δ essentially recovers the original setting of correlated sampling, for a fixed total variation distance bound between the input distributions.
For the rest of this section, we will consider a special kind of G-restriction corresponding to Alice and Bob having flat distributions over Ω = [n]. Recall from Theorem 2.3, that for all (P, Q) ∈ G n a,b, with P = U(A) and Q = U(B), is given by Moreover, the MinHash strategy applied on input pairs (P, Q) ∈ G n a,b, has a disgreement probability One might suspect that this is optimal for all values of n, a, b and . But rather surprisingly, in the very special case where |A ∩ B| = 1 and |A ∪ B| = n, Rivest [13] gave a strategy with smaller error probability than the above! While we don't know of any applications for this strategy itself, its purpose here is to illustrate that there can be strategies which do better than the MinHash strategy in some special cases. For completeness, we describe this strategy in Section 3.1. Note that for (P, Q) ∈ G a+b−1 a,b,1 , This naturally leads to the question: Is there a better correlated sampling strategy for larger intersection sizes? In fact, the MinHash strategy was conjectured to be optimal in every other case (in particular, for all > 1) by Rivest [13] as this is sufficient for proving the security of his proposed encryption scheme. As partial progress towards this conjecture, we prove that in the other extreme (as compared to Theorem 3.3), the above conjecture does hold. In particular, we show the following theorem in Section 3.2.

Correlated sampling strategy of Rivest [13]
In order to prove Theorem 3.3, we recall Philip Hall's "Marriage Theorem." It is easy to see that both Alice's and Bob's outputs are uniformly distributed in A and B, respectively. Moreover, the probability that they output the same element, is exactly 1/a, which is the probability of choosing the unique matching M r which contains the edge (A, B) (i. e., enforcing A = A and B = B).
The strategy in the general case of a = b is obtained by a simple reduction to the case above. B and not B. To fix this, Bob can simply output a uniformly random element of B whenever the above strategy requires him to return an element of B \ B. It is easy to see that this doesn't change the error probability. be the element that is most frequently output by Alice's strategy f , and denote its number of occurences by

Proof of Theorem 3.5
where the last inequality holds for all n ≥ 2. This contradicts Equation ( which contradicts Equation (3.1).
Remark. In [7], the correlated sampling strategy is used to give a randomized rounding procedure for a linear program. The factor 2 loss in the correlated sampling strategy translates into an integrality gap of at most 2. In fact, they also prove that the integrality gap is roughly tight. As pointed out by an anonymous reviewer, their proof essentially establishes Theorem 3.5 under the assumption that f = g.
4 Correlated sampling strategies of [2,7,6] For sake of completeness, we describe the correlated sampling strategies of Broder and of Kleinberg-Tardos and Holenstein, thereby proving Theorem 1.2. In this case, it is easy to show that the strategy given in Algorithm 2 achieves an error probability of 1 − |A ∩ B|/|A ∪ B|. Since π is a random permutation, f (A, π) is uniformly distributed over A and g(B, π) is uniformly distributed over B. Let i 0 be the smallest index such that π(i 0 ) ∈ A ∪ B. The probability that π(i 0 ) ∈ A ∩ B is exactly |A ∩ B|/|A ∪ B|, and this happens precisely when f (A, π) = g(B, π). Hence, we get the claimed error probability. The correlated sampling strategy of [7,6] follows a similar approach. Strategy: Proof of Theorem 1.2. Given a finite set Ω and probability distributions P and Q over Ω, define The strategy of [7,6] can be intuitively understood as follows. Alice and Bob use the MinHash strategy on inputs A and B over Ω × [0, 1], to obtain elements (ω A , p A ) and (ω B , p B ), respectively, and simply output ω A and ω B , respectively. However, this by itself is not well defined since Ω × [0, 1] is not a finite set. Nevertheless, the MinHash strategy can be modified to instead have a (countably) infinite sequence of points sampled i. i. d. from the uniform distribution over Ω × [0, 1], instead of a permutation π. This strategy is summarized in Algorithm 3.
We can ignore the possibility that no index i A exists satisfying (ω i A , r i A ) ∈ A (similarly for B) since this happens with probability 0.

Discussion and open questions
An immediate open question is to resolve Conjecture 3.4. We reflect on some further open questions that are raised by the results discussed in this paper.

Case of negatively correlated sets
In the context of Conjecture 3.4, even in the setting where the set sizes are allowed to vary slightly, our knowledge is somewhat incomplete. Lemma 2.2 shows optimality of the MinHash strategy when Strategy: In this case, A and B are independent and each of them is p-biased, so |A| ≈ p · n, |B| ≈ p · n and |A ∩ B| ≈ p 2 · n. A simple reduction to Lemma 2.2 also implies the optimality of the MinHash strategy in the case where A and B are positively correlated. Specifically for parameters α > p, consider the following distribution D p,α on pairs (A, B) of subsets of [n], where we first sample S ⊆ [n] by independently including each element of [n] with probability p/α, and then independently including every i ∈ S in each of A and B with probability α. In this case, |A| ≈ p · n, |B| ≈ p · n and |A ∩ B| ≈ α p · n > p 2 · n.
Even if we reveal S to both Alice and Bob, Lemma 2.2 implies a lower bound of 2δ /(1 + δ ) on the error probability (for large enough n). It is unclear if the optimality holds even in the case where A and B are negatively correlated, i. e., when |A| ≈ p · n, |B| ≈ p · n and |A ∩ B| p 2 · n.

Fine-grained understanding of G-restricted correlated sampling
As alluded to in the Introduction, in the setting where P and Q are flat distributions on subsets of Ω of different sizes, there is a strategy with lower error than provided in Theorem 1.2. In particular, for P = U(A) and Q = U(B) where |A| = |B|, the MinHash strategy gives an error probability of (which is the Jaccard distance between A and B). However, naïvely using the strategy of Kleinberg-Tardos and Holenstein would give an error probability of which is higher than the Jaccard distance when |A| = |B|. This implies that the strategy of Kleinberg-Tardos and Holenstein is not always optimal. Thus, it will be interesting to identify the right measure that captures the minimum error of a general G-restricted correlated sampling strategy.

Correlated sampling for infinite spaces
While this paper dealt with correlated sampling for finite sets Ω, it might also be interesting to study it for infinite sets. This needs to be defined carefully in a measure theoretic sense, which could be done as follows. Consider a measure space (Ω, F, µ), where Ω is the sample space, F is a σ -algebra over Ω and µ is a finite measure on (Ω, F). Let ∆ (Ω,F,µ) be the set of all probability measures over (Ω, F) that are absolutely continuous with respect to µ. The respective inputs of Alice and Bob are probability measures P and Q in ∆ (Ω,F,µ) . A correlated sampling strategy for (Ω, F, µ) is given by a pair of functions f , g : ∆ (Ω,F,µ) × R → Ω, where f and g are required to be measurable in their second argument r ∈ R. In order to define the error guarantee in terms of Pr r∼R [ f (P, r) = g(Q, r)], however, we require that the event {(ω, ω) : ω ∈ Ω} be measurable in (Ω × Ω, F ⊗ F). This is true, for example, when Ω is a separable metric space equipped with the standard Borel algebra (see Chapters 3,4 in [14]). We will assume this to be the case in the discussion henceforth and it might be useful to keep in mind a concrete example such as Ω = [0, 1], equipped with the Lebesgue measure.
To the best of our knowledge, it remains open whether there exists a correlated sampling strategy for general measure spaces (Ω, F, µ) with any non-trivial error bound, that is, to get ε(δ ) < 1 for all δ < 1. This is in sharp contrast to coupling, where any two probability measures P and Q with d TV (P, Q) = δ over (Ω, F) can be coupled with a disagreement probability of at most δ .
Suppose the inputs P and Q are promised to be such that the corresponding Radon-Nikodym derivatives (a. k. a. densities) dP/dµ and dQ/dµ are bounded everywhere by a known constant c. Then it is possible to generalize the strategy of Kleinberg-Tardos and Holenstein (Algorithm 3) and get the same error guarantee as in Theorem 1.2; this can be done by using µ instead of the uniform measure on Ω and replacing [0, 1] by [0, c].
However, the problem gets challenging if there is no promised upper bound on the Radon-Nikodym derivatives. One explanation for why this challenge is not faced in obtaining a coupling is because knowing both P and Q, we can always take µ = (P + Q)/2 as a measure with respect to which both P and Q are absolutely continuous and more strongly, the Radon-Nikodym derivatives dP/dµ and dQ/dµ are never greater than 2. On the other hand, for correlated sampling, the players do not have access to such a common µ .
It might also be interesting to study a generalized notion of error in correlated sampling strategies where we wish to minimize E r∼R [d( f (P, r), g(Q, r))] for some metric d : Ω × Ω → R ≥0 over Ω. The error guarantee studied in this paper corresponds to the discrete metric d(x, y) = 1{x = y}. For Ω ⊆ R, such as Ω = [0, 1], we might alternatively want to consider d(x, y) = |x − y|. Since a correlated sampling strategy induces a coupling, this notion of error can never be lower than the Wasserstein distance W 1 (P, Q) (also known as Earth-Mover distance) between the distributions P and Q. To the best of our knowledge, it remains open in this setting, whether correlated sampling strategies can get an error that is never larger than some function of W 1 (P, Q). Ron has current research interests in cryptography, computer and network security, voting systems, and algorithms. In the past he has also worked extensively in the area of machine learning.
Ron is a coauthor (with Thomas Cormen, Charles Leiserson, and Clifford Stein) of the well-known text Introduction to Algorithms, published by MIT Press. Over 500,000 copies of this text have been sold. It has been translated into 12 languages.
Ron is an inventor of the RSA public-key cryptosystem. He has extensive experience in cryptographic design and cryptanalysis, and has published numerous papers in these areas. He is a founder of RSA Data Security. (RSA was bought by Security Dynamics; the combined company was renamed to RSA Security, and later purchased by EMC), and is also a co-founder of Verisign and of Peppercoin.
Ron is a member of the CalTech/MIT Voting Technology Project. He served 2004-2009 on the Technical Guidelines Development Committee (TGDC), advisory to the Election Assistance Commission, developing recommendations for voting system certification standards; he was chair of the TGDC's Computer Security and Transparency Subcommittee. He also serves on the Board of the Verified Voting Foundation. He is a member of a Scantegrity team developing and testing voting systems that are verifiable "end-to-end." He has worked extensively on statistical post-election tabulation audits, of both the "risk-limiting audit" and "Bayesian" flavors.
Ron is a member of the Center for Science of Information. While Madhu's research explores communication and computational complexity, he prefers simplicity, and is especially proud of his exposition (with Peter Gemmell) on the Berlekemp-Welch decoding algorithm and his exposition (with David Xiang) of the analysis of the Lempel-Ziv compression algorithm.