A communication game related to the sensitivity conjecture

One of the major outstanding foundational problems about boolean functions is the sensitivity conjecture, which (in one of its many forms) asserts that the degree of a boolean function (i.e. the minimum degree of a real polynomial that interpolates the function) is bounded above by some fixed power of its sensitivity (which is the maximum vertex degree of the graph defined on the inputs where two inputs are adjacent if they differ in exactly one coordinate and their function values are different). We propose an attack on the sensitivity conjecture in terms of a novel two-player communication game. A lower bound of the form $n^{\Omega(1)}$ on the cost of this game would imply the sensitivity conjecture. To investigate the problem of bounding the cost of the game, three natural (stronger) variants of the question are considered. For two of these variants, protocols are presented that show that the hoped for lower bound does not hold. These protocols satisfy a certain monotonicity property, and (in contrast to the situation for the two variants) we show that the cost of any monotone protocol satisfies a strong lower bound. There is an easy upper bound of $\sqrt{n}$ on the cost of the game. We also improve slightly on this upper bound.

1. Introduction 1.1. A Communication Game. The focus of this paper is a somewhat unusual cooperative two player communication game. The game is parameterized by a positive integer n and is denoted G n . Alice receives a permutation σ = (σ 1 , . . . , σ n ) of [n] = {1, . . . , n} and a bit b ∈ {0, 1} and sends Bob a message (which is restricted in a way that will be described momentarily). Bob receives the message from Alice and outputs a subset J of [n] that must include σ n , the last element of the permutation. The cost to Alice and Bob is the size of the set |J|.
The message sent by Alice is constrained as follows: Alice constructs an array v consisting of n cells which we will refer to as locations, where each location v ℓ is initially empty, denoted by v ℓ = * . Alice gets the input as a data stream σ 1 , . . . , σ n , b and is required to fill the cells of v in the order specified by σ. After receiving σ i for i < n, Alice fills location σ i with 0 or 1; once written this can not be changed. Upon receiving σ n and b, Alice writes b in location σ n . The message Alice sends to Bob is the completed array in {0, 1} n .
A protocol Π is specified by Alice's algorithm for filling in the array, and Bob's function mapping the received array to the set J. The cost of a protocol c(Π) is the maximum of the output size |J| over all inputs σ 1 , . . . , σ n , b. For example, consider the following protocol. Let k = ⌈ √ n⌉. Alice and Bob fix a partition of the locations of v into k blocks each of size at most k. Alice fills v as follows: When σ i arrives, if σ i is the last location of its block to arrive then fill the entry with 1 otherwise fill it with 0. Notice that if b = 1 then the final array v will have a single 1 in each block. If b = 0 then v will have a unique all 0 block.
Bob chooses J as follows: if there is an all 0 block, then J is set to be that block, and otherwise J is set to be the set of locations containing 1's. It is clear that σ n ∈ J and so this is a valid protocol. In all cases the size of J will be at most k and so the cost of the protocol is ⌈ √ n⌉. We will refer to this protocol as the AND-OR protocol. In Section 2.1 we remark on this protocol's connection to the boolean function Let us define C(n) to be the minimum cost of any protocol for G n . We are interested in the growth rate of C(n) as a function of n. In particular, we propose: Question 1. Is there a δ > 0 such that C(n) = Ω(n δ )?
1.2. Connection to the Sensitivity Conjecture. Why consider such a strange game? The motivation is that the game provides a possible approach to the well known sensitivity conjecture from boolean function complexity.
Recall that the sensitivity of an n-variate boolean function f at an input x, denoted s x (f ), is the number of locations ℓ such that if we flip the bit of x in location ℓ then the value of the function changes. (Alternatively, this is the number of neighbors of x in the hamming graph whose f value is different from f (x).) The sensitivity of f , s(f ), is the maximum of s x (f ) over all boolean inputs x.
The degree of a function f , deg(f ), is the smallest degree of a (real) polynomial p in variables x 1 , . . . , x n that agrees with f on the boolean cube. An easy argument (given in Section 2) connects the cost function C(n) of the game G n to the sensitivity conjecture: Proposition 3. For any boolean function on n variables, s(f ) ≥ C(deg(f )).
In particular, an affirmative answer to Question 1 would imply the sensitivity conjecture. We note that Andy Drucker [5] independently formulated the above communication game, and observed its connection to the sensitivity conjecture.
1.3. Background on the Sensitivity Conjecture. Sensitivity and degree belong to a large class of complexity measures for boolean functions that seek to quantify, for each function f , the amount of knowledge about individual variables needed to evaluate f . Other such measures include decision tree complexity and its randomized and quantum variants, certificate complexity, and block sensitivity. The value of such a measure is at most the number of variables. There is a long line of research aimed at bounding one such measure in terms of another. For measures a and b let us write a ≤ r b if there are constants C 1 , C 2 such that for every total boolean function f , a(f ) ≤ C 1 b(f ) r + C 2 . For example, the decision tree complexity of f , D(f ), is at least its degree deg(f ) and thus deg ≤ 1 D. It is also known [11] that D ≤ 3 deg. We say that a is polynomially bounded by b if a ≤ r b for some r > 0 and that a and b are polynomially equivalent if each is polynomially bounded by the other.
The measures mentioned above, with the notable exception of sensitivity, are known to be polynomially equivalent. For example, Nisan and Szegedy [13] proved bs(f ) ≤ 2 deg(f ), and also proved a result in the other direction, which was improved in [2] to deg(f ) ≤ 3 bs(f ). For a survey of such results, see [3] and [8]; some recent results include [1,6].
The sensitivity conjecture, posed as a question by Nisan [12] asserts that s(f ) is polynomially equivalent to at least one (and therefore all) of the other measures mentioned. There are many reformulations and related conjectures; see [8] for a survey.
The sensitivity conjecture perhaps more commonly appears as a question on the relationship between sensitivity and block sensitivity. For example, Nisan and Szegedy [13] asked specifically if bs(f ) = O(s 2 (f )) for all functions, and as of this writing no counterexample has been given. The best known bound relating sensitivity to another measure was given by Kenyon and Kutin [9]. They proved that bs(f ) ≤ e 2π e s(f ) s(f ) for all boolean functions.
1.4. Outline of the Paper. In Section 2 we prove that a positive answer to Question 1 would imply the sensitivity conjecture. We show that adversary arguments for proving that boolean functions are evasive (that is have decision tree complexity D(f ) = n) provide strategies for the communication game. We also prove that it suffices to answer Question 1 for the restricted class of order oblivious protocols.
In Section 3 we present three stronger variants of Question 1. We exhibit protocols that show that two of these variants have negative answers. One might then expect that variants of one of these protocols might lead to a negative answer to Question 1. However, we observe that these protocols satisfy a property called monotonicity and in Section 4 we prove a ⌊ √ n⌋ lower bound on the cost of any monotone protocol. Thus a protocol that gives a negative answer to Question 1 must look quite different from the two protocols that refuted the strengthenings. We also prove a rather weak lower bound for a special class of protocols called assignment oblivious protocols. Finally, in Section 5 we construct a protocol with cost .8 √ n, thus beating the AND-OR protocol by a constant factor. Let r(k) = log(C(k)/ log(k)). After a preliminary version of our paper appeared, Szegedy [14] showed that for any k, C(n) = O(n r(k) ). Our example shows that there is a k for which r(k) < 1/2 and so it follows that C(n) = O(n 1/2−δ ) for some δ > 0. Szegedy further showed that C(30) ≤ 5 which gives the best currently known upper bound C(n) = O(n 0.4732 ).

Connection between the Sensitivity Conjecture and the Game
In this section we prove Proposition 3, which connects the sensitivity conjecture with the two player game described in the introduction.
We use e ℓ to denote the assignment in {0, 1} n that is 1 in location ℓ and 0 elsewhere. Given v, w ∈ {0, 1} n , v ⊕ w denotes their bitwise mod-2 sum.
Alice's strategy maps the permutation-bit pair (σ, b) to a boolean array v and Bob's strategy maps the array v to a subset of [n]. We now show that for each strategy for Alice there is a canonical best strategy for Bob. For a permutation σ, Π A (σ) denotes the array Alice writes while receiving σ 1 , · · · , σ n−1 (so location σ n is labeled * ). Thus Π A (σ) can be viewed as an edge in the hamming graph H n whose vertex set is {0, 1} n , with two vertices adjacent if they differ in one coordinate. The edge set E(Π) of a protocol Π is the set of edges Π A (σ) over all permutations σ. This defines a subgraph of H n . Given Alice's output v, the possible values for σ n are precisely those locations ℓ that satisfy (v, v ⊕ e ℓ ) is an edge in E(Π). Thus the best strategy for Bob is to output this set of locations. It follows that c(Π) is equal to the maximum vertex degree of the graph E(Π).
Proposition 3 will therefore follow by showing the following: Given a boolean function with degree n and sensitivity s, there is a strategy Π for Alice for the game G n such that the graph E(Π) has maximum degree at most s.
We need a few preliminaries. A subfunction of a boolean function f is a function g obtained from f by fixing some of the variables of f to 0 or 1. For a subfunction g of f , s(f ) ≥ s(g). We say a function has full degree if deg(f ) is equal to the number of variables of f . We start by recalling some well known facts.

Lemma 4.
For any boolean function f there exists a subfunction g on deg(f ) variables that has full degree.
Proof. If p is the (unique) multilinear real polynomial that agrees with f on the boolean cube, then p contains a monomial ℓ∈S x ℓ where |S| = deg(f ). Let g be the function obtained by fixing the variables in [n] \ S to 0. Then g is a function on deg(f ) variables that has full degree.

Lemma 5.
Given a function f with full degree and a location ℓ, there exists a bit b such that the function obtained from f by fixing x ℓ = b is also of full degree.
Proof. The polynomial (viewed as a function from {0, 1} n → {0, 1}) for f may be written in the form Here p 1 (x 1 , x 2 , · · · , ✚ ✚ x ℓ , · · · , x n ) indicates that the variable x ℓ is not an input to the polynomial. If p 1 has a non zero coefficient on the monomial k =ℓ x k , then we set x ℓ = 0 and the resulting function will have full degree. For the other case, note p 2 must have a non zero coefficient on k =ℓ x k because f has full degree. Thus, setting The proof of this lemma is essentially the same as the standard argument that the decision tree complexity of any function f is at least deg(f ).
We are now ready to prove Proposition 3.
Proof. Given f , let g be a subfunction on deg(f ) variables with full degree. We construct a protocol Π that satisfies E(Π) ⊆ E(g), where E(g) denotes the set of sensitive edges for the function g, i.e. the edges of H n whose endpoints are mapped to different values by g, which implies c(Π) ≤ s(g) ≤ s(f ), and thus proves the proposition. As Alice receives σ 1 , σ 2 , · · · , σ n , she fills in v so that the restriction of f to each partial sucessive partial assignment remains a full degree function, which is possible by Lemma 5. After Alice fills location σ n−1 , the function g restricted to v is a nonconstant function of one variable, and so the edge Π A (σ) is a sensitive edge for g. This implies that The proof shows that a degree n Boolean function having sensitivity s can be converted into a strategy for Alice for the game G n of cost at most s. We don't know whether this connection goes the other way, i.e., we can't rule out the possibility that the answer to Question 1 is negative (there is a very low cost protocol for G n ) but the sensitivity conjecture is still true.

Connection to Decision Tree
Complexity. An n-variate boolean function is evasive if its decision tree complexity is n. A common method for proving evasiveness is via an adversary argument. View the problem of evaluating the function by a decision tree as a game between the querier who wishes to evaluate the function and who decides which variable to read next, and the adversary who decides the value of the variable. A function is evasive if there is a strategy for the adversary that forces the querier to ask all n quesitons. For example, to prove that is evasive, the adversary can use the strategy: answer 0 to every variable unless the variable is the last variable in its -block, in which case answer 1. This adversary is exactly Alice's part of the AND-OR protocol described in the introduction. For more examples of adversary arguments see [10].
Every evasive function f by definition admits an adversary argument, and this corresponds to a protocol Π for Alice. In fact a function f is evasive if and only if there exists a protocol Π for which E(Π) ⊆ E(f ) (recall E(f ) is the set of sensitive edges of the function f ), and thus the cost (size of the set chosen by Bob) is at most the sensitivity of f . This work explores whether we can use the structure of an arbitrary adversary (or protocol) to exhibit a lower bound on sensitivity.

Order Oblivious Protocols.
In the game G n , at each step i < n, the value written by Alice at location σ i may depend on her knowledge up to that step, which includes both the sequence σ 1 , · · · , σ i and the partial assignment already made to v at locations σ 1 , . . . , σ i−1 . A natural way to restrict Alice's strategy is to require that the bit she writes in location σ i depends only on σ i and the current partial assignment to v but not on the order in which σ 1 , . . . , σ i−1 arrived. A protocol satisfying this restriction is said to be order oblivious. The following easy proposition shows that it suffices to answer Question 1 for order oblivious protocols. Proposition 6. Given any protocol Π there exists an order oblivious protocol Proof. First some notation. Given a permutation σ let σ ≤k denote the prefix of the first k elements of σ. We let Π A (σ ≤k ) denote the partial assignment written on v after Alice has been streamed σ 1 , · · · , σ k .
Given Π we define an order oblivious protocol Π ′ of cost at most that of Π. We define Π ′ in steps, (where in step i Alice receives σ i and writes a bit in that location). Given k ≥ 0 we assume that Π ′ has been defined up through step k and has the property that for every permutation σ, there is a permutation τ of σ 1 , · · · , σ k so that Π A (τ ) = Π ′ A (σ ≤k ). Suppose σ k+1 arrives and the current state of the array is v := Π ′ (σ ≤k ). From v Alice can deduce the set {σ 1 , . . . , σ k } (the set of locations not labeled *). Alice then considers all permutations τ of σ 1 , · · · , σ k such that Π A (τ ) = Π ′ A (σ ≤k ) and picks the lexicographically smallest permutation (call it τ * ) in that set and writes on location σ k+1 according to what Π does after τ * . Note that the bit written on location σ k+1 does not depend on the relative order of σ 1 , σ 2 , · · · , σ k .

Stronger Variants of Question 1
We now present three natural variants of Question 1, and refute two of them by exhibiting and analyzing some specific protocols.
The cost function c(Π) of a protocol is the worst case cost over all choices of σ 1 , . . . , σ n , b. Alternatively, we can consider the average size (with respet to random σ and b) of the set Bob outputs. We call this the expected cost of Π and denote it byc(Π). LetC(n) denote the minimum expected cost of a protocol for G n . Question 7. Is there a δ > 0 such thatC(n) = Ω(n δ )?
An affirmative answer to this question would give an affirmative answer to Question 1. It is well known that the natural probabilistic version of the sensitivity conjecture, where sensitivity is replaced by average sensitivity (wiith respect to the uniform distribution over {0, 1} n ) is trivially false (for example, for the OR function). However, there is apparently no connection between average sensitivity and average protocol cost. For example, the protocol induced by the decision tree adversary for OR has Alice write a 0 at each step. Note that E(Π) is exactly the set of sensitive edges for the OR function. However, the average costc(Π) is n/2 whereas the average sensitivity of the OR function is o(1).
We also remark that an analog of Proposition 6 holds for the cost functionc(Π), and therefore it suffices to answer the question for order oblivious protocols. (The proof of the analog is similar to the proof of Proposition 6, except when modifying the protocol τ * is not selected to be the lexicographically smallest permutation in the indicated set, but rather the permutation in the indicated set that minimizes the expected cost conditioned on the first k steps. ) There is another natural variant of Question 1 based on average case. When we run a fixed protocol Π on a random permutation σ and bit b, we can view the array v produced by Alice as a random variable. Leth(Π) be the conditional entropy of σ n given v; intuitively this measures the average number of bits of uncertainty that Bob has about σ n after seeing v. It is easy to show that this is bounded above by log(c(Π)). LetH(n) be the minimum ofh(Π) over all protocols Π for G n . The analog of Question 1 is whether there is a positive constant δ such thatH(n) = Ω(δ log(n)). An affirmative answer to this would have implied an affirmative answer to Question 1, but the answer to this new question turns out to be negative.
Remark: It might seem that this could be proved by giving a protocol Π that is not order oblivious and converting it into an order oblivious protocol as described earlier.. However, while we know that this can be done without increasing worst case cost or average cost, it is possible thath may inrease. Therefore, we construct the desired order oblivious protocol directly.
Proof. Let k = ⌈log(n)⌉ and associate each location ℓ ∈ [n] to its binary expansion, viewed as a vector b(ℓ) ∈ F k 2 . Note that 0 / ∈ [n], and thus each vector b(ℓ) is nonzero. For an array v ∈ {0, 1} n we define Γ(v) to be n i=1 b(ℓ), i.e. the vector in F k 2 obtained by summing the vectors corresponding to the 1 entries of v. Say that an array v ∈ {0, 1, * } n is admissible if there is a way of filling in the *'s (a completion) so that for the resulting array w we have Γ(w) = 0 k , where 0 k is the all 0 vector in F k 2 . For an admissible array v, letv be the unique completion of v such that (1) Γ(v) = 0 k , (2) The number r of 0's inv is minimum, (3) the ordered sequence ℓ 1 < · · · < ℓ r of locations of the 0's inv is lexicographically minimum subject to conditions (1) and (2), i.e., for each j ∈ [r], ℓ j is minimum possible given ℓ 1 , . . . , ℓ j−1 .
We now describe the protocol. Let t > k be an integer (which we'll choose to be ⌈log 2 (n)⌉). Alice says 0 for the first n − t steps. The resulting array u has n − t 0's and t * 's. Since u can be completed to the all 0 array, u is admissible. Furthermore, among the * positions there must be a set of at most k vectors that sum to 0 k , soû has at most k 1's. Alice fills in the remaining positions to agree withû. This strategy is order oblivious: a simple induction shows that for each array w reached under the above strategy, w is admissible andŵ =û, so Alice's strategy is equivalent to filling position σ k (for k ≥ n − t) according toŵ where w is the array after k − 1 steps. This is clearly an order oblivious strategy Let v denote the array in {0, 1} n received by Bob. We now obtain an upper bound on the conditional entropy of σ n given v. Let u = u(σ) be the array obtained after the first n − t steps and let T (σ) be the set of positions of *'s in u. Let S(σ) be the subset of T (σ) consisting of those positions set to 0 inû. Let L be the random variable that is 1 if σ n ∈ S(σ) and 0 otherwise. Since S(σ) depends only on the set T (σ) and not on the order of the last t locations, the probability that L = 1 is |S(σ)|/|T (σ)| ≤ log(n)/ log 2 (n) = 1/ log(n). We have: For our last variant, suppose Alice can communicate to Bob with a w-ary alphabet instead of a binary alphabet. Thus, Alice is streamed a permutation σ, and when σ i arrives she may write any of the symbols {1, . . . , w} on location σ i in v. At the last step b ∈ {1, . . . , w} arrives and Alice must write it in location σ n . Bob sees v and has to output a set J that contains σ n . The cost of the protocol is the maximum size of J over all σ and b.
We will show that Question 1 is false in this setting. To state our result we need some definitions. Fix r > 1 and positive integer k 0 . For n ≥ k 0 define for each integer j ≥ 0 the function t j defined on integers n ≥ k 0 . The function t 0 is given by t 0 (n) = n for all n. For j ≥ 1, t j is defined inductively t j (n) = max(k 0 , ⌈log r (t j−1 (n))⌉). Observe that for j ≥ 2 we have t j (n) = t j−1 (t 1 (n)) = t 1 (t j−1 (n)). Thus t j depends on parameters r and k 0 and is a minor variant of the base r iterated log function.
Theorem 9. For each j ≥ 0 there is a protocol Π j using the alphabet {1, . . . , 2j + 1} that has cost at most t j (n), where the parameters needed to define t j are r = 2 1/4 and some sufficiently large k 0 .
For example, for a ternary alphabet the cost of the protocol is O(log(n)) and for a 5-ary alphabet the cost is O(log log(n)). To prove this, we'll need a few elementary standard facts about error correcting codes. We include proofs to make the presentation self-contained.
Proposition 10. For each n ≥ 2 there is a coloring χ n of the subsets of [n] by the set [n 2 ] such that any two sets that have symmetric difference at most 2 get different colors.
Proof. Construct the graph whose vertices are subsets of [n] with two vertices joined by an edge if their symmetric difference has size 1 or 2. The degree of any vertex is n(n + 1)/2 < n 2 , and so the graph has a proper coloring with color set [n 2 ].
If Σ is a finite alphabet and s ∈ Σ k , a deletion error is the removal of some symbol from the string (shrinking the length by 1). We need the following (which is much weaker than what is possible, but is all we need.) Proposition 11. There is a k 0 such that for all integers k ≥ k 0 there is a code C k of size at least 2 k/2 over {0, 1} k that can correct ⌈4 log 2 (k)⌉ deletion errors.
We note that the k 0 that is needed for Theorem 9 will be the k 0 provided by this Proposition.
Proof. We can choose C k to be a maximal independent set in the graph on {0, 1} k in which two strings x and y are joined if there is a string z that can be obtained from each of them by at most ⌈4 log 2 (k)⌉ deletions. If ∆ is the maximum degree of the graph then any maximal independent set has size at least 2 k /(∆ + 1) and ∆ is at most k ⌈4 log 2 (k)⌉ 2 2 ⌈4 log(k)⌉ (since given x each neighbor y of x can be constructed by selecting the subset of ⌈4 log 2 (k)⌉ positions to delete from x, the subset of ⌈4 log(k)⌉ positions to delete from y and the values of the bits deleted from y). For sufficiently large k this is at most 2 k/2 − 1.
Proof of Theorem 9. Fix k 0 according to Proposition 11 and let r = 2 1/4 . Note that log r (n) = 4 log 2 (n). Define the functions t j as above.
For n ≤ k 0 our protocol will just have Alice write the same symbol every time and Bob output [n]. So assume n > k 0 .
We prove the theorem by induction on j. For the induction we need to strengthen the theorem to say that the constructed protocol Π j works in j + 1 phases numbered 0 to j where during phase 0, Alice sees t 0 (n) − t 1 (n) permutation values and writes only 2j + 1 and during phase i ∈ [1, j − 1] Alice processes the next t i (n) − t i+1 (n) permutation values and writes only symbols 2(j − i) + 1 and 2(j − i) + 2. During phase j, Alice processes t j (n) − 1 permutation values and writes symbols 1 and 2.
The protocol Π 0 is trivial: the alphabet is {1} and t 0 (n) = n and t 1 (n) = 1. Alice writes only 1's. and Bob outputs the set [n]. Now suppose j > 1 and that Π j−1 has been defined. Phase 0 of Π j is prescribed. Let t = t 1 (n) and let S = {s 1 < . . . < s t } be the unfilled positions after phase 0. Alice identifies the set S with the set [t] by the correspondence s j ↔ j and views the remaining t symbols of σ as a permutation σ ′ of [t]. The remaining j − 1 phases of the Π j correspond to the protocol Π j−1 run on σ ′ , so Phase i of Π j corresponds to Phase i − 1 of Π j−1 run on σ ′ . For i ≥ 2, Phase i of Π j is exactly the same as Phase i − 1 of Π j−1 . However, Phase 1 of Π j is different from Phase 0 of Π j−1 . In Phase 0 of Π j−1 the only symbol written is 2j − 1 but in Phase 1 of Π j both symbols 2j − 1 and 2j are used. Since t ≥ k 0 , we can construct C t as in Proposition 11 and by changing the alphabet, we can view C t as a subset of {2j − 1, 2j} t . By the choice of t = ⌈log b n⌉ ≥ 4 log 2 n, we have n 2 ≤ 2 t/2 so we can fix a 1-1 map g from [n 2 ] to C t . Alice computes g(χ n (S)) where χ n comes from proposition 10. This is a string y ∈ {2j − 1, 2j} t and during phase 1, Alice write y i on location s i . This completes the specification of Π j .
We now turn to Bob's strategy for choosing the set J to output. Let A i be the set {2i+ 1, 2i+ 2}. During phase i, Alice only writes symbols from A j−i so the number of symbols from A j−i written by Alice is When receiving Alice's output array Bob can count the number of symbols from each A j−i . For all but one i this will be d i (n), and will be 1 + d i (n) if and only i = i * .
If i * = 0 then Bob knows the set of positions that Alice wrote 2j + 1 to during phase 0, and therefore knows the set S of t 1 (n) positions that remained unfilled at the end of phase 0. Since b < 2j + 1, by identifying symbols 2j and 2j − 1, Bob can interpret the array restricted to S as the output of Π j on a set of size t 1 (n). By induction he can determine a set of size at most t j−1 (t(n)) = t j (n) that contains σ n .
This leaves the case i * = 1 Then Bob sees n − t 1 (n) + 1 positions that contain 2j + 1 one of which is σ n . Let S ′ be the set of positions that don't have 2j + 1 written on them. Then Bob knows S ′ .
We argue that Bob can recover the set S of positions not written during Phase 0. From this, Bob will know σ n , since S − S ′ = {σ n }.
For those positions s i ∈ S that Alice wrote during phase 1, Alice wrote y i in position s i where y = g(χ n (S)). The number of symbols written during phase 1 is t 1 (n) − t 2 (n) = t − ⌈4 log 2 (t)⌉ (unless j = 1 in which case t − 1 symbols were written in phase 1). Thus the string z seen by Bob (using symbols from {2j −1, 2j}) is obtained from y with at most ⌈4 log 2 (t)⌉ symbols deleted. Since C t is robust against ⌈4 log 2 (t)⌉ deletions, Bob can recover y from z. He then knows g −1 (y) = χ n (S). The choice of χ n implies that S is uniquely determined from S ′ and χ n (S), so Bob recovers S and therefore σ n .

Lower Bounds for Restricted Protocols
In the previous section, two stronger variants of Question 1 turned out to have negative answers, which may suggest that Question 1 also has a negative answer. In this section however, we prove a lower bound which implies that any counterexample to Question 1 will need to look quite different from the two protocols provided in the last section.
An order oblivious protocol can be specified by a sequence of maps A 1 , · · · , A n where each A i maps partial assignments on the set [n] to a single bit. When location σ i arrives, the bit Alice writes is A σ i (v). For partial assignments α and β, we say that β is an extension of α, denoted as β ≥ α, if β is obtained from α by fixing additional variables. An order oblivious protocol is monotone if each of the maps A 1 , · · · , A n are monotone with respect to the extension partial order. That is, if β ≥ α are partial assignments, then A i (β) ≥ A i (α) for each i. As a remark, when running the protocol there may be assignments that are never written on v, however defining each A i to have domain all partial assignments is still valid and simplifies notation.
Both the AND-OR protocol described in the introduction and the protocol constructed in Theorem 8 are examples of monotone protocols. Monotonicity generalizes to protocols on w-ary alphabets, and the w-ary protocol of Theorem 9 is monotone (if we order the alphabet symbols in reverse 2j + 1 < 2j < · · · < 1). Our main result in this section is that monotone protocols on binary alphabets have cost at least ⌊ √ n⌋. In particular, Question 1 is true for such protocols. For the rest of the paper, all protocols will be on binary alphabets.
Theorem 12. All monotone protocols have cost at least ⌊ √ n⌋.
Proof. Let Π be a monotone protocol. We show that E(Π) has a vertex of degree at least ⌊ √ n⌋.
For a permutation σ denote by bump k (σ) the permutation obtained from σ by "bumping" the element k to the end of σ and maintaining the same relative order for the rest of σ. For example, bump 1 (321654) = 326541.
We let w(σ) denote the array Π A (σ) with the entries sorted in σ order. In other words, w(σ) is the array defined by w(σ) i = Π A (σ) σ i . Claim 13. Let σ be any permutation and let τ be obtained from σ by performing some sequence of bumps on σ. Suppose that τ and m < n satisfy the following: • The elements τ 1 , τ 2 , · · · , τ m were never bumped.
Proof. The claim follows easily by induction on i. Suppose we have already shown that w(τ ) begins with (i− 1) 0's. Let v(σ, k) denote the partial assignment written on v just before Alice receives the index k (here the reader should take care to distinguish this from the partial assignment just before Alice receives σ k ). Consider the partial assignment v(τ, τ i ). It follows from the first assumption and the inductive hypothesis that v(σ, τ i ) is an extension of v(τ, τ i ). Thus, since Alice originally wrote a 0 on location τ i , by monotonicity she continues to write a 0 on that location when being streamed τ (that is Π A (τ ) τ i = 0).
Let σ be the permutation such that w(σ) is lexicographically minimum. Claim 14. w(σ) consists of a block of 0's, followed by a block of 1's, followed by a single *.
Proof. The result is trivial if their are no 1's. Let j be the location of the first 1, and let k be the last position in the block of 1's beginning at j. We claim k = n − 1. Suppose k < n − 1. Then there is a 0 in position k + 1. Let τ be obtained from σ by bumping σ j , . . . , σ k . By Claim 13, w(τ ) begins with j 0's, contradicting the lexicographic minimality of σ.
Let n−t be the number of initial 0's in w(σ) so the number of 1's ist−1. Let T = {σ n−t+1 , . . . , σ n } and let x be the vector that is 1 in those positions and 0 elsewhere. For k between 1 and n, let The vectors of the form Π A (φ) and w(φ) have a single *. For b ∈ {0, 1} we write Π A (φ, b) and w(φ, b) for the vectors obtained by replacing the * by b.
Claim 15. The vertices Π A (τ (k) , 1) for k ∈ T are all equal to x. Therefore x belongs to an edge in direction k for each k ∈ T and so has degree at least t in E(Π).
Proof. Let k ∈ T . Clearly w(τ (k) , 1) has the first n−t bits 0, and so by the choice of σ the remaining bits are 1. This implies Π A (τ (k) has 1's in the positions indexed by the last t elements of τ (k) which is the set T .
To conclude the proof of the theorem we will find an assignment y that has degree at least (n − t)/(t + 1) in the graph E(Π).
Claim 16. For k among the first n − t elements of σ, w(τ (k) ) has the first n − t − 1 bits equal to 0, and has at most one 0 among the next t bits (and last bit *).
Proof. Claim 13 immediately implies that the first n−t−1 bits of w(τ (k) ) are 0. Now take all of the locations that are labeled 1 in Π A (τ (k) ) and bump them to the end and let this new permutation be ρ. Claim 13 implies that all 0's remain 0. By the lexicographic minimality of w(σ), w(ρ) has at most n − t 0's which implies that there was at most a single 0 in τ (k) in positions n − t + 1 or higher. Now classify each of the first n − t elements of σ into sets S n−t , . . . , S n . Element k ∈ S n if w(τ (k) ) has t 1's. Otherwise k ∈ S j where j is the location of the unique 0 of w(τ (k) ) in locations n − t to n − 1. Choose j * so that |S j * | is maximum and let m = |S j * |, which is at least (n − t)/(t + 1). For k ∈ S j * , let y ki) = Π A (τ (k) , 0). Let u = σ j * +1 and let y be the vector that is 1 on the positions of T − {u} and 0 elsewhere.
Claim 17. The assignments y (k) for k ∈ S j * are all equal to y, and thus y has degree at least m in E(Π).
Proof. By the definition of the bump operation the sequence of elements appearing in positions n − t, . . . , n − 1 in τ (k) is σ n−t+1 , . . . , σ n and the element in position j * of τ (k) is σ j * +1 = u. Thus y (k) is 1 on the elements of T − {u} and 0 elsewhere.
We thus have a point x of degree at least t and a point y of degree at least (n − t)/t + 1 in E(Π). This implies that cost of Π is at least max(t, (n − t)/(t + 1)) > √ n − 1 and is thus at least ⌊ √ n⌋.
As demonstrated by the AND-OR protocol, Theorem 12 is tight up to a constant factor. We remark that the monotone protocols we consider here seem to have no general connection to the class of monotone boolean functions, and our result for monotone protocols seems to be unrelated to the easy and well known fact that the sensitivity conjecture is true for monotone functions.
We conclude this section with a lower bound for a second class of protocols. Although the lower bound is only logarithmic, proving a logarithmic lower bound for all protocols with a large enough constant would improve on the best known bounds relating degree and sensitivity.
We need a few definitions. Recall that an edge e ∈ H n may be written as an array in {0, 1, * } n for which e ℓ = * on exactly one location ℓ. We call this location ℓ the free location of that edge. We say two edges e, e ′ collide if e ℓ = e ′ ℓ for all ℓ that is not a free location of either edge. Equivalently, two edges collide if they share at least one vertex (each edge collides with itself). Both of the lower bounds in this section will follow by finding an edge e ∈ E(Π) that collides with m other edges in E(Π). This implies at least one of the vertices in e has degree at least m/2 in the graph E(Π), which in turn lower bounds the cost of the protocol.
For a permutation σ, we write ℓ < σ k to denote that the element ℓ comes before the element k in σ. Let S k (σ) = {ℓ : ℓ < σ k}. For example, if σ = 321654 then S 1 (σ) = {2, 3}. We say a protocol is assignment oblivious if the bit written by Alice in location k only depends on the set S k (σ) (and not on the assignment of bits to that set). Such protocols can be described by a collection of n hypergraphs H 1 , H 2 , · · · , H n , where each H ℓ is a hypergraph with vertex set [n] \ {ℓ}. When k arrives, Alice writes a 1 if and only if the set S k (σ) is in H k .
Proof. Let Π be an assignment oblivious protocol.
Given a permutation σ = σ 1 σ 2 · · · σ n and k ∈ [n] we define swap k (σ) to be the permutation obtained by swapping the positions of the elements k and σ n within σ and keeping every other element in the same place. For example, swap 3 (654321) = 654123. The lemma will follow by constructing a permutation σ such that that Π A (σ) and Π A (swap k (σ)) collide for each k ∈ {σ n−1 , · · · , σ n−⌈log 2 (n)⌉ } We build up such a σ in a greedy manner. We start with setting σ n−1 = 1. With σ n−1 fixed, the bit Alice writes in location 1 is completely determined by σ n (and does not depend on the values we later choose for σ 1 , · · · , σ n−2 ). This holds by the assignment oblivious property and because S 1 (σ) = {ℓ : ℓ = 1, σ n }. Let R 1 be the locations ℓ for which setting σ n = ℓ results in Alice writing a 1 in location 1. At least one of |R 1 |, |R c 1 | are bigger than ⌈(n − 1)/2⌉, let T 1 be that set. Now we fix σ n−2 to be any element in T 1 .
Having fixed σ n−1 and σ n−2 , the bit Alice writes on location σ n−2 also only depends on the value of σ n . Now let R 2 be the subset of indices j in T 1 such that setting σ n = j would cause Alice to write a 1 in location σ n−2 . At least one of |R 2 |, |R c 2 | are bigger than ⌈(|T 1 | − 1)/2⌉, let T 2 ⊆ T 1 be that set. This process is iteratively repeated. At step i we set σ n−i to be an arbitrary element of T i−1 . With σ n−1 , · · · , σ n−i now fixed, the value written in location σ n−i depends only on the value of σ n . The set R i is defined to be all such values of σ n that result in Alice writing a 1 in location σ n−i and T i ⊆ T i−1 is defined to be the larger of |R i | and |R c i |. We proceed until the set T i has only one element in it, in this case we assign σ n to be that element. This process will take at least ⌈log 2 (n)⌉ steps. We then assign the remaining elements to σ 1 , · · · , σ n−i−1 in an arbitrary order.
Proof. Let σ ′ = swap k (σ). If ℓ < σ k then S ℓ (σ) = S ℓ (σ ′ ) and so Alice writes the same bit to location ℓ under both permutations. Suppose that ℓ > σ k. Let j be such that σ n−j = ℓ. Note that σ n−1 = σ ′ n−1 , · · · , σ n−j = σ ′ n−j . Recall that holding σ n−1 , · · · , σ n−j fixed, the bit Alice writes at location ℓ depends only on the value of σ n , and furthermore that bit is the same as for all settings of σ n ∈ T j . Since both σ n and σ ′ n = k are in the set T j , it follows that Π A (σ) ℓ = Π A (σ ′ ) ℓ .
By the above claim, σ collides with swap k (σ) for at least ⌈log 2 (n)⌉ values of k. Furthermore, at least one of the vertices in Π A (σ) has degree more than ⌈log 2 (n)/2⌉. This concludes the proof.

A Protocol with Lower Cost than the AND-OR Protocol
The AND-OR protocol has cost ⌈ √ n⌉ which matches our lower bound for monotone protocols (within 1). In this section we show that non-monotone protocols can give at least a small advantage: Theorem 20. For some ε > 0 and all sufficiently large n there is a protocol Π win c(Π) ≤ (1−ε) √ n.
Proof. The construction is a variant of the AND-OR protocol. An (n, m) proper code is an indexed family {x S ∈ {0, 1} n |S ∈ [n] m } of vectors such that the support of x S is a subset of S. We need the following fact: For n sufficiently large and n ≥ k 2 ≥ .8n there is an (n, k 2 )-proper code in which any two codewords are at hamming distance at least 2k + 1.
(The routine proof of this is given below.) Choose the least k such that k 2 ≥ .8n and construct such an (n, k 2 )-proper code.
Protocol Π is as follows: Alice writes 0 in the first n − k 2 locations. Let S be the set of remaining k 2 locations. View S as split into k blocks where the each successive block consists of the smallest k unassigned indices in S. For the last k 2 elements of the permutation, when index j arrives Alice writes x S,j unless j is the final element of its block to arrive, in which case Alice writes 1 − x S,j .
The word received by Bob differs from x S in at most k places (one for each block) and so by the distance property of the code, Bob can deduce the set S. If there is a block of S such that the received vector agrees with x S on the entire block then Bob outputs that block (since that block must include σ n ); otherwise Bob outputs the set of positions (one per block) in which the received vector disagrees with x S (which again must include σ n ).
Finally we prove the existence of the desired (n, k 2 )-proper code using a standard random construction. for each S ∈ [n] k 2 define x S to be a random vector supported on S. Call a pair of sets S, T ∈ [n] k 2 bad if x S and x T differ in at most 2k + 1 positions. The number of coordinates on which x S and x T differ is at least the number of coordinates in S on which they differ. Holding x T fixed we see that this probability that S, T is bad is at most the probability of fewer than 2k + 1 heads in k 2 coin tosses, which is 2 −k 2 (1−o(1)) . Taking a union bound over all pairs of k-sets we get that the probability that there is a bad pair is at most n .2n 2 2 −.8n(1−o(1)) = o(1), and so with positive probability there are no bad pairs, and so the desired code exists.
As mentioned in the introduction, after a prelimiary version of this paper appeared, Mario Szegedy [14] gave a protocol of cost O(n .4732 ).