NOTE Decision Trees and Influence: an Inductive Proof of the OSSS Inequality

We give a simple proof of the OSSS inequality (O'Donnell, Saks, Schramm, Servedio, FOCS 2005). The inequality states that for any decision tree T calculating a Boolean function f : {0,1} n ! { 1,1}, we have Var(f) Âidi(T)Infi(f), where di(T) is the probability that the input variable xi is read by T and Infi(f) is the influence of the ith variable on f.


Introduction
Let T be a decision tree computing a function f : {0, 1} n → {−1, 1}.We write δ i (T ) for the probability that the variable x i is queried by the decision tree on a uniform random input, and we write: ∆(T ) can also be thought of as the average depth of the decision tree, or as a refinement of the notion of the size of the decision tree, since ∆(T ) ≤ log(size(T )) [4].
The influence of a variable x i on a Boolean function f is defined to be where x (i) denotes x with its i-th bit flipped [1].Recall that the variance of a function f is Var and that the covariance of two functions f and O'Donnell et al. [3] proved the following inequality: Theorem 1.1.Let f : {0, 1} n → {−1, 1} be a Boolean function, and let T be a decision tree computing f .Then This inequality can be viewed as a refinement of the Efron-Stein Inequality [2,5] for the discrete cube (i.e., Var[ f ] ≤ ∑ n i=1 Inf i ( f )) that takes into account the complexity of the function's representation.O'Donnell et al. [3] also proved the following generalization of Theorem 1.1, which is what we reprove in the next section.

Inductive Proof
The original proofs of both Theorems 1.1 and 1.2 relied on some delicate probabilistic reasoning about the independence of certain hybrid inputs to the decision tree.We will prove Theorem 1.2 using induction.To do so, we will consider the function's behavior under the two cases when the root variable takes the value 0 and the value 1.First we will review a fact from probability theory.
For a function f : {0, 1} n → R, let . The sequence {c i } is a martingale difference sequence.Let g be another real-valued function, and let {d i } be its martingale difference sequence.Then Cov We'll prove this fact for the sake of completeness.
Fact 2.1.Let f , g : {0, 1} n → R be real-valued functions, with martingale difference sequences Proof.For j < k, we have We now relate the last martingale difference sequence with the influence of the variable x n .
Proof.Let f 0 (x 1 , . . ., x n ) denote f (x 1 , . . ., x n−1 , 0), and let f 1 , g 0 , and g 1 be defined similarly.Then we have that We can rewrite E [c n d n ] as If both f 0 = f 1 and g 0 = g 1 , the quantity inside the expectation is f and thus Inf n ( f ) is an upper bound on E[c n d n ] (as is Inf n (g)).Note that this upper bound is an equality when we consider the special case of f = g, and we have that We are now ready to prove Theorem 1.2.
Proof.We'll prove the statement by induction on the number of variables.For the base case of one variable, recall that both δ i (T ) and Inf i (g) are always non-negative, and that the covariance of two functions with range {−1, 1} is a value in [−1, 1].A Boolean function on only one variable is either constant, the single variable x 1 , or its negation.If either f or g is constant, then Cov[ f , g] = 0, and the inequality holds.If neither f nor g are constant, then Inf 1 (g) and δ 1 (T ) must be 1 and the inequality holds.Now we'll consider f and g on n variables.We can assume that f and g are non-constant, or the inequality trivially holds as before.Thus, T must query at least one variable, and we will assume without loss of generality that the root of T queries x n .Let T 0 be the left subtree and let T 1 be the right subtree.Then for i = n, we have δ i (T ) = (1/2)δ i (T 0 ) + (1/2)δ i (T 1 ).As in the proof of Lemma 2.2, let f 0 (x 1 , . . ., x n ) denote f (x 1 , . . ., x n−1 , 0), and let f 1 , g 0 , and g 1 be defined similarly.For i = n, we get the following expression: Inf i (g) = (1/2) Inf i (g 0 ) + (1/2) Inf i (g 1 ).
By Fact 2.1, we can write Cov for 1 ≤ i ≤ n − 1, and define c i,1 , d i,0 , and d i,1 similarly.Then we have c i = (c i,0 + c i,1 )/2, d i = (d i,0 + d i,1 )/2, and we can write the covariance as:

1 ∑
By the triangle inequality, |Cov[ f , g]| ≤ (1/4) ∑ a,b∈{0,1} |Cov[ f a , g b ]| + |E [c n d n ]| .Since f a and g b are functions on n − 1 variables we can use the induction hypothesis, and we have:THEORY OF COMPUTING, Volume 6 (2010), pp.81-84 |Cov[ f , g]T a ) Inf i (g b ) + |E [c n d n ]| = n−i=1 δ i (T ) Inf i (g) + |E [c n d n ]| .As Inf n (g) = Inf n (−g), we have that |E [c n d n ]| ≤ Inf n (g) by Lemma 2.2, and δ n (T ) = 1 because x n is the root of the tree.Thus, the inductive step holds.