How Many Bits Can a Flock of Birds Compute ?

We derive a tight bound on the time it takes for a flock of birds to reach equilibrium in a standard model. Birds navigate by constantly averaging their velocities with those of their neighbors within a fixed distance. It is known that the system converges after a number of steps no greater than a tower-of-twos of height logarithmic in the number of birds. We show that this astronomical bound is actually tight in the worst case. We do so by viewing the bird flock as a distributed computing device and deriving a sharp estimate on the growth of its busy-beaver function. The proof highlights the use of spectral techniques in natural algorithms. ACM Classification: F.2.0 AMS Classification: 68W25


Introduction
In a standard flocking model, n birds fly in the sky and modify their velocities at each time step by averaging them with those of their neighbors within a fixed distance (Figure 1).It has been shown that the system always tends to equilibrium and that the relaxation time is bounded by a tower-of-twos of height O(log n), which is "two to the two to the two. . ." repeated on the order of log n times [2].Past that worst-case limit, equilibrium is approached exponentially fast: the birds fly with constant velocities subject fast-decaying deviations.We show here that, oddly enough, this astronomical upper bound is in fact optimal.
Figure 1: Each bird averages its velocity with those of its neighbors within a fixed radius.This work is part of an effort to understand the worst-case behavior of bird flocking viewed as a natural algorithm.Because the actual biology of bird flying is beyond anybody's grasp at this point, a reasonable approach is to specify an idealized model of the phenomenon that, one hopes, captures some essential aspect of the behavior.In this case, the model tries to express the birds' uncanny ability to reach consensus about their velocities despite the ever-changing topology of their communication channels.To prove that the time to reach equilibrium can be very high, one must carefully engineer a starting configuration of the birds and show how their behavior remains essentially nontrivial for a long time.What does this mean?Because the velocities and positions of the birds can be assumed to be rational numbers, the evolution of the group can be modeled by a distributed algorithm operating over the rationals.Since it is known that the flocking algorithm will always converge, the question is: How long can it take in the worst case?
As it happens, this is equivalent to asking how many bits are required to encode the final stationary velocity of the birds.If three birds are initialized with m bits each, none of them can compute a single extra bit on their own: together, however, they can produce 2 Ω(m) brand-new bits!It is in this sense that one can talk about the number of bits "computed" by a bird flock.This is similar to what is known in computational complexity as the "busy-beaver problem": given a program, how long can it run and halt for an input of size n?In our case, the birds never stop flying so, technically, the program never halts.The creation of new flocks does, however.Natural algorithms, by definition, never stop.Their busy-beaver functions, therefore, merely count how much time can elapse before their behaviors become trivial, i. e., fixed or asymptotically periodic.
The model and the main result.Most models of bird flocking studied in the literature follow the boids framework of Reynolds [9,12].Roughly, birds try to (i) align their headings, (ii) stay grouped together, and (iii) avoid collision.Applying all three rules together produces spectacular visuals: unsurprisingly, all of CGI bird flocking animations in Hollywood are based on them.Analyzing their joint behavior has proven to be a challenge, however.Of all three rules, the first one drives the dynamics, so it is customary to dismiss the other two as merely corrective: indeed, the bulk of the mathematical work on bird flocking has focused on variants of rule (i) [2][3][4][5][6][7][8][10][11][12][13].This bold choice was validated recently by the empirical findings of the STARFLAG project, the most comprehensive experimental bird flocking investigation to date [1].
The model is easy to describe.We give a formal description below but a few words of explanation suffice to tell the whole story.A bird is represented by a point in Euclidean 3-space along with a three-dimensional vector indicating its velocity.For a system of n birds, the initial condition is thus entirely specified by 6n numbers.The time t is discrete, so the position of a bird i at time t is given by its placement x i at time t − 1 shifted by its current velocity v i ; in other words, the bird's location x i becomes x i + v i .To update v i , we form an undirected graph, called the flocking network, by connecting any two birds within unit distance by an edge, and we express the new velocity of bird i by averaging its current velocity v i with those of its neighbors in the graphs.The system reaches equilibrium when the flocking network no longer changes.The terminology is justified by the fact that, once the network settles on a final configuration, the birds soon begin to fly (essentially) in a straight line at constant speed: from that point on, nothing of interest happens.The model is robust in that the averaging can be weighted, if so desired, and a moderate amount of decaying noise can be tolerated as well.The connected components of the flocking network are called the flocks of the system.They will typically fragment and merge in erratic ways, and it is this evolving topology that makes such systems hard to analyze.Indeed, if the flocking network were fixed once and for all, the velocities would evolve by iterated averaging in a manner lending itself to the theory of random walks.The crux of the matter is that the flocking network may be constantly changing.If the changes were random, some of the tools for Markov chains could be rescued, but the difficulty is that the changes are endogenous.There is a feedback loop from the positions of the birds, which determine the flocking network, which in turn specifies the evolution of the velocities, which then determine the new bird positions.To show that the system always reaches equilibrium calls on tools from different areas [2]: combinatorics, linear algebra, circuit complexity, computational geometry, even elimination theory-to determine whether two birds will ever be within unit distance of each other requires root separation bounds for various characteristic polynomials.
In the field of dynamics, attraction to a fixed point is usually established by exhibiting a Lyapunov function.Because of the changing topology of the network, this approach runs into all sorts of problems here.Fortunately, proving the matching lower bound can be done entirely within the language of linear algebra.Once we engineer a starting configuration for the birds, to bound the relaxation time is a matter of monitoring the evolution of various Fourier coefficients as flocks merge together.Of course, it is the occasional presence of nonlinearities that causes the astronomical delay we seek.In the construction, the number of nonlinearities can be kept remarkably small (in fact, less than the number of birds).To see why Fourier analysis arises naturally, it is helpful to compare the transmission of information among the birds to the diffusion of heat in a medium.By choosing the right topology, we can ensure that the eigenfunctions of the Laplacian form a nice, simple set of harmonics.
We define the model formally.Given n birds represented at time t by their position vector x(t) = (x 1 (t), . . ., x n (t)) ∈ (R 3 ) n , the flocking network at time t links any two birds (the nodes) within unit distance of each other.The Laplacian L t at time t is the n-by-n matrix such that L t ii is the number of birds within distance 1 of bird i and, for i = j, L t i j is −1 if birds i, j are within unit distance and 0 otherwise.For any t > 0, where P t = I n −C t L t and C t is a diagonal matrix with positive rational entries.The system is initialized by fixing x(0) and v(1).The Kronecker product is used to make the stochastic matrix P t act on each coordinate axis separately.We state the main result of this paper next and give the proof in Section 3. We set the grounds for it in Section 2.
Theorem 1.1.There exists an initial configuration of n birds in R 3 requiring O(log n) bits per bird to specify such that the flocking network is still changing after a number of steps equal to a tower-of-twos of height Ω(log n).
The initial configuration of a set of birds refers to an assignment of six rational numbers to each bird to specify their initial positions and velocities.The lower bound above is the best possible [2].
Time to equilibrium.Our result is a worst-case lower bound.What about the matching upper bound?To prove that a bird system converges is done by showing that (i) the flocking network freezes within finite time and (ii) the birds travel with constant velocity from that point on, while subject to damped oscillations decaying exponentially fast [2].The latter is easy to prove.Indeed, once the network becomes static, the system becomes a coupled oscillator dual to a Markov chain.The challenging part is (i): to show that the flocking network converges to a fixed graph.The proof proceeds in two steps.First, it is shown that the network must at some point cease to fragment, with all subsequent (nonlinear) events consisting of flock merges.Second, the last such event is shown to occur within a number of steps equal to a tower-of-twos of logarithmic height [2].In this paper we establish that this unusual bound is actually optimal.
What can possibly account for such astronomical delays?Imagine two flocks flying almost parallel to each other and headed toward collision: the smaller the angle the longer it will take for the two flocks to merge.Picture now a whole set of such flocks merging two-by-two so as to become increasingly parallel to one another after each merge.A careful choice of initial conditions can lead to delays growing exponentially between consecutive merges.By scheduling the latter in a balanced fashion, we can ensure that each bird witnesses about log n of them: the tower-of-twos lower bound follows directly.Although birds fly in three dimensions, the construction can be achieved within a fixed plane (X,Y ).In fact, the motion along the Y axis is identical for all birds, so it suffices to focus on the X coordinates.For that reason, it will be helpful to leave birds aside momentarily and use the one-dimensional imagery of trains colliding on a railroad track.The reason for doing this is that we can lift a system of colliding trains to higher dimension to form flocking birds.The transformation is entirely straightforward.Spectral shift.Imagine n railroad cars moving on a train track.Whenever two of them collide, they become attached with a spring which acts mechanically to set the resulting two-car train's stationary velocity to some average of the two individual speeds.Trains can in turn collide to form larger ones: after the collision, each train keeps averaging its velocity with its immediate neighbors on both sides (if any).What is the lowest nonzero speed that can be thus achieved?(Note that the dissipative nature of the dynamics rules out speedups.)Viewing the process as a distributed computation raises a busy-beaver type question.Assuming that each of the n cars is given an initial velocity encodable as an O(log n)-bit rational, the initial speed of any car can be no smaller than inverse polynomial in n.In fact, as one will easily see, the mass center of the train resulting from a collision between two cars cannot have a nonzero speed lower than inverse polynomial as well.It is only through repeated collisions between bigger and bigger trains that the speed can begin to drop substantially.And when it does, the dropping can be precipitous.Indeed, the lowest nonzero speed of an n-car train is inversely proportional to a tower-of-twos of height logarithmic in n.
This explosive decay points to an intriguing phenomenon of spectral shift: a collision between two trains cancels the highest eigenvalue of each one and shifts the remaining spectrum upward.A quick word of explanation.A moving n-car train can be regarded as a vibrating string with n harmonics.The highest one determines the speed of the whole train, while the other ones tell us how the motions of the individual cars deviate from the average.Because the system is dissipative (all but one of the eigenvalues are of magnitude less than 1), the nondominant eigenmodes decay exponentially fast.When two trains A and B hit each other, the new train C (of twice the size) acquires its eigenvalues from A and B: the two average speeds are made to be exactly opposite, so the highest modes cancel each perfectly.As a result, all the modes of C are linear combinations of the nondominant modes of the smaller ones; in other words, if the colliding trains have Fourier coefficients of the form (a 1 , a 2 , . . ., a k ) and (−a 1 , b 2 , . . ., b k ), then the new train C has a spectrum (c 1 , . . ., c 2k ), where the c i are linear combinations of a 2 , . . ., a k and b 2 , . . ., b k .Note the absence of a 1 in the formation of c 1 : this is the spectral shift in action.Because of dissipation, at the time of collision, the coefficients a i , b i (i > 1) will be much smaller than |a 1 |.This is easy to achieve so as to get a one-shot exponential boost.The tricky part is to arrange for the shift to kick in over repeated collisions: too much symmetry in the initial configuration of the cars brings the trains to a halt; too little fails to clear the dominant modes-a precondition for any spectral shift-and makes the exponential boost unsustainable over several collisions.
The train model attempts to represent a one-dimensional projection of the birds.Once two trains collide, they are assumed to stay attached forever.If we lift the construction to model birds, however, this "glueing" provision no longer holds and cohesion must be checked.It is imperative that birds corresponding to attached trains should remain connected on their own, i. e., stay within a flock by sheer virtue of the distant-based attachment rule of the flocking network.Unlike trains, whose cohesion is enforced exogenously, flocks may fragment and an integrity analysis is necessary to establish that they do not: this task is necessary but merely of technical interest, so we deal with it separately in Section 3.Meanwhile, the main idea of the construction, the spectral shift, is entirely contained in the analysis of slow-train systems given in Section 2.
Number encodings.The model relies on the ability of birds to update their velocities with arbitrarily high precision.(All computations are over the rationals so there is no need for real-value computation and all bit lengths are finite.)Is this a realistic assumption?Our results are likely to be mostly of theoretical interest but large precision cannot be the reason.To see why, an analogy is helpful.The relevance of chaos theory hinges on precisely the same point.Chaos is meaningful, strictly speaking, only if physics operates with infinite precision, which of course it does not.The pertinence of the concept is that its underlying motivation-sensitivity to initial conditions-still matters over a finite time horizon, hence THEORY OF COMPUTING, Volume 10 (16), 2014, pp.421-451 Figure 2: The mass center of the train moves at constant speed, which is given by its stationary velocity, i. e., the lowest mode of the system.The speeds of the individual cars, given by the higher modes, converge exponentially fast to the stationary value.
with bounded precision.In other words, the mathematical framework of chaos is an idealization of a phenomenon that is still present with imprecise computations.The same is true of the flocking model.Even with finite precision, the slowdown induced by the spectral shift can occur and delay convergence: the assumption of perfect accuracy is merely a convenient mathematical idealization.If the model has a serious weakness, it is not infinite precision but determinism: no one knows what happens if we inject noise with constant entropy rate into the system.This is a very interesting open problem.

Slow train coming
Picture n railroad cars on a track, all separated from one another by at least a fixed distance.At time 0, give each one a little kick, some to the left, others to the right.There is no friction on the track, so the cars will move at constant speed until they start to hit one another.Should this happen, the collisions will be softened by the presence of a spring at the right end of each car (Figure 2).While absorbing the shock, the spring latches on to the other car.At that moment, the two colliding cars get attached to form a two-car train.Because of the spring, now attached to both cars, the train forms a coupled oscillator.The mass center of the two-car train will move at a constant speed equal to the average of the two individual velocities at the moment of impact.(We take liberties with the physics of the construction and emphasize that the analogy is only of mathematical interest.)In spectral terms, this is the lowest mode of the system and the speed of the mass center forms the stationary velocity of the train: it is associated with the principal eigenvector whose corresponding eigenvalue is 1, which explains why the mass center moves at constant speed.
Because of the coil spring, the coupled oscillator is damped, so the velocities of the individual cars will deviate from the stationary one by an error term decaying exponentially fast.(This is dual to an ergodic Markov chain with two states.)For a brief moment, the two cars might jerk back and forth a little, reeling from the shock, but very quickly the stationary velocity will assert its dominance and both cars will proceed to move in the same direction.Further collisions can happen, resulting in multicar trains fastened together by their springs.(Recall that a spring locks onto any car it hits.)The mass center of the three-car train in Figure 2 has constant velocity but, though still fast-decaying, the oscillations of the cars' individual speeds, being now coupled, become increasingly complex.In our discrete model, the speeds of adjacent cars are being averaged at each time step, so these oscillations are governed by a rapidly mixing Markov chain.In the language of distributed computing, this is a consensus system: messages about the cars' current speeds are passed around up and down the train to cause endless readjustment toward consensual agreement on a common speed.The twist is that the trains keep colliding and hence growing, with more and more cars getting attached to them.To investigate how slowly they can go, it is useful THEORY OF COMPUTING, Volume 10 (16), 2014, pp.421-451 to distinguish the number of bits used in the initial conditions from the number of cars; therefore, we assume that the initial positions and velocities are encoded as rationals over O(log m) bits per car, for large enough m.

A small example
Two railroad cars hurling toward each other at the same speed collide to produce a two-car train with zero stationary velocity: the train oscillates around its mass center, which does not move.Obviously, we must initialize the two cars with distinct speeds if only to keep the system moving.A moment's reflection shows that, regardless of how we do it, the stationary velocity cannot be smaller than 1/poly(m).With three cars (n = 3), however, it can already be as small as 1/exp(m).We explain why next.A pattern thus seems to emerge: with each new car, the stationary velocity is (inversely) exponentiated, leading to a tower-of-twos of height linear in the number of cars.But this is not what happens: indeed, we know from [2] that the height cannot be superlogarithmic.Why?At this point, it is useful to build some intuition by working out the case n = 3 in detail: We show how, with only O(log m) bits of input, a three-car train can compute a rational number with more than m bits: an exponential expansion.For concreteness, assume the cars are 10 times longer than the unit-length springs.Cars a and b are separated by a distance of 1, so that they're joined into a two-car train.We give them an initial velocity of v a = 4/m and v b = −2/m.We choose a (1/3, 2/3) coupling action for the spring-just about any choice works-so that at the next step the velocity of a becomes v a → v a /3 + 2v b /3 and that of b becomes v b → 2v a /3 + v b /3.At time t > 0, the velocities of a, b satisfy: The second equality follows from direct diagonalization.The ab-train has a stationary velocity of (1/2)(v a + v b ) = 1/m.The third car, c, starts to the right of the ab-train with a speed of v c = −3/m.Its initial position puts its left end at a distance about 4 from the right spring attached to b.We can thus easily ensure that the impending collision between ab and c, hurtling toward each other at a relative speed of 1/m − (−3/m) = 4/m will happen at time t = m.The coupling action for the train abc is given by: Assuming that m is large and even, the post-collision velocity vector is: Since (1, 2, 1) is a principal left eigenvector for the abc-train, the stationary velocity is, as claimed, a rational whose binary expansion is more than m-bit long, i. e., In this example, the exponentially small stationary velocity is the product of a careful balance: (i) the stationary velocities of ab and c cancel out perfectly; (ii) the decaying energy of the lower modessignalled by the 3 1−m term in (2.1)-is transferred to the lowest mode to set the stationary velocity of the abc-train.This is called a spectral shift [2].Parts (i) and (ii) are in tension: the first calls for symmetry while the second must avoid undesired cancellations among the nontrivial frequencies of the spectrum.
For example, suppose we tried to generalize this construction to n = 4 by setting ab on a collision course with its mirror image cd.This would be perfect for (i) but disastrous for (ii), as the excessive symmetry would bring the abcd-train to a halt.The trick is to inject enough symmetry to kill off the lowest mode of each new train formation but not so much that it kills the higher modes too.Condition (i) calls for setting up head-on collisions between trains of the same size.These collisions will thus follow the bottom-up pattern of complete binary tree with n leaves.The example above shows that choosing the right initial velocities is the name of the game.The actual initial placement of the cars, if not quite arbitrary, is straightforward: essentially, we need to position the cars far enough apart so that two trains should have a chance to travel at least distance 1 before they collide.This suggests factoring out all positions and analyzing a "consensus" system defined entirely by the repeated averaging of the velocities: this way we can analyze the spectral shift in dimension n instead of 2n.It will then be the (easy) matter of restoring the initial placements and lifting the system to two dimensions.In the end, we will show how to position the individual train cars and give them the right initial velocities so that the time elapsed before all n cars are joined together into a single train is a tower-of-twos of height proportional to log n.

The characteristic time
Let v = (v 1 , . . ., v n ) be a vector in Q n , where n is a power of two and each v i is encoded as a ratio of two O(log n)-bit integers.We denote by T the complete binary tree with n leaves, and we label the i-th leaf from the left v i .Write This is the transition matrix of an ergodic reversible random walk on a graph consisting of a path of 2 j nodes, so P ∞ j = lim k→∞ P k j is the rank-one matrix 1π T , with π = 1 2 j − 1 (0.5, 1, . . ., 1, 0.5) T .
We define two vectors for each node a of height j: where a is the i-th leaf of T from the left.
THEORY OF COMPUTING, Volume 10 (16), 2014, pp.421-451 , where b (resp.c) is the left (resp.right) child of a and θ a = P ∞ j v a −1 ∞ .(We ignore rounding issues, which are inconsequential, and simply assume that θ a is an integer without further notice.) The leaves of the subtree rooted at a correspond to consecutive cars along the track.Together these cars form the train associated with node a: its stationary velocity, 1π T v a , is given by any of the (equal) coordinates of P ∞ j v a .The value of θ a , therefore, is roughly the time it takes for train a to travel a distance of 1 at its stationary speed.We call θ root the characteristic time of the system.How large can it be and still be finite?We define the "tower-of-twos" function: 2 n = 2 2 (n−1) for n > 1 and 2 1 = 2.
Theorem 2.1.The characteristic time can be as large as 2 log n, where n is any large enough power of two.
Proof.Set n = 2 ν for ν assumed large enough, and define where c > 0 is a large enough integer constant and ∑ ν−1 l=0 b l 2 l is the binary expansion of k = 0, . . ., n − 1.We define v as follows: This definition has a simple interpretation.Indeed, it can be arrived at by starting with the initially assignment and then, while interpreting these coordinates as labels assigned to the leaves of T, applying the following algorithm: for every node of T that is a right child but not a leaf, flip the signs of the labels of the leaves of the subtree rooted at it.Note that labels might be flipped more than once.In general, replacing a vector v a by −v a for any right child a of positive height is called a flip (more on which below).This alternative view of the dynamics makes the analysis easier so we adopt it.By symmetry, we may focus the analysis on the left spine a 1 , . . ., a ν , where a j is the leftmost node of height j.Note that v a j is of the form ( * , − * ).Quite clearly, θ a depends only on the height j of a, so we refer to it by θ j .We are only interested in the case θ j < ∞, a property that we show below holds true.

The modes of the system
The dominant left eigenvector of P j is (1, 2, . . ., 2, 1) up to scaling, so the first Fourier coefficient is and satisfies Technically, this is the Fourier coefficient of index 0 for v a j with respect to the additive group of the integers modulo 2 j+1 − 2, from which the structure can be derived by folding the cycle of length 2 j+1 − 2 into an interval of size 2 j .This gives us the average speed of the joint group of 2 j leftmost railroad cars.
Elementary trigonometry shows that, for k = 1, . . ., m = 2 j , is the unique (up to scaling) right eigenvector of P j for For s ≥ 1, we diagonalize the matrix P s j = 1π T + Q s j , where where the k-th right eigenvector C 1/2 j w k of P j is proportional to u k , with the normalization condition w k 2 = 1.It follows that, for any 1 where . By the triangle inequality and the submultiplicativity of the Frobenius norm, for any z, A Taylor series approximation for the cosine function around 0 shows that, for j, s ≥ 1 and any z, (2.8) It follows that For k > 1, the k-th Fourier coefficient α k (s) decays exponentially fast with s, while the first one, m a j , remains constant.THEORY OF COMPUTING, Volume 10 (16), 2014, pp.421-451

The spectral shift in action
We can check by direct calculation that . (2.9) In general, (2.10) The averaging operator P j cannot increase the ∞ -norm; therefore, for any j ≥ 1, The proof of this next result features the critically important cancellation of the stationary velocities m a j−1 of the two subgroups at height j − 1.
Proof.The stationary distribution for the Markov chain P j−1 is normal to Q θ j−1 j−1 v a j−1 ; hence, by (2.4, 2.10), The subscripts 1 and 2 j−1 index the coordinate being selected.By (2.11), therefore, v a j−1 2 ≤ 2 ( j+1)/2 n −c and, by (2.8), THEORY OF COMPUTING, Volume 10 (16), 2014, pp.421-451 The cancellations of the two copies of m a j−1 in the computation of m a j has the effect of making that Fourier coefficient a linear combination of powers of higher eigenvalues.That part of the spectrum being exponentially decaying, the corresponding spectral shift implies a similar exponential decay in the next first Fourier coefficients up the tree T. By Lemma 2.2, for any j > 1.We also note that θ 1 > n c by (2.9); therefore, by (2.12), for any j > 0, with θ 0 = 1.Let θ j = θ j ; by (2.9), θ 1 > 2 and, for j > 1, θ j ≥ 2 θ j−1 ; hence which proves Theorem 2.1, provided that the first spectral coordinates m a j never vanish.This is what we show in the next section.For future use, we state a weaker bound on m a j .By Lemma 2.2, for j > 1, . (2.14)
We now look more closely at the structure of F j , going back to the spectral decomposition of P θ j j .By (2.7), for j ≥ 1, P where, for clarity, we subscript 1 to indicate its dimension; for any j ≥ 1 and 1 , with ε j,k = 2 if k < 2 j and ε j,2 j = 1 ; (2.18) Our algebraic approach requires bounds on eigenvalue gaps and on the Frobenius norm of Q θ j j .Note that |µ j,k | < 1 for all j ≥ 1 and k ≥ 2. We need much tighter bounds.Recall that n is assumed large enough and define µ 0,2 = 1 for notational convenience.Lemma 2.3.For any j ≥ 1, both µ j,2 /µ n j−1,2 and Q θ j j F are less than e −n 1.5 ; for j > 1 and k > 2, so is the ratio µ j,k /µ j,2 .
The last inequality follows from the fact that n is large enough and that, with j > 1, it follows from (2.18) that 2 1− j 3 −θ j−1 ≤ |µ j−1,2 | < 1.To bound the ratio |µ j,k /µ j,2 | for j > 1 and k > 2, we begin with the case j = 2 and verify directly that e −n 3 is a valid upper bound.Indeed, Assume now that j, k > 2. Then For all j ≥ 1, by (2.17, 2.19), the submultiplicativity of the Frobenius norm and the triangle inequality, For j > 1, we express F j , the "folded" half of P θ j j , by subtracting the lower half of u j,k − (1/2)z j,k−1 from its upper half, forming where It follows from (2.6, 2.17) that, for j > 1, To tackle the formidable product ∏ i F i in (2.15), we begin with an approximation ∏ i G i , where Setting k = 2, we find that This extends to the case j = 1, so that, for any j ≥ 1, we can highlight the symmetry of u j,2 : (2.23) For k = 2, we simplify ξ l into Expanding this product is greatly simplified by observing that, by (2.23, 2.24), for any j ≥ 1, (

2.26)
To prove the bounds on γ j , we rely on (2.24), and the fact that π(l − 1) from which the two inequalities in (2.26) follow readily.By (2.25), for j > 2, (2.27) If we drop all sub/superscripts and expand the scalar expression above, we find a sum of 2 j−2 words where each a i is of the form µuw or 1z (suitably scaled).By (2.26), however, the only nonzero word is of the form This necessitates distinguishing between even and odd values of j.
Case I. (odd j > 2: Figure 3) It follows from (2.26) that where (2.28) One must verify separately that this also holds for the case j = 3, where, by convention, the product ∏ i evaluates to 1 because the indices do not go down.Recall that, by (2.9, 2.24), w 1,2 = (1, 1) T and v a 1 2 = √ 5 n −c .By Lemma 2.3 and the submultiplicativity of the Frobenius norm, By (2.17), it follows that and (2.30) Case II.(even j > 2: Figure 4) where By (2.17), where

.33)
This concludes the case analysis.Next, we still assume that j > 2 but we remove all restriction on parity.
Recall that G i is only an approximation of F i and, instead of (2.27), we must contend with (2.34) The B-word in white is brought into canonical form (black jagged line) by setting all the indices k i > 2 to 2. This cannot cause the magnitude of B to drop.We may also assume that the end result is not the A-word, as this would cause an exponential growth in line with the lemma.By (2.18, 2.20), for all i > 1 and k ≥ 1, u i,k 2 ≤ 2 i/2 and for i, k ≥ 1, w i,k 2 ≤ 2 i/2+O (1) ; so, by Cauchy-Schwarz, for i > 2 and k, l ≥ 1, We prove that all B-words are much smaller than A in absolute value.
Lemma 2.5.All B-words distinct from A satisfy: Proof.Since P 1 is stochastic, by (2.9), and the upper bound (2.37) becomes THEORY OF COMPUTING, Volume 10 (16), 2014, pp.421-451 We trace the index vectors of the A and B-words from left to right until they diverge (i = a).In this case, j is odd and the index vector of the B-word is (2, 1, 2, 1, 2, 2, 1, 2, 2).
To maximize the right-hand side of (2.38), by Lemma 2.3, we may replace any instance of k i > 2 by k i = 2 (Figure 5).This does not contradict conditions (2.36) since no index is set to 1.Note the importance for this step of having all vectorial presence removed in (2.38).We may assume that the new B-word is not A, so its index vector is not of the form (2, 1, 2, 1, . ..).Indeed, if we ended up with this pattern, and hence with A, obviously at least one index replacement would have taken place.By Lemma 2.3, any such replacement would cause an increase by a factor of at least e n 1.5 and Lemma 2.5 would follow.So, we may assume now that k i ∈ {1, 2} and Scan the string (k j−1 , . . ., k 2 ) against (2, 1, 2, 1, . ..) from left to right and let k a be the first character that differs (Figure 6).By (2.36), k j−1 = 2, so 2 ≤ a ≤ j − 2; hence j > 3; since we cannot have consecutive ones, k a = 2 and j − a is even.By (2.35, 2.38) and Lemma 2.3, The first numerator mirrors the index vector of the B-word accurately.For the denominator, however, we use the lower bound of (2.35).The reason we can afford such a loose estimate is the presence of the factor µ a,2 , which plays the central role in the calculation by drowning out all the other differences.Here are the details.All the coefficients µ are less than 1 and, by Lemma 2.3, which proves Lemma 2.5.
There are fewer than n log n B-words; so, by Lemma 2.5, their total contribution amounts to at most a fraction n log n e −n 1.2 of |A|.In other words, by (2.34), for j > 2, and the proof of Lemma 2.4 follows from (2.9, 2.30, 2.32).
Recall from (2.16) that, for j > 1, We know from (2.26, 2.28, 2.33) that neither α even j nor α odd j is null.By Lemma 2.4, it then follows that the first Fourier coefficient m a j never vanishes for j > 2. By (2.9), this is also the case for j = 1, 2, so the proof of Theorem 2.1 is complete.

A lower bound on the flocking time
Theorem 2.1 sets the grounds for a lower bound of 2 log cn (for constant c) on the convergence time of a flock of birds.The idea is to lift the previous construction to higher dimension and interpret railroad cars as birds and trains as flocks.While checking that the convergence time matches the characteristic time of the corresponding consensus system is easy, to verify the integrity of the flocks takes some effort.
The n birds all start from the X-axis and fly in the (X,Y )-plane, merging in twos, fours, eights, etc., in a pattern mirroring the tree T of the previous section (Figure 7).The horizontal motion slows down drastically as time goes on, which creates a flocking time of 2 Θ(log n).The upper bound in [2] tolerates a small amount of noise in the model, which we can use to simplify our construction: specifically, any given flock is allowed a single velocity flip, meaning that the vector v a associated with node a becomes −v a .The stochastic matrix used for flocking (1.1) is of the form P j , as defined in (2.2), when applied to a flock of 2 j birds.The projection of the birds on the X-axis corresponds to the railroad car system discussed in the introduction: the springs of length 1 match the distance threshold for the flocking network.Let x(t) denote the vector (x 1 (t), . . ., x n (t)) of bird positions from left to right, and write v(t) = x(t) − x(t − 1).
Let t 1 = 0 and t j = t j−1 + θ j−1 , for j > 1.The velocity vector satisfies v(t 1 ) = v a 1 as set in (2.9) and v(1) = P 1 v a 1 ; in general, for t > 0, Diagonalizing P 1 shows that, for any integer s > 0, It follows that, for 0 = t 1 < t ≤ t 2 , Let B i denote the bird associated with the i-th leaf of T. Note that B 1 always stays to the left of B 2 and their distance is Left to their own devices, the two birds would slide to the right at speed m a 1 , plus or minus an exponentially vanishing term; their distance would oscillate around 2/3 − (3/4)n −c and converge exponentially fast, with the oscillation created by the negative eigenvalue.This is what happens until the flock at a 1 begins to interact with its "sibling" flock to the right, (B 3 , B 4 ).The latter's velocity vector is (−n −c , 0) T at time t = 1 and, for t The stationary velocity of (B 3 , B 4 ) is −m a 1 = −(1/2)n −c , but the flock is not the mirror image of (B 1 , B 2 ), a situation that would destroy the lower bound construction.In particular, note that the diameter of the flock is which always exceeds that of (B 1 , B 2 ) for all t > 0. The diameters of both flocks oscillate around 2/3 but in phase opposition: indeed, their sum remains constant.Both two-bird flocks drift toward each other at distance x 3 (t) − x 2 (t) = 4/3 − tn −c .(The linearity in t is due to an accidental cancellation that will not occur for bigger flocks.)This implies that t 2 = t 1 + θ 1 = (1/3)n c .The two flocks link up at time t 2 , and Two issues arise: which way does the four-bird flock move and does it stay in one piece?Indeed, if the distance between B 2 and B 3 is too close to 1, can't it jump back up above 1 to cause the breakup of the flock?The issue of flock integrity deserves a separate treatment but, in this simple case, we easily verify it.We begin with the motion of the four-bird flock.At time t 2 , the flock at a 2 is formed with the initial velocity By (2.4), the stationary velocity for the four-bird flock is By Lemma 2.4, for j > 2, It is imperative that m a j > 0. If it is negative, therefore, we must flip the velocity at a j instead of its sibling (the right child of its parent).The previous analysis holds as long as exactly one child "flips" irrespective of which one.Prior to flipping, the flocks of any siblings are translated copies of each other.
We amend the flipping rule so that the left (resp.right) sibling flips if the flocks move left (resp.right) so as to set up a head-on collision.(We refer the reader to [2] for a detailed explanation of why flipping conforms the noisy flocking model with the matching upper bound.) We now turn to the issue of flock integrity.Instead of negating one of the siblings' velocity vector at time t j , we wait an extra n f steps, for some large enough constant f : the goal is to give the flock enough time to stabilize so it does not break apart because of the flip.By straightforward diagonalization, we find that, for any integer s > 0, It follows from (3.2, 3.4) that, for t > t 2 , both x 2 (t) − x 1 (t) and x 4 (t) − x 3 (t) are 2/3 ± O(n −c ); therefore, the two end edges of the four-bird flock are safe, which we define as being of length less than 1 (so as to belong to the flocking network) but greater than 1/2 (so as to avoid edges joining nonconsecutive birds).The middle edge between B 2 and B 3 is more problematic.Its length is We can verify that 15 − 16 2 3 for all t > t 2 , which, by (3.5), shows that the distance between the two middle birds stays is 1 − Θ(n −c ), which is also 1 − |m a 1 |.This points to a general principle crucial to the integrity of the flocks: when two of them meet, the new edge's length stays away from 1 (the breaking point) at the next step by a margin roughly equal to the stationary velocity of the colliding flocks.This new stationary velocity is exponentially smaller than the previous one, so local vibrations are insufficient to make edges unsafe (either bigger than 1 or less than 1/2).Because of the polynomial-time mixing of the Markov chains, delaying flips by n f steps ensures that individual velocities are roughly equal to the new stationary velocity, so changing their signs cannot jeopardize edge safety.We flesh out the details in the next lemma, which concludes the lower bound proof.
Lemma 3.1.Any two adjacent birds within the same flock lie at a distance between 0.58 and 1.This holds over the entire lifetime of the flock, whether it flips or not.
Proof.For notational convenience, put m a 0 = (1/4)n −5 and define h(i) as the height of the nearest common ancestor of the two leaves associated with birds B i and B i+1 ; e. g., h(1) = 1 and h(2) = 2.We prove by induction on j that, for any 1 ≤ j < log n, t j ≤ t ≤ t j+1 , and Recall that a 0 , a 1 , etc., constitute the left spine of the merge tree T. By (2.14), the upper and lower bounds above fall between 0.58 and 1, so satisfying them implies the integrity of the flocks along the spine: indeed, the upper bound ensures the existence of the desired edges, while the lower bound, being greater than 1/2, rules out edges between nonconsecutive birds.Before we proceed with the proof, we should explain why the upper bound of (3.10) distinguishes between two cases.In general, once two consecutive birds are joined in a flock, they stay forever at a distance strictly less than 1.There is only one exception to this rule: at the time t when they join, the only assurance we can give is that their distance does not exceed 1; it could actually be equal to 1, hence the difficulty of a nontrivial upper bound when t = t j and i = 2 j−1 .Relations (3.10) show that the time θ j between the formation of the flock at a j and its next collision at a j+1 is proportional to the reciprocal of the stationary velocity |m a j |; note that choosing c large enough makes the delay n f inconsequential.This implies that the earlier setting of θ j in (2.5) must now be understood up to a constant factor, i. e., as θ j = Θ(|m −1 a j |); the previous analysis still holds.For the case j = 1, we observe that, for 0 ≤ t ≤ t 2 , by (3.2, 3.4), Assume now that j ≥ 2. By applying successively (2.8, 2.11, 2.14), we find that By (2.10), The ± leaves open the possibility of a flip of either type (left or right sibling) before the 2 j−1 -bird flocks join at time t j .As we saw earlier, the choice of type ensures that the flock with the lower-indexed birds drifts to the right while its sibling, with the higher-indexed birds, flies to the left; hence the certainty that, after flipping, the "fixed" part of the velocity vector v a j is of the form |m a j−1 |(1, −1) T ⊗ 1 2 j−1 .(In fact, to achieve this is the sole purpose of flipping.)It follows that For 1 ≤ i < 2 j , define where f (s) = 1 if there is a flip and s > n f , and f (s) = 0 otherwise.Note that there is no risk in using DIST t (B i , B i+1 ), instead of the signed version, x i+1 (t) − x i (t), that birds might cross unnoticed: indeed, the bound in (2.11) applies to all the velocities, so that distances cannot change by more than O(n 1−c ) in one step.This implies that a change of sign for x i+1 (t) − x i (t) would be preceded by the drop of DIST t (B i , B i+1 ) below 1/2 and a violation of (3.10).By Cauchy-Schwarz and (2.8, 3.11), 2 j+1 e −Ω(s4 − j ) ζ 2 ≤ O(n 2 )e −n−Ω(s/n 2 ) m a j−1 ; and, since n is assumed large enough, for s ≥ 1, Likewise, .45 e −Ω(s/n 2 ) v a j 2 ≤ n 1.45 e −Ω(s/n 2 ) m a j−1 For s ≥ 1 and 1 ≤ i < 2 j , by (3.11), χ T i Q s j v a j ≤ n 2 m a j−1 e −Ω(s/n 2 ) . (3.14) Recall that j ≥ 2. To prove (3.10), we distinguish between two cases: whether the birds B i , B i+1 are joined at node a j or earlier.
Case I. (i = 2 j−1 ): The edge (i, i + 1) is created at node a j and h(i) = j, where 2 ≤ j < log n (Figure 9).We begin with the case t = t j .By construction, the upper bound in (3.Considering that the two middle birds in the flock for a j get attached in the flocking network at time t j , DIST t j (B i , B i+1 ) ≤ 1. Assume that the flock at a j does not undergo a flip.Then, by (3.11, 3.12, 3.13), for t j < t ≤ t j+1 , which proves the upper bound in (3.10) for i = 2 j−1 .The negative geometric series we obtain from (3.16) reflects the "inertia" of the two flocks as they collide and penetrate each other's zone of influence before being stabilized.Suppose now that the flock at a j undergoes a flip at time t j + n f .The previous analysis holds for t j < t ≤ t j + n f ; so assume that t j + n f < t ≤ t j+1 .By (3.14) and h(i) = j, n 2 m a h(i)−1 e −Ω(sn −2 +n f −2 ) = o m a h(i)−1 .
By (3.12), therefore, This establishes the upper bound in (3.10) for i = 2 j−1 , whether there is a flip or not.We prove the lower Note that this derivation still holds if the flock "flips," i. e., reverses the sign of Q s j v a j .This establishes (3.10) for i = 2 j−1 .
Case II.(i < 2 j−1 ): This implies that h(i) < j (Figure 10).Recall that j ≥ 2. We omit the case i > 2 j−1 , which is treated similarly.The case t = t j follows by induction2 for j = j − 1 and t = t j +1 .Note that t = t j−1 , so the inductive use of (3.10) does not take us to the upper bound of 1; it actually provides stronger bounds, as j < j.Assuming that t j < t ≤ t j+1 , by (3.12, 3.14), ≤ n 2 m a j−1 ∑ s≥1 e −Ω(s/n 2 ) ≤ O n 4 m a j−1 .

Figure 5 :
Figure 5: The top horizontal line represents k i = 1.The white dots below the line correspond to k i = 2.The B-word in white is brought into canonical form (black jagged line) by setting all the indices k i > 2 to 2. This cannot cause the magnitude of B to drop.We may also assume that the end result is not the A-word, as this would cause an exponential growth in line with the lemma.

Figure 7 :
Figure 7: Birds join in flocks of size 2, 4, 8, etc., each time flying in a direction closer to the Y -axis.The angle decreases exponentially at each level.The big arrow indicates the Markov chain corresponding to a four-bird flock.
10) is equal to 1.To establish the THEORY OF COMPUTING, Volume 10 (16), 2014, pp.421-451 We find by induction that b