A Constructive Lovász Local Lemma for Permutations ∗

: While there has been signiﬁcant progress on algorithmic aspects of the Lovász Local Lemma (LLL) in recent years, a noteworthy exception is when the LLL is used in the context of random permutations. The breakthrough algorithm of Moser & Tardos only works in the setting of independent variables, and does not apply in this context. We resolve this by developing a randomized polynomial-time algorithm for such applications. A noteworthy application is for Latin transversals: the best general result known here (Bissacot et al., improving on Erd˝os and Spencer), states that any n × n matrix in which each entry appears at most ( 27 / 256 ) n times, has a Latin transversal. We present the ﬁrst polynomial-time algorithm to construct such a transversal. We also develop RNC algorithms for Latin transversals, rainbow Hamiltonian cycles, strong chromatic number, and hypergraph packing. In addition to efﬁciently ﬁnding a conﬁguration which avoids bad events, the algo-rithm of Moser & Tardos has many powerful extensions and properties. These include a well-characterized distribution on the output distribution, parallel algorithms, and a partial


Introduction
Recent years have seen substantial progress in developing algorithmic versions of the Lovász Local Lemma (LLL) and some of its generalizations, starting with the breakthrough by Moser & Tardos [31], see, e. g., [16,18,25,32].However, one major relative of the LLL that has eluded constructive versions, is the "lopsided" version of the LLL (with the single exception of the CNF-SAT problem [31]).A natural setting for the lopsided LLL is where we have one or many random permutations [13,27,30].This approach has been used for Latin transversals [9,13,36], hypergraph packing [28], graph coloring [10], and certain error-correcting codes [24].However, current techniques do not give constructive versions in this context.We develop a randomized polynomial-time algorithm to construct such permutation(s) whose existence is guaranteed by the lopsided LLL, leading to several algorithmic applications in combinatorics.Furthermore, since the appearance of the conference version of this work [19], related papers, including [1,20,26] have been published; we make a comparison to these in Sections 1.2 and 6.3, detailing which of our contributions do not appear to follow from the frameworks of [1,20,26].

The Lopsided Lovász Local Lemma and random permutations
Suppose we want to select N permutations π 1 , . . ., π N , where each π k is a permutation on the set [n k ] = {1, . . ., n k }, which satisfy a given list of side constraints.The Lopsided Lovász Local Lemma (LLLL) can be used to prove that such permutations exist, under suitable conditions.To do so, we define the probability space Ω, which is the uniform distribution on S n 1 × • • • × S n N , i. e., each permutation π k is chosen independently and uniformly.For every constraint on the permutations, there is an associated "bad" event in the probability space Ω that the permutations violate the constraint.We then wish to show that there is positive probability that no bad event occurs, i. e., permutations exist satisfying the list of constraints.
We restrict our attention to a limited class of constraints, in which each bad event B has the form for some list of tuples {(k 1 , x 1 , y 1 ), . . ., (k r , x r , y r )}.(More complex constraints can usually be decomposed into such conjunctions, so this does not lose much generality.)We frequently abuse notation to identify B with the set of tuples describing it, so we write B = {(k 1 , x 1 , y 1 ), . . ., (k r , x r , y r )} and say that B is true on π if π k 1 (x 1 ) = y 1 ∧ • • • ∧ π k r (x r ) = y r .We will assume that no bad event contains two tuples (k, x, y), (k, x, y ) where y = y , or two tuples (k, x, y), (k, x , y) where x = x ; such a bad event would be impossible and could be ignored.
To apply the LLLL in this setting, we need to define a dependency graph with respect to these bad events.We connect two bad events B, B by an edge if they overlap in one slice of the domain or range, that is, if there are k, x, y 1 , y 2 with (k, x, y 1 ) ∈ B, (k, x, y 2 ) ∈ B or there are k, x 1 , x 2 , y with (k, x 1 , y) ∈ B, (k, x 2 , y) ∈ B .We write this B ∼ B ; note that B ∼ B. The following notation will be useful: THEORY OF COMPUTING, Volume 13 (17), 2017, pp.  for pairs (x 1 , y 1 ), (x 2 , y 2 ), we write (x 1 , y 1 ) ∼ (x 2 , y 2 ) if x 1 = x 2 or y 1 = y 2 (or both).Thus, another way to write B ∼ B is that "there are (k, x, y) ∈ B, (k, x , y ) ∈ B with (x, y) ∼ (x , y )."At various points we use the notation (k, x, * ) to mean any (or all) triples of the form (k, x, y), and similarly for (k, * , y), or (x, * ) etc. Therefore, yet another way to write the condition B ∼ B is that there are (k, x, * ) ∈ B, (k, x, * ) ∈ B or (k, * , y) ∈ B, (k, * , y) ∈ B .
With these definitions, one can show that in the space Ω the probability of avoiding a bad event B can only be increased by avoiding other bad events B ∼ B [28].Thus, in the language of the lopsided LLL, the relation ∼ defines a negative-dependence graph among the bad events.(See [27,28,30] for a study of the connection between negative dependence, random injections/permutations, and the LLLL.)Hence, the standard LLLL criterion is as follows.
Theorem 1.1 ( [28]).Suppose some function x : B → (0, 1) satisfies, for every B ∈ B, the condition Then the random process of selecting each π k uniformly at random and independently has a positive probability of selecting permutations that avoid all the bad events.
The "positive probability" of Theorem 1.1 is however typically exponentially small, as is standard for the LLL.As mentioned above, a variety of papers have used the framework of Theorem 1.1 to prove the existence of various combinatorial structures.Unfortunately, the algorithms for the LLL, such as Moser-Tardos resampling [31], do not apply in this setting.The problem is that such algorithms have a more restrictive notion of when two bad events are dependent, namely, that they share variables.(The Moser-Tardos algorithm allows for a restricted type of dependence called lopsidependence, wherein two bad events which share a variable but always agree on that value, are counted as independent.This is not strong enough to generate permutations.)So we do not have an efficient algorithm to generate such permutations, we can merely show that they exist.
We develop an algorithmic analogue of the LLL for permutations.The necessary conditions for our Swapping Algorithm are the same as for the LLL (Theorem 1.1); however, we will construct such permutations in randomized polynomial (typically linear or near-linear) time.Our setting is far more complex than [31], and requires many intermediate results first.The main complication is that when we encounter a bad event involving "π k (x) = y," and perform our algorithm's random swap associated with it, we could potentially change any entry of π k .In contrast, when we resample a variable in [31,18], all the changes are confined to that variable.There is a further technical issue: the current witness-tree-based algorithmic versions of the LLL such as [31,18], identify, for each bad event B in the witness-tree τ, some necessary event occurring with probability at most P Ω (B).This is not the proof we employ here; there are significant additional terms ("(n k − A 0 k )!/n!"-see the proof of Lemma 3.1) that are gradually "discharged" over time.
We also develop RNC versions of our algorithms.Going from serial to parallel is fairly direct in [31]; our main bottleneck here is that when we resample an "independent" set of bad events, they could still influence each other.
(Note: we distinguish in this paper between the probability of events which occur in our algorithm, which we denote simply by P, and the probabilities of events within the space Ω, which we denote by P Ω .)

Comparison with other LLLL algorithms
Building on an earlier version of this article [19], several papers have developed generic frameworks for variations of the Moser-Tardos algorithm applied to other probability spaces.In [1], Achlioptas & Iliopoulos gave an algorithm which is based on a compression analysis for a random walk; this was improved for permutations and matchings by Kolmogorov [26].In [20], Harvey & Vondrák gave a probabilistic analysis similar to the parallel Moser-Tardos algorithm.These frameworks both include the permutation LLL as well as some other combinatorial applications.These papers give much simpler proofs that the Swapping Algorithm terminates quickly.
The Moser-Tardos algorithm has many other powerful properties and extensions, beyond the fact that it efficiently finds a configuration avoiding bad events.These properties include a well-characterized distribution on the output distribution at the end of the resampling process, a corresponding efficient parallel (RNC) algorithm, a partial-resampling variant (as developed in [18]), and an arbitrary (even adversarial) choice of which bad event to resample.All of these properties follow from the Witness Tree Lemma we show for our Swapping Algorithm.The more generalized LLLL frameworks of [1,20] have a limited ability to show such extensions.
We will discuss the relationship between this paper and the other LLLL frameworks further in Section 6.3.As one example of the power of our proof method, we develop a parallel Swapping Algorithm in Section 7; we emphasize that such a parallel algorithm cannot be shown using the results of [1] or [20].A second example is provided by Theorem 8.2, which we do not see how to develop using the frameworks of [1,20,26].
One of the main goals of our paper is to provide a model for what properties a generalized LLLL algorithm should have.In our view, there has been significant progress toward this goal but there remain many missing pieces toward a true generalization of the Moser-Tardos algorithm.We will discuss this more in a concluding section, Section 9.

Applications
We present algorithmic applications for four classical combinatorial problems: Latin transversals, rainbow Hamiltonian cycles, strong chromatic number, and edge-disjoint hypergraph packing.In addition to the improved bounds, we wish to highlight that our algorithmic approach can go beyond Theorem 1.1.As we will see shortly, one of our asymptotically optimal algorithmic results on Latin transversals, could not even have been shown non-constructively using the lopsided LLL prior to this work.
The study of Latin squares and the closely related Latin transversals is a classical area of combinatorics, going back to Euler and earlier [23].Given an m × n matrix A with m ≤ n, a transversal of A is a choice of m elements from A, one from each row and at most one from any column.Perhaps the major open problem here is given an integer s, under what conditions will A have an s-transversal: a transversal in which no value appears more than s times [9,12,13,35,36]?The usual type of sufficient condition sought here is an upper bound ∆ on the number of occurrences of any given value in A. Thus we ask: what is the maximum ∆ such that any m × n matrix A in which each value appears at most ∆ times, is guaranteed to have an s-transversal?We denote this quantity by L(s; m, n).
The case s = 1 is perhaps most studied, and 1-transversals are also called Latin transversals.The case m = n is also commonly studied (and includes Latin squares as a special case), and we will also focus on THEORY OF COMPUTING, Volume 13 (17), 2017, pp.1-41 these.It is well-known that L(1; n, n) ≤ n − 1 [35].In perhaps the first application of the LLLL to random permutations, Erdős & Spencer essentially proved a result very similar to Theorem 1.1, and used it to show that L(1; n, n) ≥ n/(4e) [13].(Their paper shows that L(1; n, n) ≥ n/16; the n/(4e) lower bound follows easily from their technique.)To our knowledge, this is the first Ω(n) lower bound on L(1; n, n).Alon asked if there is a constructive version of this result [4].Building on [13] and using the connections to the LLL from [33,34], Bissacot et al. showed non-constructively that L(1; n, n) ≥ (27/256)n [9].Our result makes these results constructive.
The lopsided LLL has also been used to study the case s > 1 [36].Here, we prove a result that is asymptotically optimal for large s, except for the lower-order O( √ s) term: we show (algorithmically An interesting fact is that this was not known even non-constructively before-Theorem 1.1 roughly gives L(s; n, n) ≥ (s/e) • n.We also give faster serial and perhaps the first RNC algorithms with good bounds, for the strong chromatic number.Strong coloring is quite well studied [5,8,14,21,22], and is in turn useful in covering a matrix with Latin transversals [7].

Outline
In Section 2 we introduce our Swapping Algorithm, a variant of the Moser-Tardos resampling algorithm.In it, we randomly select our initial permutations; as long as some bad event is currently true, we perform certain random swaps to randomize (or resample) them.
Section 3 introduces the key analytic tools to understand the behavior of the Swapping Algorithm, namely the witness tree and the witness subdag.The construction for witness trees follows [31]; it provides an explanation or history for the random choices used in each resampling.The witness subdag is a related concept, which is new here; it provides a history not for each resampling, but for each individual swapping operation performed during the resamplings.
In Section 4, we show how these witness subdags may be used to deduce partial information about the permutations.As the Swapping Algorithm proceeds in time, the witness subdags can also be considered to evolve over time.At each stage of this process, the current value of the witness subdags provides information about the current values of the permutations.In Section 5, we use this process to make probabilistic predictions for certain swaps made by the Swapping Algorithm.Whenever the witness subdags change, the swaps must be highly constrained so that the permutations still conform to them.We calculate the probability that the swaps satisfy these constraints.
Section 6 puts the analyses of Sections 3, 4, 5 together, to prove that our Swapping Algorithm terminates in polynomial time under the same conditions as those of Theorem 1.1; also, as mentioned in Section 1.2, Section 6.3 discusses certain contributions that our approach leads to that do not appear to follow from [1,20,26].
In Section 7, we introduce a parallel (RNC) algorithm corresponding to the Swapping Algorithm.This is similar in spirit to the Parallel Resampling Algorithm of Moser & Tardos.In the latter algorithm, one repeatedly selects a maximal independent set (MIS) of bad events which are currently true, and resamples them in parallel.In our setting, bad events which are "independent" in the LLL sense (that is, they are not connected via ∼), may still influence each other; a great deal of care must be taken to avoid these conflicts.
Section 8 describes a variety of combinatorial problems to which our Swapping Algorithm can be applied, including Latin transversals, strong chromatic number, and hypergraph packing.Finally, we conclude in Section 9 with a discussion of future goals for the construction of a generalized LLL algorithm.

The Swapping Algorithm
We will analyze the following Swapping Algorithm to find a satisfactory π 1 , . . ., π N .
2. While there is some true bad event, 3. Choose some true bad event B ∈ B arbitrarily.For each permutation that is involved in B, we perform a swapping of all the relevant entries.(We will describe the swapping subroutine "Swap" shortly.)We refer to this step as a resampling of the bad event B.
Each permutation involved in B is swapped independently, but if B involves multiple entries from a single permutation, then all such entries are swapped simultaneously.For example, if B consisted of triples (k 1 , x 1 , y 1 ), (k 2 , x 2 , y 2 ), (k 2 , x 3 , y 3 ), then we would perform Swap(π 1 ; x 1 ) and Swap(π 2 ; x 2 , x 3 ), where the "Swap" procedure is given next.
• Swap entries x i and x i of π.
At every stage of this algorithm all the π k are permutations, and if this algorithm terminates, then the π k must avoid all the bad events.So our task will be to show that the algorithm terminates in polynomial time.We measure time in terms of a single iteration of the main loop of the Swapping Algorithm: each time we run one such iteration, we increment the time by one.We will use the notation π T k to denote the value of permutation π k after time T .The initial sampling of the permutation (after Step (1)) generates π 0 k .The swapping subroutine seems strange; it would appear more natural to allow x i to be uniformly selected among [t].However, the swapping subroutine is nothing more than than the Fisher-Yates Shuffle for generating uniformly random permutations.If we allowed x i to be chosen from [t] then the resulting permutation would be biased.The goal is to change π k in a minimal way to ensure that π k (x 1 ), . . ., π k (x r ) and π −1 k (y 1 ), . . ., π −1 k (y r ) are adequately randomized.There are alternative methods for generating random permutations, and many of these can replace the Swapping subroutine without changing our analysis.We discuss a variety of such equivalencies in Appendix A, which are used in various parts of our proofs.One class of algorithms that has a very different behavior is the commonly used method to generate random reals r i ∈ [0, 1], and then form the permutation by sorting these reals.When encountering a bad event, one would resample the affected reals r i .In our setting, where the bad events are defined in terms of specific values of the permutation, this is not a good swapping method because a single swap can drastically change the permutation.THEORY OF COMPUTING, Volume 13 (17), 2017, pp.When bad events are defined in terms of the relative rankings of the permutation (e. g., a bad event is π(x 1 ) < π(x 2 ) < π(x 3 )), then this is a better method and can be analyzed in the framework of the ordinary Moser-Tardos algorithm.

Witness trees and witness subdags
To analyze the Swapping Algorithm, following the Moser-Tardos approach [31], we introduce the concept of an execution log and a witness tree.The execution log consists of listing every resampled bad event, in the order that they are resampled.We form a witness tree to justify the resampling at time t.We start with the resampled bad event B corresponding to time t, and create a single node in our tree labeled by this event.We move backward in time; for each bad event B we encounter, we add it to the witness tree if B ∼ B for some event B already in the tree.We choose such a B that has the maximum depth in the current tree (breaking ties arbitrarily), and make B a child of this B (there could be many nodes labeled B ).If B ∼ B for all B in the current tree, we ignore this B and keep moving backward in time.To make this discussion simpler we say that the root of the tree is at the "top" and the deep layers of the tree are at the "bottom."The top of the tree corresponds to later events, the bottom of the tree to the earliest events.
We will use the term "witness tree" in two closely related senses in the following proof.First, when we run the Swapping Algorithm, we produce a witness tree τT ; this is a random variable.Second, we might want to fix some labeled tree τ, and discuss hypothetically under what conditions it could be produced or what properties it has; in this sense, τ is a specific object.We will always use the notation τT to denote the specific witness tree produced by running the Swapping Algorithm, corresponding to resampling time T .We write τ as shorthand for τT where T is understood from context (or irrelevant).
We say that a witness tree τ appears if τT = τ for some T ≥ 0. The critical lemma that allows us to analyze the behavior of this algorithm is the following Witness Tree Lemma.Lemma 3.1 (Witness Tree Lemma).Let τ be a witness tree, with nodes labeled B 1 , . . ., B s .Then P(τ appears) ≤ P Ω (B 1 ) Note that the probability of the event B within the space Ω can be computed as follows: if B contains r 1 , . . ., r N elements from each of the permutations 1, . . ., N, (and B is not impossible) then This lemma is superficially similar to the corresponding lemma of Moser & Tardos [31].However, the proof will be far more complex, and we will require many intermediate results first.The main complication is that when we encounter a bad event involving π k (x) = y, and we perform the random swap associated with it, then we could potentially change any entry of π k .By contrast, when the Moser-Tardos algorithm resamples a variable, all the changes are confined to that variable.However, as we will see, the witness tree will leave us with enough clues about which swap was actually performed that we will be able to narrow down the possible impact of the swap.
The analysis in the next sections can be very complicated.We have two recommendations to make these proofs easier.First, the basic idea behind how to form and analyze these trees comes from [31]; the reader should consult that paper for results and examples which we omit here.Second, one can get most of the intuition behind these proofs by considering the situation in which there is a single permutation, and every bad event has the form π(x i ) = y i .In this case, the witness subdags (defined later) are more or less equivalent to the witness tree.(The main point of the witness subdag concept is, in effect, to reduce bad events to their individual elements.)When reading the following proofs, it is a good idea to keep this special case in mind.In several places, we will discuss how certain results simplify in that setting.
The following proposition is the main reason the witness tree encodes sufficient information about the sequence of swaps.Proposition 3.2.Suppose that at some time t 0 we have π t 0 k (X) = Y , and at some later time t 2 > t 0 we have π t 2 k (X) = Y .Then there must have occurred at some intermediate time t 1 some bad event including (k, X, * ) or (k, * ,Y ).
Proof.Let t 1 ∈ [t 0 ,t 2 − 1] denote the earliest time at which we had π t 1 +1 (X) = Y ; this must be due to encountering some bad event including the elements (k, x 1 , y 1 ), . . ., (k, x r , y r ) (and possibly other elements from other permutations).Suppose that π k (X) = Y was first caused by swapping entry x i , which at that time had π k (x i ) = y i , with some x .
After this swap, we have π k (x i ) = y and π k (x ) = y i .Evidently x = X or x i = X.In the second case, the bad event at time t 1 included (k, X, * ) as desired and we are done.So suppose x = X and y i = Y .So at the time of the swap, we had π k (x i ) = Y .The only earlier swaps in this resampling were with x 1 , . . ., x i−1 ; so at the beginning of time t 1 , we must have had π t 1 k (x j ) = Y for some j ≤ i.This implies that y j = Y , so that the bad event at time t 1 included (k, * ,Y ) as desired.
To explain some of the intuition behind Lemma 3.1, we note that Proposition 3.2 implies Lemma 3.1 for a singleton witness tree.Corollary 3.3.Suppose that τ is a singleton node labeled by B. Then P(τ appears) ≤ P Ω (B).
Proof.Suppose τT = τ.We claim that B must have been true of the initial configuration.For suppose that (k, x, y) ∈ B but in the initial configuration we have π k (x) = y.At some later point in time t ≤ T , the event B must become true.By Proposition 3.2, then there is some time t < t at which we encounter a bad event B including (k, x, * ) or (k, * , y).This bad event B occurs earlier than B, and B ∼ B. Hence, we would have placed B below B in the witness tree τT .
In proving Lemma 3.1, we will not need to analyze the interactions between the separate permutations, but rather we will be able to handle each permutation in a completely independent way.For a permutation π k , we define the witness subdag for permutation π k ; this is a relative of the witness tree, but which only includes the information for a single permutation at a time.Definition 3.4 (Witness subdags).For a permutation π k , a witness subdag for π k is defined to be a directed acyclic simple graph, whose nodes are labeled with pairs of the form (x, y).If a node v is labeled by (x, y), we write v ≈ (x, y).This graph must in addition satisfies the following conditions: 1.If any pair of nodes overlaps in a coordinate, that is, v ≈ (x, y) ∼ (x , y ) ≈ v , then nodes v, v must be comparable (that is, either there is a path from v to v or vice versa).
2. Every node of G has in-degree at most two and out-degree at most two.
We also may label the nodes with some auxiliary information, for example we will record that the nodes of a witness subdag correspond to bad events or nodes in a witness tree τ.
We refer to vertices close to the source nodes of G (appearing earlier in term) as the "bottom" and vertices close to the sink nodes (appearing in later in time) as the "top" of G.
The witness subdags that we will be interested in are derived from witness trees in the following manner.Definition 3.5 (Projection of a witness tree).For a witness tree τ, we define the projection of τ onto permutation π k which we denote Proj k (τ), as follows.
Consider a node v ∈ τ labeled by some bad event B = {(k 1 , x 1 , y 1 ), . . ., (k r , x r , y r )}.For each i with k i = k, we create a corresponding node v i ≈ (x i , y i ) in the graph Proj k (τ).We also include some auxiliary information indicating that these nodes came from bad event B, and in particular that all such nodes are part of the same bad event.
The edges of Proj k (τ) are formed follows.For each node v ∈ Proj k (τ), labeled by (x, y) and corresponding to v ∈ τ, we find the node w x ∈ τ (if any) which satisfies the following conditions: (P1) The depth of w x is smaller than the depth of v.
(P2) w x is labeled by some bad event B which contains (k, x, * ).
If this node w x ∈ τ exists, then it corresponds to a node w x ∈ Proj k (τ) labeled (k, x, * ); we construct an edge from v to w x .Note that, since the levels of the witness tree are independent under ∼, there can be at most one such w x and at most one such w x .
We similarly define a node w y satisfying: (P1') The depth of w y is smaller than the depth of v.
(P2') w y is labeled by some bad event B which contains (k, * , y).
If this node exists, we create an edge from v to the corresponding w y ∈ Proj k (τ) labeled (k, * , y).
Note that since edges in Proj k (τ) correspond to strictly smaller depth in τ, the graph Proj k (τ) is acyclic.Also, note that it is possible that w x = w y ; in this case we only add a single edge to Proj k (τ).
Expository remark.In the special case when each bad event contains a single element, the witness subdag is a "flattening" of the tree structure.Each node in the tree corresponds to a node in the witness subdag, and each node in the witness subdag points to the next highest occurrence of the domain and range variables.
Basically, the projection of τ onto k tells us all of the swaps of π k that occur.It also gives us some of the temporal information about these swaps that would have been available from τ.If there is a path from v to v in Proj k (τ), then we know that the swap corresponding to v must come before the swap corresponding to v .It is possible that there are a pair of nodes in Proj k (τ) which are incomparable, yet in τ there was enough information to deduce which event came first (because the nodes would have been connected through some other permutation).So Proj k (τ) does discard some information from τ, but it turns out that we will not need this information.
To prove Lemma 3.1, we will prove (almost) the following claim: Let τ be a witness tree whose nodes are labeled with bad events B 1 , . . ., B s .Then the probability that there is some , where, for a bad event B we define P k (B) in a manner, similar to P Ω (B); namely, if the bad event B contains r k elements from permutation k, then we define Unfortunately, proving this directly runs into technical complications regarding the order of conditioning.It is simpler to just sidestep these issues.However, the reader should bear this in mind as the informal motivation for the analysis in Section 4.

The conditions on a permutation π k * over time
In Section 4, we will fix a value k * , and we will describe conditions that π t k * must satisfy at various times t during the execution of the Swapping Algorithm.In this section, we are only analyzing a single permutation k * .To simplify notation, the dependence on k * will be hidden henceforth; we will discuss simply π, Proj(τ), and so forth.
This analysis can be divided into three phases.
1. We define the future-subgraph at time t, denoted G t .This is a kind of graph which encodes necessary conditions on π t , in order for τ to appear, that is, for τT = τ for some T > 0. Importantly, these conditions, and G t itself, are independent of the precise value of T .We define and describe some structural properties of these graphs.

2.
We analyze how a future-subgraph G t imposes conditions on the corresponding permutation π t , and how these conditions change over time.
3. We compute the probability that the swapping satisfies these conditions.
We will prove 1. and 2. in Section 4. In Section 5 we will put this together to prove 3. for all the permutations.

The future-subgraph
Suppose we have fixed a target graph G, which could hypothetically have been produced as the projection of τT onto k * .We begin the execution of the Swapping Algorithm and see if, so far, it is still possible that THEORY OF COMPUTING, Volume 13 (17), 2017, pp.1-41 G = Proj k * ( τT ), or if G has been disqualified somehow.Suppose we are at time t of this process; we will show that certain swaps must have already occurred at past times t < t, and certain other swaps must occur at future times t > t.
We define the future-subgraph of G at time t, denoted G t , which tells us all the future swaps that must occur.
Definition 4.1 (The future-subgraph).We define the future-subgraphs G t inductively.Initially G 0 = G.When we run the Swapping Algorithm, as we encounter a bad event (k 1 , x 1 , y 1 ), . . ., (k r , x r , y r ) at time t, we form G t+1 from G t as follows: 1. Suppose that k i = k * , and G t has a source labeled (x i , y ) where y = y i or (x , y i ) where x = x i .
Then, as will be shown in Proposition 4.2, we can immediately conclude G is impossible; we set G t+1 = ⊥, and we can abort the execution of the Swapping Algorithm.
2. Suppose that G t contains source nodes labeled (k i , x i , y i ); then G t+1 is obtained from G t by removing all such nodes.

Otherwise, we set
Proposition 4.2.For any time t ≥ 0, let τT ≥t denote the witness tree built for the event at time T , but only using the execution log from time t onwards.Then if Proj( τT ) = G we also have Proj( τT ≥t ) = G t .Note that if G t = ⊥, the latter condition is obviously impossible; in this case, we are asserting that whenever G t = ⊥, it is impossible to have Proj( τT ) = G.
Proof.We omit T from the notation, as usual.We prove this by induction on t.When t = 0, this is obviously true as τ≥0 = τ and G 0 = G.
Suppose first that τ≥t+1 does not contain any bad events B ∼ B. Then, by our rule for building the witness tree, we have τ≥t = τ≥t+1 .Hence G t = Proj( τ≥t+1 ).The graph Proj( τ≥t+1 ) cannot have any source node labeled (k, x, y) with (x, y) ∼ (x i , y i ) as such node would be labeled with B ∼ B. Hence, according to our rules for updating G t , we have G t+1 = G t .So in this case we have τ≥t = τ≥t+1 and G t = G t+1 and Proj( τ≥t ) = G t ; it follows that Proj( τ≥t+1 ) = G t+1 as desired.
Next, suppose τ≥t+1 does contain B ∼ B. Then bad event B will be added to τ≥t , placed below any such B .When we project τ≥t , then for each i with k i = k * we add a node (x i , y i ) to Proj( τ≥t ).Each such node is necessarily a source node; if such a node (x i , y i ) had a predecessor (x , y ) ∼ (x i , y i ), then the node (x , y ) would correspond to an event B ∼ B placed below B. Hence we see that Proj( τ≥t ) is obtained from Proj( τ≥t ) by adding source nodes So Proj( τ≥t ) = Proj( τ≥t+1 ) plus the addition of source nodes for each (k * , x i , y i ).By inductive hypothesis, G t = Proj( τ≥t ), so that G t = Proj( τ≥t+1 ) plus source nodes for each (k * , x i , y i ).Now our rule for updating G t+1 from G t is to remove all such source nodes, so it is clear that G t+1 = Proj( τ≥t+1 ), as desired.
Note that in this proof, we assumed that Proj( τ) = G, and we never encountered the case in which G t+1 = ⊥.This confirms our claim that whenever G t+1 = ⊥ it is impossible to have Proj( τ) = G.By Proposition 4.2, the witness subdag G and the future-subgraphs G t have a similar shape; they are all produced by projecting witness trees of (possibly truncated) execution logs.Note that if G = Proj(τ) for some tree τ, then for any bad event B ∈ τ, either B is not represented in G, or all the pairs of the form (k * , x, y) ∈ B are represented in G and are incomparable there.
The following structural decomposition of a witness subdag G will be critical.
Definition 4.3 (Alternating paths).Given a witness subdag G, we define an alternating path in G to be a simple path which alternately proceeds forward and backward along the directed edges of G.For a vertex v ∈ G, the forward path of v in G is the maximal alternating path which includes v and all the forward edges emanating from v. The backward path of G is defined analogously.Because G has in-degree and out-degree at most two, every vertex v has a unique forward and backward path (up to reflection); this justifies our reference to "the" forward and backward path.These paths may be even-length cycles.
Note that if v is a source node, then its backward path contains just v itself.This is an important type of alternating path which should always be taken into account in our definitions.
One type of alternating path, which is referred to as the W-configuration, plays a particularly important role.
Definition 4.4 (The W-configuration).Suppose v ≈ (x, y) has in-degree at most one, and the backward path contains an even number of edges, terminating at vertex v ≈ (x , y ).We refer to this alternating path as a W-configuration.(See Figure 1.) Any W-configuration can be written (in one of its two orientations) as a path of vertices labeled (x 0 , y 1 ), (x 1 , y 1 ), (x 1 , y 2 ), . . ., (x s , y s ), (x s , y s+1 ); here the vertices (x 1 , y 1 ), . . ., (x s , y s ) are at the "base" of the W-configuration.Note here that we have written the path so that the x-coordinate changes, then the y-coordinate, then x, and so on.When written this way, we refer to (x 0 , y s+1 ) as the endpoints of the W-configuration.If v ≈ (x, y) is a source node, then it defines a W-configuration with endpoints (x, y).This should not be considered a triviality or degeneracy, rather it will be the most important type of W-configuration.

The conditions on π t k * encoded by G t
At any time t, the future-subgraph G t gives certain necessary conditions on π in order for some putative τ to appear.Proposition 4.5 describes a certain set of conditions that plays a key role in the analysis.
Proposition 4.5.For a witness subdag G and integers t ≤ T , the following condition is necessary to have G = Proj( τT ≥t ): For every W-configuration in G with endpoints (x 0 , y s+1 ), we must have π t (x 0 ) = y s+1 , Proof.We prove this by induction on s.The base case is s = 0; in this case we have a source node (x, y).Suppose π t (x) = y.In order for τT to contain some bad event containing (k * , x, y), we must at some point t > t have π t (x) = y; let t be the minimal such time.By Proposition 3.2, we must encounter a bad event containing (k * , x, * ) or (k * , * , y) at some intervening time t < t .If this bad event contains (k * , x, y) then necessarily π t (x) = y contradicting minimality of t .So there is a bad event B containing either (k * , x, = y) or (k * , = x, y), earlier than the earliest occurrence of π(x) = y.This event B corresponds to a source node (x, = y) or ( = x, y) in Proj( τT ≥t ).So (x, y) cannot also be a source node of G.We now prove the induction step.Consider a W-configuration with base (x 1 , y 1 ), . . ., (x s , y s ), whose endpoints are vertices v, v labeled (x 0 , y 1 ) and (x s , y s+1 ), respectively.
At some future time t ≥ t we must encounter a bad event B involving some subset of the source nodes, say that B includes (x i 1 , y i 1 ), . . ., (x i r , y i r ) for 1 ≤ r ≤ s.As these were necessarily source nodes in Proj( τT ≥t ), we had π t (x i 1 ) = y i 1 , . . ., π t (x i r ) = y i r .After the swaps, these source nodes are removed and so the updated Proj( τT ≥t +1 ) has r + 1 new W-configurations, whose length is all smaller than s.By inductive hypothesis, the updated permutation π t +1 must then satisfy By Proposition A.2, we may suppose without loss of generality that the resampling of the bad event first swaps x i 1 , . . ., x i r in that order.Let π denote the result of these swaps; there may be additional swaps to other elements of the permutation, but we must have π t +1 (x i ) = π (x i ) for = 1, . . ., r. Evidently x i 1 swapped with x i 2 , then x i 2 swapped with x i 3 , and so on, until eventually x i r was swapped with x = (π t ) −1 y s+1 .At this point, we have π (x ) = y i 1 .Later swaps during time t may swap x with some other x, where (x, y) ∈ B. Thus, at time t + 1 we either have In the latter case, (x 0 , y) ∈ B. Thus implies that, when we encounter the bad event B at time t , there is a source node labeled (x 0 , y) ∈ Proj( τT ≥t ).This node (x 0 , y) would also occur in Proj( τT ≥t ).So (x 0 , y 1 ), (x 1 , y 1 ), . . ., (x s , y s+1 ) cannot be a W-configuration in Proj( τT ≥t ), although it is a W-configuration in G.
Thus, we conclude that x = x 0 .So (π t ) −1 y s = x = x 0 or equivalently π t (x 0 ) = y s .This in turn implies that π t (x 0 ) = y s+1 .For, by Proposition 3.2, otherwise we would have encountered a bad event involving (x 0 , * ) or ( * , y s+1 ); these would imply an additional in-neighbor of v or v , respectively, which contradicts that it is part of a W-configuration of Proj( τT ≥t ).We also define A t k to be the cardinality of Active(G t ), that is, the number of active conditions of permutation π k at time t.(The subscript k may be omitted in context, as usual.)Lemma 4.7.Suppose G is a witness subdag which has source nodes v 1 ≈ (x 1 , y 1 ), . . ., v r ≈ (x r , y r ) (plus possibly some additional source nodes Then there is a set Z ⊆ {(x 1 , y 1 ), . . ., (x r , y r )} with the following properties: 1.There is an injective function f : Z → Active(H), with the property that (x, y) ∼ f ((x, y)) for all (x, y) ∈ Z .
Intuitively, we are saying that every node (x, y) we are removing is either explicitly constrained in an "independent way" by some new condition in the graph H (corresponding to Z), or it is almost totally unconstrained.
Expository remark.We have recommended bearing in mind the special case when each bad event consists of a single element.In this case, we would have r = 1; and the stated theorem would be that either We will recursively build up a set Z i and functions f i : Z i → Active(H i ), where Z i ⊆ {(x 1 , y 1 ), . . ., (x i , y i )}, and which satisfy the given conditions up to stage i.
We remove the source node v i from H i−1 .Observe that (x i , y i ) ∈ Active(H i−1 ), but (unless there is some other vertex with the same label in G), (x i , y i ) ∈ Active(H i ).Thus, the most obvious change when we remove v i is that we destroy the active condition (x i , y i ).This may add or subtract other active conditions as well.
We will need to update Z i−1 , f i−1 .Most importantly, f i−1 may have mapped (x j , y j ) for j < i, to an active condition of H i−1 which is destroyed when v i is removed.In this case, we must re-map this to a new active condition.Note that we cannot have f i−1 (x j , y j ) = (x i , y i ) for j < i, as x i = x j and y i = y j .
There are now a variety of cases depending on the forward path of v i in H i−1 .
1.This forward path consists of a cycle, or the forward path terminates on both sides in forward edges.This is the easiest case.Then no more active conditions of H i−1 are created or destroyed.We update 2. This forward path contains a forward edge on one side and a backward edge on the other.For example, suppose the path has the form (X 1 ,Y 1 ), (X 1 ,Y 2 ), (X 2 ,Y 2 ), . . ., (X s ,Y s+1 ), where the vertices (X 1 ,Y 1 ), . . ., (X s ,Y s ) are at the base, and the node (X 1 ,Y 1 ) has out-degree 1, and the node (X s ,Y s+1 ) has in-degree 1. Suppose that (x i , y i ) = (X j ,Y j ) for some j ∈ {1, . . ., s}. (See Figure 2.) In this THEORY OF COMPUTING, Volume 13 (17), 2017, pp.1-41 case, we do not destroy any W-configurations, but we create a new W-configuration with endpoints (X j ,Y s+1 ) = (x i ,Y s+1 ).
We now update Z i = Z i−1 ∪ {(x i , y i )}.We define f i = f i−1 plus we map (x i , y i ) to the new active condition (x i ,Y s+1 ).In net, no active conditions were added or removed, and Figure 2: When we remove (X 2 ,Y 2 ), we create a new W-configuration with endpoints (X 2 ,Y 5 ).
3. This forward path was a W-configuration (X 0 ,Y 1 ), (X 1 ,Y 1 ), . . ., (X s ,Y s ), (X s ,Y s+1 ) with the pairs (X 1 ,Y 1 ), . . ., (X s ,Y s ) on the base, and we had (x i , y i ) = (X j ,Y j ).This is the most complicated situation; in this case, we destroy the original W-configuration with endpoints (X 0 ,Y s+1 ) but create two new W-configurations with endpoints (X 0 ,Y j ) and (X j ,Y s+1 ).We update We will set f i = f i−1 , except for a few small changes as follows.
If ( f i−1 ) −1 (X 0 ,Y s+1 ) = / 0 then simply set f i (x i , y i ) = (X 0 ,Y j ).Otherwise, we have f i−1 (x , y ) = (X 0 ,Y s+1 ) for some < i; so either x = X 0 or y = Y s+1 .If it is the former, set In any case, f i is updated appropriately, and in the net no active conditions are added or removed, so

The probability that the swaps are all successful
In the previous sections, we determined necessary conditions for the permutations π t k , depending on the graphs G k,t .In this section, we finish by computing the probability that the swapping subroutine causes the permutations to, in fact, satisfy all such conditions.Proposition 5.1 states the key randomness condition satisfied by the swapping subroutine.The basic intuition is as follows: suppose π : [n] → [n] is a fixed permutation with π(x) = y, and π = Swap(π; x 1 , . . ., x r ).Then π (x 1 ) has a uniform distribution over [n].Similarly, π −1 (y 1 ) has a uniform distribution over [n].However, the joint distribution is not uniform-there is essentially only one degree of freedom for the two values.In general, any subset of the variables π (x 1 ), . . ., π (x r ), π −1 (y 1 ), . . ., π −1 (y r ) will have the uniform distribution, as long as the subset does not simultaneously contain π (x i ), π −1 (y i ) for some i ∈ [r].
Consider a list (x 1 , y 1 ), . . ., (x q , y q ) satisfying the following properties: 3. All x are distinct; all y are distinct.
Then we have the bound: Expository remark.Consider the special case when each bad event contains a single element.In that case, we only need to use this result for r = 1.There are two possibilities for s; either s = 0 in which case this probability on the right is 1 − q/n (i.e., the probability that π (x 1 ) = y 1 , . . ., y q ); or s = 1 in which case this probability is 1/n (i.e., the probability that π (x 1 ) = y 1 ).
Proof.Define the function We will prove this proposition by induction on s, r, considering a number of separate cases.
1. Suppose s > 0 and x 1 = x 1 .Then, in order to satisfy the desired conditions, we must swap x 1 to x = π −1 (y 1 ); this occurs with probability 1/n.The subsequent r − 1 swaps starting with the permutation π(x 1 x ) must now satisfy the conditions π (x 2 ) = y 2 , . . ., π (x q ) = y q .We claim that (x i , π(x 1 x )x i ) ∼ (x i , y i ) for i = 2, . . ., s.If x = x 2 , . . ., x s , this is immediate.Otherwise, suppose x = x j .If x j = x j , then we again still have (x j , π(x 1 x )x j ) ∼ (x j , y j ).If y j = y j , then this implies that y 1 = y j = y j , which contradicts that the y j = y 1 .
So we apply the induction hypothesis to π(x 1 x ); in the induction, we subtract one from n, q, r, s.This gives as desired.
2. Similarly, suppose s > 0 and suppose y 1 = y 1 .By Proposition A.3, we would obtain the same distribution if we executed (π ) −1 = Swap(π −1 ; y 1 , . . ., y r ), so Now, the right-hand side has swapped the roles of x 1 /y 1 ; in particular, it now falls under the previous case (1) already proved, and so the right-hand side is at most g(n, r, s, q) as desired.
THEORY OF COMPUTING, Volume 13 (17), 2017, pp.1-41 3. Suppose s = 0 and there are indices i ∈ [r], j ∈ [q] with (x i , y i ) ∼ (x j , y j ).By Proposition A.2, we can assume without loss of generality that (x 1 , y 1 ) ∼ (x 1 , y 1 ).So, in this case, we are really in the case with s = 1.This is covered by case (1) or case (2), which we have already shown.Thus, we have Here, we are using our hypothesis that n ≥ q + (r − s) = q + r.
We apply Proposition 5.1 to upper-bound the probability that the Swapping Algorithm successfully swaps when it encounters a bad event.
Proposition 5.2.Suppose we encounter a bad event B at time t containing elements (k, x 1 , y 1 ), . . ., (k, x r , y r ) from permutation k (and perhaps other elements from other permutations).Then the probability that π t+1 k satisfies all the active conditions of its future-subgraph, conditional on all past events and all other swappings at time t, is at most .
Recall that we have defined A t k = | Active(G k,t )| and we have defined Expository remark.Consider the special case when each bad event consists of a single element.In this case, P k (B) = 1/n, and the stated theorem is now: either A t+1 = A t , in which case the probability that π satisfies its swapping condition is 1/n; or A t+1 = A t − 1; in which case the probability that π satisfies its swapping condition is 1 − A t+1 /n.
For each (x, y) ∈ Z, we have y = π t k (x), and there is an injective function f : Z → Active(H) and (x, y) ∼ f ((x, y)).By Proposition A.2, we can assume without loss of generality Z = {(x 1 , y 1 ), . . ., (x s , y s )} and f (x i , y i ) = (x i , y i ).In order to satisfy the active conditions on G k,t+1 , the swapping must cause π t+1 k (x i ) = y i for i = 1, . . ., q.
By Lemma 4.7, we have A t k = A t+1 k + (r − s) = q + (r − s).Since A t k ≤ n, all the conditions of Proposition 5.1 are satisfied.Thus this probability is at most .
We finally have all the pieces necessary to prove Lemma 3.1.
Lemma 3.1.Let τ be a witness tree, with nodes labeled B 1 , . . ., B s .Then P(τ appears) ≤ P Ω (B 1 ) Proof.The Swapping Algorithm, as we have defined it, begins by selecting the permutations uniformly at random.One may also consider fixing the permutations to some arbitrary (not random) value, and allowing the Swapping Algorithm to execute from that point onward.We refer to this as starting at an arbitrary state of the Swapping Algorithm.We will prove the following by induction on τ : The probability, starting at an arbitrary state of the Swapping Algorithm, that the subsequent swaps cause subtree τ to appear, is at most When τ = / 0, the RHS of (5.1) is equal to one so this is vacuously true.To show the induction step, note that in order for τ to appear, it must be that some B is resampled, where some node v ∈ τ is labeled by B. Suppose we condition on that v is the first such node, resampled at time t.A necessary condition to have τT = τ for some T ≥ t is that each π t+1 k satisfies all the active conditions on G k,t+1 .By Proposition 5.2, this has probability at most Next, if this event occurs, then subsequent resamplings must cause τT ≥t+1 = τ − v.We bound this probability using the induction hypothesis.Note that the induction hypothesis gives a bound conditional on any starting configuration of the Swapping Algorithm, so we may multiply these probabilities to get completing the induction argument.
THEORY OF COMPUTING, Volume 13 (17), 2017, pp.1-41 We now consider the necessary conditions to produce the entire witness tree τ, and not just fragments of it.First, the original permutations π 0 k must satisfy the active conditions of the respective witness subdags Proj k (τ).For each permutation k, this occurs with probability Next, the subsequent sampling must be compatible with τ; by (5.1) this has probability at most Again, note that the bound in (5.1) is conditional on any starting position of the Swapping Algorithm, hence we may multiply these probabilities.In total we have We note a counterintuitive aspect to this proof.The natural way of proving this lemma would be to identify, for each bad event B ∈ τ, some necessary event occurring with probability at most P Ω (B).This is the general strategy in Moser & Tardos [31] and related constructive LLL variants such as [18], [1], [20].This is not the proof we employ here; there is an additional factor of (n k − A 0 k )!/n! which is present for the original permutation and is gradually "discharged" as active conditions disappear from the future-subgraphs.

The constructive LLL for permutations
Now that we have proved the Witness Tree Lemma, the remainder of the analysis is essentially the same as for the Moser-Tardos algorithm [31].Using arguments and proofs from [31] with our key lemma, we can now easily show our key theorem: Theorem 6.1.Suppose some function x : B → (0, 1) satisfies, for every B ∈ B, the condition Then the Swapping Algorithm terminates with probability one.The expected number of iterations in which we resample B is at most x(B)/(1 − x(B)).
In the "symmetric" case, this gives us the well-known LLL criterion: Corollary 6.2.Suppose each bad event B ∈ B has probability at most p, and is dependent with at most d bad events.Then if ep(d + 1) ≤ 1, the Swapping Algorithm terminates with probability one; the expected number of resamplings of each bad event is O(1).Some extensions of the LLL, such as the Moser-Tardos distribution bounds shown in [16], the cluster-expansion LLL criterion [9] and its connection to the Moser-Tardos algorithm [32], or the partial resampling of [18], follow almost immediately here.There are a few extensions which require slightly more discussion.THEORY OF COMPUTING, Volume 13 (17), 2017, pp.1-41

Lopsidependence
As in [31], it is possible to slightly restrict the notion of dependence.We can redefine the relation ∼ on bad events by setting B ∼ B iff 1. B = B , or 2. there is some (k, x, y) ∈ B, (k, x , y ) ∈ B with either x = x , y = y or x = x , y = y .
In particular, bad events which share the same triple (k, x, y), are not caused to be dependent.
Proving that the Swapping Algorithm still works in this setting requires only a slight change in our definition of Proj k (τ).Now, the tree τ may have multiple copies of any given triple (k, x, y) on a single level.When this occurs, we create the corresponding nodes v ≈ (x, y) ∈ Proj k (τ); edges are added between such nodes in an arbitrary (but consistent) way.The remainder of the proof remains as before.

LLL for injective functions
The analysis of [28] considers a slightly more general setting for the LLL, in which we select random where m k ≤ n k .In fact, our Swapping Algorithm can be extended to this case.We simply define a permutation π k on [n k ], where the entries π k (m k + 1), . . ., π k (n k ) are "dummies" which do not participate in any bad events.The LLL criterion for the extended permutation π k is exactly the same as the corresponding LLL criterion for the injection f k .Because all of the dummy entries have the same behavior, it is not necessary for the Swapping Algorithm to keep track of the dummy entries exactly; they are needed only for the analysis.

Comparison with the approaches of Achlioptas & Iliopoulos and Harvey & Vondrák
Achlioptas & Iliopoulos [1] and Harvey & Vondrák [20] gave generic frameworks for analyzing variants of the Moser-Tardos algorithm, applicable to different types of combinatorial configurations.These frameworks can include vertex colorings, permutations, Hamiltonian cycles of graphs, spanning trees, matchings, and other settings.For the case of permutations, both of these frameworks give a version of the Swapping Algorithm and show that it terminates under the same conditions as we do, which in turn are the same conditions as the LLL (Theorem 1.1).
The key difference between our approach and [1,20] is that they enumerate the entire history of all resamplings to the permutations.In contrast, our proof is based on the Witness Tree Lemma; this is a much more succinct structure that ignores most of the resamplings, and only enumerates the few resamplings that are necessary to justify a single item in the execution log.Their proofs are much simpler than ours; a major part of the complexity of our proof lies in the need to argue that the bad events which were ignored by the witness tree do not affect the probabilities.(The ignored bad events do interact with the variables we need to track for the witness tree, but do so in a "neutral" way.) If our only goal is to prove that the Swapping Algorithm terminates in polynomial time, then the other two frameworks give a better and simpler approach.However, the Witness Tree Lemma allows much more precise estimates for many types of events.The main reason for this precision is the following: suppose we want to show that some event E has a low probability of occurring during or after the THEORY OF COMPUTING, Volume 13 (17), 2017, pp.1-41 execution of the Swapping Algorithm.The proof strategy of Moser & Tardos is to take a union bound over all witness trees that correspond to this event.In this case, we show a probability bound which is proportional to the total weight of all such witness trees.This can be a relatively small number as only the witness trees connected to E are relevant.Our analysis, which is also based on witness trees, is able to show similar types of bounds.
However, the analysis of Achlioptas & Iliopoulos and Harvey & Vondrák is not based on witness trees, but the much larger set of full execution logs.The number of possible execution logs can be exponentially larger than the number of witness trees.It is very inefficient to take a union bound over all such logs.Hence, Achlioptas & Iliopoulos and Harvey & Vondrák give bounds which are exponentially weaker (in a certain technical sense) than the ones we provide.
Many properties of the Swapping Algorithm depend on the fine degree of control provided by the Witness Tree Lemma, and it seems difficult to obtain them from the alternate LLLL approaches.We list a few of these properties here.The LLL criterion without slack.As a simple example of the problems caused by taking a union bound over execution logs, suppose that we satisfy the LLL criterion without slack, say ep(d + 1) = 1; here, as usual, p and d are bounds, respectively on the probability of any bad event and the degree of any bad event in the dependency graph.In this case, we show that the expected time for our Swapping Algorithm to terminate is O(m).In contrast, in Achlioptas & Iliopoulos or Harvey & Vondrák, they require satisfying the LLL criterion with slack ep(1 + ε)(d + 1) = 1, and achieve a termination time of O(m/ε).They require this slack term in order to damp the exponential growth in the number of execution logs.(Harvey & Vondrák show that if the symmetric LLL criterion is satisfied without slack, then the Shearer criterion [34] is satisfied with slack ε = Ω(1/m).Thus, they would achieve a running time of O(m 2 ) without slack.) Arbitrary choice of which bad event to resample.The Swapping Algorithm as we have stated it is actually underdetermined, in that the choice of which bad event to resample is arbitrary.In contrast, in both Achlioptas & Iliopoulos and Harvey & Vondrák, there is a fixed priority on the bad events.(Kolmogorov [26] has shown that this restriction can be removed in certain special cases of the Achlioptas & Iliopoulos setting, including for random permutations and matchings.)This freedom can be quite useful.For example, in Section 7 we consider a parallel implementation of our Swapping Algorithm.We will select which bad events to resample in a quite complicated and randomized way.However, the correctness of the parallel algorithm will follow from the fact that it simulates some serial implementation of the Swapping Algorithm.The Moser-Tardos distribution.The Witness Tree Lemma allows us to analyze the so-called "Moser-Tardos (MT) distribution," first discussed by [16].The LLL and its algorithms ensure that bad events B cannot possibly occur.In other words, we know that the configuration produced by the LLL has the property that no B ∈ B is true.In many applications of the LLL, we may wish to know more about such configurations, other than they exist.
We give two examples where this useful for the ordinary, variable-based LLL.First, suppose that we have some weights for the values of our variables, and we define the objective function on a solution ∑ i w(X i ); in this case, if we are able to estimate the probability that a variable X i takes on value j in the output of the LLL (or Moser-Tardos algorithm), then we may be able to show that configurations with a good objective function exist.A second example is when the number of bad events becomes too large, perhaps exponentially large.In this case, the Moser-Tardos algorithm cannot test them all.However, we may still be able to ignore a subset of the bad events, and argue that the probability that they are true at the end of the Moser-Tardos algorithm is small even though they were never checked.
The Witness Tree Lemma gives us an extremely powerful result concerning this MT distribution, which carries over to the Swapping Algorithm.
Then the probability that E is true in the output of the Swapping Algorithm, is at most Proof.See [16] for the proof of this for the ordinary MT algorithm; the extension to the Swapping Algorithm is straightforward.
Bounds on the depth of the resampling process.One key requirement for parallel variants of the Moser-Tardos algorithm appears to be that the resampling process has logarithmic depth.This is equivalent to showing that there are no deep witness trees.This follows easily from the Witness Tree Lemma, along the same lines as in the original paper of Moser & Tardos, but appears to be very difficult in the other LLLL frameworks.
Partial resampling.In [18], a partial resampling variant of the Moser-Tardos algorithm was developed.In this variant, one only resamples a small, random subset of the variables (or, in our case, permutation elements) which determine a bad event.To analyze this variant, [18] developed an alternate type of witness tree, which only records the variables which were actually resampled.Ignoring the other variables can drastically prune the space of witness trees.Again, this does not seem to be possible in other LLLL frameworks in which the full execution log must be recorded.We will see an example of this in Theorem 8.2; we do not know of any way to show results such as Theorem 8.2 using the frameworks of either Achlioptas & Iliopoulos or Harvey & Vondrák.

A parallel version of the Swapping Algorithm
The Moser-Tardos resampling algorithm for the ordinary LLL can be transformed into an RNC algorithm by allowing a slight slack in the LLL's sufficient condition [31].The basic idea is that in every round, we select a maximal independent set (MIS) of bad events to resample.Using the known distributed/parallel algorithms for MIS, this can be done in RNC; the number of resampling rounds is then shown to be logarithmic w. h.p. ("with high probability"), in [31].
In this section, we will describe a parallel algorithm for the Swapping Algorithm, which runs along the same lines.However, everything is more complicated than in the case of the ordinary LLL.In the Moser-Tardos algorithm, events which are not connected to each other cannot affect each other in any way.For the permutation LLL, such events can interfere with each other, but do so rarely.Consider the following example.Suppose that at some point we have two active bad events, "π k (1) = 1" and "π k (2) = 2," and so we decide to resample them simultaneously (since they are not connected to each other, and hence constitute an independent set).When we resample the bad event π k (1) = 1, we may swap 1 with 2; this automatically fixes the second bad event as well.The sequential algorithm, in this case, would only swap a single element.The parallel algorithm should likewise not perform a second swap for the second bad event, or else it would be oversampling.Avoiding this type of conflict is quite tricky.
Let n = n 1 + • • • + n N ; since the output of the algorithm will be the contents of the permutations π 1 , . . ., π k , this algorithm should be measured in terms of n, and we must show that this algorithm runs in polylog(n) time.Our algorithm will require that |B|, the total number of bad events, is polynomial in n, and that every element B ∈ B has size |B| ≤ polylog(n); these conditions hold for many cases.
2. We proceed through a series of rounds while there is some true bad event.In round i (i = 1, 2, . . .,) do the following: 3. Let V i,1 ⊆ B denote the set of bad events which are currently true at the beginning of round i.
We will attempt to fix the bad events in V i,1 through a series of subrounds.This may introduce new bad events, but we will not fix any newly created bad events until round i + 1.
Repeat the following process for j = 1, 2, . . .as long as V i, j = / 0: 4. Let I i, j be an MIS of V i, j . 5. For each bad event B = {(k 1 , x 1 , y 1 ), . . ., (k r , x r , y r )} ∈ I i, j , choose the swaps corresponding to B. We select each z ∈ [n k ], which is the element to be swapped with π k (x ) according to procedure Swap.Do not perform the indicated swaps at this time though!We refer to (k 1 , x 1 ), . . ., (k r , x r ) as the swap-sources of B and the (k 1 , z 1 ), . . ., (k r , z r ) as the swap-mates of B. 6. Select a random ordering ρ i, j of the elements of I i, j .Consider the graph G i, j whose vertices correspond to elements of I i, j , with an edge on (B, B ) if ρ i, j (B) < ρ i, j (B ) and one of the swap-mates of B is a swap-source of B .Generate I i, j ⊆ I i, j as the lexicographically first MIS (LFMIS) of the resulting graph G i, j , with respect to vertex ordering ρ i, j .7. For each permutation π k , enumerate all the transpositions (x z) corresponding to elements of I i, j , arranged in order of ρ i, j .Say these transpositions are, in order (x 1 , z 1 ), . . .(x , z ).Compute, in parallel for all π k , the composition π k = π k (x 1 z 1 ) . . .(x z ). 8. Update V i, j+1 from V i, j by removing all elements which are either no longer true for the current permutation, or are connected via ∼ to some element of I i, j .
Most steps of this algorithm can be implemented using standard parallel methods.For example, step (1) can be performed simply by having each element of [n k ] choose a random real and then executing a parallel sort.The independent set I i, j can be found in time in polylogarithmic time using [6,29].The difficult step to parallelize is in selecting the LFMIS I i, j .In general, the problem of finding the LFMIS is P-complete [11], hence we do not expect a generic parallel algorithm for this.However, what saves us it that the ordering ρ i, j and the graph G i, j are constructed in a highly random fashion.
This allows us to use the following greedy algorithm to construct I i, j , the LFMIS of G i, j : 1. Let H 1 be the directed graph obtained by orienting all edges of G i, j in the direction of ρ i, j .Repeat the following for s = 1, 2, . . .,: 2. If H s = / 0 terminate.
3. Find all source nodes of H s .Add these to I i, j .
4. Construct H s+1 by removing all source nodes and all successors of source nodes from H s .
The output of this algorithm is the LFMIS I i, j .Each step can be implemented in parallel time O(log n).
The number of iterations of this algorithm is at most the length of the longest directed path in G i, j .So it suffices it show that, w. h.p., all directed paths in G i, j have length polylog(n).
Proposition 7.1.Let I ⊆ B be an an arbitrary independent set of true bad events, and suppose all elements of B have size ≤ M. Let G = G i, j be the graph constructed in Step ( 6) of the Parallel Swapping Algorithm.Then w. h.p., every directed path in G has length O(M + log n).
Proof.One of the main ideas below is to show that for the typical B 1 , . . ., B ∈ I, where = 5(M + log n), the probability that B 1 , . . ., B form a directed path is small.Suppose we select B 1 , . . ., B ∈ I uniformly at random without replacement.Let us analyze how these could form a directed path in G. (We may assume |I| > or otherwise the result holds trivially.)First, it must be the case that ρ(B 1 ) . This occurs with probability 1/ !. Next, the swap-mates of B s must overlap the swap-sources of B s+1 , for s = 1, . . ., − 1.Now, B s has O(M) swap-mates; each such swap-mate can overlap with at most one element of I, since I is an independent set.Conditional on having chosen B 1 , . . ., B s , there remain |I| − s choices for B s+1 .This gives that the probability of having B s with an edge to B s+1 , conditional on the previous events, is at most M/(|I| − s).(The fact that swap-mates are chosen randomly does not give too much of an advantage here.) Putting this all together, the total probability that there is a directed path on B 1 , . . ., B is The above was for a random B 1 , . . ., B , so the probability that there is some such length-path is at most Proof.We begin by showing that, if B ∈ I, where I is an arbitrary independent set of B, then with probability at least 1 − 1/(2M ln n) we have B ∈ I as well, where I is the LFMIS associated with I.
Observe that if there is no B ∈ I such that ρ(B ) < ρ(B) and such that a swap-mate of B overlaps with a swap-source of B, then B ∈ I (this is not a necessary condition).We will analyze the ordering ρ using the standard trick, in which each element B ∈ I chooses a rank W (B) ∼ Uniform[0, 1], independently and identically.The ordering ρ is then formed by sorting in increasing ordering of W .In this way, we are able to avoid the dependencies induced by the rankings.For the moment, suppose that the rank W (B) is fixed at some real value w.We will then count how many B ∈ I satisfy W (B ) < w and a swap-mate of B overlaps a swap-source of B.
So consider some swap-source s of B in permutation k, and consider some B j ∈ I which has r j other elements in permutation k.For = 1, . . ., r j , there are n k − + 1 possible choices for the th swap-mate from B j , and hence the total expected number of swap-mates of B which overlap s is at most Next, sum over all B j ∈ I. Since I is an independent set, we must have ∑ r j ≤ n k − 1.Thus, by concavity of the ln function, Summing over all swap-sources of B, the total probability that there is some B with ρ(B ) ≤ B and for which a swap-mate overlaps a swap-source of B, is at most w|B| ln n ≤ wM ln n.By Markov's inequality, Integrating over w gives Now, using this fact, we show that V i, j is decreasing quickly in size.For, suppose B ∈ V i, j .So B ∼ B for some B ∈ I i, j , as I i, j is a maximal independent set (possibly B = B ).We will remove B from V i, j+1 if B ∈ I i, j , which occurs with probability at least 1 − 1/(2M ln n).As B was an arbitrary element of V i, j , this shows that For j = Ω(M log 2 n), this implies that 1) .This in turn implies that V i, j = / 0 with high probability, for j = Ω(M log 2 n).
To finish the proof, we must show that the number of rounds is itself bounded w. h.p.We begin by showing that Witness Tree Lemma remains valid in the parallel setting.Proposition 7.3.When we execute the Parallel Swapping Algorithm, we may generate an "execution log" according to the following rule: suppose that we resample B in round i, j and B in round i , j .Then we place B before B iff: 1. i < i ; OR 2. i = i AND j < j ; OR 3. i = i and j = j and ρ i, j (B) < ρ i , j (B ); that is, we order the resampled bad events lexicographically by round, subround, and then rank ρ.
Given such an execution log, we may also generate witness trees in the same manner as the sequential algorithm.For any witness tree τ, this procedure ensures that P(τ appears) ≤ ∏ B∈τ P Ω (B) .
Proof.Observe that the choice of swaps for a bad event B at round i, subround j, and rank ρ i, j (B), is only affected by the events in earlier rounds / subrounds as well as other B ∈ I i, j with ρ i, j (B ) < ρ i, j (B).Thus, we can view this parallel algorithm as simulating the sequential algorithm, with a particular rule for selecting the bad event to resample.Namely, we keep track of the sets V i and I i, j as we do for the parallel algorithm, and within each subround we resample the bad event in I i, j with the minimum value of ρ i, j (B).This is why it is critical in step (6) that we select I i, j to be the lexicographically first MIS; this means that the presence of B ∈ I i, j cannot be affected with B with ρ(B ) > ρ(B).Proposition 7.4.Let B be any resampling performed at the i th round of the Parallel Swapping Algorithm (that is, B ∈ I i, j for some integer j > 0).Then the witness tree corresponding to the resampling of B has height exactly i.
Proof.First, note that if we have B ∼ B in the execution log, where B occurs earlier in time, and the witness tree corresponding to B has height i, then the witness tree corresponding to B must have height i + 1.So it will suffice to show that if B ∈ I i, j , then we must have B ∼ B for some B ∈ I i−1, j .
The bad event B must be true at the beginning of round i.By Proposition 3.2, either B was already true at the beginning of round i − 1, or some bad event B ∼ B was resampled at round i − 1.If it is the latter, we are done.If B was true at the beginning of round i − 1, then B ∈ V i−1,1 .In order for B to have been removed from V i−1 , then either we had B ∼ B ∈ I i−1, j , in which case we are also done, or after some subround j the event B was no longer true.But again by Proposition 3.2, in order for B to become true again at the beginning of round i, there must have been some bad event B ∼ B encountered later in round i − 1.This gives us the key bound on the running time of the Parallel Swapping Algorithm.We give only a sketch of the proof, since the argument is identical to that of [31].Proposition 7.5.Suppose that ε > 0 and that some function x : B → (0, 1) satisfies, for every B ∈ B, the condition THEORY OF COMPUTING, Volume 13 (17), 2017, pp.1-41 Then, w. h.p., the Parallel Swapping Algorithm terminates after Proof.Consider the event that some B ∈ B is resampled after i rounds of the Parallel Swapping Algorithm.In this case, τ has height i.As shown in [31], the sum, over all witness trees of some height h, of the product of the probabilities of the constituent events in the witness trees, is decreasing exponentially in h.So, for any fixed B, the probability that this occurs is exponentially small; this remains true after taking a union-bound over the polynomial number of B ∈ B.
We can put this analysis all together to show the following result.

Algorithmic Applications
The LLL for permutations plays a role in diverse combinatorial constructions.Using our algorithm, nearly all of these constructions become algorithmic.We examine a few selected applications now.

Latin transversals
Consider an n × n matrix A, whose entries come from a set C which are referred to as colors.A Latin transversal of this matrix is a permutation π ∈ S n , such that no color appears twice among the entries A(i, π(i)); that is, there are no i = j with A(i, π(i)) = A( j, π( j)).A typical question in this area is the following: suppose each color appears at most ∆ times in the matrix.How large can ∆ be so as to guarantee the existence of a Latin transversal?In [13], a proof using the probabilistic form of the LLL was given, showing that ∆ ≤ n/(4e) suffices.This was the first application of the LLL to permutations.This bound was subsequently improved by [9] to the criterion ∆ ≤ (27/256)n; this uses a variant of the probabilistic Local Lemma which is essentially equivalent to Pegden's variant on the constructive Local Lemma.Our algorithmic LLL can almost immediately transform the existential proof of [9] into a constructive algorithm.To our knowledge, this is the first polynomial-time algorithm for constructing such a transversal.THEORY OF COMPUTING, Volume 13 (17), 2017, pp.1-41 Theorem 8.1.Suppose ∆ ≤ (27/256)n.Then there is a Latin transversal of the matrix.Furthermore, the Swapping Algorithm selects such a transversal in polynomial time.
Proof.For any quadruple i, j, i , j with A(i, j) = A(i , j ), we have a bad event π(i) = j ∧ π(i ) = j .Such an event has probability 1/(n(n − 1)).We apply Pegden's criterion [32] using the weight function µ(B) = α to every bad event B, where α is a scalar to be determined.
This bad event can have up to four types of neighbors (i 1 , j 1 , i 1 , j 1 ), which overlap on one of the four coordinates i, j, i , j ; as discussed in [9], all the neighbors of any type are themselves neighbors in the dependency graph.Since these are all the same, we will analyze just the first type of neighbor, one which shares the same value of i, that is i 1 = i.We now may choose any value for j 1 (n choices).At this point, the color A(i 1 , j 1 ) is determined, so there are ∆ − 1 remaining choices for i 1 , j 1 .
By Lemma 3.1 and Pegden's criterion [32], a sufficient condition for the convergence of the Swapping Algorithm is that Routine algebra shows that this has a positive real root α when ∆ ≤ (27/256)n.
In [36], Szabó considered a generalization of this question: suppose that we seek a transversal, such that no color appears more than s times.When s = 1, this is asking for a Latin transversal.Szabó gave similar criteria "∆ ≤ γ s n" for s a small constant.Such bounds can be easily obtained constructively using the permutation LLL as well.By combining the permutation LLL with the partial resampling approach of [18], we can provide asymptotically optimal bounds for large s.Theorem 8.2.Suppose ∆ ≤ (s−c √ s)n, where c is a sufficiently large constant.Then there is a transversal of the matrix in which each color appears no more than s times.This transversal can be constructed in polynomial time.
Proof.For each set of s appearances of any color, we have a bad event.We use the partial resampling framework, to associate the fractional hitting set which assigns weight s r −1 to any r appearances of a color, where r = √ s .We first compute the probability of selecting a given r-set X.From the fractional hitting set, this has probability s r −1 .In addition, the probability of selecting the indicated cells is (n − r)!/n!.So we have Next, we compute the dependency of the set X. First, we may select another X which overlaps with X in a row or column; the number of such sets is 2rn ∆ r−1 .Next, we may select any other r-set with the same color as X (this is the dependency due to in the partial resampling framework; see [18] for more details).The number of such sets is ∆ r .So the LLL criterion is satisfied if e × s r THEORY OF COMPUTING, Volume 13 (17), 2017, pp.1-41 Simple calculus now shows that this can be satisfied when ∆ ≤ (s − O( √ s))n.Also, it is easy to detect a true bad event and resample it in polynomial time, so this gives a polynomial-time algorithm.
Our result depends on the Swapping Algorithm in a fundamental way-it does not follow from Theorem 1.1 (which would roughly require ∆ ≤ (s/e)n).Hence, prior to this paper, we would not have been able to even show the existence of such transversals; here we provide an efficient algorithm as well.To see that our bound is asymptotically optimal, consider a matrix in which the first s + 1 rows all contain a given color, a total multiplicity of ∆ = (s + 1)n.Then the transversal must contain that color at least s + 1 times.

Rainbow Hamiltonian cycles and related structures
The problem of finding Hamiltonian cycles in the complete graph K n , with edges of distinct colors, was first studied in [17].This problem is typically phrased in the language of graphs and edges, but we can rephrase it in the language of Latin transversals, with the additional property that the permutation π has full cycle.How often can a color appear in the matrix A, for this to be possible?In [3], it was shown that such a transversal exists if each color appears at most ∆ = n/32 times. 1 This proof is based on applying the non-constructive LLLL to the probability space induced by a random choice of full-cycle permutation.This result was later generalized in [15], which showed that if each color appears at most ∆ ≤ c 0 n times for a certain constant c 0 > 0, then not only is there a full-cycle Latin transversal, but there are also cycles of each length 3 ≤ k ≤ n.The constant c 0 was somewhat small, and this result was also non-constructive.Theorem 8.3 uses the Swapping Algorithm to construct Latin transversals with essentially arbitrary cycle structures; this generalizes [15] and [3] quite a bit.Theorem 8.3.Suppose that each color appears at most ∆ ≤ 0.027n times in the matrix A, and n is sufficiently large.Let τ be any permutation on n letters, whose cycle structure contains no fixed points nor swaps (2-cycles).Then there is a Latin transversal π which is conjugate to τ (i.e., has the same cycle structure); furthermore the Swapping Algorithm finds it in polynomial time.Also, the Parallel Swapping Algorithm finds it in time polylog(n).
Proof.We cannot apply the Swapping Algorithm directly to the permutation π, because we will not be able to control its cycle structure.Rather, we will set π = σ −1 τσ , and apply the Swapping Algorithm to σ .
A bad event is that A(x, π(x)) = A(x , π(x )) for some x = x .Using the fact that τ has no fixed points or 2-cycles, we can see that this is equivalent to one of the following two situations: (A) There are i, i , x, y, x , y such that σ (x) = i, σ (y) = τ(i), σ (x ) = i , σ (y ) = τ(i ), and x, y, x , y are distinct, and i, i , τ(i), τ(i ) are distinct, and A(x, y) = A(x , y ) or (B) There are i, x, y, z with σ (x) = i, σ (y) = τ(i), σ (z) = τ 2 (i), and all of x, y, z are distinct, and A(x, y) = A(y, z).We will refer to the first type of bad event as an event of type A led by i (such an event is also led by i ); we will refer to the second type of bad event as type B led by i.
Note that in an A-event, the color is repeated in distinct column and rows, and in a B-event the column of one coordinate is the row of another.So, to an extent, these events are mutually exclusive.Much of the complexity of the proof lies in balancing the two configurations.To a first approximation, the worst case occurs when A-events are maximized and B-events are impossible.This intuition should be kept in mind during the following proof.
We will define the function µ for Pegden's criterion as follows.Each event of type A is assigned the same weight µ A , and each event of type B is assigned weight µ B .The event of type A has probability (n − 4)!/n! and each event of type B has probability (n − 3)!/n!.In the following proof, we shall need to compare the relative magnitude of µ A , µ B .In order to make this concrete, we set (In deriving this proof, we left these constant coefficients undetermined until the end of the computation, and we then verified that all desired inequalities held.)To apply Pegden's criterion [32] for the convergence of the Swapping Algorithm, we will need to analyze the independent sets of neighbors of each bad event.To count these, it will be convenient to define the following sums.We let t denote the sum of µ(X) over all bad events X involving some fixed term σ (x).Let s denote the sum of µ(X) over all bad events X (of type either A or B) led by any fixed value i, and let b denote the sum of µ(X) over B-events X led by any fixed value i. Recall that each bad event of type A is led by i and also by i .
We now examine how to compute the term t.We will enumerate all the bad events that involve σ (x) for some fixed value x.These correspond to color-repetitions involving either row or column x in the matrix A. Let c i denote the number of occurrences of color i in column x of the matrix, excluding A(x, y); similarly, let r i denote the number of occurrences of color i in row y of the matrix, again excluding A(x, y).
A repeated color may have the one of the three following forms: 1. A(y, x) = A(x, y ) where y = y ; 2. A(x, y) = A(x , y ), where x = x , y = y ; 3. A(y, x) = A(y , x ), where x = x , y = y .
If v 1 , v 2 , v 3 denote the number of such repetitions, then we see that A repetition of the first type must correspond to an B-event, in which σ (y) = i, σ (x) = τ(i), σ (y ) = τ 2 (i) for some i.For a repetition of the second type, if x = y this correspond to an A-event in which σ (x) = i, σ (y) = τ(i), σ (x ) = i , σ (y ) = τ(i ) for some i, i or alternatively if x = y it corresponds to a B-event in which σ (x) = i, σ (y) = τ(i), σ (y ) = τ 2 (i) for some i.The third type of repetition is similar to the second type.Summing the three cases gives The RHS of this expression is maximized when there are n distinct colors with c j = 1 and n distinct colors with r j = 1.For, suppose that a color has (say) c j > 1.If we decrement c j by 1 while adding a new color with c j = 1, this changes the RHS by This gives us t ≤ 2n 3 ∆µ A .
Similarly, let us consider s.Given i, we choose some y with σ (y) = τ(i), and we list all color repetitions A(x, y) = A(x , y ) or A(x, y) = A(y, z).The number of the former is at most ∑ j c j (∆ − c j − r j ) and the number of the latter is at most ∑ j c j r j .As before, this is maximized when each color appears once in the column, leading to s ≤ n 3 ∆µ A .
Now fix some bad event of type A, with parameters i, i , x, y, x , y , and let us enumerate its independent sets of neighbors.This could have one or zero events involving σ (x) and similarly for σ (y), σ (x ), σ (y ); this gives a total contribution of (1 +t) 4 .The neighbor could also overlap on i; the total set of possibilities is either zero such events, a B-event led by i − 2, a B-event led by i − 2 and an event led by i, an event led by i − 1, an event led by i − 1 and an event led by i + 1, an event led by i, an event led by i + 1.There is an identical factor for the contributions of bad events led by i − 2, . . ., i + 1.In total, Pegden's criterion is Applying the same type of analysis to an event of type B gives the criterion: Putting all these constraints together gives a complicated system of polynomial equations, which can be solved using a symbolic algebra package.Indeed, the stated values of µ A , µ B satisfy these conditions when ∆ ≤ 0.027n and n is sufficiently large.
Hence the Swapping Algorithm terminates, resulting in the desired permutation π = σ −1 τσ .It is easy to see that the Parallel Swapping Algorithm works as well.
We note that for certain cycle structures, such as the full cycle σ = (123 . . .n − 1 n) and n/2 transpositions σ = (12)(34) . . .(n − 1 n), the LLLL can be applied directly to the permutation π.This gives a qualitatively similar condition, of the form ∆ ≤ cn, but the constant term is slightly better than ours.For some of these settings, one can also apply a variant of the Moser-Tardos algorithm to find such permutations [1].However, these results do not apply to general cycle structures, and they do not give parallel algorithms.

Strong chromatic number
Consider a graph G, whose vertices are partitioned into r blocks each of size b, i. e., V = We would like to b-color the vertices, such that every block has exactly b colors, and such that no edge has both endpoints with the same color (i.e., it is a proper vertex coloring).This is referred to as a strong coloring of the graph.If this is possible for any such partition of the vertices into blocks of size b, then we say that the graph G has strong chromatic number b.
A series of papers [5,8,14,21] have provided bounds on the strong chromatic number of graphs, typically in terms of their maximum degree ∆.In [22], it is shown that when b ≥ (11/4)∆ + Ω(1), such a coloring exists; this is the best bound currently known.Furthermore, the constant 11/4 cannot be improved to any number strictly less than 2. The methods used in most of these papers are highly non-constructive, and do not provide algorithms for generating such colorings.
In this section, we examine two routes to constructing strong colorings.The first proof, based on [2], builds up the coloring vertex by vertex, using the ordinary LLL.The second proof uses the permutation LLL to build the strong coloring directly.The latter appears to be the first RNC algorithm with a reasonable bound on b.
We first develop a related concept to the strong coloring known as an independent transversal.In an independent transversal, we choose a single vertex from each block, so that the selected vertices form an independent set of the graph.Proposition 8.4.Suppose b ≥ 4∆.Then G has an independent transversal, which can be found in expected time O(n∆).
Furthermore, let v ∈ G be any fixed vertex.Then G has an independent transversal which includes v, which can be found in expected time O(n∆ 2 ).
Proof.Use the ordinary LLL to select a single vertex uniformly from each block.See [9], [18] for more details.This shows that, under the condition b ≥ 4∆, an independent transversal exists and is found in expected time O(n∆).
To find an independent transversal including v, we imagine assigning a weight 1 to vertex v and weight zero to all other vertices.As described in [18], the expected weight of the independent transversal returned by the Moser-Tardos algorithm, is at least Ω(w(V )/∆), where w(V ) is the total weight of all vertices.This implies that that vertex v is selected with probability Ω(1/∆).Hence, after running the Moser-Tardos algorithm for O(∆) separate independent executions, one finds an independent transversal including v. Proof.(This proof is almost identical to the proof of Theorem 5.3 of [2]).We maintain a partial coloring of the graph G, in which some vertices are colored with {1, . . ., b} and some vertices are uncolored.Initially all vertices are uncolored.We require that in a block, no vertices have the same color, and no adjacent vertices have the same color.Now, suppose some color is partially missing from the strong coloring; say without loss of generality there is a vertex w missing color 1 in block 1.In each block i = 1, . . ., r, we will select some vertex v i to have color 1.If the block does not have such a vertex already, we will simply assign v i to have color 1.If the block i already had some vertex u i with color 1, we will swap the colors of v i and u i (if v i was previously uncolored, then u i will become uncolored).
We need to ensure three things.First, the vertices v 1 , . . ., v r must form an independent transversal of G. Second, if we select vertex v i and swap its color with u i , this cannot cause u i to have any conflicts with its neighbors.Third, we will select v 1 = w.
A vertex u i will have conflicts with its neighbors if v i currently has the same color as one of the neighbors of u i .In each block, there are at least b − ∆ possible choices of v i that avoid that; we must select an independent transversal among these vertices, which also includes the designated vertex w.By Proposition 8.4, this can be done in time O(n∆ 2 ) as long as b − ∆ ≥ 4∆.Whenever we select the independent transversal v 1 , . . ., v r , the total number of colored vertices increases by at least one.For, the vertex w becomes colored while it was not initially, and in every other block the number of colored vertices does not decrease.So, after n iterations, the entire graph has a strong coloring; the total time is O(n 2 ∆ 2 ).
The algorithm based on the ordinary LLL is slow and is inherently sequential.The permutation LLL gives a more direct and faster construction; however, the hypothesis of the theorem will need to be slightly stronger.For each edge f = (u, v) ∈ G and any color c ∈ {1, . . .b}, we have a bad event that both u and v have color c. (Note that we cannot specify simply that u and v have the same color, because of the restricted class of bad events we consider.) Each bad event has probability 1/b 2 .We apply Pegden's criterion using the weight function µ(B) = α for every bad event B, where α is a scalar to be determined.Each such event (u, v, c) is dependent with four other types of bad events: 1.An event u, v , c where v is connected to vertex u; 2. An event u , v, c where u is connected to vertex v; 3. An event u , v , c where u is in the block of u and v is connected to u ; 4. An event u , v , c where v is in the block of v and u is connected to v .
There are b∆ neighbors of each type.For any of these four types, all the neighbors are themselves connected to each other.Hence an independent set of neighbors can contain one or zero of each of the four types of bad events.Using Lemma 3.1 and Pegden's criterion [32], a sufficient condition for the convergence of the Swapping Algorithm is that When b ≥ (256/27)∆, this has a real positive root α * (which is a complicated algebraic expression).Furthermore, in this case the expected number of swaps of each permutation is at most b 2 ∆α * ≤ (256/81)∆.So the Swapping Algorithm terminates in expected time O(n∆).A similar argument applies to the parallel Swapping Algorithm.

Hypergraph packing
In [28], the following packing problem was considered.Suppose we are given two r-uniform hypergraphs H 1 , H 2 and an integer n.Is it possible to find two injections φ ) A sufficient condition on H 1 , H 2 , n was given using the LLLL.We achieve this algorithmically as well.
Theorem 8.7.Suppose that H 1 and H 2 have m 1 and m 2 edges, respectively.Suppose that each edge of H i intersects with at most d i other edges of H i , and suppose that Then the Swapping Algorithm finds injections φ i : V (H i ) → [n] such that φ 1 (H 1 ) is edge-disjoint to φ 2 (H 2 ).Suppose further that r ≤ polylog(n) and Then the Parallel Swapping Algorithm finds such injections with high probability in polylog(n)/ε time and using poly(m 1 , m 2 , n) processors.
Proof.[28] proves this fact using the LLLL, and the proof immediately applies to the Swapping Algorithm as well.We review the proof briefly: we may assume without loss of generality that the vertex set of H 1 is [n] and the vertex set of H 2 has cardinality n and that φ 1 is the identity permutation; then we only need to select the bijection φ 2 : H 2 → [n].For each pair of edges e 1 = {u 1 , . . ., u r } ∈ H 1 , e 2 = {v 1 , . . ., v r } ∈ H 2 , and each ordering σ ∈ S r , there is a separate bad event φ 2 (v 1 ) = u σ 1 ∧ • • • ∧ φ 2 (v r ) = u σ r .Now observe that the LLL criterion is satisfied for these bad events, under the stated hypothesis.
The proof for the Parallel Swapping Algorithm is almost immediate.There is one slight complication: the total number of bad events is m 1 m 2 r!, which could be superpolynomial.However, it is easy to see that the total number of bad events which are true at any one time is at most m 1 m 2 , since each pair of edges e 1 , e 2 can have at most one σ with φ 2 (v 1 ) = u σ 1 ∧ • • • ∧ φ 2 (v r ) = u σ r .It is not hard to see that Theorem 7.6 still holds under this condition.

Conclusion
The original formulation of the LLLL [13] applies in a natural way to general probability spaces.There has been great progress over the last few years in developing constructive algorithms, which find in polynomial time the combinatorial structures in these probability spaces whose existence is guaranteed by the LLL.These algorithms have been developed in great generality, encompassing the Swapping Algorithm as a special case.
However, the Moser-Tardos algorithm has uses beyond simply finding a object which avoids the bad events.In many ways, the Moser-Tardos algorithm is more powerful than the LLL.We have already seen problems that feature its extensions: e. g., Theorem 8.2 requires the use of the Partial Resampling variant of the Moser-Tardos algorithm, and Proposition 8.4 requires the use of the Moser-Tardos distribution (albeit in the context of the original Moser-Tardos algorithm, not the Swapping Algorithm).
While the algorithmic frameworks of Achlioptas & Iliopoulos and Harvey & Vondrák achieve the main goal of a generalized constructive LLL algorithm, they do not match the full power of the Moser-Tardos algorithm.However, our analysis shows that the Swapping Algorithm matches nearly all of the additional features of the Moser-Tardos algorithm.In our view, one main goal of our paper is to serve as a roadmap to the construction of a true generalized LLL algorithm.Behind all the difficult technical analysis, there is the underlying theme: even complicated probability spaces such as permutations can be reduced to "variables" (the domain and range elements of the range) which interact in a somewhat "independent" fashion.
Encouragingly, there has been progress toward this goal.For example, one main motivation of [1,20] was to generalize the Swapping Algorithm.Then, Kolmogorov noted that our Swapping Algorithm had the nice property that the choice of which bad event to resample can be made arbitrarily, a property missing from analysis of [1]; this led to Kolmogorov's work [26] where he partially generalized that property (to which he refers as commutativity).
At the current time, we do not even know how to define a truly generalized LLL algorithm, let alone analyze it.But we hope that we have at least provided an example approach toward such an algorithm.and P(Swap(π; x 1 , . . ., x r ) = σ ) = P(Swap(τπ; x 1 , . . ., x r = τσ ) .
(Less formally, the swapping subroutine is invariant under permutations of the domain or range.) Proof.We prove this by induction on r.The following equivalence will be useful.We can view a single call to Swap as follows: we select a random x 1 and swap x 1 with x 1 ; let π = π • (x 1 x 1 ) denote the permutation after this swap.Now consider the permutation on n − 1 letters obtained by removing x 1 from the range and π (x 1 ) from the range of π ; we use the notation π − (x 1 , * ) to denote this restriction of range/domain.We then recursively call Swap(π − (x 1 , * ), x 2 , . . ., x r ).Now, in order to have Swap(πτ; τ −1 x 1 , . . ., τ −1 x r ) = σ τ we must first swap τ −1 x 1 with x 1 = τ −1 π −1 σ τx 1 ; this occurs with probability 1/n.Then we would have P(Swap(πτ; τ −1 x 1 , . . ., τ −1 x r ) = σ τ) A similar argument applies for permutation of the range (i.e., post-composition by τ).
In our analysis and algorithm, we will seek to maintain the symmetry between the "domain" and "range" of the permutation.The swapping subroutine seems to break this symmetry, inasmuch as the swaps are all based on the domain of the permutation.However, this symmetry breaking is only superficial as shown in Proposition A.3.

Proposition 4 .
5 can be viewed equally as a definition: THEORY OF COMPUTING, Volume 13(17), 2017, pp.1-41 Definition 4.6 (Active conditions of a witness subdag).We refer to the conditions implied by Proposition 4.5 as the active conditions of the witness subdag G.More formally, we define Active(G) = {(x, y) | (x, y) are the end-points of a W -configuration of G} .

Proposition 7 . 2 .
Suppose |B| ≤ poly(n) and all elements B ∈ B have size |B| ≤ M. Then w. h.p. we have V i, j = / 0 for some j = O(M log 2 n).

Theorem 8 . 5 .
If b ≥ 5∆, then G has a strong coloring, which can be found in expected time O(n 2 ∆ 2 ).

Theorem 8 . 6 .
Suppose G is a graph of maximum degree ∆, whose vertices are partitioned into blocks of size b.Then if b ≥ (256/27)∆, it is possible to strongly color graph G in expected time O(n∆).If b ≥ (256/27 + ε)∆ for some constant ε > 0, there is an RNC algorithm to construct such a strong coloring.Proof.For each block, we assume the vertices and colors are identified with the set [b].Then any proper coloring of a block corresponds to a permutation of S b .When we discuss the color of a vertex v, we refer to π k (v) where k is the block containing vertex v.