Distributed Corruption Detection in Networks

We consider the problem of distributed corruption detection in networks. In this model, each vertex of a directed graph is either truthful or corrupt. Each vertex reports the type (truthful or corrupt) of each of its outneighbors. If it is truthful, it reports the truth, whereas if it is corrupt, it reports adversarially. This model, first considered by Preparata, Metze, and Chien in 1967, motivated by the desire to identify the faulty components of a digital system by having the other components checking them, became known as the PMC model. The main known results for this model characterize networks in which \emph{all} corrupt (that is, faulty) vertices can be identified, when there is a known upper bound on their number. We are interested in networks in which the identity of a \emph{large fraction} of the vertices can be identified. It is known that in the PMC model, in order to identify all corrupt vertices when their number is $t$, all indegrees have to be at least $t$. In contrast, we show that in $d$ regular-graphs with strong expansion properties, a $1-O(1/d)$ fraction of the corrupt vertices, and a $1-O(1/d)$ fraction of the truthful vertices can be identified, whenever there is a majority of truthful vertices. We also observe that if the graph is very far from being a good expander, namely, if the deletion of a small set of vertices splits the graph into small components, then no corruption detection is possible even if most of the vertices are truthful. Finally, we discuss the algorithmic aspects and the computational hardness of the problem.


Introduction
have studied many aspect of corruption networks, see, e.g., [22,25,10]. However, to the best of our knowledge, prior to this work, the effect of the network structure on corruption detection has not been systematically studied.
As noted in follow up work [14], the theoretical models and results discussed here have natural limitations for social and economic networks: " Despite our theoretical results, corruption is prevalent in many real-world networks, and yet in many scenarios it is not easy to pinpoint even a single truthful node. One reason for that is that some of the assumptions do not seem to hold in some real world networks. For example, we assume that audits from the truthful nodes are not only non-malicious, but also perfectly reliable. In practice this assumption is unlikely to be true: many truthful nodes could be non-malicious but simply unable to audit their neighbors accurately. Further assumptions that may not hold in some scenarios include the notion of a central agency that is both uncorrupted and has access to reports from every agency, and possibly even the assumption that the number of corrupt nodes is less than |V |/2". In addition there are many constraints on the network G that may prohibit it to be an expander. Despite these shortcomings of our model, our work points to ideal conditions that allow corruption detection. While it is perhaps unreasonable to assume that one imposes an expander auditing structure among government agencies, imposing such a structure among more equal entities such as banks, hospitals or universities may be more realistic. Furthermore, our results for directed graphs, suggest such structures even in cases where the auditing relation is not symmetric.

Formal Definitions and Main Results
Definition 1.1 (Digraph) For a set V , let V (2) denote the set of ordered pairs of distinct elements. A digraph G = (V, E) consists of a set V of nodes ("agents") and a set E ⊂ V (2) of directed edges. Definition 1.2 (Type, Report) Consider a (digraph) G = (V, E) along with a function τ : V → {c, t} that assigns each node a type. We call τ a truthfulness assignment to G. We call a node v truthful if τ (v) = t and corrupt if τ (v) = c. We write T = τ −1 (t) for the set of truthful nodes and B = τ −1 (c) for the set of corrupt nodes, so that V = T ⊔ B is a partition of the set of nodes. A report is a function ρ : E → {c, t}. A report ρ : E → {c, t} is compatible with the truthfulness assignment τ if for each truthful node u ∈ T and each directed edge (u, v) ∈ E we have ρ(u, v) = τ (v). We call ρ(u, v) the type of v reported by u. We call τ a valid truthfulness assignment if |T | > |B|. We say that a report ρ is feasible if it is compatible with at least one valid truthfulness assignment.
Since it is common to use the letter C for constants, we prefer to denote the set of corrupt nodes by B (the set of " bad" nodes).
The question we address is under what conditions on the digraph G and the number of truthful vertices it is possible to identify the truthfulness status of most nodes with certainty. It is easy to see that this is impossible if |T | ≤ |B|. Indeed, if V = V 1 ∪ V 2 ∪ W is a partition of V into 3 pairwise disjoint sets where |V 1 | = |V 2 | (and W may be empty), then the corrupt agents can ensure that all the reports in the two scenarios T = V 1 , B = V 2 ∪ W and T = V 2 , B = V 1 ∪ W will be identical. As there is no common truthful agent in these two possibilities, no algorithm can locate a truthful agent with no error.
Our main result is that if the graph is a good bounded-degree directed expander, in the sense described below, and we have a majority of truthful agents, then it is possible to identify the truthfulness status of most of the nodes. We recall the following definition of regular spectral expander graphs. Definition 1.3 Call a graph an (n, d, λ)-graph if it is d-regular, has n vertices and all its eigenvalues besides the top one are in absolute value at most λ.
We use the fact that classical construction of expanders like the Ramanujan graphs of [19], [20], are (n, d, λ) graphs with λ = 2 √ d − 1. For the known connection between the expansion and pseudo-random properties of graphs and their eigenvalues, see, e.g., [3], [13] and the references therein.
First we consider the somewhat simpler case of undirected graphs (the adjacency relation is symmetric).
The main result for the undirected case is the following.
Then, there is an algorithm that given a feasible report ρ, returns subsets T ′ , B ′ such that for every truth assignment τ that is compatible with ρ, with T = τ −1 (t) and B = τ −1 (c), and |T | > |B|, it holds that Moreover, if there exists a truth assignment compatible with ρ with |T | > (1 + 12 d ) n 2 , then there is a linear-time algorithm that identifies subsets T ′ and B ′ satisfying (1).
We remark that 1. Given any graph G with all degrees bounded by d, with at least 2d+1 corrupt nodes, then there is a set of ⌊|B|/(2d+ 1)⌋ corrupt nodes and a set of ⌊|B|/(2d+ 1)⌋ truthful nodes who cannot be identified, even if in addition to the report ρ, we are given the number of corrupt nodes. Thus, the fractions of nodes we identify is tight up to a constant factor. To see that this is the case, consider the following selection of b corrupt nodes: choose an arbitrary vertex v 1 in the graph and set all its neighbors N (v 1 ) to be corrupt, then choose v 2 / ∈ {v 1 } ∪ N (v 1 ) and set all of its neighbors to be corrupt, then choose one of v 1 , v 2 to be corrupt and the other to be truthful. Continue in a similar fashion, defining N (v 3 ), N (v 4 ) to be corrupt and choosing one of v 3 , v 4 to be corrupt and the other truthful. Continue until the number of corrupt nodes left, b ′ satisfies b ′ < 2d + 1. Set b ′ of the remaining nodes to be corrupt. Let r(v, u) = c, whenever v is corrupt. It is then clear that it is impossible to decide which of the v i is corrupt and which is truthful.
2. In the case where |B|/d = o(n), the theorem allows to recover 1 − o(1) fraction of the good nodes.
3. The algorithm in the proof of the theorem is an exponential time algorithm if we only assume that |T | > |B| (or if we assume that |T | > (1/2 + µ)n for a very small fixed µ = µ(λ, d)).
The fact that the detection algorithm is not efficient when we only assume that T is just a little bit bigger than B is not a coincidence. Indeed, the algorithm described in the proof of the theorem, presented in the next sections, provides a set T of more than n/2 truthful agents, which is consistent with the reports obtained, when such a set exists. We show that the problem of producing such a set when it exists is N P -hard, even when restricted to bounded-degree expanders (and even if we ensure that there is such a set of size at least n/2 + ηn.) Theorem 1.5 There exist c > 0, η > 0 such that the following promise problem is N Phard. The input is a graph G = (V, E) with |V | = n, which is an (n, d, c √ d)-graph along with a report ρ. The promise is that either • ρ is compatible with τ satisfying |T | = |τ −1 (t)| ≥ n/2 + ηn, or • Any τ which is compatible with ρ satisfies |T | = τ −1 (t) ≤ n/2 − ηn.
The objective is to distinguish between the two options above.
The proof is presented in subsection 2.2.
We also establish in subsection 2.3 the following simple statement, which shows that at least some (weak) form of expansion is needed for solving the corruption detection problem.
Proposition 1.6 Let G = (V, E) be a graph on n vertices so that it is possible to remove at most ǫn vertices of G and get a graph in which each connected component is of size at most ǫn. Then given a report ρ compatible with an assignment τ , with T = τ −1 (t) and B = τ −1 (c), and |T | ≥ (1 − 2ǫ)n it is impossible to identify even a single member t ∈ T from the reports of all vertices. In particular, this is the case for planar graphs or graphs with a fixed excluded minor even if ǫ = Θ(n −1/3 ).
Note that there is still a significant gap between the expansion properties that suffice for solving the detection problem, described in Theorem 1.4, and the conditions in the last proposition that are necessary for such a solution. It will be interesting to obtain tighter relations between expansion and corruption detection. This is further discussed in section 4.

Results for directed graphs
In this subsection we consider directed graphs (digraphs). This is motivated by the fact that in various auditing situations it is unnatural to allow u to inspect v whenever v inspects u. In fact, it may even be desirable not to allow any short cycles in the directed inspection graph.
For a set of vertices A, in a graph G, define: For a digraph G = (V, E), let Note that N (U ), N + (U ) and N − (U ) may contain elements from U . We first present an explicit construction of regular directed expanders which expand at three different scales as in the undirected case. • If A, B ⊂ [n] with |A| ≥ c 2 n/d, and |B| ≥ n/4, then there is a directed edge from A to B and a directed edge from B to A.
The proof, presented in Section 3, is based on "packing" three different group based expanders to obtain the expansion in the different scales. Given the directed expanders constructed in Proposition 1.7, we prove the following directed analogue of Theorem 1.4. There is an algorithm that given a feasible report ρ, returns subsets T ′ , B ′ such that for every truth assignment τ that is compatible with ρ, with T = τ −1 (t) and B = τ −1 (c), and |T | > |B|, it holds that Moreover, if there exists a truth assignment compatible with ρ with |T | > (1 + c 4 d ) n 2 , then there is a linear-time algorithm that identifies subsets T ′ ⊂ T and a subset B ′ ⊂ B satisfying (3). If we only assume that |T | > |B| then the detection algorithm is exponential.

Relation to Distributed Computing
The model presented in the paper requires a central agency that obtains the report ρ from all nodes. This is in contrast to models in distributed computing and byzantine agreement where all nodes only communicate with neighboring nodes. We note however that the following holds. Proposition 1.9 In the setting of of Theorem 1.4 when assuming each node has a unique identity that is known to all of its neighbors, there is a distributed algorithm on the graph G, where all nodes in T ′ output the sets T ′ and B ′ in (4).

Related Work
The vast literature on corruption detection in computer science, and in particular on the diagnosable system problem and the PMC model introduced in [23], deals either with the problem of identifying all corrupt nodes, or with that of identifying a single corrupt node. As observed in [23], a necessary condition for the identification of all corrupt nodes in a network with t corrupt nodes is that the minimal indegree in the network is at least t. Therefore, if the number of corrupt nodes is linear in the total number of vertices, all indegrees have to be linear, and the total number of edges has to be quadratic.
The main contribution of the present paper is a proof that the number of required edges may be much smaller when relaxing the requirement of identifying all corrupt nodes and replacing it by the requirement of the identification of the truth status of most of the nodes. By relaxing the requirement as above we are able to study bounded-degree graphs, in particular d-regular graphs. Our main new result is that a linear number of edges ensures the recovery of the the truth status of a large fraction of the nodes, provided the graph is a sufficiently strong expander. It was shown already in [23] that a linear number of edges suffices to ensure the detection of a single corrupt vertex. We show that such a small number of edges suffices to determine the types of a 1 − O(1/d) fraction of the vertices, even when the number of truthful vertices exceeds that of corrupt ones by only 1.
In the context of Byzantine agreement it was discovered already in the 80s that expanders allow "almost everywhere agreement". This was first established by Dwork et. al. in [9] and further developed in subsequent work, see, in particular [11] and its references. In [9] it is shown that one can achieve a relaxed notion of Byzantine agreement, where a large fraction of non-faulty nodes agree on a value for random regular graphs. Like in our results, in the bounded degree case, this fraction is bounded away from 1 unless the number of corrupt nodes is sublinear. Moreover, the results of [9] require that the number of faulty nodes is a small fraction (much smaller than 1/2) of the total number of nodes.
It is therefore not very surprising that graph expansion is relevant to corruption detection. It is, however, interesting to note that in sufficiently strong expanders it is possible to identify most of the truthful and most of the corrupt agents even if the number of truthful agents exceeds the number of corrupt ones by only 1.

Techniques
The proof of Theorem 1.4 rely crucially on the fact that (n, d, O( √ d))-graphs expand well at different scales.
• Every set A of size Ω(n/d) and every set B of size n/4 have at least one edge between them, For the directed case, we prove the existence of directed expanders of high girth with analogous properties in Proposition 1.7.
We observe that a certain weak expansion property, namely, the nonexistence of small separators, is necessary for corruption detection. Combining this observation with the planar separator theorem of Lipton and Tarjan and its extensions we conclude that planar graphs and graphs with a fixed excluded minor are not good for corruption detection.
Finally we discuss the algorithmic aspects of our problem using results about hardness of approximation.

Proofs
In this section we present the proofs of all results besides the ones for directed graphs. These appear in the next section.

Undirected graphs
We first state the following generalization of Theorem 1.4. Theorem 1.4 follows when considering graphs with λ ≤ 2 .
Then, given a report ρ which is compatible with a truth assignment τ with T = τ −1 (t) and B = τ −1 (c), and |T | > |B|, we can identify a subset T ′ ⊂ T and a subset B ′ ⊂ B so that Moreover, if |T | > (1/2 + 3λ 2 /d 2 )n then there is a linear-time algorithm that identifies a subset T ′ ⊂ T and a subset B ′ ⊂ B satisfying (4).
For a positive δ < 1/8 call a graph G = (V, E) on a set of n vertices a δ-good expander if any set U of at most 2δn vertices has more than |U | neighbors outside U , and there is an edge between any pair of sets of vertices provided one of them is of size at least δn and the other is of size at least n/4. We recall how standard results about expanders imply that (n, d, O( √ d))-graphs are δ good for δ = Θ(1/d).
Proof: Let U be of size at least δn and suppose that that there are no edges between U and set B of size n/4. Then if u = |U |/n, by Lemma 2.2 it follows that which is a contradiction. Similarly, let U be a set of size un ≤ 2δn and suppose that |N (U ) \ U | < un. Then which is a contradiction. ✷ The other property of expanders we will need is that small sets expand by a factor of Ω(d). In particular: Proof: The proof is similar. Let |A| = an. Let B = V \ (A ∪ N (A)) and put |B| = (1 − ca)n ≥ n/4. Since there are no edges between A and B, it follows that and so Let τ be a truth assignment consistent with ρ and let T = τ −1 (t) and . . , V s be the sets of vertices of the connected components of H. Proof: Assume this is false and the largest connected component of H ′ is on a set of vertices U 1 of size smaller than |T | − δn. Since the total number of vertices of H ′ is |T | > n/2, it is easy to check that one can split the connected components of H ′ into two disjoint sets, each of total size at least δn. However, the bigger among the two is of size bigger than n/4, and hence, since G is a δ-good expander, there is an edge of G between the two groups. This is impossible, as it means that there is an edge of G between two distinct connected components of H ′ . ✷ The analysis so far allow us to prove the easy part of the theorem.
, establishing the required inequality (with room to spare).
Thus, if this is the case, the set T ′ is found by the simple, linear-time algorithm that computes the connected components of H. Furthermore, the set B ′ can also be found by computing the vertex boundary of H which is easily computed in linear time as well. ✷ It remains to show that even if we only assume that |T | > n/2 then we can still identify correctly most of the truthful and corrupt vertices. We proceed with the proof of this stronger statement.
By Claim 2.6 and Claim 2.7 it follows that H contains at least one connected component of size at least (1/2 − δ)n ≥ 3/8n. If H contains only one such component, then this component must consist of truthful agents, and we can identify all of them. Having these truthful vertices, we also know the types of all their neighbors. By the assumption on G this gives the types of all vertices but less than δn, thus establishing (4).
Otherwise, there is another connected component of size at least (1/2 − δ)n, and as there is no room for more than two such components, there are exactly two of them, say V 1 and V 2 . Note that by the expansion properties of G there are edges of G between V 1 and V 2 , and hence it is impossible that both of them are truthful components. As one of them must be truthful, it follows that exactly one of V 1 and V 2 is a truthful component and the other corrupt. We next show that we can identify the types of both components.
Construct an auxiliary weighted graph S on the set of vertices 1, 2, . . . , s representing the connected components V 1 , V 2 . . . , V s as follows. The weight w i of i is defined by w i = |V i | |V | . Two vertices i and j are connected iff there is at least one edge of G that connects a vertex in V i with one in V j . Call an independent set in the graph S large if its total weight is bigger than 1/2. Note that by the discussion above T must be a union of the form T = i∈I V i , where I is a large independent set in the graph S. In order to complete the argument we prove the following.

Claim 2.9
Either there is no large independent set in S containing 1, or there is no large independent set in S containing 2.
Proof: Assume this is false, and let I 1 be a large independent set in S containing 1, and I 2 a large independent set in S containing 2. To get a contradiction we show that for w(I 1 ) = i∈I 1 w i and w(I 2 ) = i∈I 2 w i we have w(I 1 ) + w(I 2 ) ≤ 1 (and hence it is impossible that each of them has total weight bigger than a half).
To prove the above note, first, that the two vertices 1 and 2 of S are connected (as each corresponds to a set of more than (1/2 − δ)n vertices of G, hence there are edges of G connecting V 1 and V 2 ). Therefore I 1 must contain 1 but not 2, and I 2 contains 2 but not 1.
If there are any vertices i of S connected in S both to 1 and to 2, then these vertices belong to neither I 1 nor I 2 , as these are independent sets. Similarly, if a vertex i is connected to 1 but not to 2, then it can belong to I 2 but not to I 1 , and the symmetric statement holds for vertices connected to 2 but not to 1. So far we have discussed only vertices that can belong to at most one of the two independent sets I 1 and I 2 . If this is the case for all the vertices of S, then each of them contributes its weight only to one of the two sets and their total weight would thus be at most 1, implying that it cannot be that the weight of each of them is bigger than 1/2, and completing the proof of the claim. It thus remains to deal with the vertices of S that belong to both I 1 and I 2 . Let J ⊂ {3, 4, . . . , s} be the set of all these vertices. Note, first, that the total weight of the vertices in J is at most 2δ, as the total weight of 1 and 2 is at least 2(1/2 − δ) = 1 − 2δ. Note also that by the discussion above each j ∈ J is not a neighbor of 1 or of 2. By the assumption about the expander G the total weight of the vertices that are neighbors of vertices in J and do not belong to J is bigger than the total weight of the vertices in J. Indeed, this is the case as the number of neighbors in G of the set ∪ j∈J V j that do not lie in this set is bigger than the size of the set. We thus conclude that if J ′ = N S (J) − J denotes the set of neighbors of J that do not belong to J, then the total weight of the vertices in J ′ exceeds the total weight of the vertices in J, and the vertices in J ′ belong to neither I 1 nor I 2 . We have thus proved that the sum of weights of the two independent sets I 1 and I 2 satisfies contradicting the fact that both I 1 and I 2 are large. This completes the proof of the claim.

✷
By ??c24 we conclude that one can identify the types of the components V 1 and V 2 . This means that we can identify at least (1/2 − δ)n truthful vertices with no error. Recall that this is the case also when H has only one connected component of size at least (1/2 − δ)n. Having these truthful vertices, we also know the types of all their neighbors. By the assumption on G this gives the types of all vertices but less than δn, completing the proof of (4).
As noted above the algorithm described in the proof above is a linear time algorithm provided |T | > (1/2 + δ)n. However, if we only assume that |T | > |B| the proof provides only a non-efficient algorithm for deciding the types of the components V 1 and V 2 . Indeed, we have to compute the maximum weight of an independent set containing 1 in the weighted graph S, and the maximum weight of an independent set containing 2. By the proof above, exactly one of this maxima is larger than 1/2, providing the required types.
✷ The proof of Proposition 1.9 follows from the fact that it deals with the case of a large component of truthful nodes who can communicate with each other.
Proof: For simplicity, consider a synchronized protocol. In the protocol, in each round, each node transmits to all of its neighbors, the identity of all nodes it recognizes as truthful and as corrupt. A truthful node will add to its record the identity of all nodes it receives from truthful neighbors. Recall that there is exactly one component T ′ of size ≥ (1/2−δ)n of truthful vertices, that T ′ is the set of vertices in this component and B ′ is N (T ′ ). It is clear that all element in T ′ who will follow the protocol, will recognize all elements of T ′ as truthful and all elements of B ′ as corrupt. ✷

Hardness
In this subsection we prove Theorem 1.5 which explains the non-efficiency of the algorithm in the proof of Theorem 1.4.
Proof of Theorem 1.5: The proof is based on the following result [7]: there exist constants b < a < 1/2 such that deciding if a graph H on m vertices, all of whose degrees are bounded by 4, has a maximum independent set of size at least (a + b)m or at most (a − b)m is N P -hard. It is easy to see that in fact one can assume that H is 4-regular. Let G ′ be an (n, d − 4, c 1 √ d)-graph with vertex set V , where c √ d ≥ 8. Split the vertices into 3 disjoint sets V 1 , V 2 , V 3 of size Ω(n), where V 3 is an independent set in G ′ of size m, all its neighbors are in V 2 , |V 1 | = n/2 − am and |V 2 | = n/2 − m + am, where m = Ω(n). Write η for the constant satisfying bm = ηn. Add on V 3 a bounded-degree graph H as above, in which it is hard to decide if the maximum independent set is of size at least (a + b)m or at most (a − b)m. That is, identify the set of vertices of H with V 3 and add edges between the vertices of V 3 as in H. Add a 4-regular graph on V 1 and another one on V 2 . Call the resulting] graph G and note that it is an (n, d, The reports of the vertices are as follows. Each vertex in V 1 reports true on each neighbor it has in V 1 , and corrupt on any other neighbor. Similarly, each vertex of V 2 reports true on any neighbor it has in V 2 and corrupt on any other neighbor, and each vertex in V 3 reports corrupt on all its neighbors. Note that with these reports the connected components of the graph H in the proof of Theorem 1.4 are V 1 , V 2 and every singleton in It is easy to check that here if H has an independent set I of size at least (a+b)m, then G has a set T of truthful vertices of size at least n/2 + bm, namely, the set I ∪ V 1 , which is consistent with all reports. If H has no independent set of size bigger than (a − b)m, then G does not admit any set T of truthful vertices of size bigger than n/2 − bm consistent with all reports. This completes the proof. ✷

Graphs with small separators
In this subsection we describe the simple proof of Proposition 1.6.
Proof of Proposition 1.6: Let B ′ be a set of at most ǫn vertices of G whose removal splits G into connected components with vertex classes V 1 , V 2 , . . . , V s , each of size at most ǫn. Consider the following s possible scenarios R i , for 1 ≤ i ≤ s. It is not difficult to check that in all these s scenarios, all vertices make exactly the same reports. On the other hand, there is no vertex of G that is truthful in all these scenarios, hence it is impossible to identify a truthful vertex with no error. Since the number of corrupt vertices in all scenarios is at most 2ǫn, the first assertion of the theorem follows. The claim regarding planar graphs and graphs with excluded minors follows from the results in [18], [5]. ✷ 3 Directed Graphs

Construction of Directed Expanders
Here we provide the proofs for the case of directed graphs. We start with the proof of existence of explicit directed expanders that expand in three scales. We will use the following lemma for undirected graphs from [4].
• for all v ∈ A, it holds that |{w ∈ X : (w, v) ∈ E}| ≥ γ(d + 1). Then: We will also use the following variant of Lemma 2.3 whose proof is similar.
Lemma 3.2 Any (n, d, λ)-graph in which 16 λ 2 d 2 ≤ δ satisfies that any set of size at least δn/2 and any set of size n/8 have at least one edge between them.
Proof: Let U be of size at least δn/2 and suppose that that there are no edges between U and set B of size n/8. Then if u = |U |/n, by Corollary 2.2 it follows that which is a contradiction.
✷ Proof of Proposition 1.7: Let G ′ = ([n], E ′ ) be a d-regular undirected non-bipartite Ramanujan Cayley graph as constructed in [19] or [20]. This is a Cayley graph of a finite group Γ of size n, with respect to a set S ′ of d generators S ′ = {a 1 , a −1 1 , a 2 , a −1 2 , . . . , a d/2 , a −1 d/2 }. This graph is known to have girth at least 2 log d (n)/3, which is the same as saying there is no nontrivial word of the generators of length less than 2 log d (n)/3 that is the identity.
Let T = {a 4 , a −1 4 , . . . , a d/2 , a −1 d/2 }. Note that the Cayley graph of Γ with respect to T is a d − 6 regular graph whose second eigenvalue is at most 2 √ d + 6. Thus if d ≥ 36 this is an (n, d − 6, 3 √ d)-graph. Now for 1 ≤ i ≤ 3, let T i = a −1 i T a i . Then clearly, G i , the Cayley graph of Γ with respect to T i is also an (n, d − 6, 3 √ d)-graph. Moreover, if S = T 1 ∪ T 2 ∪ T 3 , then the Cayley graph H of Γ with respect to S has girth at least 2 9 log n log d , since a nontrivial word in S of length k corresponds to a non-trivial word of length at most 3k in S ′ . Note also that G ′ is 3(d − 6)-regular.
The desired graph, G = ([n], E), will be obtained by assigning directions to the edges E of H as follows: • If {a, b} is an edge of G 1 then orient it from a to b if a > b and from b to a if b > a.
• If {a, b} is an edge of G 2 then orient it from b to a if a > b and from a to b if b > a.
• In the graph G 3 all the degrees are even. Pick an orientation of the edges of G 3 by picking a directed Eulerian cycle and orienting the edges according to it. In particular, note that every in-degree and out-degree is exactly d/2 − 3 in G 3 .
We now verify the expansion at the three different scales. First, we apply Lemma 3.1 to the graph G 3 and sets A of size |A| ≤ n/(9d), and γ = 1/3 and obtain that Proof: If u ∈ T and v is an out neighbor of u in H, then v ∈ T (as u reports so). If v ∈ B, and u is an in-neighbor of v in H, then u ∈ B (as u reports that u ∈ T ).
✷ Call an SCC of H truthful if it is a subset of T , else it is a subset of B and we call it corrupt.
Let H ′ be the induced subgraph of G on the set T of all truthful vertices. Proof: Consider the component graph of H ′ : this is the directed graph F whose vertices are all the SCCs of H ′ , where there is a directed edge from C to C ′ iff there is some edge of H ′ from some vertex of C to some vertex of C ′ . It is easy and well known that this graph is a directed acyclic graph, and hence there is a topological order of it, that is, a numbering C 1 , C 2 , . . . , C r of the components so that all edges between different components are of the form (C i , C j ) with i < j. Order the vertices of H ′ in a linear order according to this topological order, where the vertices of C 1 come first (in an arbitrary order), those of C 2 afterwards, etc. Let u i be the vertex in place i according to this order (1 ≤ i ≤ |T |). If the vertices u δn and u |T |−δn+1 belong to the same SCC, then this component is of size at least |T | − 2δn and we are done. Otherwise, the SCC containing u |T |/2 differs from either that containing u δn or from that containing u |T |−δn+1 . In the first case, the set A of all SCCs up to that containing u δn is of size at least δn, and the set B of all SCCs starting from that containing u |T |/2 is of size at least |T |/2 ≥ n/4. In addition there is no edge directed from B to A, contradicting the property of G. The second case leads to a symmetric contradiction, establishing the claim. ✷ Note that the above shows that if |T | > (1/2 + 2δ)n then H ′ and hence also H must contain a SCC C of size bigger than n/2, which must be truthful. Let T ′ denote the set of vertices reachable from C in H and note that T ′ ⊂ T and moreover By the expansion properties of G it follows that |R| ≤ c 2 n/d and and that |R| ≤ We next show that even if we only assume that |T | > n/2 we can still identify correctly most of the truthful vertices.
By the last two claims it follows that H contains at least one SCC of size at least (1/2 − 2δ)n ≥ 3/8n. If H contains only one such component, then this component must consist of truthful agents, and we can identify all of them (and hence also the types of all their out-neighbors). Otherwise, there is another SCC of size at least (1/2 − δ)n, and as there is no room for more than two such components, there are exactly two of them, say V 1 and V 2 . Note that by the properties of G there are edges of G from V 1 to V 2 and from V 2 to V 1 , and hence it is impossible that both of them are truthful components. As one of them must be truthful, it follows that exactly one of them is truthful and one is corrupt. We next show that we can identify the types of both components.
Recall that we have the SCCs of H, and the set T of all truthful vertices must be a union of a subset of these SCCs. In addition, this set must be of size bigger than n/2 and must be consistent with all reports of the vertices along every edge (in the sense that for any edge (u, v) with u ∈ T , the report of u on v should be consistent with the actual type of v.) Claim 3.6 Given the strongly connected components V 1 , V 2 , . . . , V s of H and the reports along each edge, either there is no union I 1 of SCCs including V 1 whose size exceeds n/2 so that T = I 1 , B = V − I 1 is consistent with all reports along the edges, or there is no union I 2 of SCCs including V 2 whose size exceeds n/2 so that T = I 2 , B = V − I 2 is consistent with all reports along the edges.
Proof: Assume this is false, and let I 1 , I 2 be as above. By the above discussion we know that I 1 contains V 1 but not V 2 and I 2 contains V 2 but not V 1 . Note that if some SCC V i is contained both in I 1 and in I 2 and there is any directed edge (u, v) from V i to some other SCC V j , then if the report along this edge is that v is truthful, then V j must be truthful component in both I 1 and in I 2 . Similarly, if the report along this edge is v ∈ B, then V j must be outside I 1 and outside I 2 . In particular, there are no edges at all from V i to V 1 or V 2 (as each of them lies in exactly one of the two unions I 1 , I 2 ). Let J be the set of all SCCs that are contained in both I 1 , I 2 . By the remark above, for every edge (u, v) from a vertex of J to a vertex outside J, the report along the edge must be v ∈ B (since otherwise v would also be in an SCC which is truthful both in I 1 and in I 2 and hence would be in J). Thus all edges (u, v) as above report v ∈ B, implying that all components outside J to which there are directed edges from vertices in J belong to neither I 1 nor I 2 . By the properties of our graph the total size of these components exceeds that of J, (as |J| ≤ 4δn and all out-neighbors of J are outside V 1 , V 2 ), and this shows that the sum of the sizes of I 1 and I 2 is at most Therefore it cannot be that both I 1 and I 2 are of size bigger than |V |/2 = n/2, proving the claim.
✷ By the last claim it follows that one can identify the types of the SCCs V 1 and V 2 . This means that we can identify at least (1/2 − 2δ)n truthful vertices with no error. Recall that this is the case also when H has only one SCC of size at least (1/2 − δ)n. Having these truthful vertices, we also know the types of all their out-neighbors. By the assumption on G this gives the types of all vertices but at most O(|B|/d), completing the proof of the main part of the theorem.
The comment about the linear algorithm provided |T | > (1/2 + 2δ)n is clear. If we only assume that |T | > |B| the proof provides only a non-efficient algorithm for deciding the types of the SCCs V 1 and V 2 . Indeed, we have to check all 2 s possibilities of the types of each of the SCCs and see which ones are consistent with all reports and are of total size bigger than n/2. By the proof above, only one of the two SCCs V 1 , V 2 will appear among the truthful SCCs of such a possibility. ✷

Discussion and Open Problems
The usefulness of expanders for corruption detection raises the natural question about the existence of good explicit spectral expanders with any desired number of nodes. After our work has been posted, explicit construction for such undirected expander graphs with any number of nodes were obtained in [1]. It is interesting to study in more detail the relation between expansion and corruption detection. For a weak result in the desired direction, consider the following argument. We say that an undirected graph G is δ-connected if for every two disjoint sets A 1 , A 2 with |A 1 | ≥ δn, |A 2 | ≥ (1 − 3δ)n there is at least one edge between A 1 and A 2 . Note that the notion of δ-connectedness is much weaker than expansion. In particular a graph G can be δconnected, yet at the same time have δn/2 isolated vertices, while any δ-good expander must be connected. Proof: Let E ′ ⊂ E be the set of edges both of whose end-points declare each other truthful. Recall that each connected components of G ′ = (V, E ′ ) is either truthful or corrupt.
Let T 1 , T 2 , . . . denote all the components of size at least ǫn in G ′ . Then we claim that if T ′ = ∪T i then |T \ T ′ | < ǫn. Assume otherwise. Since all the connected components of T \ T ′ are of size at most ǫn, there exists T ′′ ⊂ T \ T ′ of size in [ǫn, 2ǫn] with no edges to T \ T ′′ whose size is in [(1 − 3ǫ)n, (1 − 2ǫ)n]. This is a contradiction to ǫ-connectedness and the proof follows.
✷ To see that the conditions of Claim 4.2 are tight up to constant factors consider the star graph with m leaves. Assume that |T | ≤ m − 1. Then it is easy to see that one cannot find even one member of T if all vertices declare all their neighbors corrupt. On the other hand, this example is (vacuously) 1/(4m) connected. To get a non-trivial example, one can replace each node with a complete graph K k and each edge with a complete bipartite graph K k,k for an arbitrary k.
In a follow up work [14], the connection between expansion and corruption detection is formalized using the conjectured hardness of Small Set Expansion [24]. Assuming the hardness of Small Set Expansion, it is shown that it is computationally hard to approximate the minimal number of nodes whose corruption makes identifying even one truthful node impossible.
We conclude with a short discussion of a variant of the model. From the modeling perspective, it is interesting to consider probabilistic variants of the corruption detection problem.

Question 4.3
What is the effect of relaxing the assumption that truthful nodes always report the status correctly? Suppose for example that each truthful node reports the status of each of its neighbors independently accurately with probability 1 − ǫ. Note that in this case it is impossible to detect the status of an individual node with probability one. However it is still desirable to find sets T ′ and B ′ such that the symmetric difference T ∆T ′ and B∆B ′ are small with high probability. Under what conditions can this be achieved? What are good algorithms for finding T ′ and B ′ ?
See [6] for results addressing this problem obtained by Alweiss after our work has been posted.