Tight Hardness of the Non-commutative Grothendieck Problem

$\newcommand{\eps}{\varepsilon} $We prove that for any $\eps>0$ it is $\textsf{NP}$-hard to approximate the non-commutative Grothendieck problem to within a factor $1/2 + \eps$, which matches the approximation ratio of the algorithm of Naor, Regev, and Vidick (STOC'13). Our proof uses an embedding of $\ell_2$ into the space of matrices endowed with the trace norm with the property that the image of standard basis vectors is longer than that of unit vectors with no large coordinates. We also observe that one can obtain a tight $\textsf{NP}$-hardness result for the commutative Little Grothendieck problem; previously, this was only known based on the Unique Games Conjecture (Khot and Naor, Mathematika 2009).


Introduction
The subject of this paper, the non-commutative Grothendieck problem, has its roots in celebrated work of Grothendieck [11], sometimes (jokingly?) referred to as "Grothendieck's résumé." His paper laid the foundation for the study of the geometry of tensor products of Banach spaces, though its significance only A conference version of this paper appeared in the Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2015) [8]. *  became widely recognized after it was revamped by Lindenstrauss and Pełczyński [27]. The main result of the paper, now known as Grothendieck's inequality, shows a close relationship between the following two quantities. For a complex d×d matrix M let where the supremum goes over scalars on the complex unit circle, and let where the supremum goes over vectors on a complex Euclidean unit sphere of any dimension. Since the circle is the sphere in dimension one, we clearly have SDP(M) ≥ OPT(M). Grothendieck's inequality states that there exists a universal constant K C G < ∞ such that for any positive integer d and any d×d matrix M, we also have SDP(M) ≤ K C G OPT(M). This result found an enormous number of applications both within and far beyond its original scope and we give some examples below (see [23,32] for extensive surveys). Despite this, finding the optimal value of K C G is the only one of six problems posed in [11] that remains unsolved today; the current best upper and lower bounds are 1.4049 [14] and 1.338 [9], respectively. The situation is similar for the real variant of the problem, where all objects involved are over the real numbers. The constant in that case is denoted K G and is known to be between 1.6769 and 1.7823 (see [6]).
The non-commutative Grothendieck problem, to which we will refer as the NCG, is the optimization problem in which we are asked to maximize a given bilinear form over all pairs of unitary matrices. More explicitly, we are given a four-dimensional array of complex numbers (T i jkl ) d i, j,k,l=1 and are asked to find or approximate the value where the supremum is over pairs of d×d unitary matrices. (The word "non-commutative" simply refers to the fact that optimization is over matrices.) It is not difficult to see that the (commutative) Grothendieck problem of computing OPT(M) as in (1.1) is the special case where T has T ii j j = M i j and zeros elsewhere. Seen at first, the problem might seem overly abstract, but in fact, as we will illustrate below, it captures many natural questions as special cases. Grothendieck conjectured that his namesake inequality has an extension that relates (1.3) and the quantity where A, B range over all d×d matrices whose entries are complex vectors of arbitrary dimension satisfying a certain "unitarity" constraint. 1 Namely, he conjectured that there exists a universal constant K < ∞ such that for every positive integer d and array T as above, we have OPT(T ) ≤ SDP(T ) ≤ K OPT(T ), where the first inequality follows immediately from the definition. Over twenty-five years after being posed, the non-trivial content of Grothendieck's conjecture, SDP(T ) ≤ K OPT(T ), was finally settled in the positive by Pisier [31]. This result is now known as the non-commutative Grothendieck inequality.
In contrast with the commutative case, and somewhat surprisingly, the optimal value of K is known: Haagerup [13] lowered Pisier's original estimate to K ≤ 2 and this was later shown to be sharp by Haagerup and Itoh [15].
Algorithmic applications. The importance of Grothendieck's inequality to computer science was pointed out by Alon and Naor [1], who placed it in the context of approximation algorithms for combinatorial optimization problems. They observed that computing SDP(M) is a semidefinite programming (SDP) problem that can be solved efficiently (to within arbitrary precision), and they translated an upper bound of about 1.78 on K G due to Krivine [26] to an efficient rounding scheme that turns SDP vectors into a feasible solution for the real Grothendieck problem (1.1) achieving value at least SDP(M)/1.78. It is known that whatever the value of K G is, there exists an efficient algorithm achieving value at least (1/K G − ε) SDP(M) for any constant ε > 0. (This was first shown in [33], but can also be derived from the results in [7] using a simple discretization argument combined with a brute-force search.) The Grothendieck problem shows up in a number of different areas such as graph partition problems and computing the Cut-Norm of a matrix [1], in statistical physics where it gives ground state energies in the spin glass model [25], and in quantum physics where it is related to Bell inequalities [37].
In the same spirit, Naor, Regev, and Vidick [28] recently translated the non-commutative Grothendieck inequality into an efficient SDP-based approximation algorithm for the NCG problem (1.3) that achieves value at least SDP(T )/2. They also considered the real variant and a Hermitian variant, for which they gave analogous algorithms achieving value at least SDP(T )/2 √ 2. This in turn implies efficient constant-factor approximation algorithms for a variety of problems, including the Procrustes problem and two robust versions of Principal Component Analysis [28] and quantum XOR games [34]. In a related result, Bandeira et al. [2] considered a special case of the NCG (in fact a special case even of the Little NCG defined below) and showed how to obtain better approximation factors for it. They also show that it is rich enough to capture some applications such as the Procrustes problem and another natural problem called the Global Registration Problem.
Hardness of approximation. For simplicity we momentarily turn to the real setting, but similar results hold over the complex numbers. The Grothendieck problem contains MAXCUT as a special case (in fact, it is a special case of the "Little Grothendieck Problem," discussed below, in which M is the positive semidefinite Laplacian matrix of a graph [1]). It therefore follows from Håstad's inapproximability result [16] that it is NP-hard to approximate the value (1.1) to any factor larger than 16/17 ≈ .941. Based on the current best-known lower bound of about 1.676 on K G , Khot and O'Donnell [24] proved that (1.1) is Unique-Games-hard to approximate to within a factor larger than 1/1.676 ≈ .597. Moreover, despite the fact that the exact value of K G is still unknown, Raghavendra and Steurer [33] were able to improve this Unique Games hardness to 1/K G . (See for instance [20,36] for background on the Unique Games conjecture.) THEORY OF COMPUTING, Volume 13 (15), 2017, pp. 1-24 Our result. Whereas the hardness situation for the commutative version of Grothendieck's problem is reasonably well understood (apart from the yet-unknown exact value of K G ), no tight hardness result was previously known for the non-commutative version. In fact, we are not even aware of any hardness result that is better than what follows from the commutative case. Here we settle this question. Theorem 1.1. For any constant ε > 0 it is NP-hard to approximate the optimum (1.3) of the noncommutative Grothendieck problem to within a factor greater than 1/2 + ε.
Little Grothendieck. In fact, we prove a stronger result than Theorem 1.1 that concerns a special case of the NCG called the Little NCG. Let us start by describing the (real case of the) commutative Little Grothendieck problem (a. k. a. the positive-semidefinite Grothendieck problem). A convenient way to phrase it is as asking for the operator norm of a linear map F : R n → d 1 (where d p denotes R d endowed with the p norm), defined as F = sup a F(a) 1 where the vector a ranges over the n-dimensional Euclidean unit ball. It turns out that this is a special case of (the real version of) Equation (1.1): for any F there exists a positive semidefinite d×d matrix M such that OPT(M) = F 2 ; and vice versa, one can also map any such M into a corresponding operator F (see, e. g., [32] or Section 6). We wish to highlight that for such instances, the constant K G may be replaced by the smaller value π/2 [35] and that this value is known to be optimal [11]. Moreover, Nesterov made this algorithmic, namely, he showed an algorithm that approximates F as above to within 2/π [29]. Finally, Khot and Naor [22], as part of a more general result, showed that this is tight: the Unique-Games-hardness threshold for the Little Grothendieck problem is exactly 2/π. As an aside, we note that other operator norms, in particular of operators in R n → d 4 , have played an important role recently in theoretical computer science (see, e. g., [3]). The Little non-commutative Grothendieck problem is formulated in terms of the (normalized) trace norm, also known as the Schatten-1 norm, which for a d×d matrix A is given by In other words, A S 1 is the average of the singular values of A. The space of matrices endowed with this norm is denoted by S 1 and by S d 1 if we restrict to d×d matrices. The problem then asks for the operator norm of a linear map F : C n → S d 1 . This problem is a special case of the NCG where OPT(T ) = F 2 (see Section 6). In particular, it follows from [13,28] that there is an efficient SDP-based 1/ √ 2-approximation algorithm for the Little NCG. Our stronger result alluded to above shows tight hardness for the Little NCG, which directly implies Theorem 1.1.
Theorem 1.2. For any constant ε > 0 it is NP-hard to approximate the Little non-commutative Grothendieck problem to within a factor greater than 1/ √ 2 + ε.
While this result applies to the complex case, an easy transformation shows that it directly implies the same result for the real and Hermitian cases introduced in [28] (see Section 5.1). Finally, as we show in the "warm-up" section of this paper (Section 4), we also get a tight NP-hardness result for the commutative Little Grothendieck problem, strengthening the unique-games-based result of [22]. Theorem 1.3. For any constant ε > 0 it is NP-hard to approximate the real Little commutative Grothendieck problem to within a factor greater than 2/π + ε. Similarly, the complex case is NP-hard to approximate to within a factor greater than π/4 + ε. Techniques. Nearly all recent work on hardness of approximation, including for commutative Grothendieck problems [33,22], uses the machinery of Fourier analysis over the hypercube, influences, or the majority is stablest theorem [30]. Our attempts to apply these techniques here failed. Instead, we use a more direct approach similar to that taken in [12] and avoid the use of the hypercube altogether. The role of dictator functions is played in our proof simply by the standard basis vectors of C n . The dictatorship test, which is our main technical contribution, comes in the form of a linear operator F : C n → S d 1 with the following notable property: it maps the n standard basis vectors to matrices with trace norm 1, and it maps any unit vector with no large coordinate to a matrix with trace norm close to 1/ √ 2. Roughly speaking, one can think of F as identifying an interesting subspace of S d 1 (namely, the image of F) in which the unit ball looks somewhat like the intersection of the Euclidean ball of radius √ 2 with the ∞ ball [−1, 1] n (since with such a unit ball, the norm of a vector a is given by max{ a 2 / √ 2, a ∞ }). A first attempt to construct an operator F as above might be to map each standard basis vector to a random unitary matrix. This, however, leads to a very poor map-while standard basis vectors are mapped to matrices of trace norm 1, vectors with no large coordinates are mapped to matrices of trace norm close to 8/(3π) ≈ 0.848 by Wigner's semicircle law. Another natural approach is to look at the construction by Haagerup and Itoh [15] (see also [32,Section 11] for a self-contained description) which shows the factor-2 lower bound in the non-commutative Grothendieck inequality, i. e., the tight integrality gap of the SDP (1.4). Their construction relies on the so-called CAR algebra (after canonical anticommutation relations) and provides an isometric mapping from C n to S 1 , i. e., all unit vectors are mapped to matrices of trace norm 1. Directly modifying this construction (akin to how, e. g., Khot et al. [21] obtained tight hardness of MAXCUT by restricting the tight integrality gap instances by Feige and Schechtman [10] from the sphere to the hypercube) does not seem to work. Instead, our construction of F relies on a different (yet related) algebra known as the Clifford algebra. The Clifford algebra was used before in a celebrated result by Tsirelson [37] (to show that Grothendieck's inequality can be interpreted as a statement about XOR games with entanglement). His result crucially relies on the fact that the Clifford algebra gives an isometric mapping from R n to S 1 . Notice that this is again an isometric embedding, but now only over the reals. Our main observation here (Lemma 5.2) is that the same mapping, when extended to C n , exhibits intriguing cancellations when the phases in the input vectors are not aligned, and this leads to the construction of F (Lemma 5.1). Even though the proof of this fact is simple, we find it surprising; we are not aware of any previous application of such "complex extensions" of Clifford algebras.
Open questions. For the real and Hermitian cases there is a gap of √ 2 between the guarantee of the [28] algorithms and our hardness result. It also would be interesting to explore whether hardness of approximation results can be derived to some of the applications of the NCG, including the Procrustes problem and robust Principle Component Analysis. We believe that our embedding would be useful there too.
Outline. The rest of the paper is organized as follows. In Section 2, we set some notational conventions, gather basic preliminary facts about relevant Banach spaces, and give a detailed formulation of the Smooth Label Cover problem. In Section 3, we prove hardness of approximation for the problem of computing the norm of a general class of Banach-space-valued functions, closely following [12]. In THEORY OF COMPUTING, Volume 13 (15), 2017, pp. 1-24 Section 4, as a "warm up," we prove Theorem 1.3 using the generic result of Section 3 and straightforward applications of real and complex versions of the Berry-Esséen Theorem. Section 5 contains our main technical contribution, which we use there to finish the proof of our main result (Theorem 1.2).

Preliminaries
Notation and relevant Banach spaces. For a positive integer n we denote [n] = {1, . . . , n}. For a graph G and vertices v, w ∈ V (G) we write v ∼ w to denote that v and w are adjacent. We write Pr e∼v [·] for the probability with respect to a uniformly distributed random edge incident with v. For a finite set U we denote by E u∈U [·] the expectation with respect to the uniform distribution over U. For a complex number c ∈ C, we denote its real and imaginary parts by ℜ(c) and ℑ(c), respectively. All Banach spaces are assumed to be finite-dimensional (so we can equivalently talk about normed spaces). Recall that for Banach spaces X,Y the operator norm of a linear operator F : X → Y is given by For a real number p ≥ 1, the p-norm of a vector a ∈ C n is given by As usual we implicitly endow C n with the Euclidean norm a 2 . For a finite set U endowed with the uniform probability measure we denote by L p (U) the space of functions f : U → C with the norm More generally, for a Banach space X we denote by L p (U, X) the space of functions f : We will write L p (X) if U is not explicitly given and f L p instead of f L p (U,X) when there is no danger of ambiguity. Note that L 2 (U, C n ) is a Hilbert space. The following hardness result for Smooth Label Cover, given in [12], 2 is a slight variant of the original construction due to Khot [19]. The theorem also describes the various structural properties, including smoothness, that are satisfied by the hard instances. • (Hardness) It is NP-hard to distinguish between the following two cases: -(YES Case) There is an assignment that satisfies all edges.
-(NO Case) Every assignment satisfies less than a ζ -fraction of the edges.
-(Weak Expansion) For any δ > 0 and vertex subset V ⊆ V such that |V | = δ · |V |, the number of edges between the vertices in V is at least (δ 2 /2)|E|.

Hardness for general Banach-space valued operators
The following proposition shows hardness of approximation for the problem of computing the norm of a linear map from C n to any Banach space that allows for a "dictatorship test," namely, a linear function that maps the standard basis vectors to long vectors, and maps "spread" unit vectors to short vectors. As stated, the proposition assumes the underlying field to be C; we note that the proposition holds with exactly the same proof also in the case of the real field R.
Theorem 3.1. Let (X n ) n∈N be a family of finite-dimensional Banach spaces, and η and τ be positive numbers such that η > τ. Suppose that for each positive integer n there exists a linear operator f : C n → X n with the following properties: • For any vector a ∈ C n , we have f (a) X n ≤ a 2 .
• For each standard basis vector e i , we have f (e i ) X n ≥ η.
Then, for any ε > 0 there exists a positive integer n such that it is NP-hard to approximate the norm of an explicitly given linear operator F : L 2 → L 1 (X n ) to within a factor greater than (τ/η) + ε .

The hardness reduction
To set up the reduction, we begin by defining a linear operator F = F ζ ,γ for any choice of positive real numbers ζ , γ. Afterwards we show that there is a choice of these parameters giving the desired result. For positive real numbers ζ , γ, let n, k, and t be positive integers (depending on ζ , γ) and (G, is a regular graph. Note that ζ controls the "satisfiability" of the instance in the NO case, that γ controls the "smoothness," and that t depends on ζ only. Endow the vertex set V with the uniform probability measure. To define F we consider a special linear subspace H of the Hilbert space L 2 (V, C n ). It will be helpful to view a vector a ∈ L 2 (V, C n ) as an The operator F thus maps a C n -valued assignment a = (a v ) v∈V satisfying (3.1) to an X n -valued assignment given by f (a v ) for each v ∈ V . Theorem 3.1 follows from the following two lemmas, which we prove in Sections 3.2 and 3.3, respectively.  Proof of Theorem 3.1. Let ε > 0 be arbitrary, let ζ , γ be as in Lemma 3.3, and let n = n(ζ , γ) and k = k(ζ , γ) be as in Theorem 2.1. We use the reduction described above, which maps a Smooth Label Cover instance (G, [n], [k], Σ) to the linear operator F : L 2 → L 1 (X n ) specified in (3.2). By Lemma 3.2, YES instances are mapped to F satisfying F ≥ η, whereas by Lemma 3.3, NO instances are mapped to F satisfying F < τ + 4ε. We therefore obtain hardness of approximation to within a factor (τ + 4ε)/η. Since ε is arbitrary, we are done.

Completeness
Here we prove Lemma 3.

Soundness
Here we prove Lemma 3.3 and show that among the family of operators F = F ζ ,γ as in (3.2), for any ε > 0 there is a choice of ζ , γ > 0 such that if F > τ + 4ε, then there exists an assignment satisfying a ζfraction of the edges in the Smooth Label Cover instance associated with F. To begin, assume that F > τ + 4ε for some ε > 0. Let b ∈ H be a vector such that b L 2 = 1 and The weak expansion property in Theorem 2.1 implies that it suffices to find a "good" assignment for a large subset of the vertices, as any large set of vertices will induce a large set of edges. For δ = δ (ε) as in Theorem 3.1, we will consider set of vertices The following lemma shows that V 0 contains a significant fraction of vertices. Proof. Define the sets We bound the four sums on the left-hand side of (3.5) individually. Since (by the first item in Theorem 3.1) we have f (b v ) X n ≤ b v 2 , and since b v 2 ≤ 1/ε for every v ∈ V 0 , the first sum in (3.5) can be bounded by Similarly, using the definition of V 1 the second sum in (3.5) is at most ε|V |. Next, from the third property of f in Theorem 3.1, for each v ∈ V 2 , we have f (b v ) X n ≤ (τ + ε) b v 2 . Therefore, the third sum where the last inequality uses b L 2 = 1. Finally, the fourth sum in (3.5) is bounded by We set out to show that there exists an assignment to the vertices in V 0 that satisfies a significant fraction of edges in E(V 0 ). Roughly speaking, we do this by randomly assigning each v ∈ V 0 one of the coordinates of the vector b v at which it has large magnitude. (Assigning the largest coordinate may not work.) The following simple proposition shows that those vectors indeed have large coordinates.
Proof. For every v ∈ V 0 , we have giving the claim.
For the to-be-determined value of ζ let t = t(ζ ) be as in Theorem 2.1 and for each v ∈ V 0 let By Proposition 3.5 these sets are nonempty and clearly A v 1 ⊆ A v 2 . Moreover, since b v 2 ≤ 1/ε, we have, Now consider a random assignment A : V 0 → [n] that independently assigns each vertex v ∈ V 0 a uniformly random label from A v 1 and assigns the remaining vertices in V some fixed arbitrary label. The following lemma shows that on average, this assignment satisfies a significant fraction of edges. Lemma 3.6. There exists a γ > 0 depending only on ε and ζ such that for some absolute constant c > 0 the expected fraction of edges in E satisfied by the random assignment A given above is at least cε 8 β 4 .
Setting γ appropriately as in the above lemma and ζ = cε 8 δ 4 then gives Lemma 3.3; indeed, notice that then ζ , and therefore also γ, depend on ε alone.
The remainder of this section is devoted to the proof of Lemma 3.6. Let E ⊆ E(V 0 ) be the subset of edges e = (v, w) whose projections π ev and π ew are injective on the subsets A v 2 and A w 2 respectively. Formally, We set the parameter γ according to the following proposition which shows a lower bound on |E | using the smoothness property. Recall that t is a function of ζ only.
Proof. Consider any vertex v ∈ V 0 . By the smoothness property of Theorem 2.1 and a union bound over all distinct pairs i, j ∈ A 2 v , the fraction of edges e ∈ E incident on v that do not satisfy via an appropriate setting of c . Therefore, the number of edges in E that are incident on some v ∈ V 0 and do not satisfy (3.13) is at most Thus, by Equation (3.9).
The following proposition shows that for an edge e = (v, w) ∈ E , the label sets A v 1 and A w 1 intersect under projections given by e. Proof. From Proposition 3.5, let i * ∈ [n] be such that |b v (i * )| ≥ β . Note that i * ∈ A v 1 . Let j * = π ev (i * ). Clearly it suffices to show that there exists an i ∈ A w 1 such that π ew (i ) = j * , as this implies that j * ∈ π ev (A v 1 ) ∩ π ew (A w 1 ). Recall that since b ∈ H, the vector b satisfies the constraint (3.1), in particular, (3.14) We show that because i * ∈ A v 1 , the left-hand side must be large. Therefore the right hand side is also large, from which we conclude that there must exist a coordinate i ∈ π −1 ew ( j * ) such that |b w (i )| is large, and so i ∈ A w 1 . Recall from the second structural property in Theorem 2.1 that |π −1 ev ( j * )| ≤ t. Moreover, since π ev acts injectively on the set A v 2 and since i * ∈ A v 2 , no index i = i * such that π ev (i) = π ev (i * ) can belong to A v 2 . Hence, by the triangle inequality, the left-hand side of (3.14) is at least Combining (3.14), (3.15), and the triangle inequality lets us bound the right-hand side of (3.14) by where the last inequality uses the same facts as above. Since π ew acts injectively on A w 2 , there is at most one index i ∈ π −1 ew ( j * ) that also belongs to A w 2 , meaning that the sum in (3.16) consists of at most one term. We see that sum must is at least β /2 and in particular, there is an i ∈ π −1 ew ( j * ) such that |b w (i )| ≥ β /2. We conclude that i ∈ A w 1 and π ew (i ) = j * = π ev (i * ), proving the claim.

The commutative case
Recall that the commutative Little Grothendieck problem asks for the norm of a linear operator F : L 2 → L 1 . In this section we use Theorem 3.1 to prove Theorem 1.3, the tight hardness result for this problem. We first consider the real case of Theorem 1.3, and then the complex case in Section 4.2.

The real case
The real case of Theorem 1.3 follows easily by combining Theorem 3.1 with the following simple lemma. • For any vector a ∈ R n , we have f (a) L 1 ≤ a 2 .
• For each standard basis vector e i , we have f (e i ) This shows that there is an L 1 -valued function f that satisfies the conditions of the real variant of Theorem 3.1 for τ = 2/π, η = 1 and δ (ε) = (ε/K). Hence, it is NP-hard to approximate the norm of a linear operator F : L 2 → L 1 (L 1 ) over R to a factor 2/π + ε for any ε > 0. The real case of Theorem 1.3 then follows from the fact that L 1 (L 1 ) is isometrically isomorphic to L 1 (i. e., there is a bijective isometry between the two).
The proof of Lemma 4.1 uses the following version of the Berry-Esséen Theorem (see for example [30, Chapter 5.2, Theorem 5.16]).

Theorem 4.2 (Berry-Esséen Theorem).
There exists a universal constant K < ∞ such that the following holds. Let n be a positive integer and let Z 1 , . . . , Z n be independent centered {−1, 1}-valued random variables. Then, for any ε > 0 and for any vector a ∈ R n such that a ∞ ≤ ε a 2 , we have Proof of Lemma 4.1. Endow {−1, 1} n with the uniform probability measure and define the function The first property follows since The second property is trivial. The third property follows from Theorem 4.2. Indeed, the theorem implies that if for some ε > 0, we have then a ∞ > (ε/K) a 2 . Since a 4 ≥ a ∞ the last property follows.

The complex case
A similar argument to the one above shows the complex case of Theorem 1.3. This follows from the following complex analogue of Lemma 4.1. • For any vector a ∈ C n , we have f (a) L 1 ≤ a 2 .
• For each standard basis vector e i , we have f (e i ) This shows that there is an L 1 -valued function f that satisfies the conditions of Theorem 3.1 for τ = π/4, η = 1 and δ (ε) = (ε 2 /K). Hence, it is NP-hard to approximate the norm of a linear operator F : L 2 → L 1 (L 1 ) over C to a factor π/4 + ε for any ε > 0. The complex case of Theorem 1.3 then follows as before from the fact that L 1 (L 1 ) is isometrically isomorphic to L 1 .
The proof of Lemma 4.3 is based on the following complex analogue the Berry-Esséen Theorem. Since we could not find this precise formulation in the literature we include a proof below for completeness.

Lemma 4.4 (Complex Berry-Esséen Theorem).
There exists a universal constant K < ∞ such that the following holds. Let Z 1 , . . . , Z n be independent uniformly distributed random variables over {1, i, −1, −i}. Then, for any vector a ∈ C n such that a ∞ ≤ ε a 2 , we have The proof is based on the following multi-dimensional version of the Berry-Esséen theorem due to Bentkus [4, Theorem 1.1].
Theorem 4.5 (Bentkus). Let X 1 , . . . , X n be independent R d -valued random variables such that E[X j ] = 0 for each j ∈ [n]. Let S = X 1 + · · · + X n and assume that the covariance matrix of S equals 1 d . Let g ∼ N(0, 1 d ) be a standard Gaussian vector in R d with the same covariance matrix as S. Then, for any measurable convex set A ⊆ R d , we have We also use the following standard tail bounds. Lemma 4.7 (Hoeffding's inequality [17]). Let X 1 , . . . , X n be independent real-valued random variables such that for each i ∈ [n], X i ∈ [a i , b i ] for some a i < b i . Let S = X 1 + · · · + X n . Then, for any t > 0, Proof of Lemma 4.4. Let a ∈ C n be some vector. By homogeneity we may assume that a 2 = 1. Set ε = a ∞ . For each j ∈ [n] define the random vector X j ∈ R 2 by X j = √ 2[ℜ(Z j a j ), ℑ(Z j a j )] T and note that X j 2 = √ 2|a j | ≤ √ 2ε. Let S = X 1 + · · · + X n , and let T ≥ √ 8 be some number to be set later. We have (4.1) We now analyze each integral separately. Notice that E[X j ] = 0, and E[X j X T j ] = |a j | 2 1 2 . It follows that the covariance matrix of S equals 1 2 . If we let g ∼ N(0, 1 2 ) be a standard Gaussian vector in R 2 , then it follows from Theorem 4.5 (for d = 2) that for any t > 0, we have Therefore, the first integral in (4.1) satisfies Since g 2 is distributed according to a χ 2 distribution, we have Moreover, it follows from Lemma 4.6 (for d = 2) and our assumption on T that where we used that (t − √ 2) 2 ≥ t 2 /4 for t ≥ √ 8. Combining (4.3), (4.4), and (4.5), we obtain that the first integral in (4.1) satisfies We now bound the second integral in (4.1), which is clearly nonnegative. The first coordinate S 1 is a sum of independent random variables, √ 2ℜ(Z j a j ), which are centered and have magnitude at most √ 2|a j |. Similarly, the same holds for S 2 . Lemma 4.7 therefore gives, where in the last inequality we used the assumption T ≥ 1. Now set T = 8/ε. Combining (4.1), (4.6), and (4.7), we get The proof of Lemma 4.3 is nearly identical to that of Lemma 4.1, now based on Lemma 4.4 and the function f : C n → L 1 ({1, i, −1, −i} n ) given by f (a) (Z 1 , . . . , Z n ) = a 1 Z 1 + · · · + a n Z n , where {1, i, −1, −i} n is endowed with the uniform probability measure.

The non-commutative case
In this section we complete the proof of our main theorem (Theorem 1.2). The following lemma gives the linear matrix-valued map f mentioned in the introduction.
Lemma 5.1. Let n be a positive integer and let d = 2 2n+ n/2 . Then, there exists a linear operator f : C n → C d×d such that for any vector a ∈ C n , we have f (a) S 1 ≤ a 2 2 + a 2 4 2 .
Theorem 1.2 now follows easily by combining the above lemma with Theorem 3.1. Indeed, Lemma 5.1 shows that the conditions of Theorem 3.1 hold for τ = 2 −1/2 , η = 1 and δ (ε) = √ 2ε. It is therefore NP-hard to approximate the norm of a linear operator F : L 2 → L 1 (S 1 ) to within a factor 1/ √ 2 + ε for any ε > 0. This implies the theorem because L 1 (S 1 ) embeds isometrically into S 1 . To see the last fact, we use the map that takes a matrix-valued function g on a finite measure space U to a block diagonal matrix with blocks proportional to g(u) for u ∈ U and use the fact that the trace norm of a block diagonal matrix is the average trace norm of the blocks.
The rest of this section is devoted to the proof of Lemma 5. Note that this value is the area of the parallelogram in R n generated by the vectors ℜ(a) and ℑ(a).
Lemma 5.2. Let n be a positive integer and let d = 2 n/2 . Then, there exists a operator C : C n → C d ×d such that for any vector a ∈ C n , we have Though we will not use it here, let us point out that the map C becomes an isometric embedding if we restrict it to R n , since Λ(a) = 0 for real vectors.
Proof. We begin by defining a set of pairwise anti-commuting matrices as follows. The Pauli matrices are the four Hermitian matrices Using these we define 2 n/2 matrices in C d ×d by for each j ∈ [ n/2 ]. It is easy to verify that these matrices have trace zero, that they are Hermitian, unitary, and that they pairwise anti-commute. In particular, they satisfy C 2 j = I. For a vector a ∈ C n we define the map C by C(a) = a 1 C 1 + · · · + a n C n . Note that for a real vector x ∈ R n , the matrix C(x) is Hermitian and that it satisfies C(x) 2 = x 2 2 I. If a real vector z ∈ R n is orthogonal to x then by expanding the definitions of the matrices C(x) and C(z) and using the above properties we find that they anti-commute: This shows that the matrix C(x)C(z) is skew-Hermitian, which implies that it has purely imaginary eigenvalues. Since this matrix has trace zero and satisfies C(x)C(z) C(x)C(z) * = x 2 2 z 2 2 I, half the eigenvalues equal i x 2 z 2 and the other half equal −i x 2 z 2 .
We show that C satisfies (5.2). Let x = ℜ(a) and y = ℑ(a) so that C(a) = C(x) + iC(y). Write y = y + y ⊥ where y is parallel to x and y ⊥ is orthogonal to x. Then, where in the last line we used the fact that C(y ) commutes with C(x) while C(y ⊥ ) anti-commutes with C(x). Using what we deduced above for the matrix C(x)C(y ⊥ ) we see that half of the eigenvalues of C(a)C(a) * equal a 2 2 + 2 x 2 y ⊥ 2 and the other half equal a 2 2 − 2 x 2 y ⊥ 2 . Hence, The claim now follows because x 2 y ⊥ 2 is precisely the area of the parallelogram generated by the vectors x and y.
We denote the entry-wise product of two vectors a, b ∈ C n by a • b = (a 1 b 1 , . . . , a n b n ).
With this it is easy to verify that By independence of ψ j and ψ k when j = k and the elementary identities E[cos 2 (φ j + ψ k )] = 1/2, E[sin 2 (φ j + ψ k )] = 1/2 and E[cos(φ j + ψ j ) sin(φ j + ψ j )] = 0, the expectation of (5.3) equals We remark that in the above proof, it suffices if ω ∈ {1, i, −1, −i} n is chosen from a pairwise independent family. Using this in the proof below, allows one to prove Lemma 5.1 with a smaller parameter d.
Proof of Lemma 5.1. Let C be the map given by Lemma 5.2. Define the map where ω ranges over over {1, i, −1, −i} n . By convexity of the square function, Jensen's inequality, and the fact that a • ω 2 = a 2 , we have Concavity of the square-root function, Jensen's inequality and Proposition 5.3 gives that the expectation in (5.4) is at most For the second claim observe that for any standard basis vector e j and ω ∈ {1, i, −1, −i} n , the vector e j •ω is either purely real or purely imaginary. This implies Λ(e j • ω) = 0. Hence, by Lemma 5.2,

The real and Hermitian variants
We end this section by showing that our hardness result of Theorem 1.2 also holds for two variants of the Little NCG, the real variant and the Hermitian variant. Both variants were introduced (in the context of the "big" NCG) in [28], partly for the purpose of using them in applications. The real variant asks for the operator norm of a linear map F from R n to a space R d×d endowed with the Schatten-1 norm; in the Hermitian variant, the linear map is from R n to the space H d ⊆ C d×d of Hermitian matrices, again endowed with the Schatten-1 norm. In both cases the operator norm is given by F = sup a F(a) S 1 with the supremum over real unit vectors a. Both the real and Hermitian variants follow directly by combining the lemma shown below and the real version of Theorem 3.1. Let us denote by S d×d ⊆ R d×d the space of real symmetric matrices. The lemma follows by applying the map ρ of the elementary claim below to the restriction of the operator f of Lemma 5.1 to R n .
Claim 5.5. For every positive integer d there exists a map ρ : C d×d → S 4d×4d such that for any matrix A ∈ C d×d , we have ρ(A) S 1 = A S 1 . Moreover, ρ is linear over the real numbers, that is, for any α ∈ R and A, B ∈ C d×d , we have ρ(αA) = αρ(A) and ρ(A + B) = ρ(A) + ρ(B).
Proof. The proof follows by combining two standard transformations taking complex matrices to Hermitian matrices and real matrices, respectively. Let A ∈ C d×d be a matrix with singular values σ 1 ≥ · · · ≥ σ d . The first transformation is given by A → 0 A A * 0 . By [18,Theorem 7.3.3], the last matrix has eigenvalues σ 1 ≥ · · · ≥ σ d ≥ −σ d ≥ · · · ≥ −σ 1 . Notice that this transformation is linear over the reals since the adjoint is such. Let B ∈ C d×d be a Hermitian matrix with eigenvalues λ 1 ≥ · · · ≥ λ d . The second transformation is given by Then the last matrix is symmetric and by [18, 1.30.P20 (g), p. 71], that matrix has the same eigenvalues as B but with doubled multiplicities, that is, the matrix has eigenvalues λ 1 ≥ λ 1 ≥ · · · ≥ λ d ≥ λ d . Notice that this transformation is also linear over the reals. Let ρ be the composition of these maps. Then the matrix ρ(A) has the same singular values as A but with quadrupled multiplicities, which implies that ρ(A) S 1 = A S 1 , and ρ is linear over the reals.

Little versus big Grothendieck theorem
For completeness, we include here the well-known relation between the little and big Grothendieck problems. We focus on the non-commutative case; the commutative case is similar and can be found in, e. g., [32,Section 5]. This discussion clarifies how to derive Theorem 1.1 from Theorem 1.2. Consider a linear map F : C n → S d 1 . A standard and easy-to-prove fact is that for two finite-dimensional Banach spaces X,Y , the operator norm of a linear map G : X → Y equals the norm of its adjoint G * : Y * → X * . As a result, F = F * . Notice that since Hilbert space is self-dual and the dual of S 1 is the space S ∞ of matrices endowed with the Schatten-∞ norm (i. e., the maximum singular value), we have that F * : S d ∞ → C n . In particular, where the supremum is taken over all A of Schatten-∞ norm at most 1. Equivalently, since any matrix with Schatten-∞ norm at most 1 lies in the convex hull of the set of unitary matrices, we could take the supremum over all unitary matrices A. Next, recall that in the NCG problem we are given a bilinear form T : C d×d × C d×d → C, and asked to compute OPT(T ) = sup A,B T (A, B) , where the supremum ranges over unitary matrices. Define the bilinear form T (A, B) = F * (A), F * (B) . By Cauchy-Schwarz, where the supremum is over all unitary A, showing that the Little NCG is a special case of the "big" NCG.