New lower bounds for the border rank of matrix multiplication

The border rank of the matrix multiplication operator for n by n matrices is a standard measure of its complexity. Using techniques from algebraic geometry and representation theory, we show the border rank is at least 2n^2-n. Our bounds are better than the previous lower bound (due to Lickteig in 1985) of 3/2 n^2+ n/2 -1 for all n>2. The bounds are obtained by finding new equations that bilinear maps of small border rank must satisfy, i.e., new equations for secant varieties of triple Segre products, that matrix multiplication fails to satisfy.


Introduction and statement of results
Finding lower bounds in complexity theory is considered difficult. For example, chapter 14 of [1] is "Circuit lower bounds: Complexity theory's Waterloo". The complexity of matrix multiplication is roughly equivalent to the complexity of many standard operations in linear algebra, such as taking the determinant or inverse of a matrix. A standard measure of the complexity of an operation is the minimal length of a straight line program (or circuit) needed to perform it. Another measure is just to count the number of multiplications performed. The exponent of matrix multiplication ω is defined to be lim n log n of the arithmetic cost to multiply n × n matrices, or equivalently, lim n log n of the minimal number of multiplications needed. (The result that these are equivalent justifies ignoring additions.) Determining the complexity of matrix multiplication is a central question of practical importance. We give new lower bounds for its complexity in terms of border rank.
The rank one bilinear maps are those that can be executed using just one scalar multiplication. The rank of a bilinear map T is the smallest r such that T can be written as a sum of r rank one bilinear maps. In other words, let A, B, C be vector spaces, with dual spaces A * , B * , C * , and let T : A * × B * → C be a bilinear map. Then the rank of T is the smallest r such that there exist a 1 , . . . , a r ∈ A, b 1 , . . . , b r ∈ B, c 1 , . . . , c r ∈ C such that T (α, β) = r i=1 a i (α)b i (β)c i . The border rank of T is the smallest r such that T can be written as a limit of a sequence of bilinear maps of rank r. Let R(T ) denote the border rank of T .
Our results are as follows: Thus for 3 × 3 matrices, the state of the art is 15 ≤ R(M 3,3,3 ) ≤ 21, the upper bound is due to Schönhage [11].
Our results include other bounds that might be better asymptotically. For example: In the case m = n, set q = 2 · ⌈ n(n+1) 4 ⌉ − 1, and set p = ⌈ q 2 ⌉ − 1. Then For small values, Theorem 1.1 gives better bounds, but it may be the case that asymptotically the bounds in Theorem 1.3 are better, although this does not appear to be the case up to n = 270. Independent of matrix multiplication, determining the limiting values gives rise to interesting questions in asymptotic representation theory.
Remark 1.5. The best lower bounds for the rank of matrix multiplication are R(M n,m,l ) ≥ lm + mn + l − m + n − 3, R(M n,n,l ) ≥ 2ln − l + 2n − 2, and R(M n,n,n ) ≥ 5 2 n 2 − 3n. These are all due to to Bläser, the first two are in [3], and the third in [2].
Our bounds come from explicit equations that bilinear maps of low border rank must satisfy. These equations are best expressed in the language of tensors. Our method is similar in nature to the method used by Strassen to get his lower bounds -we find explicit polynomials that tensors of low border rank must satisfy, and show that matrix multiplication fails to satisfy them. Strassen found his equations via linear algebra -taking the commutator of certain matrices. We found ours using representation theory and algebraic geometry. (Algebraic geometry is not needed for presenting the results. For its role in our method see [7].) More precisely, in §3 we define, for every p, a linear map and we prove that R(M m,n,l ) ≥ mn−1 p rank (M m,n,l ) ∧p A . We then compute the rank of the linear map (M m,n,l ) ∧p A . The above-mentioned equations are the minors of the linear map (M m,n,l ) ∧p A . This is done with the help of representation theory -we explicitly describe the kernel as a sum of irreducible representations labeled by certain Young diagrams. Equation (4) is obtained in §4 by expressing the kernel of (M m,n,l ) ∧p A as the last term of an exact sequence, and then computing the alternating sum of the dimensions of the spaces involved, and the proof of (5) is similar. Remark 1.6. Viewing matrix multiplication as a map C 3n 2 → C, it is generally expected in the computer science community to have a lower bound on the border rank asymptotically like 3n 2 , the "input size". On the other hand it is also conjectured that R(M n,n,n ) grows like O(n 2 ).
A truly significant lower bound would be a function that grew like 3n 2 h(n) where h is an increasing function. No such super-linear lower bound on the complexity of any explicit tensor (or any computational problem) is known, see [1,14].
¿From a mathematician's perspective, all known equations for secant varieties of Segre varieties that have a geometric model arise by translating multi-linear algebra to linear algebra, and it appears that the limit of this technique is roughly the input size.
Remark 1.7. The methods used here should be applicable to lower bound problems coming from the Geometric Complexity Theory (GCT) introduced by Mulmuley and Sohoni [9], in particular to separate the determinant (small weakly skew circuits) from polynomials with small formulas (small tree circuits).
Overview. In §2 we describe the new equations to test for border rank in the language of tensors. In §3 we apply these equations to matrix multiplication. Theorems 1.3 and 1.1 are respectively proved in §4 and §5. We conclude in §6 with a review of Lickteig's method for purposes of comparison.

The new equations
Let A, B, C be complex vector spaces of dimensions a, b, c, with b ≤ c, and with dual vector spaces A * , B * , C * . Then A⊗B⊗C may be thought of as the space of bilinear maps A * ×B * → C. We work in projective space as the objects we are interested in are invariant under rescaling.
Let Seg(PA × PB × PC) ⊂ P(A⊗B⊗C) denote the Segre variety of rank one tensors and let σ r (Seg(PA × PB × PC)) denote its r-th secant variety, the variety of tensors of border rank at most r.
The most naïve equations for σ r (Seg(PA × PB × PC)) are the so-called flattenings. Given T ∈ A⊗B⊗C, consider T B : B * → A⊗C as a linear map. Then R(T ) ≥ rank(T B ) and similarly for cyclic permutations of A, B, C. The rank of a linear map is determined by taking minors.
In [7] we proposed a generalization of flattenings, called Young flattenings, which in the present context is as follows: Recall that irreducible polynomial representations of the general linear group GL(A) correspond to partitions π = (π 1 , . . . , π a ). Let S π A denote the corresponding GL(A)-module. Consider representations S π A, S µ B, S ν C, and the identity maps Id SπA ∈ S π A⊗S π A * etc... Then we may consider We may decompose S π A⊗A according to the Pieri rule and project to one irreducible component, say SπA, whereπ is obtained by adding a box to π, and similarly for C, while for B we may decompose S µ B * ⊗B and project to one irreducible component, say SμB * , whereμ is obtained by deleting a box from µ. The upshot is a tensor T ′ ∈ SπA⊗S µ B⊗SνC⊗S π A * ⊗SμB * ⊗S ν C * which we may then consider as a linear map, e.g., and rank conditions on T ′ often give border rank conditions on T .
Strassen's equations [12] may be understood in this framework. As described in [10], tensor with Id A and project to obtain a map

T ∧1
A : B * ⊗A → ∧ 2 A⊗C. If T is generic, then one can show that T ∧1 A will have maximal rank, and if T = a⊗b⊗c is of rank one, rank((a⊗b⊗c) ∧1 . Thus the best bound one could hope for with this technique is up to r = ba a−1 . The minors of order r(a − 1) + 1 of T ∧1 A give equations for σ r (Seg(PA × PB × PC)). This is most effective when a = 3. When a > 3, for each 3-plane A ′ ⊂ A, consider the restriction T | A ′ ⊗B⊗C and the corresponding equations, to obtain modules of equations for σ r (Seg(PA×PB ×PC)) for r ≤ 3b 2 . This procedure is called inheritance.
We consider the next simplest cases: To see this, expand a = a 1 to a basis a 1 , . . . , a a of A with dual basis α 1 , . . . , α a of A * . Then , so the image is isomorphic to ∧ p (A/a 1 )⊗c. Remark 2.1. Alternatively, one can compute the rank using the vector bundle techniques of [7].
When T is generic, we expect T ∧p A to be injective, thus potentially obtaining modules of equations up to Since this is an increasing function of p, one gets the most equations taking p equal to its maximal value, p = ⌈ a 2 ⌉ − 1, and again by inheritance, one potentially obtains new modules of equations up to roughly σ 2b (Seg(PA × PB × PC)). A consequence of Theorem 1.1 is that this is indeed the case. For example, since our equations when a = n 2 are nontrivial up to at least 2n 2 − n, we obtain: We record the following proposition which follows from Stirling's formula and the discussion above.
and determining their precise module structure (i.e., which irreducible submodules of (7) actually contribute nontrivial equations) appears to be difficult. Theorem 1.1 is obtained by applying the inheritance principle to the case of an (n + m − 1)plane A ′ ⊂ A = C nm .

Matrix multiplication
Let M, N, L be vector spaces of dimensions m, n, l. Write A = M ⊗N * , B = N ⊗L * , C = L⊗M * , so a = mn, b = nl, c = ml. The matrix multiplication operator M <m,n,l> is M <m,n,l> = Id M ⊗Id N ⊗Id L ∈ A⊗B⊗C. Let U = N * . We compute the kernel of the map For a partition π = (π 1 , . . . , π N ), let ℓ(π) denote the number of parts of π, i.e., the largest k such that π k > 0. Let π ′ denote the conjugate partition to π. We show that all other modules in ∧ p (M ⊗U )⊗U are not in the kernel by computing ψ p at weight vectors. Set is the dual basis to (u i ) and similarly for (m α ) and (m α ), and the summation convention is used throughout. Then Here the c τ 's are Young symmetrizers, and if we write . . . , q 1 , q 2 , . . . , q 2 , . . . , q f , . . . , q f ), then ǫ = (0, . . . , 0, 1, 0, . . . , 0) where the 1 can be in the slots 1, s 1 + 1,...,s f + 1, the last only if s 1 + · · · + s f < n. Now thinking of ψ p : ∧ p A⊗U → ∧ p+1 A⊗M * as a linear map and recalling Schur's lemma, it is clear that if µ + ǫ = ν and ν ′ is of the form µ ′ + ǫ ′ where ǫ ′ , similar to ǫ, has a one in any slot where there is a jump in the partition (i.e., as allowed by the Pieri rule), then the map is the identity on the corresponding module, and otherwise the map is zero. There are corresponding modules except when the modules are as in the statement of the lemma.
Remark 3.2. If we let B ′ = U , C ′ = M , then in the proof above we are really just computing the rank of (T ′ ) ∧p A where T ′ ∈ A⊗B ′ ⊗C ′ is Id U ⊗Id M . The maximal border rank of a tensor T in C mn ⊗C m ⊗C n is mn which occurs anytime the map T : C mn * → C m ⊗C n is injective, so T ′ is a generic tensor in A⊗B ′ ⊗C ′ , and the calculation of rankψ p is determining the maximal rank of (T ′ ) ∧p A for a generic element of C mn ⊗C n ⊗C m .

Proof of formula (4)
We compute the dimension of ker(M m,n,l ) ∧p A = ker ψ p ⊗Id L via an exact sequence. We continue the notations of above. Consider the map Lemma 4.1. Image ψ p,2 = ker ψ p .
Proof. Observe that and a given module in the source with ν n = 0 maps to S π+(1) U ⊗S π ′ M ⊂ S π U ⊗U ⊗S π ′ M where π = (m, ν 1 , . . . , ν n−1 ), the proof is similar to the proof of Lemma 3.1. Its other components map to zero.
The kernel of ψ p,2 is the image of T ⊗m 1 ∧ · · · ∧ m m ⊗m⊗u m+2 → T ∧ (m⊗u)⊗m 1 ∧ · · · ∧ m m ⊗u m+1 and ψ p,3 has kernel the image of T ⊗m 1 ∧ · · · ∧ m m ⊗m 2 ⊗u m+3 → T ∧ (m⊗u)⊗m 1 ∧ · · · ∧ m m ⊗m⊗u m+2 One defines analogous maps ψ p,k . By taking the Euler characteristic we obtain: The first part of Theorem 1.3 follows. The second part is proved by making an identification M ≃ U and restricting to A ′ = S 2 U ⊂ U ⊗U or A ′ = S 2 0 U , the traceless symmetric matrices. The second part corresponds to taking dim A ′ = q.

Proof of Theorem 1.1
The essential idea is to choose a subspace A ′ ⊂ M ⊗U on which the "restriction" of ψ p becomes injective. Take a vector space W of dimension 2, and fix isomorphisms U ≃ S n−1 W * , Recall that S α W may be interpreted as the space of homogenous polynomials of degree α in two variables. If f ∈ S α W and g ∈ S β W * then we can perform the contraction g · f ∈ S α−β W .
In the case f = l α is the power of a linear form l, then the contraction g·l α equals l α−β multiplied by the value of g at the point l, so that (for β ≤ α) g · l α = 0 if and only if l is a root of g.
Consider the natural skew-symmetrization map Recall that representation theory distinguishes a complement A ′′ to A, so the projection M ⊗U → A ′ is well defined. Compose (12) with the projection Now (14) is equivalent to a map . We claim (15) is injective. (Note that when n = m the source and target space of (15) are dual to each other.) Consider the transposed map S m−1 W * ⊗ ∧ n S m+n−2 W → S n−1 W ⊗ ∧ n−1 S m+n−2 W . It is defined as follows on decomposable elements (and then extended by linearity): We show this dual map is surjective. Let l n−1 ⊗(l m+n−2 1 ∧ · · · ∧ l m+n−2 n−1 ) ∈ S n−1 W ⊗ ∧ n−1 S m+n−2 W with l i ∈ W . Such elements span the target so it will be sufficient to show any such element is in the image. Assume first that l is distinct from the l i . Since n ≤ m, there is a polynomial g ∈ S m−1 W * which vanishes on l 1 , . . . , l n−1 and is nonzero on l. Then, up to a nonzero scalar, g⊗(l m+n−2 1 ∧ · · · ∧ l m+n−2 n−1 ∧ l m+n−2 ) maps to our element. Since the image is closed (being a linear space), the condition that l is distinct from the l i may be removed by taking limits.
Finally, ψ ′ p ⊗Id L is the map induced from the restricted matrix multiplication operator and we may repeat the arguments of §3. To complete the proof of Theorem 1.1, observe that an element of rank one in A ′ ⊗B⊗C induces a map of rank n+m−2 n−1 , So the rank of the multiplication operator must be at least .) It follows in three steps. The first combines two standard facts from algebraic geometry: for varieties X, Y ⊂ PV , let J(X, Y ) ⊂ PV denote the join of X and Y . Then σ r+s (X) = J(σ r (X), σ s (X)). If X = Seg(PA × PB × PC) is a Segre variety, then σ s (Seg(PA × PB × PC)) ⊆ Sub s (A⊗B⊗C), where See, e.g., [6] for details. (The proofs of these facts form the bulk of the paper.) Next Lickteig observes that if T ∈ σ r+s (Seg(PA × PB × PC)), then there exist A ′ , B ′ , C ′ each of dimension s such that, thinking of T : A * ⊗B * → C, Finally, for matrix multiplication, with A = M ⊗N * etc., he defines M ′ ⊂ M , N * ′ ⊂ N * to be the smallest spaces such that A ′ ⊆ M ′ ⊗N * ′ and similarly for the other spaces. Then one applies (16) combined with the observation that M | (A ′ ) ⊥ ⊗B * ⊆ M ′ ⊗L * etc., and keeps track of the various bounds to conclude.