On The Hardness of Approximate and Exact (Bichromatic) Maximum Inner Product

In this paper we study the (Bichromatic) Maximum Inner Product Problem (Max-IP), in which we are given sets $A$ and $B$ of vectors, and the goal is to find $a \in A$ and $b \in B$ maximizing inner product $a \cdot b$. Max-IP is very basic and serves as the base problem in the recent breakthrough of [Abboud et al., FOCS 2017] on hardness of approximation for polynomial-time problems. It is also used (implicitly) in the argument for hardness of exact $\ell_2$-Furthest Pair (and other important problems in computational geometry) in poly-log-log dimensions in [Williams, SODA 2018]. We have three main results regarding this problem. First, we study the best multiplicative approximation ratio for Boolean Max-IP in sub-quadratic time. We show that, for Max-IP with two sets of $n$ vectors from $\{0,1\}^{d}$, there is an $n^{2 - \Omega(1)}$ time $\left( d/\log n \right)^{\Omega(1)}$-multiplicative-approximating algorithm, and we show this is conditionally optimal, as such a $\left(d/\log n\right)^{o(1)}$-approximating algorithm would refute SETH. Second, we achieve a similar characterization for the best additive approximation error to Boolean Max-IP. We show that, for Max-IP with two sets of $n$ vectors from $\{0,1\}^{d}$, there is an $n^{2 - \Omega(1)}$ time $\Omega(d)$-additive-approximating algorithm, and this is conditionally optimal, as such an $o(d)$-approximating algorithm would refute SETH [Rubinstein, STOC 2018]. Last, we revisit the hardness of solving Max-IP exactly for vectors with integer entries. We show that, under SETH, for Max-IP with sets of $n$ vectors from $\mathbb{Z}^{d}$ for some $d = 2^{O(\log^{*} n)}$, every exact algorithm requires $n^{2 - o(1)}$ time. With the reduction from [Williams, SODA 2018], it follows that $\ell_2$-Furthest Pair and Bichromatic $\ell_2$-Closest Pair in $2^{O(\log^{*} n)}$ dimensions require $n^{2 - o(1)}$ time.

• Characterization of Multiplicative Approximation. First, we study the best multiplicative approximation ratio for Boolean Max-IP in subquadratic time. We show that, for Max-IP with two sets each consisting of n vectors from {0, 1} d , there is an n 2−Ω(1)time multiplicative t-approximation algorithm when t = (d/ log n) Ω (1) . We also show this is conditionally optimal, as a (d/ log n) o(1) -approximation algorithm would refute SETH. Similar characterization is also achieved for additive approximation for Max-IP.
• 2 O(log * n) -dimensional Hardness for Exact Max-IP Over The Integers. Second, we revisit the hardness of solving Max-IP exactly for vectors with integer entries. We show that, under SETH, for Max-IP with sets of n vectors from Z d for some d = 2 O(log * n) , every exact algorithm requires n 2−o(1) time. With the reduction from [Williams, SODA 2018], it follows that 2 -Furthest Pair and Bichromatic 2 -Closest Pair in dimension 2 O(log * n) require n 2−o(1) time.
• Connection with NP · UPP Communication Protocols. Last, we establish a connection between conditional lower bounds for exact Max-IP with integer entries and NP · UPP communication protocols for Set-Disjointness, parallel to the connection between conditional lower bounds for approximate Max-IP and MA communication protocols for Set-Disjointness.
The lower bound in our first result is a direct corollary of the new MA protocol for Set-Disjointness introduced in [Rubinstein, STOC 2018], and our algorithms utilize the polynomial method and simple random sampling. Our second result follows from a new dimensionality self reduction from the Orthogonal Vectors problem for n vectors from {0, 1} d to n vectors from Z where = 2 O(log * d) , dramatically improving the previous reduction in [Williams, SODA 2018]. The key technical ingredient is a recursive application of Chinese Remainder Theorem.
As a by-product we obtain an MA communication protocol for Set-Disjointness with complexity O( √ n log n log log n), slightly improving the O(

Introduction
Maximum Inner Product Search is a fundamental similarity search problem in which you want to maintain a collection of vectors S, and answer queries of the form that given a new vector q, find the vector in S which is the most correlated to q (or an approximation to it). This problem is closely related to another fundamental problem called nearest neighbor search, in which one needs to maintain a collection of points, and find the nearest neighbor for the query points (or an approximation to it).
In this paper we consider a natural offline version of Maximum Inner Product Search, in which you are only required to compute the maximum correlation between S and all queries, and queries are given in advance. 1  We use Z-Max-IP n,d (R-Max-IP n,d ) to denote the same problem, but with A, B being sets of vectors from Z d (R d ).

Motivation and background 1.Hardness of approximate Max-IP
A natural brute-force algorithm solves Max-IP in O(n 2 · d) time. Assuming SETH 2 , there is no n 2−Ω(1)time algorithm for Max-IP n,d when d = ω(log n) [73].
Despite being one of the most central problems in similarity search and having numerous applications [48,43,15,64,65,68,17,16,18,60,69,71,14,50,11,70,34,33], until recently it was unclear whether there could be a near-linear time, 1.1-approximation algorithm, before the recent breakthrough by Abboud, Rubinstein and Williams [5]. 3 In [5], a framework for proving inapproximability results for problems in P is established (the distributed PCP framework), from which it follows: Rubinstein, Williams 2017). Assuming SETH, no n 2−Ω(1) -time algorithm for Max-IP n,n o(1) can achieve multiplicative 2 (log n) 1−o(1) -approximation. Theorem 1.2 is an exciting development for hardness of approximation in P, implying other important inapproximability results for a host of problems including Bichromatic LCS Closest Pair Over Permutations, Approximate Regular Expression Matching, and Diameter in Product Metrics [5]. However, we still do not have a complete understanding of the approximation hardness of Max-IP yet. For instance, consider the following two concrete questions: Question 1. Is there a multiplicative (log n)-approximation n 2−Ω(1) -time algorithm for Max-IP n,log 2 n ? What about a multiplicative 2-approximation algorithm for Max-IP n,log 2 n ? Question 2. Is there an additive (d/ log n)-approximation n 2−Ω(1) -time algorithm for Max-IP n,d ?
We note that the lower bound from [5] cannot answer Question 1. Tracing the details of their proofs, one can see that it only shows approximation hardness for dimension d = log ω(1) n. Question 2 concerning additive approximation is not addressed at all by [5]. Given the importance of Max-IP, it is interesting to ask: For what ratios r do n 2−Ω(1) -time r-approximation algorithms exist for Max-IP?
Does the best-possible approximation ratio (in n 2−Ω(1) time) relate to the dimensionality, in some way?
In an important recent work, Rubinstein [67] improved the distributed PCP construction in a crucial way, from which one can derive more refined lower bounds on approximate Max-IP. He then used the refined lower bounds to establish the SETH-hardness of (1 + o(1))-approximate nearest neighbor search.
Building on Rubinstein's technique in this paper we provide full characterizations, determining essentially optimal multiplicative approximations and additive approximations to Max-IP, under SETH.

Hypothesis assumed in this paper
We use OV n,d to denote the Orthogonal Vectors problem: given two sets of vectors A, B each consisting of n vectors from {0, 1} d , determine whether there is an a ∈ A and b ∈ B such that a · b = 0. 6 Similarly, we use Z-OV n,d to denote the same problem except for that A and B consist of vectors from Z d (which is also called Hopcroft's problem).
All our results are based on the following widely used conjecture about OV: Conjecture 1.4 (Orthogonal Vectors Conjecture (OVC) [73,8]). For every ε > 0, there exists a c ≥ 1 such that OV n,d requires n 2−ε time when d = c log n.
OVC is a plausible conjecture as it is implied by the popular Strong Exponential Time Hypothesis [47,29] on the time complexity of solving k-SAT [73,76].

Our results on {0, 1}-Max-IP
The first main result of our paper characterizes when there is a truly subquadratic-time (n 2−Ω(1) -time, for some universal constant hidden in the big-Ω) multiplicative t-approximation algorithm for Max-IP, and characterizes the best-possible additive approximations as well. Let P be an optimization problem (can be either minimization or maximization). We begin with formal definitions of these two standard types of approximation: • We say A is a multiplicative t-approximation algorithm for P, if for all instances I, A outputs a value OPT(I) such that OPT(I) ∈ [OPT(I), OPT(I) · t], where OPT(I) is the answer to P on instance I.
• We say A is an additive t-approximation algorithm for P, if for all instances I, A outputs a value OPT(I) such that | OPT(I) − OPT(I)| ≤ t.
• To avoid ambiguity, we call an algorithm computing OPT(I) exactly an exact algorithm for P.

Characterizations of hardness of multiplicative approximations to Max-IP
In the multiplicative case, our characterization (formally stated below) basically says that there is a multiplicative t-approximation n 2−Ω(1) -time algorithm for Max-IP n,d if and only if t = (d/ log n) Ω (1) . Note that in the following theorem we require d = ω(log n), since in the case of d = O(log n), there are n 2−ε -time algorithms for exact Max-IP n,d [14,13].
Letting ω(log n) < d < n o(1) and t ≥ 2, 7 the following holds. 6 Here we use the bichromatic version of OV instead of the monochromatic one for convenience, as they are equivalent. 7 Note that t and d are both functions of n, we assume they are computable in n o(1) time throughout this paper for simplicity.
1. There is an n 2−Ω(1) -time multiplicative t-approximation algorithm for Max-IP n,d if and under SETH (or OVC), there is no n 2−Ω(1) -time multiplicative t-approximation algorithm for

Moreover, let
There are multiplicative t-approximation deterministic algorithms for Max-IP n,d running in time Remark 1.6. We remark that our algorithm still gets a non-trivial speed-up over the brute force algorithm as long as ε = ω(log log n/ log n).
Comparison with [11]. We remark that in [11], subquadratic-time algorithms for multiplicative tapproximation Max-IP is given when t = d Ω(1) . 8 Our algorithms achieve a slightly better approximation ratio t = (d/ log n) Ω(1) which matches the conditional lower bound. Moreover, our algorithms work for the following more general case which is not handled by [11]. The algorithms in Theorem 1.5 indeed work for the case where the sets consist of non-negative reals (i. e., R + -Max-IP). Corollary 1.7. Assuming ω(log n) < d < n o(1) and letting there is a multiplicative t-approximation deterministic algorithm for R + -Max-IP n,d running in time The lower bound of Theorem 1.5 is a direct corollary of the new improved MA protocols for Set-Disjointness from [67], which is based on Algebraic Geometry codes. Together with the framework of [5], that MA-protocol implies a reduction from OV to approximate Max-IP.
Our upper bounds are applications of the polynomial method [74,6]: defining appropriate sparse polynomials for approximating Max-IP on small groups of vectors, and using fast matrix multiplication to speed up the evaluation of these polynomials on many pairs of points.
Via the known reduction from Max-IP to LCS-Pair in [5], we also obtain a more refined lower bound for approximating the LCS Closest Pair problem (defined below). Definition 1.8 (LCS Closest Pair). The LCS-Closest-Pair n,d problem is: given two sets A, B each consisting of n strings from Σ d (Σ is a finite alphabet), determine max a∈A,b∈B where LCS(a, b) is the length of the longest common subsequence of strings a and b. Corollary 1.9 (Improved Inapproximability for LCS-Closest-Pair). Assuming SETH (or OVC), for every t ≥ 2, computing a multiplicative t-approximation to LCS-Closest-Pair n,d requires n 2−o(1) time, if d = t ω(1) · log 5 n.

Characterizations of hardness of additive approximations to Max-IP
Our characterization for additive approximations to Max-IP says that there is an additive t-approximation n 2−Ω(1) -time algorithm for Max-IP n,d if and only if t = Ω(d).
The lower bound above is already established in [67], while the upper bound works by reducing the problem to the d = O(log n) case via random-sampling coordinates, and solving the reduced problem via known methods [14,13]. Remark 1.11. We want to remark here that the lower bounds for approximate Max-IP are direct corollaries of the new MA protocols for Set-Disjointness in [67]. Our main contribution is providing the complementary upper bounds to show that these lower bounds are indeed tight assuming SETH.
All-Pair-Max-IP. Finally, we remark that our algorithms (with slight adaptations) also work for the following stronger problem 9 : All-Pair-Max-IP n,d , in which we are given two sets A and B each consisting of n vectors from {0, 1} d , and for each x ∈ A we must compute We say an algorithm is a multiplicative t-approximation (additive t-approximation) algorithm for All-Pair-Max-IP, if for all instances of OPT(x, B), it computes corresponding approximate answers. Corollary 1.12. Suppose ω(log n) < d < n o (1) , and let ε M := min logt log(d/ log n) , 1 and ε A := min(t, d) d .

BQP communication protocols and approximate {−1, 1}-Max-IP
Making use of the O( √ n)-degree approximate polynomial for OR [27,77], we also give a completely different proof for the hardness of multiplicative approximations to {−1, 1}-Max-IP. Note that the answer to {−1, 1}-Max-IP can be negative, so we consider the variant of maximizing unsigned inner product here; that is, we want to approximate max a∈A,b∈B |a · b| (this variant is also studied in [11]). Lower bound from that approach is inferior to Theorem 1.5: in particular, it cannot achieve a characterization. 10 It is asked in [5] that whether we can make use of the O( √ n) BQP communication protocol for Set-Disjointness [28] to prove conditional lower bounds. Indeed, that quantum communication protocol is based on the O( √ n)-time quantum query algorithm for OR (Grover's algorithm [42]), which induces the needed approximate polynomial for OR. Hence, the following theorem in some sense answers their question in the affirmative. can achieve multiplicative n o(1) -approximation.
The full statement can be found in Theorem 7.1 and Theorem 7.2.

Our results on
Now we turn to discussing our results on Z-Max-IP. We show that Z-Max-IP is hard to solve in n 2−Ω(1) time, even with 2 O(log * n) -dimensional vectors. Theorem 1.14. Assuming SETH (or OVC), there is a constant c such that any exact algorithm for Z-Max-IP n,d in dimension d = c log * n requires n 2−o(1) time, with vectors of O(log n)-bit entries.
As direct corollaries of the above theorem, using reductions implicit in [75], we also conclude hardness for 2 -Furthest Pair and Bichromatic 2 -Closest Pair under SETH (or OVC) in dimension 2 O(log * n) . 10 We also remark that in particular, the hardness of approximate {−1, 1}-Max-IP follows from the hardness of approximate

Improved dimensionality reduction for OV and Hopcroft's problem
Our hardness of Z-Max-IP is established by a reduction from Hopcroft's problem, whose hardness is in turn derived from the following significantly improved dimensionality reduction for OV.
Comparison with [75]. Comparing to the old construction in [75], our reduction here is more efficient when is much smaller than d (which is the case we care about). That is, in [75], OV n,d can be reduced to d d/ instances of Z-OV n, +1 , while we get { 6 log * d } d/ instances in our improved one. So, for example, when = 7 log * d , the old reduction yields d d/7 log * d = n ω(1) instances (recall that d = c log n for an arbitrary constant c), while our improved one yields only n o(1) instances, each with 2 O(log * n) dimensions.
From Lemma 1.17, the following theorem follows in the same way as in [75].

Connection between Z-Max-IP lower bounds and NP · UPP communication protocols
We also show a new connection between Z-Max-IP and a special type of communication protocol. Let us first recall the Set-Disjointness problem.
In [5], the hardness of approximate Max-IP is established via a connection to MA communication protocols (in particular, an MA communication protocol with small communication complexity for Set-Disjointness). Our lower bound for (exact) Z-Max-IP can also be connected to similar NP · UPP protocols (note that MA = NP · promiseBPP). THEORY OF COMPUTING, Volume 16 (4), 2020, pp. 1-50 Formally, we define NP · UPP protocols as follows. 11 Definition 1.20. For a problem Π with two n-bit strings x, y as inputs (Alice holds x and Bob holds y), we say a communication protocol is an (m, )-efficient NP · UPP communication protocol if the following holds: • There are three parties Alice, Bob and Merlin in the protocol.
• Merlin sends Alice and Bob an advice string z of length m, which is a function of x and y.
• Given y and z, Bob sends Alice bits, and Alice decides to accept or not. 12 They have an unlimited supply of private random coins (not public, which is important) during their conversation. The following conditions hold: -If Π(x, y) = 1, then there is an advice z from Merlin such that Alice accepts with probability ≥ 1/2. -Otherwise, for all possible advice strings from Merlin, Alice accepts with probability < 1/2. Moreover, we say the protocol is (m, )-computational-efficient, if in addition the probability distributions of both Alice's and Bob's behavior can be computed in poly(n) time given their input and the advice.
Our new reduction from OV to Max-IP actually implies an efficient NP · UPP protocol for Set-Disjointness.
For example, when α = 3 log * n, Theorem 1.21 implies there is an O(o(n), O(log * n))-computationalefficient NP · UPP communication protocol for DISJ n . Moreover, we show that if the protocol of Theorem 1.21 can be improved a little bit (like removing the 6 log * n term), we would obtain the desired hardness for Z-Max-IP in dimension ω(1).

Improved MA protocols for Set-Disjointness
Finally, we also obtain a new MA protocol for Set-Disjointness, which improves on the previous O( √ n log n) protocol in [1], and is closer to the Ω( √ n) lower bound by [54]. Like the protocol in [1], our new protocol also works for the following slightly harder problem Inner Product. Definition 1.23 (Inner Product). Let n ∈ N. In Inner Product (IP n ), Alice holds a vector X ∈ {0, 1} n , Bob holds a vector Y ∈ {0, 1} n , and they want to compute X ·Y . In [67], the author asked whether the MA communication complexity of DISJ n (IP n ) is Θ( √ n) or Θ( √ n log n). Our result makes progress on that question by showing that the true complexity lies between Θ( √ n) and Θ( √ n log n log log n).

Intuition for dimensionality self-reduction for OV
The 2 O(log * n) factor in Lemma 1.17 is not common in theoretical computer science, 13 and our new reduction for OV is considerably more complicated than the polynomial-based construction from [75]. Hence, it is worth discussing the intuition behind Lemma 1.17, and the reason why we get a factor of 2 O(log * n) .
A direct approach based on the Chinese Remainder Theorem. We first discuss a direct reduction based on the Chinese Remainder Theorem (CRT) (see Theorem 2.5 for a formal definition). CRT says that given a collection of distinct primes q 1 , . . . , q b , and a collection of integers r 1 , . . . , r b , there exists a unique integer t = CRR({r j }; {q j }) such that t ≡ r j (mod q j ) for each j ∈ [b] and 0 ≤ t < ∏ b j=1 q j (CRR stands for Chinese Remainder Representation). Now, let b, ∈ N. Suppose we would like to have a dimensionality reduction ϕ from {0, 1} b· to Z . We can partition an input x ∈ {0, 1} b· into blocks, each of length b, and represent each block via CRT: that is, for a block z ∈ {0, 1} b , we map it into a single integer ϕ block (z) := CRR({z j }; {q j }), and the concatenations of ϕ block over all blocks of x is ϕ(x) ∈ Z .
The key idea here is that, for z, z ∈ {0, 1} b , ϕ block (z) · ϕ block (z ) (mod q j ) is simply z j · z j . That is, the multiplication between two integers ϕ block (z) · ϕ block (z ) simulates the coordinate-wise multiplication between two vectors z and z ! Therefore, if we make all primes q j larger than , we can in fact determine x · y from ϕ(x) · ϕ(y), by looking at ϕ(x) · ϕ(y) (mod q j ) for each j. That is, 13 Other examples include an O 2 O(log * n) n 4/3 -time algorithm for Z-OV n,3 [59], O 2 O(log * n) n log n -time algorithms (Fürer's algorithm with its modifications) for Fast Integer Multiplication [39,36,44] and an old O(n d/2 2 O(log * n) )-time algorithm for Klee's measure problem [30]. We The reduction is completed by constructing a Z-OV instance for each v ∈ V : we append corresponding values to make ϕ A (x) = [ϕ(x), −1] and ϕ B (y) = [ϕ(y), v] (this step is from [75]).
Note that a nice property for ϕ is that each ϕ(x) i only depends on the i-th block of x, and the mapping is the same on each block (ϕ block ); we call this the block mapping property.
Analysis of the direct reduction. To continue building intuition, let us analyze the above reduction. The size of V is the number of Z-OV n, +1 instances we create, and |V | ≥ ∏ b j=1 q j . These primes q j have to be all distinct, and it follows that ∏ b j=1 q j is b Θ(b) . Since we want to create at most n o(1) instances (or n ε for arbitrarily small ε), we need to set b ≤ log n/ log log n. Moreover, to base our hardness on OVC which deals with c log n-dimensional vectors, we need to set b · = d = c · log n for an arbitrary constant c. Therefore, we must have ≥ log log n, and the above reduction only obtains the same hardness result as [75].
Key observation: "Most space modulo q j " is actually wasted. To improve the above reduction, we need to make |V | smaller. Our key observation about ϕ is that, for the primes q j , they are mostly larger than b , but ϕ(x) · ϕ(y) ∈ {0, 1, . . . , } (mod q j ) for all these q j . Hence, "most space modulo q j " is actually wasted.
Make more "efficient" use of the "space": recursive reduction. Based on the previous observation, we want to use the "space modulo q j " more efficiently. It is natural to consider a recursive reduction. We will require all our primes q j to be larger than b. Let b m be a very small integer compared to b, and ψ : {0, 1} b m · → Z with a set V ψ and a block mapping ψ block be a similar reduction on much smaller inputs: for x, y ∈ {0, 1} b m · , x · y = 0 ⇔ ψ(x) · ψ(y) ∈ V ψ . We also require here that ψ(x) · ψ(y) ≤ b for every x and y.
For an input x ∈ {0, 1} b· and a block z ∈ {0, 1} b of x, our key idea is to partition z again into b/b m "micro" blocks each of size b m . And for a block z in x, let z 1 , . . . , z b/b m be its b/b m micro blocks. We map z into an integer ϕ block (z) Hence, for every j ∈ [b/b m ], we can determine whether x [ j] · y [ j] = 0 from whether ϕ(x) · ϕ(y) (mod q j ) ∈ V ϕ , and therefore also determine whether x · y = 0 from ϕ(x) · ϕ(y).
We can now observe that |V | ≤ b Θ(b/b m ) , smaller than before; thus we get an improvement, depending on how large can b m be. Clearly, the reduction ψ can also be constructed from even smaller reductions, and after recursing Θ(log * n) times, we can switch to the direct construction discussed before. By a straightforward (but tedious) calculation, we can derive Lemma 1.17.
High-level explanation on the 2 O(log * n) factor. Ideally, we want to have a reduction from OV to Z-OV with only O(b) instances, in other words, we want |V | = O(b) . The reason we need to pay an extra 2 O(log * n) factor in the exponent is as follows: In our reduction, |V | is at least ∏ b/b m j=1 q j , which is also the bound on each coordinate of the reduction: , whose value can be as large as That is, all we want is to control the upper bound on the coordinates of the reduction.
Suppose we are constructing an "outer" reduction ϕ : , and let L ψ = κ·b m . (That is, κ is the extra factor comparing to the ideal case.) Recall that we have to ensure q j > ψ(x) · ψ(y) to make our construction work, and therefore we have to set q j larger than L 2 ψ . Then the coordinate upper bound for ϕ becomes . Therefore, we can see that after one recursion, the "extra factor" κ at least doubles. Since our recursion proceeds in Θ(log * n) rounds, we have to pay an extra 2 O(log * n) factor on the exponent.
Communication Complexity and conditional hardness. The connection between communication protocols (in various model) for Set-Disjointness and SETH dates back at least to [62], in which it is shown that a sublinear computational efficient protocol for 3-party Number-On-Forehead Set-Disjointness problem would refute SETH. And it is worth mentioning that Abboud and Rubinstein's result [4] builds on the O(log n) IP communication protocol for Set-Disjointness in [1]. Making use of the IP communication protocol for low-space computation, [31] establish an equivalence class for LCS-Closest-Pair.
In [32], Σ 2 communication protocols are utilized to show the subquadratic-time equivalence between OV n,O(log n) , Max-IP n,O(log n) , Approximate Bichromatic Closest Pair and several other problems.
Distributed PCP. Using Algebraic Geometry codes (AG codes), [67] obtains a better MA protocol, which in turn improves the efficiency of the previous distributed PCP construction of [5]. He then shows the n 2−o(1) -time hardness for 1 + o(1)-approximation to Bichromatic Closest Pair and o(d)-additive approximation to Max-IP n,d with this new technique.
[51] use the Distributed PCP framework to derive inapproximability results for k-Dominating Set under various assumptions. In particular, building on the techniques of [67], it is shown that under SETH, k-Dominating Set has no (log n) 1/ poly(k,e(ε)) approximation in n k−ε time. 14 [52] also utilize AG codes and polynomial method to show hardness results for Exact and Approximate Monochromatic Closest Pair and Approximate Monochromatic Maximum Inner Product.
Hardness of approximation in P. Making use of Chebyshev embeddings, [11] prove a 2 Ω √ log n log log n inapproximability lower bound on {−1, 1}-Max-IP. [2] take an approach different from Distributed PCP, and shows that under certain complexity assumptions, LCS does not have a deterministic 1 + o(1)approximation in n 2−ε time. They also establish a connection with circuit lower bounds and show that the existence of such a deterministic algorithm implies E NP does not have non-uniform linear-size Valiant Series Parallel circuits. In [4], it is improved to that any constant factor approximation deterministic algorithm for LCS in n 2−ε time implies that E NP does not have non-uniform linear-size NC 1 circuits. See [5] for more related results in hardness of approximation in P.

Organization of the paper
In Section 2, we introduce the needed preliminaries for this paper. In Section 3, we prove our characterizations for approximate Max-IP and other related results. In Section 4, we prove 2 O(log * n) dimensional hardness for Z-Max-IP and other related problems. In Section 5, we establish the connection between NP · UPP communication protocols and SETH-based lower bounds for exact Z-Max-IP. In Section 6, we present the O √ n log n log log n MA protocol for Set-Disjointness.

Preliminaries
We begin by introducing some notation. For an integer d, we use [d] to denote the set of integers from 1 to d. For a vector u, we use u i to denote the i-th element of u.
We use log(x) to denote the logarithm of x with respect to base 2 and ln(x) to denote the natural logarithm of x.
In our arguments, we use the iterated logarithm function log * (n), which is defined recursively as follows: log * (log n) + 1 n > 1.

Fast rectangular matrix multiplication
Similarly to previous algorithms using the polynomial method, our algorithms make use of the algorithms for fast rectangular matrix multiplication.
). There is an N 2+o(1) -time algorithm for multiplying two matrices A and B with size N × N α and N α × N, where α > 0.31389.
). There is an N 2 · polylog(N)-time algorithm for multiplying two matrices A and B with size N × N α and N α × N, where α > 0.172.

Number theory
Here we recall some facts from number theory. In our reduction from OV to Z-OV, we will apply the famous prime number theorem, which supplies a good estimate of the number of primes smaller than a certain number. See e.g. [19] for a reference on this.
From a simple calculation, we have the following lemma. Proof. For a sufficiently large n, from the prime number theorem, the number of primes in [n + 1, n 2 ] is equal to π(n 2 ) − π(n) ∼ n 2 /2 ln n − n/ ln n 10n.
Next we recall the Chinese remainder theorem, and Chinese remainder representation.
We call this t the Chinese remainder representation (or the CRR encoding) of the r i (with respect to these q i ). We also denote for convenience. We sometimes omit the sequence {q i } for simplicity, when it is clear from the context. Moreover, t can be computed in polynomial time with respect to the total bits of all the given integers.

Communication complexity
In our paper we will make use of a certain kind of MA protocol, we call them (m, r, , s)-efficient protocols. 15 Definition 2.6. We say an MA Protocol is (m, r, , s)-efficient for a communication problem, if in the protocol: • There are three parties Alice, Bob and Merlin in the protocol, Alice holds input x and Bob holds input y.
• Merlin sends an advice string z of length m to Alice, which is a function of x and y. 15 Our notation here is adopted from [51]. They also defined similar k-party communication protocols, while we only discuss 2-party protocols in this paper.
• Alice and Bob jointly toss r coins to obtain a random string w of length r.
• Given y and w, Bob sends Alice a message of length .
• After that, Alice decides whether to accept or not.
-When the answer is yes, Merlin has exactly one advice such that Alice always accept.
-When the answer is no, or Merlin sends the wrong advice, Alice accepts with probability at most s.

Derandomization
We make use of expander graphs to reduce the amount of random coins needed in one of our communication protocols. We abstract the following result for our use here.
Theorem 2.7 (see, e. g., Theorem 21.12 and Theorem 21.19 in [20]). Let m be an integer. There is a universal constant c 1 such that for every ε < 1/2, there is a poly(log m, log ε −1 )-time computable function here a ∈ F(w) means a is one of the elements in the sequence F(w).

Hardness of Approximate Max-IP
In this section we prove our characterizations of approximate Max-IP.

The multiplicative case
We begin with the proof of Theorem 1.5. In Lemma 3.2, we construct the desired approximation algorithm and in Corollary 3.4 we prove the lower bound. First we need the following simple lemma, which says that the k-th root of the sum of the k-th powers of non-negative reals gives a good approximation to their maximum. Lemma 3.1. Let S be a set of non-negative real numbers, k be an integer, and x max := max x∈S x. We have the lemma follows directly by taking the k-th root of both sides.
THEORY OF COMPUTING, Volume 16 (4), 2020, pp. 1-50 Assuming ω(log n) < d < n o(1) and letting there are multiplicative t-approximation deterministic algorithms for R + -Max-IP n,d running in time 16 O n Proof. Let d = c · log n. From the assumption, we have c = ω(1), and ε = min (log(t)/log(c), 1). When logt > log c, we simply use a multiplicative c-approximation algorithm instead, hence in the following we assume logt ≤ log c. We begin with the first algorithm here.
Construction and analysis of the Power of Sum Polynomial P r (z). Let r be a parameter to be specified later and z be a vector from (R + ) d . We define a polynomial P r (z) as , the e i are non-negative integers}. We have For each e ∈ E, we define z e := ∏ d i=1 z e i i . Now, by expanding out the polynomial, we can write P r (z) as where the c e are the corresponding coefficients. Then consider P r (x, y) := P r (x 1 · y 1 , x 2 · y 2 , . . . , x d · y d ), plugging in z i := x i · y i , it can be written as where x e and y e are defined similarly as z e .
Construction and analysis of the Batch Evaluation Polynomial P r (X,Y ). Now, let X and Y be two sets each consisting of b = t r/2 vectors from {0, 1} d . We define 17 16 In the following we assume a real RAM model of computation for simplicity. 17 We remark that similar polynomials are also used in [12] to give a simple algorithm for solving the light bulb problem. By Lemma 3.1, we have Embedding into Rectangular Matrix Multiplication. Now, for x, y ∈ {0, 1} d , we define the mappings φ x (x) and φ y (y) as, for every x, y ∈ {0, 1} d . Then for each X and Y , we map them into m-dimensional vectors φ X (X) and φ Y (Y ) simply by a summation: We can see

Given two sets
After that, the evaluation of P r (A i , B j ) for all integers i, j ∈ [n/b] can be reduced to computing the matrix product M A · M T B . After knowing all the P r (A i , B j ), we simply compute their maximum, whose r-th root gives us a t-approximate answer of the original problem.
Analysis of the running time. Finally, we are going to specify the parameter r and analyze the time complexity. In order to utilize the fast matrix multiplication algorithm from Theorem 2.1, we need to have m ≤ (n/b) 0.313 , then our running time is simply (n/b) 2+o(1) = n 2+o(1) /b 2 .
We are going to set r = k · log n/ log c, and our choice of k will satisfy k = Θ(1) and r ≤ d. We have m ≤ e · (r + d) r r ≤ e · 2d r r ≤ 2c log n · e k · log n/ log c k·log n/ log c , and therefore log m ≤ k · log n log 2c log c k + log e log c.
Plugging in, we have Finally, with our choice of k specified, our running time is n 2+o(1) /b 2 = n 2+o(1) /t r . By a simple calculation, Hence, our running time is The second algorithm. The second algorithm follows exactly the same except for that we apply Theorem 2.2 instead, hence the constant 0.31 is replaced by 0.17.
The lower bound follows directly from the new MA protocol for Set-Disjointness in [67]. We present an explicit proof here for completeness.
Before proving the lower bound, we need the following reduction from OV to approximate Max-IP. ). There is a universal constant c 1 such that, for every integer c, realε ∈ (0, 1] and τ ≥ 2, OV n,c log n can be reduced to n ε Max-IP n,d instances (A i , B i ) for i ∈ [n ε ], such that: • d = τ poly(c/ε) · log n.
• Letting T = c log n · τ c 1 , if there is an a ∈ A and b ∈ B such that a · b = 0, then there exists an i such that The reduction above follows directly from the new MA communication protocols in [67] together with the use of expander graphs to reduce the amount of random coins. A proof for the lemma is deferred to Section 3.5.
Now we are ready to show the lower bound on approximate Max-IP. Proof. Let c = d/ log n. Note that t = c o(1) (recall that t and d are two functions of n). Suppose for contradiction that there is an n 2−ε -time multiplicative t(n)-approximation algorithm A for Max-IP(n, d) for some ε > 0.
Let ε = ε /2. Now, for every constant c 2 , we apply the reduction in Lemma 3.3 with τ = t to reduce an OV n,c 2 log n instance to n ε Max-IP n,t poly(c 2 /ε) ·log n instances. Since t poly(c 2 /ε) = t O(1) and t = c o(1) , it follows that for sufficiently large n, t O(1) · log n = c o(1) · log n = o(d). It in turn implies that for sufficiently large n, n ε calls to A are enough to solve the OV n,c 2 log n instance.
Therefore, we can solve OV n,c 2 log n in n 2−ε · n ε = n 2−ε time for all constants c 2 . Contradiction to OVC.
Finally, the correctness of Theorem 1.5 follows directly from Lemma 3.2 and Corollary 3.4.

The additive case
In this subsection we prove Theorem 1.10. We proceed similarly as in the multiplicative case by establishing the algorithm first.
The algorithm is actually very easy, we simply apply the following algorithm from [13]. time, additive t-approximation randomized algorithm for Max-IP n,d when ε = ω(log 6 log n/ log 3 n).
Proof. When t > d the problem becomes trivial, so we can assume t ≤ d, and now t = ε · d.
Let ε 1 = ε/2 and c 1 be a constant to be specified later. Given a Max-IP n,d instance with two sets A and B each consisting of n vectors from {0, 1} d , we create another Max-IP n,d 1 instance with sets A, B and d 1 = c 1 · ε −2 1 · log n as follows: is an independent uniform random number in [d].
• Then we construct A from A by reducing each a ∈ A intoã = (a i 1 , a i 2 , . . . , a i d 1 ) ∈ {0, 1} d 1 and B from B in the same way.
Note for each a ∈ A and b ∈ B, by a Chernoff bound, we have By setting c 1 = 2, the above probability is smaller than 1/n 3 . Hence, by a simple union bound, with probability at least 1 − 1/n, we have for every a ∈ A and b ∈ B. Hence, it means that this reduction only changes the "relative inner product"(a · b/d or a · b/d 1 ) of each pair by at most ε 1 . Hence, the maximum of the "relative inner product" also changes by at most ε 1 , and we have |OPT Then we apply the algorithm in Lemma 3.5 on the instance with sets A and B with error ε = ε 1 to obtain an estimate O, and our final answer is simply From the guarantee from Lemma 3.5, we have |OPT( A, B)/d 1 − O/d 1 | ≤ ε 1 , and therefore from which the correctness of our algorithm follows directly. For the running time, note that the reduction part runs in linear time O(n · d), and the rest takes time.
The lower bound is already established in [67], we show it follows from Lemma 3.3 here for completeness.
THEORY OF COMPUTING, Volume 16 (4), 2020, pp. 1-50 Lemma 3.7 (Theorem 4.1 of [67]). Assuming SETH (or OVC), and letting d = ω(log n) and t > 0, there is no n 2−Ω(1) -time additive t-approximation randomized algorithm for Max-IP n,d if Proof. Recall that t and d are all functions of n. Suppose for contradiction that there is an n 2−ε -time additive t(n)-approximation algorithm A for Max-IP(n, d) for some ε > 0.
Let ε = ε /2. Now, for every constant c 2 , we apply the reduction in Lemma 3.3 with τ = 2 to reduce an OV n,c 2 log n instance to n ε Max-IP n,d 1 instances, where d 1 = 2 poly(c 2 /ε) · log n = O(1) · log n. In addition, from Lemma 3.3, to solve the OV n,c 2 log n instance, we only need to compute additive T /6 = Ω(log n) = Ω(d 1 )-approximations to these Max-IP instances obtained via the reduction.
This can be solved, via n ε calls to A as follows: for each Max-IP n,d 1 instance I we get, we duplicate each coordinate d/d 1 times (note that d = ω(log n) d 1 = O(log n), and for simplicity we assume d 1 | d), to obtain a Max-IP n,d instance I new , such that OPT(I new ) = d/d 1 · OPT(I). Then A can be used to estimate OPT(I new ) within an additive error t = o(d). Scaling its estimate by d 1 /d, it can also be used to estimate OPT(I) within an additive error o(d 1 ) = o(log n) ≤ T /6 for sufficiently large n.
Therefore, we can solve OV n,c 2 log n in n 2−ε · n ε = n 2−ε time for all constants c 2 . Contradiction to OVC.
Finally, the correctness of Theorem 1.10 follows directly from Lemma 3.6 and Lemma 3.7.

Adaptation to All-Pair-Max-IP
Now we sketch the adaptation of our algorithms to work for the All-Pair-Max-IP problem.
Proof sketch. Note that the algorithm in Lemma 3.5 from [13] actually works for the All-Pair-Max-IP n,d . Hence, we can simply apply that algorithm after the coordinate sampling phase, and obtain an additive t-approximation algorithm for All-Pair-Max-IP n,d .
For multiplicative t-approximation algorithm, suppose we are given with two sets, A and B, of n vectors each, from {0, 1} d . Instead of partitioning each of them into n/b subsets (the notation used here is the same as in the proof of Lemma 3.2), we only partition B into n/b subsets, B 1 , B 2 , . . . , B n/b , of size b each, and calculate P r (x, B i ) := ∑ y∈B i P r (x, y) for every x ∈ A and i ∈ [n/b] using similar reduction to rectangular matrix multiplication as in Lemma 3.2. Note that here we are multiplying an n × m matrix and an m × (n/b) matrix, and this can be reduced to b instances of multiplication of an (n/b) × m matrix and an m × (n/b) matrix, and now our running time becomes n 2 /b · polylog(n) instead of n 2 /b 2 · polylog(n).
By a similar analysis, these can be done in n 2−Ω(ε M ) · polylog(n) time, and then we can compute the multiplicative t-approximate answers for the given All-Pair-Max-IP n,d instance.

Improved hardness for LCS-Closest Pair problem
We finish this section with the proof of Corollary 1.9. First we abstract the reduction from Max-IP to LCS-Closest-Pair in [5] here. Lemma 3.8 (Implicit in Theorem I.10 in [5]). For every real t ≥ 2 and integer n, computing a multiplicative t-approximation to Max-IP n,d reduces to computing a multiplicative t/2-approximation to LCS-Closest-Pair n,O(d 3 log 2 n) in O(n poly(d, log n)) time. Now we are ready to prove Corollary 1.9 (restated below for convenience).

A proof of Lemma 3.3
Finally, we present a proof of Lemma 3.3, which is implicit in [67].
We need the following efficient MA protocol for Set-Disjointness from [67], which is also used in [51]. 18 We want to reduce the error probability while keeping the number of total random coins relatively low. To achieves this, we can use an expander graph (Theorem 2.7) to prove the following theorem. Proof. Let c 1 and F : {0, 1} log m+c 1 ·log ε −1 → [m] c 1 ·log ε −1 be the corresponding constant and function as in Theorem 2.7, and Π be the (m/α, log 2 m, poly(α), 1/2)-efficient MA protocol for DISJ m in Lemma 3.9. Set q = c 1 · log ε −1 and our new protocol Π new works as follows: • Merlin still sends the same advice to Alice as in Π.
• Alice and Bob jointly toss r = log m + q coins to get a string w ∈ {0, 1} r . Then we let w 1 , w 2 , . . . , w q be the sequence corresponding to F(w). Each of them can be interpreted as log m bits.
• Bob sends Alice q messages, the i-th message m i corresponds to Bob's message in Π when the random bits is w i .
• After that, Alice decides whether to accept or not as follows: -If for every i ∈ [q], Alice would accept Bob's message m i with random bits w i in Π, then Alice accepts.
It is easy to verify that the advice length, message length and number of random coins satisfy our requirements.
For the error probability, note that when these two sets are disjoint, the same advice in Π leads to acceptance of Alice. Otherwise, suppose the advice from Merlin is either wrong or these two sets are intersecting, then half of the random bits in {0, 1} log m leads to the rejection of Alice in Π. Hence, from Theorem 2.7, with probability at least 1 − ε, at least one of the random bits w i would lead to the rejection of Alice, which completes the proof.

Reminder of Lemma 3.3
There is a universal constant c 1 such that, for every integer c, realε ∈ (0, 1] and τ ≥ 2, OV n,c log n can be reduced to n ε Max-IP n,d instances (A i , B i ) for i ∈ [n ε ], such that: • d = τ poly(c/ε) · log n.
• Letting T = c log n · τ c 1 , if there is a ∈ A and b ∈ B such that a · b = 0, then there exists an i such that OPT(A i , B i ) ≥ T .1 • Otherwise, for every i we must have OPT(A i , B i ) ≤ T /τ.
Proof. The reduction follows exactly the same as in [5], we recap here for completeness. Set α = c/ε, m = c · log n and ε = 1/τ, and let Π be the (m/α, log 2 m + O(log ε −1 ), poly(α) · log ε −1 , ε)-efficient MA protocol for Set-Disjointness as in Lemma 3.10. Now, we first enumerate all of 2 m/α = 2 ε·log n = n ε possible advice strings, and create a Max-IP instance for each of the advice strings.
For a fixed advice ψ ∈ {0, 1} ε·log n , we create a Max-IP instance with sets A ψ and B ψ as follows. We use a • b to denote the concatenation of the strings a and b.
• For each a ∈ A, and for each string w ∈ {0, 1} r , we create a vector a w ∈ {0, 1} 2 , such that a w i indicates that given advice ψ and randomness w, whether Alice accepts message m i or not (1 for acceptance, 0 for rejection). Let the concatenation of all these a w be a ψ . Then A ψ is the set of all these a ψ for a ∈ A.
• For each b ∈ B, and for each string w ∈ {0, 1} r , we create a vector b w ∈ {0, 1} 2 , such that b w i = 1 if Bob sends the message m i given advice ψ and randomness w, and = 0 otherwise. Let the concatenation of all these b w be b ψ . Then B ψ is the set of all these b ψ for b ∈ B.
We can see that for a ∈ A and b ∈ B, a ψ · b ψ is precisely the number of random coins leading Alice to accept the message from Bob given advice ψ when Alice and Bob holds a and b correspondingly. Therefore, let T = 2 r = c log n · τ c 1 . From the properties of the protocol Π, we can see that: • If there is an a ∈ A and b ∈ B such that a ·b = 0, then there is a ψ ∈ {0, 1} ε·log n such that a ψ ·b ψ ≥ T .

The organization of this section
In Section 4.1, we prove the improved dimensionality reduction for OV. In Section 4.2, we establish the hardness of Hopcroft's problem in dimension 2 O(log * n) with the improved reduction. In Section 4.3, we show Hopcroft's problem can be reduced to Z-Max-IP and thus establish the hardness for the later one. In Section 4.4, we show Z-Max-IP can be reduced to 2 -Furthest Pair and Bichromatic 2 -Closest Pair, therefore the hardness for the last two problems follow. In Section 4.5, we show that to construct better dimensionality reduction for OV, it suffices to show the existence of such reductions, instead of constructing them algorithmically. See Figure 1 for a diagram of all reductions covered in this section. The reductions in Section 4.2, Section 4.3 and Section 4.4 are all from [75] (either explicitly or implicitly), we make them explicit here for our ease of exposition and for making the paper self-contained. THEORY OF COMPUTING, Volume 16 (4), 2020, pp. 1-50

Improved dimensionality reduction for OV
We begin with the improved dimensionality reduction for OV. The following theorem is one of the technical cores of this paper, which makes use of the CRR encoding (see Theorem 2.5) recursively.
Theorem 4.1. Let b, be two sufficiently large integers. There is a reduction ψ b, : {0, 1} b· → Z and a set V b, ⊆ Z, such that for every x, y ∈ {0, 1} b· , for every x ∈ {0, 1} b· and i ∈ [ ]. Moreover, the computation of ψ b, (x) takes poly(b · ) time, and the set V b, can be constructed in time.
Remark 4.2. We didn't make much effort to minimize the base 6 above to keep the calculation clean, it can be replaced by any constant > 2 with a tighter calculation.
Proof. We are going to construct our reduction in a recursive way. will be the same throughout the proof, hence in the following we use ψ b (V b ) instead of ψ b, (V b, ) for simplicity.
Direct CRR for small b: When b < , we use a direct Chinese remainder representation of numbers. We pick b distinct primes q 1 , q 2 , . . . , q b in [ + 1, 2 ] (they are guaranteed to exist by Lemma 2.4), and use them for our CRR encoding. For x ∈ {0, 1} b· , we partition it into equal-sized subvectors, and use x i to denote the i-th subvector. That is, x i is the subvector of x from the ((i − 1) · b + 1)-th bit to the (i · b)-th bit.
Then we define ψ b (x) as That is, the i-th coordinate of ψ b (x) is the CRR encoding of the i-th subvector x i with respect to the primes q j . Now, for x, y ∈ {0, 1} b· , note that for j ∈ [b], Since the sum ∑ i=1 x i j · y i j is in [0, ], and q j > , we can see Therefore, and we can set V b to be the set of all integers in [0, 6 log * (b) ·2b+1 ] that is 0 modulo each q j , and it is easy to see that for every x, y ∈ {0, 1} b· .
Recursive construction for larger b: When b ≥ , suppose the theorem holds for all integers b < b. Let b m be the number such that (we ignore the rounding issue here and pretend that b m is an integer for simplicity), 6 log * (bm) ·b m = b.
Then we pick b/b m primes p 1 , p 2 , . . . , , and use them as our reference primes in the CRR encodings.
For x ∈ {0, 1} b· , as before, we partition x into equal-sized subvectors x 1 , x 2 , . . . , x , where x i consists of the ((i − 1) · b + 1)-th bit of x to the (i · b)-th bit of x. Then we partition each x i again into b/b m micro groups, each of size b m . We use x i, j to denote the j-th micro group of x i after the partition. Now, we use x [ j] to denote the concatenation of the vectors x 1, j , x 2, j , . . . , x , j . That is, x [ j] is the concatenation of the j-th micro group in each of the subvectors. Note that x [ j] ∈ {0, 1} b m · , and can be seen as a smaller instance, on which we can apply ψ b m .
Our recursive construction then goes in two steps. In the first step, we make use of ψ b m , and transform each b m -size micro group into a single number in [0, b). This step transforms x from a vector in {0, 1} b· into a vector S(x) in Z (b/b m )· . And in the second step, we use a similar CRR encoding as in the base case to encode S(x), to get our final reduced vector in Z .
Then we define ψ b (x) as In other words, the i-th coordinate of ψ b (x) is the CRR representation of the number sequence S i , with respect to our primes For every x, y ∈ {0, 1} b· and j ∈ [b/b m ], we have Finally, recall that we have Taking logarithm of both sides, we have THEORY OF COMPUTING, Volume 16 (4), 2020, pp. 1-50 Then we can upper bound ψ b (x) i by Therefore, we can set V b as the set of all integers t in [0, And it is easy to see this V b satisfies our requirement. Finally, it is easy to see that the straightforward way of constructing ψ b (x) takes O(poly(b · )) time, and we can construct V b by enumerating all possible values of ψ b (x) · ψ b (y) and checking each of them in O(poly(b · )) time. Since there are at most O(6 log * (b) ·b) such values, V b can be constructed in time, which completes the proof. Now we prove Lemma 1.17, we recap its statement here for convenience.
Proof. The proof is exactly the same as the proof for Lemma 1.1 in [75] with different parameters. We recap it here for convenience.
Given two sets A and B each consisting of n vectors from {0, 1} d , we apply ψ d/ , to each of the vectors in A (B ) to obtain a set A (B) of vectors from Z . From Theorem 4.1, there is a (u, v) Now, for each element t ∈ V d/ , , we are going to construct two sets A t and B t of vectors from Z +1 such that there is a (u, v) ∈ A × B with u · v = t if and only if there is a (u, v) ∈ A t × B t with u · v = 0. We construct a set A t as the collection of all vectors u A = [u, 1] for u ∈ A, and a set B t as the collection of all vectors v B = [v, −t] for v ∈ B. It is easy to verify this reduction has the properties we want.
Note that there are at most O(6 log * d ·(d/ )) numbers in V d/ , , so we have such a number of Z-OV n, +1 instances. And from Theorem 4.1, the reduction takes Finally, the bit-length of reduced vectors is bounded by which completes the proof.
A transformation from nonuniform construction to uniform construction. The proof for Theorem 4.1 works recursively. In one recursive step, we reduce the construction of ψ b, to the construction of ψ b m , , where b m ≤ log b. Applying this reduction log * n times, we get a sufficiently small instance that we can switch to a direct CRR construction. An interesting observation here is that after applying the reduction only three times, the block length parameter becomes b ≤ log log log b. Such a b is so small that we can actually use brute force to find the "optimal" construction ψ b , in b o(1) time instead of recursing deeper. Hence, to find a construction better than Theorem 4.1, we only need to prove the existence of such a construction. See Section 4.5 for details.

Improved hardness for Hopcroft's problem
In this subsection we are going to prove Theorem 1.18 using our new dimensionality reduction from Lemma 1.17, we recap its statement here for completeness.
Reminder of Theorem 1.18 [Hardness of Hopcroft's problem in c log * n dimension] Assuming SETH (or OVC), there is a constant c such that Z-OV n,c log * n with vectors of O(log n)-bit entries requires n 2−o(1) time.
Proof. The proof here follows roughly the same as the proof for Theorem 1.1 in [75].
Let c be an arbitrary constant and d := c · log n. We show that an algorithm A solving Z-OV n, +1 where = 7 log * n in O(n 2−δ ) time for some δ > 0 can be used to construct an O(n 2−δ +o(1) )-time algorithm for OV n,d , and therefore contradicts the OVC.
THEORY OF COMPUTING, Volume 16 (4), 2020, pp. 1-50 Therefore, the reduction takes O(n · O(6 log * d ·(d/ )) · poly(d)) = n 1+o(1) time. It reduces an OV n,d instance to n o(1) instances of Z-OV n, +1 , whose vectors have bit length o(log n) as calculated above. We simply solve all these n o(1) instances using A, and this gives us an O(n 2−δ +o(1) )-time algorithm for OV n,d , which completes the proof.

Hardness for Z-Max-IP
Now we move to hardness of exact Z-Max-IP. Proof. We remark here that this reduction is implicitly used in the proof of Theorem 1.2 in [75], we abstract it here only for our exposition. Given a Z-OV n,d instance with sets A, B. Consider the following polynomial P(x, y), where x, y ∈ Z d .
It is easy to see that whether there is an (x, y) ∈ A × B such that x · y = 0 is equivalent to whether the maximum value of P(x, y) is 0. Now, for each x ∈ A and y ∈ B, we identify and construct x, y ∈ Z d 2 such that x (i, j) = x i · x j and y (i, j) = −y i · y j .
Then we have x · y = P(x, y). Hence, let A be the set of all these vectors x, and B be the set of all these vectors y. Whether there is an (x, y) ∈ A × B such that x · y = 0 is equivalent to whether OPT( A, B) = 0, and our reduction is completed. Now, Theorem 1.14 (restated below) is just a simple corollary of Theorem 4.3 and Theorem 1.18.
Reminder of Theorem 1.14 Assuming SETH (or OVC), there is a constant c such that every exact algorithm for Z-Max-IP n,d for d = c log * n requires n 2−o(1) time, with vectors of O(log n)-bit entries.

A dimensionality reduction for Max-IP
The reduction ψ b, from Theorem 4.1 actually does more: For x, y ∈ {0, 1} b· , from ψ b, (x) · ψ b, (y) we can in fact determine the inner product x · y itself, not only whether x · y = 0. Formally, we have the following corollary.  Proof sketch. Let b = d/ (assume divides d here for simplicity), A and B be the sets in the given Max-IP n,d instance. We proceed similarly as the case for OV.
For each k from 0 to d, we construct the set V k b, as specified in Corollary 4.4. Then there is an Using exactly the same reduction as in Lemma 1.17, we can in turn reduce this into O(6 log * (b) ·b) instances of Z-OV n, +1 .
Applying Theorem 4.3, by solving all these (d + 1) · O(6 log * (b) ·b) Z-Max-IP n,( +1) 2 instances, we can determine whether there is an (x, y) ∈ A × B such that x · y = k for every k, from which we can compute the answer to the Max-IP n,d instance.

Hardness for 2 -Furthest Pair and Bichromatic 2 -Closest Pair
Now we turn to the proof of hardness of 2 -Furthest Pair and Bichromatic 2 -Closest Pair. The two reductions below are slight adaptations of the reductions in the proofs of Theorem 1.2 and Corollary 2.1 in [75]. Proof. Let A, B be the sets in the Z-Max-IP n,d instance, and k be the smallest integer such that all vectors from A and B consist of (k · log n)-bit entries.
Let W be n C·k where C is a large enough constant. Given x ∈ A and y ∈ B, we construct point That is, we append two corresponding values into the end of vectors x and −y.
THEORY OF COMPUTING, Volume 16 (4), 2020, pp. 1-50 Now, we can see that for x 1 , x 2 ∈ A, the squared distance between their reduced points is Similarly we have Next, for x ∈ A and y ∈ B, we have the last inequality holds when we set C to be 5.
Putting everything together, we can see the 2 -farthest pair among all points x and y must be a pair ( x, y) with x ∈ A and y ∈ B. And maximizing x − y is equivalent to maximizing x · y, which proves the correctness of our reduction. Furthermore, when k is a constant, the reduced instance only needs vectors with O(k) · log n = O(log n)-bit entries. Proof. Let A, B be the sets in the Z-Max-IP n,d instance, and k be the smallest integer such that all vectors from A and B consist of (k · log n)-bit entries.
Let W be n C·k where C is a large enough constant. Given x ∈ A and y ∈ B, we construct point x = x, W − x 2 , 0 and y = y, 0, W − y 2 .
That is, we append two corresponding values into the end of vectors x and y. And our reduced instance is to find the closest point between the set A (consisting of all these x where x ∈ A) and the set B (consisting of all these y where y ∈ B). Next, for x ∈ A and y ∈ B, we have Hence minimizing x − y where x ∈ A and y ∈ B is equivalent to maximize x · y, which proves the correctness of our reduction. Furthermore, when k is a constant, the reduced instance only needs vectors with O(k) · log n = O(log n)-bit entries.

Nonuniform to uniform transformation for dimensionality reduction for OV
Finally, we discuss the transformation from nonuniform construction to uniform one for dimensionality reduction for OV. In order to state our result formally, we need to introduce some definitions.  Z is a (b, , κ)-reduction, if the following holds: Similarly, let τ be an increasing function. We say a function family {ϕ b, } b, together with a set family {V b, } b, is a τ-reduction family, if for every integer b and , Moreover, if for all integers b and ≤ log log log b, there is an algorithm A which computes ϕ b, (x) in poly(b) time given b, and x ∈ {0, 1} b· , and constructs the set V b, in O O(τ(b)·b) · poly(b) time given b and , then we call (ϕ b, ,V b, ) a uniform-τ-reduction family.
Remark 4.9. The reason we assume to be so small compared to b is that in our applications we only care about very small , and that greatly simplifies the notation. From Theorem 4.1, there is a uniform-6 log * b -reduction family, and a better uniform-reduction family implies better hardness for Z-OV and other related problems as well (Lemma 1.17, Theorem 4.3, Lemma 4.7 and Lemma 4.6). Now we are ready to state our nonuniform to uniform transformation result formally. Proof sketch. The construction in Theorem 4.1 is recursive, it constructs the reduction ψ b, from a much smaller reduction ψ b m , , where b m ≤ log b. In the original construction, it takes log * b recursions to make the problem instance sufficiently small so that a direct construction can be used. Here we only apply the reduction three times. First let us abstract the following lemma from the proof of Theorem 4.1. • There is a (b, , 6 · κ)-reduction (ψ,V ).
• Given (ϕ,V ), for every x ∈ {0, 1} b· , ψ(x) can be computed in poly(b · ), and V can be constructed in O O(κ·b) · poly(b · ) time. Now, let b, ∈ N. We are going to construct our reduction as follows.
THEORY OF COMPUTING, Volume 16 (4), 2020, pp. 1-50 Similarly, we set b 2 and b 3 so that We can calculate from above that b 3 ≤ log log log b. From the assumption that there is a τ-reduction, there is a (b 3 , , τ(b 3 ))-reduction (ϕ b 3 , ,V b 3 , ), which is also a (b 3 , , τ(b))-reduction, as τ is increasing. Note that we can assume ≤ log log log b and τ(b) ≤ log log log b from assumption. Now we simply use a brute force algorithm to find  Proof sketch. Note that since the hardness of Z-OV implies the harnesses of other three, we only need to consider Z-OV here.
From Theorem 4.10 and the assumption, there exists a uniform-O(1)-reduction. Proceeding similar as in Lemma 1.17 with the uniform-O(1)-reduction, we obtain a better dimensionality self reduction from OV to Z-OV. Then exactly the same argument as in Theorem 1.18 with different parameters gives us the lower bound required.

NP · UPP communication protocol and exact hardness for Z-Max-IP
We note that the inapproximability results for (Boolean) Max-IP is established via a connection to the MA communication complexity protocol of Set-Disjointness [5]. In the light of this, in this section we view our reduction from OV to Z-Max-IP (Lemma 1.17 and Theorem 4.3) in the perspective of communication complexity.
We observe that in fact, our reduction can be understood as an NP · UPP communication protocol for Set Disjointness. Moreover, we show that if we can get a slightly better NP · UPP communication protocol for Set-Disjointness, then we would be able to prove Z-Max-IP is hard even for ω(1) dimensions (and also 2 -Furthest Pair and Bichromatic 2 -Closest Pair).

Slightly better protocols imply hardness in dimension ω(1)
Finally, we show that if we have a slightly better NP·UPP protocol for Set-Disjointness, then we can show Z-Max-IP requires n 2−o(1) time even for ω(1) dimensions (and so do 2 -Furthest Pair and Bichromatic 2 -Closest Pair). We restate Theorem 1.22 here for convenience.
Reminder of Theorem 1.22 Assuming SETH (or OVC), if there is an increasing and unbounded function f such that for every 1 ≤ α ≤ n, there is an Let ε = ε 1 /2, and α be the first number such that c/ f (α) < ε. Note that α is also a constant. Consider the (c log n/ f (α), α)-computational-efficient NP · UPP protocol Π for DISJ c log n , and let A, B be the two sets in the OV n,c log n instance. Our algorithm via reduction works as follows: • There are 2 α possible messages in {0, 1} α , let m 1 , m 2 , . . . , m 2 α be an enumeration of them.
• We first enumerate all possible advice strings from Merlin in Π. There are 2 c log n/ f (α) ≤ 2 ε·log n = n ε such strings, let φ ∈ {0, 1} ε·log n be such an advice string.
-For each x ∈ A, let ψ Alice (x) ∈ R 2 α be the probabilities that Alice accepts each message from Bob. That is, ψ Alice (x) i is the probability that Alice accepts the message m i , given its input x and the advice φ . -Similarly, for each y ∈ B, let ψ Bob (y) ∈ R 2 α be the probabilities that Bob sends each message.
That is, ψ Bob (y) i is the probability that Bob sends the message m i , give its input y and the advice φ . -Then, for each x ∈ A and y ∈ B, ψ Alice (x) · ψ Bob (y) is precisely the probability that Alice accepts at the end when Alice and Bob hold x and y respectively and the advice is φ . Now we let A φ be the set of all the vectors ψ Alice (x), and B φ be the set of all the vectors ψ Bob (y).
• If there is a φ such that OPT(A φ , B φ ) ≥ 1/2, then we output yes, and otherwise output no.
From the definition of Π, it is straightforward to see that the above algorithm solves OV n,c·log n . Moreover, notice that from the computational-efficient property of Π, the reduction itself works in n 1+ε · polylog(n) time, and all the vectors in A φ and B φ have at most polylog(n) bit precision, which means OPT(A φ , B φ ) can be solved by a call to Z-Max-IP n,2 α with vectors of polylog(n)-bit entries.
Hence, the final running time for the above algorithm is bounded by n ε · n 2−ε 1 = n 2−ε (2 α is still a constant), which contradicts the OVC.

Improved MA protocols
In this section we prove Theorem 1.24 (restated below for convenience).
Reminder of Theorem 1.24 There is an MA protocol for DISJ n and IP n with communication complexity O n log n log log n .
To prove Theorem 1.24, we need the following intermediate problem.
Definition 6.1 (The Inner Product Modulo p Problem (IP p n )). Let p and n be two positive integers. In IP p n , Alice and Bob are given vectors X and Y in {0, 1} n respectively and they want to compute X ·Y (mod p).
Note that IP n and IP p n are not Boolean functions, so we need to generalize the definition of an MA protocol. In an MA protocol for IP n , Merlin sends the answer directly to Alice together with a proof to convince Alice and Bob. The correctness condition becomes that for the right answer X ·Y , Merlin has a proof such that Alice and Bob will accept with high probability (like 2/3). And the soundness condition becomes that for the wrong answers, every proof from Merlin will be rejected with high probability.
We are going to use the following MA protocol for IP p n , which is a slight adaptation of the protocol from [67]. Proof of Theorem 1.24. Since an IP n protocol trivially implies a DISJ n protocol, we only need to consider IP n in the following. Now, let x be the number such that x x = n. For convenience we are going to pretend that x is an integer. It is easy to see that x = Θ(log n/ log log n). Then we pick 10x distinct primes p 1 , p 2 , . . . , p 10x in [x + 1, x 2 ]. (We can assume that n is large enough to make x satisfy the requirement of Lemma 2.4.) Let T be a parameter. We use Π p i to denote the O (n/T · log p i ) , log n + O(1), O (T · log p i ) , 1/2 -efficient MA protocol for IP p i n . Our protocol for IP n works as follows: • Merlin sends Alice all the advice strings from the protocols Π p 1 , Π p 2 , . . . , Π p 10x , together with a presumed inner product 0 ≤ z ≤ n.
• Note that Π p i contains the presumed value of X · Y (mod p i ), Alice first checks whether z is consistent with all these Π p i , and rejects immediately if it does not.
• Alice and Bob jointly toss O(log(10x)) coins, to pick a uniform random number i ∈ [10x], and then they simulate Π p i . That is, they pretend they are the Alice and Bob in the protocol Π p i with the advice from Merlin in Π p i (which Alice does have).
Correctness. Let X,Y ∈ {0, 1} n be the vectors of Alice and Bob. If X ·Y = z, then by the definition of these protocols Π p i , Alice always accepts with the correct advice from Merlin. Otherwise, let d = X ·Y = z.
We are going to analyze the probability that we pick a "good" p i such that p i does not divide |d − z|.
Since p i > x for all the p i and x x > n ≥ |d − z|, |d − z| cannot be a multiplier of more than x primes among the p i . Therefore, with probability at least 0.9, our pick of p i is good. And in this case, from the definition of the protocols Π p i , Alice and Bob would reject afterward with probability at least 1/2, as d (mod p i ) differs from z (mod p i ). In summary, when X ·Y = z, Alice rejects with probability at least 0.9/2 = 0.45, which finishes the proof for the correctness.
Complexity. Now, note that the total advice length is

And the communication complexity between Alice and Bob is bounded by
Setting T = n log n/ log log n balances the above two quantities, and we obtain the needed MAprotocol for IP n . In this section we apply the O( √ n)-degree approximate polynomial for OR [27,77] to show hardness of approximate {−1, 1}-Max-IP. We first give a reduction from OV to approximate {−1, 1}-Max-IP.
• There is an integer T > ε −1 such that if there is an (a, b) ∈ A × B such that a · b = 0, then OPT( A, B) ≥ T .
We remark here that the above reduction fails to achieve a characterization: setting ε = 1/2 and d = c log n for an arbitrary constant c, we have d 1 = 2 O( √ log n) , much larger than log n. Another interesting difference between the above theorem and Lemma 3.3 (the reduction from OV to approximate Max-IP) is that Lemma 3.3 reduces one OV instance to many Max-IP instances, while the above reduction only reduces it to one {−1, 1}-Max-IP instance.
Proof of Theorem 7.1. Construction and analysis of the polynomial P ε (z). By [27,77], there is a polynomial P ε : {0, 1} d → R such that: • P ε is of degree D = O d log 1/ε .
• P ε can be constructed in time polynomial in its description size.
where χ S , P ε is the inner product of χ S and P ε , defined as χ S , P ε := E x∈{0,1} d χ S (x) · P ε (x). Let c S = χ S , P ε . From the definition it is easy to see that c S ∈ [−1, 1].
Discretization of polynomial P ε . Note that P ε (z) has real coefficients, we need to turn it into another polynomial with integer coefficients first. Let M := d ≤D . Consider the following polynomial P ε : We can see that | P ε (z)/(2M/ε) − P ε (z)| ≤ ε for every z ∈ {0, 1} d , and we letĉ S := c S · M · 2/ε for convenience.
Simplification of the polynomial P ε . P ε (z) is expressed over the basis consisting of the χ S . We need to turn it into a polynomial over the standard basis.
For each S ⊆ [d], consider χ S , it can also be written as where z T := ∏ i∈T z i . Plugging it into the expression of P ε , we have Properties of the polynomial P ε . Let us summarize some properties of P ε for now. First we need a bound on |c T |. We can see |ĉ S | ≤ M · 2/ε, and by a simple calculation we have |c T | ≤ M 2 · 2 D · 2/ε.
The reduction. Now, let us construct the reduction, we begin with some notation. For two vectors a, b, we use a • b to denote their concatenation. For a vector a and a real τ, we use a · τ to denote the vector resulting from multiplying each coordinate of a by τ. Let sgn(τ) be the sign function that outputs 1 when τ > 0, −1 when τ < 0, and 0 when τ = 0. For τ ∈ {−B, −B + 1, . . . , B}, we use e τ ∈ {−1, 0, 1} B to denote the vector whose first |τ| elements are sgn(τ) and the rest are zeros. We also use 1 to denote the all-1 vector with length B.
THEORY OF COMPUTING, Volume 16 (4), 2020, pp. 1-50 Finally, given an OV n,d instance with two sets A and B, we construct two sets A and B, such that A consists of all the vectors φ x (x) for x ∈ A, and B consists of all the vectors φ y (y) for y ∈ B.
Then we can see A and B consist of n vectors from {−1, 1} d 1 , where It is not hard to see the above reduction takes n · poly(d 1 ) time. Moreover, if there is an (x, y) ∈ A × B such that x · y = 0, then OPT( A, B) ≥ (8M/ε) · (1 − 2ε), otherwise, OPT( A, B) ≤ (8M/ε) · 2ε. Setting ε above to be 1/3 times the ε in the statement finishes the proof. With Theorem 7.1, we are ready to prove our hardness results on {−1, 1}-Max-IP.

Future work
We end our paper by discussing a few interesting research directions.
1. The most important future direction from this work is to further improve the dimensionality reduction for OV. It is certainly weird to consider 2 O(log * n) to be the right answer for the limit of the dimensionality reduction. This term seems to follow from the limitation of our recursive number-theoretical construction, and not from the nature of the problem itself. We conjecture that there should be an ω(1) dimensional reduction with a more direct construction.