Optimal Unateness Testers for Real-Valued Functions: Adaptivity Helps

We study the problem of testing unateness of functions $f:\{0,1\}^d \to \mathbb{R}.$ We give a $O(\frac{d}{\epsilon} \cdot \log\frac{d}{\epsilon})$-query nonadaptive tester and a $O(\frac{d}{\epsilon})$-query adaptive tester and show that both testers are optimal for a fixed distance parameter $\epsilon$. Previously known unateness testers worked only for Boolean functions, and their query complexity had worse dependence on the dimension both for the adaptive and the nonadaptive case. Moreover, no lower bounds for testing unateness were known. We also generalize our results to obtain optimal unateness testers for functions $f:[n]^d \to \mathbb{R}$. Our results establish that adaptivity helps with testing unateness of real-valued functions on domains of the form $\{0,1\}^d$ and, more generally, $[n]^d$. This stands in contrast to the situation for monotonicity testing where there is no adaptivity gap for functions $f:[n]^d \to \mathbb{R}$.


Introduction
We study the problem of testing whether a given real-valued function f on domain [n] d , where n, d ∈ N, is unate. A function f : [n] d → R is unate if for every coordinate i ∈ [d], the function is either nonincreasing in the i th coordinate or nondecreasing in the i th coordinate. Unate functions naturally generalize monotone functions, which are nondecreasing in all coordinates, and b-monotone functions, which have a particular direction in each coordinate (either nonincreasing or nondecreasing), specified by a bit-vector b ∈ {0, 1} d . More precisely, a function is b-monotone if it is nondecreasing in coordinates i with b i = 0 and nonincreasing in the other coordinates. Observe that a function f is unate iff there exists some b ∈ {0, 1} d for which f is b-monotone.
A tester [59,39] for a property P of a function f is an algorithm that gets a distance parameter ε ∈ (0, 1) and query access to f . The (relative) distance from a function f to a property P is the smallest fraction of values of f that must be modified to make f satisfy P. A function f is ε-far from P if the distance from f to P is at least ε. A tester for P has to accept with probability at least 2/3 if f has property P and reject with probability at least 2/3 if f is ε-far from P. A tester has one-sided error if it always accepts a function satisfying P; it has two-sided error otherwise. A nonadaptive tester makes all its queries at once, whereas an adaptive tester can make queries after seeing answers to previous queries.
In this paper, we improve upon both these works, and our results hold for a more general class of functions. Specifically, we show that unateness of real-valued functions on hypercubes can be tested nonadaptively with O((d/ε) log(d/ε)) queries and adaptively with O(d/ε) queries. More generally, we describe an O((d/ε) · (log(d/ε) + log n))-query nonadaptive tester and an O((d log n)/ε)-query adaptive tester for unateness of real-valued functions over hypergrids.
In contrast to the state of knowledge for unateness testing, the complexity of testing monotonicity of real-valued functions over the hypercube and the hypergrid has been resolved. For constant distance parameter ε, it is known to be Θ(d log n). Moreover, this bound holds for all bounded-derivative properties [21], a large class that includes b-monotonicity and some properties quite different from monotonicity, such as the Lipschitz property. Amazingly, the upper bound for all these properties is achieved by the same simple and, in particular, nonadaptive tester. Even though proving lower bounds for adaptive testers has been challenging in general, a line of work, starting from Fischer [35] and including [16,23,21], has established that adaptivity does not help for this large class of properties. Since unateness is so closely related, it is natural to ask whether the same is true for testing unateness.
We answer this in the negative: we prove that any nonadaptive unateness tester of real-valued functions over the hypercube (for some constant distance parameter) must make Ω(d log d) queries. More generally, it needs Ω(d(log d + log n)) queries for the hypergrid domain. These lower bounds complement our algorithms, completing the picture for testing unateness of real-valued functions. From a property testing standpoint, our results establish that unateness is different from monotonicity and, more generally, any derivative-bounded property.

Formal statements and technical overview
Our testers are summarized in the following theorem, stated for functions over the hypergrid domains. (Recall that the hypercube is a special case of the hypergrid with n = 2.) Theorem 1.1. Consider functions f : [n] d → R and a distance parameter ε ∈ (0, 1/2). 2. There is an adaptive unateness tester that makes O((d log n)/ε) queries.
Both testers have one-sided error.
Our main technical contribution is the proof that the extra Ω(log d) is needed for nonadaptive testers. This result demonstrates a gap between adaptive and nonadaptive unateness testing. Theorem 1.2. Any nonadaptive unateness tester (even with two-sided error) for real-valued functions f : {0, 1} d → R with a distance parameter ε = 1/8 must make Ω(d log d) queries.
The lower bound for adaptive testers is an easy adaptation of the monotonicity lower bound in [23]. We state this theorem for completeness and prove it in Section 4.1. Theorems 1.2 and 1.3 directly imply that our nonadaptive tester is optimal for constant ε, even for the hypergrid domain. The following theorem is proved in Section 4.2.
Theorem 1.4. Any nonadaptive unateness tester (even with two-sided error) for real-valued functions f : [n] d → R must make Ω(d(log n + log d)) queries. Our nonadaptive unateness tester on the hypercube uses the work investment strategy from [11] (also refer to Section 8.2.4 of Goldreich's book [37]) to "guess" a good dimension where to look for violations of unateness (specifically, both increasing and decreasing edges). For all i ∈ [d], let α i be the fraction of the i-edges that are decreasing, β i be the fraction of the i-edges that are increasing, and µ i = min(α i , β i ). The dimension reduction theorem from [21] implies that if the input function is ε-far from unate, then the average of µ i over all dimensions is at least ε/(4d). If the tester knew which dimension had µ i = Ω(ε/d), it could detect a violation with high probability by querying the endpoints of O(1/µ i ) = O(d/ε) uniformly random i-edges. Performing this test for every dimension would result in query complexity Θ(d 2 /ε). The work investment strategy allows us to achieve query complexity O((d/ε) log(d/ε)) by repeatedly choosing a uniformly random dimension and investing a specific number of queries in trying to find a violation of unateness in that dimension. It proceeds in Θ(log(d/ε)) stages, doubling, in every stage, the number of queries invested in each dimension. Intuitively, when all violations are in one dimension, a tester has to try Θ(d) dimensions before it finds the bad one, but needs only Θ(1/ε) queries in the bad dimension. In contrast, when all dimensions have µ i = ε/(4d), only a constant number of attempts are needed to find a dimension with violations, but Θ(d/ε) queries in one dimension are required to detect a violation. (In this case, sampling Θ((d/ε) log(d/ε)) uniformly random edges would not be enough to detect a violation.) The work investment strategy allows us to interpolate between these scenarios.

Overview of techniques
With adaptivity, this search through Θ(log(d/ε)) different scenarios is not required. A pair of queries in each dimension detects dimensions with many non-constant edges, and the algorithm focuses on finding violations in those dimensions. This leads to the query complexity of O(d/ε), removing the log(d/ε) factor.
It is relatively easy to extend (both adaptive and nonadaptive) testers from hypercubes to hypergrids by incurring an extra factor of log n in the query complexity. The role of i-edges is now played by i-lines. An i-line is a set of n domain points that differ only in coordinate i. The domain [n] is called a line. Monotonicity on the line (a.k.a. sortedness) was one of the first properties studied in the context of property testing [34]. What we need is a nonadaptive tester for sortedness that has 1-sided error and is, in addition, proximity oblivious: that is, it rejects a function f : [n] → R with probability proportional to the distance from f to monotonicity. Proximity-oblivious testers (POTs) were defined by Goldreich and Ron [40]. There are several POTs for sortedness that make O(log n) queries: the tree tester [34], the spanners-based tester [12,54], and the power of 2 tester [22]. Such testers can be easily modified to work for unateness on the line and to output not just an accept/reject decision, but whether they found a pair of THEORY OF COMPUTING, Volume 16 (3), 2020, pp. 1-36 points on which the function is strictly increasing and, similarly, whether they found a pair of points on which the function is strictly decreasing (see Section 2.3 for details). To generalize our unateness testers from the hypercube to the hypergrid domains, instead of sampling a random i-edge, we sample a random i-line and run a modified POT for unateness on the restriction f | of function f to the line . This "direct generalization" is optimal for adaptive testers, but, interestingly, not for nonadaptive testers.
Intuitively, for nonadaptive testers, the "direct generalization" works with only O((d log n)/ε) queries when there are enough lines that are, cumulatively, sufficiently far from unate. However, it could happen that a function on the hypergrid is far from unate, but does not have any lines that are far from unate: the distance from unateness arises because some lines are far from monotone and other lines in the same dimension are far from antitone (that is, far from nonincreasing). We prove that each function f on the line that is ε-far from monotone, but is not ε/2-far from unate, is strictly decreasing on an ε/4 fraction of pairs in [n]. Symmetrically, if a function is far from antitone but close to unate, it is strictly increasing on a large fraction of pairs. The dimension reduction allows us to use this statement to show that if the "direct generalization" does not work well, then, intuitively, the average dimension has many pairs on which f is increasing and many pairs on which f is decreasing. We again use the work investment strategy to get a tester for this case that has the same complexity as our nonadaptive tester for the hypercube. The resulting nonadaptive complexity (for constant ε) is O(d(log d + log n)), which we show is optimal.
The nonadaptive lower bound. Our most significant finding is the log d gap in the query complexity between adaptive and nonadaptive testing of unateness. By techniques from previous work [35,23], it suffices to prove lower bounds for comparison-based testers, i. e., testers that can only perform comparisons of the function values at queried points, but cannot use the values themselves. Our main technical contribution is the Ω(d log d) lower bound for nonadaptive comparison-based testers of unateness on hypercube domains.
Note that nonadaptivity must be critical in our lower bound construction, since we obtained an O(d)-query adaptive tester for unateness. Another challenge in proving the lower bound is the existence of a single, universal nonadaptive O(d)-tester for all b-monotonicity properties, proven in [21]. In other words, there is a single distribution on O(d) queries that defines a nonadaptive property tester for b-monotonicity, regardless of b. Since unateness is the union of all b-monotonicity properties, our construction of hard inputs must be able to fool such algorithms. In particular, if b is fixed in advance, the algorithm from [21] will work, so it should be hard to learn b for functions in the lower bound construction. Once a tester finds a non-constant edge in each dimension, the problem reduces to testing b-monotonicity for a vector b determined by the directions (increasing or decreasing) of the non-constant edges. That is, intuitively, most edges in our construction must be constant. This is one of the main technical challenges. The previous lower bound constructions for monotonicity testing [16,23] crucially used the fact that all edges in the hard functions were non-constant.
We briefly describe how we overcome the problems mentioned above. By Yao's minimax principle, it suffices to construct two distributions, D + and D − , on unate and on far-from-unate functions, respectively, that a deterministic nonadaptive tester cannot distinguish. First, for some parameter m, we partition the hypercube into m subcubes based on the first log 2 m most significant coordinates. Both distributions, D + and D − , sample a uniform k from [K], where K = Θ(log d), and a set R ⊆ [d] of cardinality 2 k . Furthermore, each subcube j ∈ [m] selects an "action dimension" r j ∈ R uniformly at random. For both distributions, in any particular subcube j, the function value is completely determined by the coordinates not in R, and the random coordinate r j ∈ R. Note that all the i-edges for i ∈ (R \ {r j }) are constant. Within the subcube, the function is a linear function with exponentially increasing coefficients. In the distribution D + , any two subcubes j, j with the same action dimension orient the edges in that dimension the same way (both increasing or both decreasing), whereas in the distribution D − each subcube decides on the orientation independently. The former correlation maintains unateness while the latter independence creates distance to unateness. We prove that to distinguish the distributions, any comparison-based nonadaptive tester must find two distinct subcubes with the same action dimension r j and, furthermore, make a specific query (in both) that reveals the coefficient of r j . We show that, with o(d log d) queries, the probability of this event is negligible.

Upper bounds
In this section, we prove parts 1-2 of Theorem 1.1, starting from the hypercube domain.
Recall the definition of i-edges and i-lines from Section 1.1.1 and what it means for an edge to be increasing, decreasing, and constant.
The starting point for our algorithms is the dimension reduction theorem from [21]. It bounds the distance of f : [n] d → R to monotonicity in terms of the average distances of restrictions of f to one-dimensional functions.
For the special case of the hypercube domains, i-lines become i-edges, and the average distance µ i to b i -monotonicity is the fraction of i-edges on which the function is not b i -monotone.

The nonadaptive tester over the hypercube
We now describe Algorithm 1, the nonadaptive tester for unateness over the hypercube domains.   It is evident that Algorithm 1 is a nonadaptive, one-sided error tester. Furthermore, its query complexity is O ((d/ε) log(d/ε)). It suffices to prove the following.

Proof.
Recall that α i is the fraction of i-edges that are decreasing, β i is the fraction of i-edges that are increasing and µ i = min(α i , β i ).
Define the d-dimensional bit vector b as follows: Then the average distance of f to b i -monotonicity over a random i-edge is precisely µ i . Since f is ε-far from being unate, f is also ε-far from being b-monotone. By Theorem 2.1, . We now apply the work investment strategy due to Berman et al. [11] to get an upper bound on the probability that Algorithm 1 fails to reject.
and δ ∈ (0, 1) be the desired error probability. Let s r = 4 ln 1/δ µ·2 r . Then Consider running Algorithm 1 on a function f that is ε-far from unate. Let X = µ i where i is sampled uniformly at random from [d]. Then E[X] ≥ ε/(4d). Applying the work investment strategy (Theorem 2.3) on X with µ = ε/(4d), we get that the probability that, in some iteration, Step 3 samples a dimension i such that µ i ≥ 2 −r is at least 1 − δ . We set δ = 1/4. Conditioned on sampling such a dimension, the probability that Step 4 fails to obtain an increasing edge and a decreasing edge among its 3 · 2 r samples is at most 2 (1 − 2 −r ) 3·2 r ≤ 2e −3 < 1/9, as the fraction of both increasing and decreasing edges in the dimension is at least 2 −r . Hence, the probability that Algorithm 1 rejects f is at least (3/4) · (8/9) = 2/3, which completes the proof of Lemma 2.2.

The adaptive tester over the hypercube
We now describe Algorithm 2, an adaptive tester for unateness over the hypercube domains with good expected query complexity. The final tester is obtained by repeating this tester and accepting if the number of queries exceeds a specified bound. Proof. Consider one iteration of the repeat-loop in Step 1. We prove that the expected number of queries in this iteration is 4d. The total number of queries in Step 3 is 2d, as 2 points per dimension are queried. Let E i be the event that edge e i is non-constant and T i be the random variable for the number of i-edges sampled in Step 5. Then Therefore, the expected number of all edges sampled in Step 5 is Pr Sample an i-edge e i uniformly at random. 4 if e i is non-constant (i. e., increasing or decreasing) then 5 Sample i-edges uniformly at random until we obtain a non-constant edge e i . 6 reject if one of the edges e i and e i is increasing and the other is decreasing.

accept
Hence, the expected number of queries in Step 5 is 2d, and the total expected number of queries in one iteration is 4d. Since there are 8/ε iterations in Step 1, the expected number of queries in Algorithm 2 is O(d/ε).

Proof. First, we bound the probability that a violation of unateness is detected in some dimension
in one iteration of the repeat-loop in Step 1. Consider the probability of finding a decreasing i-edge in Step 3 and of finding an increasing i-edge in Step 5. The former is exactly α i , and the latter is Similarly, the probability of finding an increasing i-edge in Step 3 and of finding a decreasing i-edge in Step 5 is β i and α i /(α i + β i ), respectively. Therefore, the probability we detect a violation from dimension i is The probability that we fail to detect a violation in any of the d dimensions in one iteration is at most where the last inequality follows from Theorem 2.1 (Dimension Reduction). Thus, the probability that Algorithm 2 fails to reject in all iterations of Step 1 is at most (exp(−ε/4)) (8/ε) = e −2 < 1/6.
Proof of Theorem 1.1, Part 2 (for the special case of the hypercube domain). We run Algorithm 2, aborting and accepting if the number of queries exceeds the expectation by a factor of 6. By Markov's inequality, the probability of aborting is at most 1/6. By Claim 2.5, if f is ε-far from unate, Algorithm 2 accepts with probability at most 1/6. The theorem follows by a union bound.

Extension to hypergrids
We start by establishing terminology for lines and pairs. Consider a function f : Recall the definition of i-lines from Section 1.1.1. A pair of points that differ only in coordinate i is called an i-pair.
Recall that a proximity-oblivious tester (POT) for a property P rejects every input with probability proportional to the distance from the input to P. As discussed in Section 1.1.1, there are several 1-sided error, nonadaptive POTs for monotonicity on the line that we can use to extend Algorithms 1 and 2 to work on hypergrids. One of them is the tree tester, designed by Ergun et al. [34], that simply picks a point x ∈ [n] uniformly at random, queries all points visited in a binary search for x, and rejects iff x forms a decreasing pair with one of the queried points. By symmetry, when this algorithm is modified to reject iff it finds an increasing pair, it tests antitonicity. The tree tester (and any other 1-sided error POT for sortedness) can be easily modified to return whether it found any increasing/decreasing edges by including ↑ / ↓ in its output.
• If h is monotone, then ↓ is not in dir. The probability that dir contains ↓ is at least the distance from h to monotonicity.
• Similarly, if h is antitone, then ↑ is not in dir. The probability that dir contains ↑ is at least the distance from h to antitonicity.
We now describe Algorithm 3, an adaptive tester for unateness over the hypergrid domains with good expected query complexity. As in the case of the hypercube domains, the final tester is obtained by repeating this tester and accepting if the number of queries exceeds a specified bound. Sample an i-line i uniformly at random. 4 Let dir i be the output of A dir (from Lemma 2.6) on f | i .
Sample an i-line i uniformly at random and let dir i be the output of 1, if f is ε-far from unate, and thus ε-far from b-monotone, then ∑ d i=1 µ i ≥ ε/4. By Lemma 2.6, the probability that the output of A dir on a uniformly random i-line f | i contains ↓ is at least α i , and the probability that it contains ↑ is at least β i . The rest of the analysis of Algorithm 3 is similar to that in the hypercube case.
In the proof of Claim 2.4, the expected number of edges sampled in one iteration of Algorithm 2 is 2d. Similarly, the expected number of lines sampled in one iteration of Algorithm 3 is 2d. Since A dir makes O(log n) queries, the overall expected number of queries is O((d log n)/ε).
The proof of Claim 2.5 carries over almost word for word. Fix a dimension i. The probability that ↓ ∈ dir i in Step 4 is at least α i . The probability that ↑ ∈ dir i in Step 7 is at least β i /(α i + β i ). The rest of the calculation is identical to that of the proof of Claim 2.5.
Finally, we run Algorithm 3 and abort and accept if the number of queries exceeds the expectation by a factor of 6. As in the proof of Theorem 1.1, Part 2, the resulting algorithm always accepts unate functions and accepts functions that are ε-far from unate with probability at most 1/3, completing the proof.
Next we show how to modify any POT for sortedness in a black-box manner to obtain a POT for unateness on the line with the same guarantees. Proof. First, consider the case when h(1) = h(n). Since h is ε-far from unate, it is also ε-far from being a constant function equal to h(1) on all points in [n]. Therefore, with probability at least ε, point x chosen in Step 2 of the algorithm satisfies h(x) = h(1). If this holds, then one of the pairs {1, x} and {x, n} is increasing and the other is decreasing. So, Algorithm 4 indeed rejects with probability at least ε. Next, consider the case when h(1) < h(n). Then {1, n} is an increasing pair. Since h is ε-far from monotone, by Lemma 2.6, the output of A dir contains ↓ with probability at least ε. So, again Algorithm 4 rejects with probability at least ε.
Finally, the case when h(1) > h(n) is symmetric to the case when h(1) < h(n).
Our nonadaptive hypergrid tester is stated in Algorithm 5. A crucial part of its analysis is Lemma 2.8 that demonstrates that every function on the line that is far from monotone is also far from unate or a large fraction of pairs are decreasing with respect to that function. Sample an i-line uniformly at random. 4 Reject if Algorithm 4 rejects on input f | .
Sample a dimension i ∈ [d] uniformly at random. 8 Sample 3 · 2 r i-pairs uniformly and independently at random. 9 If we find an increasing and a decreasing pair among the sampled pairs, reject. 10 accept

The function h is
Proof. Suppose Item 1 does not hold, that is, there exists a set G ⊆ [n] of size greater than n(1 − ε/2) on which h |G is unate and therefore antitone. (Unate means monotone or antitone, but h |G cannot be monotone because h is ε-far from monotone.) Observe that if u, v ∈ G and h(u) = h(v) then {u, v} is a decreasing pair because h |G is antitone. For each point u ∈ G, let D u ⊆ G be the set of points on which h differs from h(u). Since h is ε-far from monotone and, consequently, ε-far from constant, |D u | ≥ ε/2 for all u ∈ G. Thus, The last inequality follows because ε ≤ 1. We proved that if Item 1 does not hold then Item 2 must hold.
Proof. Suppose f : [n] d → R is ε-far from unate. We will show that when Algorithm 5 runs on f , one of the two loops (in Steps 1 and 5) rejects with probability at least 2/3. For every line , we define the following quantities.
• α : the distance of f | to monotonicity.
• β : the distance of f | to antitonicity.
• α : the probability that a uniformly random pair in is decreasing.
By Lemma 2.8 and symmetry, for every line , Let L i be the set of i-lines. By Theorem 2.1, is the average distance to unateness in dimension i. (In general, u i is different from the quantity µ i used in the analysis of the adaptive tester. Recall that µ i was used to denote the minimum of the average distance to monotonicity and the average distance to antitonicity in dimension i. We are using different notation to avoid confusion.) Let α i to be the average of α for all lines ∈ L i . Define β i analogously, and let Then The last two inequalities follow from Equations (2.1) and (2.2), respectively. We conclude that at least one of u and E[X] has to be large, specifically, u ≥ ε/24 or E[X] ≥ ε/(24d). Case 1: u ≥ ε/24. Consider the first loop of Algorithm 5 (in Step 1). By Lemma 2.7, the probability that a uniformly random i-line is rejected by Algorithm 4 is at least u i . The probability that one iteration of the loop in Step 1 fails to reject is Thus, all iterations of the loop fail to reject with probability at most (exp(−ε/24)) (24 ln 3)/ε = 1/3.
Applying the work investment strategy (Theorem 2.3) on X with µ = ε/(24d) and using a calculation analogous to that in the proof of Lemma 2.2, the probability that Step 9 rejects in some iteration is at least 2/3.

The lower bound for nonadaptive testers over the hypercube
In this section, we prove Theorem 1.2, which gives a lower bound for nonadaptive unateness testers for functions over the hypercube.
Fischer [35] showed that in order to prove lower bounds for a general class of properties on the line domain, it is sufficient to consider a special class of testers called comparison-based testers. The properties he looked at are called order-based properties (see Definition 3.2), and they include monotonicity and unateness. A tester is comparison-based if it bases its decisions only on the order of the function values at the points it queried, and not on the values themselves. Chakrabarty and Seshadhri [23] extended Fischer's proof to monotonicity on any partially-ordered domain for the case when all function values are distinct. As we show in Section 3.1 below, Chakrabarty and Seshadhri's proof goes through for all order-based properties on partially-ordered domains. Moreover, the assumption of distinctness for function values can be removed. We include this proof for completeness, filling in the details needed to generalize the original proof.
Our main technical contribution is the construction of a distribution of functions f : {0, 1} d → N on which every nonadaptive comparison-based tester must query Ω(d log d) points to determine whether the sampled function is unate or far from unate. We describe this construction in Section 3.2 and show its correctness in Sections 3.3-3.4.

Reduction to comparison-based testers
In this section, we prove that if there exists a tester for an order-based property P of functions over a partially-ordered domain, then there exists a comparison-based tester for P with the same query complexity. This is stated in Theorem 3.3. Before stating the theorem, we introduce some definitions.
Definition 3.1. A (t, ε, δ )-tester for a property is a 2-sided error tester with distance parameter ε that makes at most t queries and errs with probability at most δ . Definition 3.2 (Order-based property). For an arbitrary partial order D and an arbitrary total order R, a property P of functions f : D → R is order-based if, for all strictly increasing maps φ : R → R and all functions f , we have dist( f , P) = dist(φ • f , P).
In particular, unateness is an order-based property. The following theorem is an extension of Theorem 5 in [35] and Theorem 2.1 in [23]. Specifically, Theorem 2.1 in [23] was proved with the assumption that the function values are distinct. We generalize the theorem by removing this assumption. Theorem 3.3 (Generalization of [35,23]). Let P be an order-based property of functions f : D → N over a finite domain D. Suppose there exists a (t, ε, δ )-tester for P. Then there exists a comparison-based (t, ε, 2δ )-tester for P.
The rest of this section is devoted to proving Theorem 3.3. Our proof closely follows the proof of Theorem 2.1 in [23]. The proof has two parts. The first part describes a reduction from a tester to a discretized tester, and the second part describes a reduction from a discretized tester to a comparison-based tester.
THEORY OF COMPUTING, Volume 16 (3), 2020, pp. 1-36 Let P be a property of functions f : D → R for an arbitrary partial order D and an arbitrary total order R ⊆ N. Let T be a (t, ε, δ )-tester for P. First, we define a family of probability functions that completely characterizes T. Fix some s ∈ [t]. Consider the point in time in an execution of the tester T on some input function f , where exactly s queries have been made. Suppose these queries are x 1 , x 2 , . . . , x s ∈ D and the corresponding answers are a 1 = f (x 1 ), a 2 = f (x 2 ), . . . , a s = f (x s ). Let the query vector X be (x 1 , . . . , x s ) and the answer vector A be (a 1 , . . . , a s ). The next action of the algorithm is either choosing the (s + 1) st query from D or outputting accept or reject. For each action y ∈ D ∪ {accept, reject}, let p y X (A) denote the probability that T chooses action y after making queries X and receiving answers A. Since p y X (A) is a probability distribution, Furthermore, the tester cannot make more than t queries, and so the action (t + 1) must be either accept or reject. Formally, ∀X ∈ D t , ∀A ∈ R t , ∑ y∈{accept,reject} p y X (A) = 1.
If a tester decides to accept or reject before making t queries, i. e., p y X (A) = 1 for some X = (x 1 , . . . , x s ), A = (a 1 , . . . , a s ), where s < t and y ∈ {accept, reject}, then we fill in the values for p, so that the action of the tester is defined to be the same y for all values until t + 1. Specifically, we set p y X (A ) = 1 for all X ∈ D s , A ∈ R s where s < s ≤ t and the first s queries (in X ) and their corresponding answers (in A ) are x 1 , . . . , x s and a 1 , . . . , a s , respectively. Chakrabarty and Seshadhri [23] proved that if there exists a (t, ε, δ )-monotonicity tester T for functions f : D → N, then there exists a discretized (t, ε, 2δ )-monotonicity tester T for the same class of functions. Both the statement and the proof in [23] hold not only for testers of monotonicity, but for testers of all properties of functions f : D → R. This completes the first part of the proof. Next, we show how to transform a discretized tester into a comparison-based tester. Intuitively, a tester is comparison-based if each action of the tester depends only on the ordering of the answers to the previous queries, not on the values themselves. We define a family of probability functions q in order to characterize comparison-based testers. The q-functions are defined in terms of p-functions, but, in their definition, we decouple the set of values that were received as answers from their positions in the answer vector. Specifically, the set of values that were received as answers (i. e., information irrelevant for a comparison-based tester) is given as the argument to the q-functions. All the remaining information is given as subscripts and superscripts. Let V represent the set {a 1 , . . . , a s } of answer values (without duplicates). Let r be the number of (distinct) values in V . Note that r ≤ s. Suppose V is where v 1 , . . . , v r ∈ R and v 1 < v 2 < . . . < v r . Let ρ be the map from positions of values in the answer vector to their corresponding indices in V , that is, ρ : [s] → [r]. Observe that ρ is surjective. The q-functions are defined as follows: (2) , . . . , v ρ(s) )).
For any set S, let S (r) denote the set of all subsets of S of size r. , the function q y X,ρ is constant on R (r) . That is, for all V,V ∈ R (r) , we have q y X,ρ (V ) = q y X,ρ (V ).
To complete the proof of Theorem 3.3, we show that if there exists a discretized tester T for an order-based property P over the functions f : D → N, then there exists an infinite set R ⊆ N such that, for functions f : D → R, the tester T is comparison-based. Specifically, the q-functions that describe the tester do not depend on V , the specific set of answer values, as long as V ⊂ R. At the end of the proof, we construct a new comparison-based tester that modifies the input function so that its range is R (without changing the distance of the function to the property P) and runs T on the modified function. The existence of the infinite set R is proved using Ramsey theory arguments.
We introduce some Ramsey theory terminology. Consider a positive integer C, where [C] represents a set of colors. For any positive integer i, a finite coloring of N (i) is a function col i : . That is, this function assigns one of C colors to each subset of N with i elements. An infinite set R ⊆ N is monochromatic with respect to col i if for all i-element subsets V,V ∈ R (i) , the color col i (V ) = col i (V ). In other words, each i-element subset of R is colored with the same color by the coloring function col i . A k-wise finite coloring of N is a collection of k-colorings col 1 , col 2 , . . . , col k . Note that each coloring function col 1 , . . . , col k is defined over subsets of a different size, and together they assign a color to each subset with at most k elements. An infinite subset R ⊆ N is k-wise monochromatic with respect to col 1 , . . . , col k if R is monochromatic with respect to all col i for i ∈ [k]. That is, all subsets of R of the same size get assigned the same color by the coloring functions.
We use the following variant of Ramsey's theorem which was also used in [35,23]. Proof of Theorem 3.3. Suppose there exists a (t, ε, δ )-tester for property P of functions f : D → N. By Lemma 3.5, there exists a (t, ε, 2δ )-discretized tester T for P. Consider the family of q-functions that characterizes T.
The main idea of the proof is to view the behavior of the tester T on each possible subset of answer values V (with at most t elements) as the color of V . More precisely, the color of V is the corresponding vector of all q-functions evaluated at V . If two sets V and V are mapped to the same color, it means, intuitively, that the tester T is ignoring whether the specific answer values are V or V . We want to show that there is a large subset R of values, such that the tester T ignores the answer values, as long as they come from R. Since T is discretized, the set of colors is finite, and we are able to apply Theorem 3.7 to get an infinite subset R of N which is t-wise monochromatic. That means that T ignores V , as long as V ⊂ R. That is, T is already comparison-based on R. At the end, we use the fact that P is an order-based property to get a comparison-based tester for the whole range.
We define a t-wise finite coloring of N. For each r ∈ [t] and V ∈ N (r) , the color col r (V ) is defined as a vector of probability values q y X,ρ (V ). The vector is indexed by (y, X, ρ) for each y ∈ D ∪ {accept, reject}, X ∈ D s for an integer s satisfying r ≤ s ≤ t, and a surjection ρ : [s] → [r]. The value at the index (y, X, ρ) in col r (V ) is equal to q y X,ρ (V ). Note that, there are finitely many possible values for y and X, and surjections ρ. So, the dimension of the vector col r (V ) is finite. Furthermore, since the tester is discretized, the number of different values that the q-functions take is also finite. Hence, the range of col r is finite. Now, we have a t-wise finite coloring col 1 , . . . , col t of N. By Theorem 3.7, there exists an infinite t-wise monochromatic set R ⊆ N. Thus, for each r ∈ [t] and V,V ∈ R (r) , we have col r (V ) = col r (V ), implying that q y X,ρ (V ) = q y X,ρ (V ) for all y, X, and ρ. Thus, T is comparison-based for functions f : D → R.
Finally, we construct a comparison-based tester T for the whole range N. Consider a strictly monotone increasing map φ : N → R. Given any function f : Hence, T is a (t, ε, 2δ )-tester for P. Moreover, since the tester T just runs T on φ • f : D → R, and T is comparison-based for φ • f , the tester T is also comparison-based.

The hard distributions
Our main lower bound theorem is stated next. Together with Theorem 3.3, it implies Theorem 1.2. The proof of Theorem 3.8 is presented in Sections 3.2-3.4 and forms the core technical content of this work. We will use Yao's minimax principle [61], specifically, the version stated in [57,Claim 5]. It asserts that to prove a lower bound for a randomized tester, it is sufficient to give two distributions, one on positive and one on negative instances, that every deterministic tester fails to distinguish with high probability. Next, we define two input distributions.
It may be useful for the reader to recall the sketch of the main ideas given in Section 1.1.1. Without loss of generality, 2 let d be an even power of 2 and d := d +log 2 d. We will focus on functions h : {0, 1} d → N, and prove the lower bound of Ω(d log d) for this class of functions, as Ω(d log d) = Ω(d log d ).
We partition {0, 1} d into d subcubes based on the log 2 d most significant bits. Specifically, for i ∈ [d], the i th subcube is defined as Let m := d. For ease of notation, we denote the set of subcube indices by [m] and the set of dimensions in a subcube by [d]. We use i, j ∈ [m] to index subcubes, and a, b ∈ [d] to index dimensions. We now define a collection of random variables, where each variable may depend on the previous ones: • The logarithm of the number of action dimensions, k: a number picked uniformly at random from [(1/2) log 2 d].
• The set of all action dimensions, R: a uniformly random subset of [d] of size 2 k .
• The action dimension for each subcube, r i : for each i ∈ [m], r i is picked from R uniformly and independently at random.
• The direction for each dimension (in D + distribution), α b : for each b ∈ [d], α b is picked from {−1, +1} uniformly and independently at random. (Technically, α b will only be used for each b ∈ R. We define it for all b ∈ [d] so that it is independent of R.) • The direction of potential violations for each subcube (in D − distribution), β i : for each i ∈ [m], β i is picked from {−1, +1} uniformly and independently at random.
We use T to refer to the entire collection (k, R, of random variables. We denote the tuple (k, R, {r i | i ∈ [m]}) by S, also referred to as the shared randomness common to the distributions D + and D − . Given T, the distributions D + and D − generate the functions f T and g T , respectively, where and i is the subcube containing x, i. e., i = val(x d x d −1 · · · x d+1 ) + 1.

The sign function and the distance to unateness for hard distributions
We need to analyze when functions in our hard distributions are increasing or decreasing in a specified dimension. More generally, since we are looking at comparison-based testers, we need to understand how h(x) and h(y) compare for any points x, y in the hypercube {0, 1} d for any function h in the support of the hard distributions. To help us with that, we define the sign function sgn h (x, y) and analyze its behavior on our hard distribution.
Recall that val(z) := ∑ p i=1 z i 2 i−1 denotes the integer equivalent of the binary string z = z p z p−1 . . . z 1 . We say that x < val y if val(x) < val(y). Note that < val forms a total ordering on {0, 1} d .  Next, we show that for every function in the support of our hard distributions and two points x < val y in different subcubes, sgn h (x, y) = 1.
For a distribution D, let supp(D) denote the support of D.
Claim 3.10. For all h ∈ supp(D + ) ∪ supp(D − ), x ∈ C i and y ∈ C j such that i, j ∈ [m] and i < j, we have sgn h (x, y) = 1.
Proof. Let b be the most significant coordinate on which x and y differ, i. e., x b = y b . Since x ∈ C i , y ∈ C j and i < j, we know that Now we analyze the sign function on points x, y in the same subcube. Its value is determined by the most significant coordinate on which x and y differ that has a nonzero coefficient in the definitions of hard functions f T and g T . We define this coordinate next. Note that S determines R and {r i }. For any T that extends S, the restrictions of both f T and g T to C i is constant with respect to the coordinates in R \ {r i }. Thus, t i S (x, y) is ⊥ if x and y differ only on coordinates that do not influence the value of the function in C i ; otherwise, it is the first coordinate on which x and y differ that is influential in C i . Claim 3.12. Fix some shared randomness S, index i ∈ [m], and points x, y ∈ C i where x < val y. Let a = t i S (x, y). For any T that extends S, • if a = r i , sgn f T (x, y) = α a and sgn g T (x, y) = β i .
Proof. Recall that THEORY OF COMPUTING, Volume 16 (3), 2020, pp. 1-36 First, consider the case a =⊥. Then By the definition of f T and g T , we get f T (x) = f T (y) and g T (x) = g T (y). Hence, sgn f T (x, y) = sgn g T (x, y) = 0.
Next, consider the case a = r i . Then a / ∈ R by the definition of t i S (x, y). Also, x b = y b for all b > a such that b / ∈ R. Since x < val y, we have x a = 0 and y a = 1.
Finally, consider the case a = r i . Thus, is positive when α a = 1 and negative when α a = −1. This implies that sgn f T (x, y) = α a . By an analogous argument, sgn g T (x, y) = β i . We complete Section 3.3 by using Claims 3.10 and 3.12 to analyze the distance to unateness for functions distributed according to our hard distributions. If b ∈ R, then, by Claim 3.12, We write f ∼ D to denote that f is sampled from distribution D.
Note that for all i ∈ [m], the size |C i | = 2 d . By Claim 3.12, for any r-edge (x, y) that lies within subcube C i where i ∈ A r , the sign sgn g T (x, y) = β i . Hence, in g T , for all i ∈ A + r , all r-edges in subcube C i are increasing, whereas, for all j ∈ A − r , all r-edges in C j are decreasing. In any unate function, all these r-edges are either nonincreasing or nondecreasing. Thus g T differs from a unate function on at least points. Overall, we need to change at least (2 d /8) ∑ r∈R |A r | values. Since the A r 's partition the set of subcube indices,

Bad events
Fix a set of queries Q made by a deterministic, nonadaptive, comparison-based tester. We first identify certain bad values of S, on which Q could potentially distinguish between f T and g T for any T that extends S. In this section, we prove that, for a given Q, the probability of a bad S is small. In Section 3.5, we show that the tester with queries Q cannot distinguish between f T and g T for any T that extends good S.
Recall that when a comparison-based tester queries points x and y, it only sees sgn h (x, y). In order for sgn h (x, y) to have a possibility of being different for h ∼ D + and h ∼ D − , by Claims 3.10 and 3.12, queries x and y have to be in the same subcube C i for some i ∈ [m] and, moreover, t i S (x, y), the most significant coordinate in ([d] \ R) ∪ {r i } on which x and y differ has to be the action dimension r i for the subcube C i . Subcubes are fixed, whereas action dimensions R and r i are chosen randomly. Intuitively, the value of sgn h (x, y) is likely to be determined by the top several coordinates in [d] on which x and y differ. (And specifically, considering 5 top coordinates is sufficient to get high enough probability.) Next, we formalize this intuition in the definition of dimensions captured by (x, y) and, more generally, by a set of query points. 2. the set of 5 most significant coordinates on which x and y differ, otherwise.
We say that the pair (x, y) captures the set of coordinates cap(x, y). For a set of points P ⊆ {0, 1} d , define cap(P) := x,y∈P cap(x, y) to be the set of all coordinates captured by the set P.
For each i ∈ [m], let Q i := Q ∩C i denote the set of the queried points that lie in the subcube C i . We define two bad events for S. Definition 3.16 (Bad events). S is bad if at least one of the following bad events holds: • Abort Event A: There exist x, y ∈ Q such that |cap(x, y)| = 5 and cap(x, y) ⊆ R.
• Collision Event C: There exist distinct i, j ∈ [m] with r i = r j , such that r i ∈ cap(Q i ) and r j ∈ cap(Q j ).
If A does not occur, then for every pair (x, y), the sign sgn h (x, y) is determined by cap(x, y) for any h ∈ supp(D + ) ∪ supp(D − ). And, if C does not occur, conditioned on A not occurring, then the tester cannot distinguish D + and D − . The heart of the analysis lies in proving that the bad events happen rarely. Proof. Recall that k is the log of the number of action dimensions. Fix any choice of k (in S). For each pair of points x, y ∈ Q such that |cap(x, y)| = 5, Since d − 4 ≥ d/2 for all d ≥ 8 and k ≤ (log d)/2, the probability is at most 32d −5/2 . By the union bound, .01 for all sufficiently large d.
To analyze the probability of the collision event C, we prove the following combinatorial lemma that will be used to bound the number of coordinates Q captures. Proof. We construct c different edge-colored graphs G 1 , . . . , G c over the vertex set V with colors from [d]. For every coordinate i ∈ cap c (V ), add one edge of color i to exactly one of the graphs G t . Since i ∈ cap c (V ), there exists at least one pair of vectors x, y such that i ∈ cap c (x, y). Thinking of each cap c (x, y) as an ordered set, find one pair of vectors (x, y) where i appears "earliest" in cap c (x, y). Let the position of i in this cap c (x, y) be denoted by t. Add the edge (x, y) to G t and color it i. Note that each edge (x, y) is added to G t at most once, and hence each graph G t is simple. We claim that each G t is acyclic. Suppose not. Let C be a cycle in G t . Let (x, y) be the edge in C with the smallest color i. Clearly, x i = y i , since i ∈ cap c (x, y). There must exist another edge (u, v) in C such that u i = v i . Furthermore, the color of (u, v) is j > i. Thus, j is the t th entry in cap c (u, v). Note that i ∈ cap c (u, v), since u i = v i and j > i. It follows that i appears earlier in cap c (u, v) than in cap c (x, y). So the edge colored i should not be in G t , a contradiction. Lemma 3.18 implies that a small Q can capture only a few coordinates. Next we bound the probability of the collision event. For each r ∈ [d], define A r := { j ∈ [m] | r ∈ cap(Q j )} to be the set of indices of the subcubes in which coordinate r is captured. Let a r := |A r |. For each ∈ [log d], define n := |{r ∈ [d] | a r ∈ (2 −1 , 2 ]}| to be the number of coordinates that are captured by more than 2 −1 , but at most 2 subcubes. Observe that where the first inequality holds because if coordinate r is included in the count n then 2 −1 < a r , and the last inequality follows from Equation (3.2). Fix the log of the action dimensions, k, to be some κ ∈ [(1/2) log d]. For each r ∈ [d], we say the event C r occurs if r ∈ R and there exist distinct i, j ∈ [m] such that r i = r j = r, and r ∈ cap(Q i ) ∩ cap(Q j ). By the union bound, Pr Now, we compute Pr[C r |k = κ]. Conditioning on k = κ, the size of the set of the action dimensions, |R|, is 2 κ . Only sets Q j with j ∈ A r capture coordinate r. Event C r occurs if at least two of these sets have r i = r j = r. Hence, Pr [There are exactly c subcube indices i ∈ A r with r i = r | r ∈ R] where we used Pr[r ∈ R] = 2 κ /d to get the second equality, and Pr[r i = r | r ∈ R] = 2 −κ for each i ∈ [m] to get the third equality. If a r > 2 κ /4, by Equation (3.4), If a r ≤ 2 κ /4, by Equation (3.5), where we used a r c < a c r , (1 − 2 −κ ) −1 ≤ 2 and a r 2 κ−1 ≤ 1 2 THEORY OF COMPUTING, Volume 16 (3), 2020, pp.  to get the first, second and third inequalities, respectively. Using the union bound over all r ∈ [d] and grouping according to n , we get where the second last inequality follows from Equations (3.6) and (3.7), and the last inequality holds because if coordinate r is included in the count n , then 2 −1 < a r ≤ 2 .
Averaging over all the values of k, we get where the first inequality follows from Equation (3.8) and the last inequality is obtained by switching the order of summations and rearranging the terms. Now, Using these inequalities to bound the sum in Equation (3.9) and then applying Equation (3.3), we get

Indistinguishability of hard distributions
Proof of Theorem 3.8. Fix a deterministic, nonadaptive, comparison-based tester making a set of queries Q whose size |Q| ≤ δ 0 d log d. (Recall that δ 0 = 1/20000.) The tester decides whether the input function h is unate based on the |Q| 2 comparisons between the function values at points in Q. We define a labeled undirected graph G Q h to be the clique on the node set Q, where each edge {x, y} such that x < val y is labeled by sgn h (x, y). The graph G Q h defines the tester's "view" of the function h after it has made its queries Q.
For any distribution D over functions f : {0, 1} d → N, let D-view denote the distribution of the labeled graphs G Q h when h ∼ D. For two distributions D 1 and D 2 , let D 1 ≈ α D 2 denote that the statistical distance between D 1 and D 2 is at most α. By the version of Yao's principle stated in [57,Claim 5], it is sufficient to give two distributions, D 1 and D 2 , on positive and negative instances, respectively, such that the statistical distance between D 1 -view and D 2 -view is less than 1/6 for every deterministic tester that makes at most δ 0 d log d queries. Specifically, we show that for such testers, D 1 -view ≈ 3/19 D 2 -view.
We first show that, conditioned on the bad events A and C not occurring, the view of the tester is the same for both D + and D − . For a distribution D and an event E, let D| E denote the distribution D conditioned on the event E.
Proof. Fix some S = S for Q such that the bad events A and C do not occur. Let E S denote the event that the collection of random variables T used to define the support of D + and D − is extended from S. We show that for any labeled clique G over the vertex set Q, Consider an edge {x, y} ∈ G with x < val y and x, y ∈ {0, 1} d . If x and y lie in different subcubes, then, by Claim 3.10, its label sgn h (x, y) = 1 for all h ∈ F. Similarly, if x and y lie in the same subcube C i for some i ∈ [m], but with t i S (x, y) = r i , then by Claim 3.12, sgn h (x, y) is the same for all h ∈ F. Hence the label of the edge {x, y} can potentially distinguish between D + and D − only if x and y lie in the same subcube C i for some i ∈ [m] and t i S (x, y) = r i . We call such edges interesting. In the rest of this proof, we focus on interesting edges, since the labels of non-interesting edges cannot be used to distinguish between D + and D − .
Fix i ∈ [m]. Recall that Q i = Q ∩C i is the set of queried points that lie in the subcube C i . Let G(Q i ) denote the subgraph of G induced on vertex set Q i . By Claim 3.12, all interesting edges in G(Q i ) have the same label. Denote this label by i .
Let I = {i ∈ [m] | G(Q i ) contains an interesting edge} denote the set of indices of all subcubes with at least one interesting edge. Now, we focus on g ∼ D − | E S . For any edge {x, y} ∈ G with x < val y, let (x, y) denote its label. The probability since each β i is chosen uniformly and independently at random from {+1, −1}.
Similarly, for f ∼ D + | E S , the probability Note that unless there exist distinct i, j ∈ I such that r i = r j , the individual events in the probability expression Equation (3.11) are mutually independent. We show that if the bad events A and C do not occur then all subcubes C i with interesting edges have distinct action dimensions r i .
Claim 3.21. If A ∪ C then r i = r j for all distinct i, j ∈ I.
Proof. Assume for the sake of contradiction that there exist distinct i, j ∈ I with r i = r j . The fact that C holds ensures that if r i ∈ cap(Q i ) and r j ∈ cap(Q j ) for distinct i, j ∈ [m], then r i = r j . Hence, r i / ∈ cap(Q i ) or r j / ∈ cap(Q j ). Suppose without loss of generality r i / ∈ cap(Q i ). Since i ∈ I, there must exist an interesting edge {x, y} ∈ G(Q i ) with x < val y. We know r i / ∈ cap(Q i ), so r i / ∈ cap(x, y). By Definition 3.15, if x and y differ on at most five coordinates, then r i ∈ cap(x, y). Hence, x and y differ on at least six coordinates, and r i is not among the five most significant coordinates. Since the abort event A does not occur, cap(x, y) R. Hence, there exists a coordinate a ∈ cap(x, y) that is not in R and is more significant than r i . Then, t i S (x, y) ≥ a, contradicting the fact that t i S (x, y) = r i . Hence, r i = r j for all distinct i, j ∈ I.
By Equation (3.11) and Claim 3.21, completing the proof of Lemma 3.20.
We now wrap up the proof of Theorem 3.8. Let FAR denote the event that a function h ∼ D − is 1/8-far from unate. Define D − to be D − | FAR . Note that every function h ∼ D − is 1/8-far from unate. By Claim 3.13, every function h ∼ D + is unate. Hence, to complete the proof of this theorem, it suffices to show that D + -view ≈ 3/19 D − -view.
By Claim 3.14, Pr[FAR] ≥ 19/20. We use the following claim from [57] to show that D − and D − are close. 14) Using Equation (3.14), Lemma 3.20, Equation (3.15), and, finally, Equation (3.13), we get Proof of Theorem 1.3. By Yao's minimax principle and the reduction to testing with comparison-based testers from [35,23] (stated for completeness in Theorem 3.3), it is sufficient to give a hard input distribution on which every deterministic comparison-based tester fails with probability at least 2/3. We use the hard distribution constructed by Chakrabarty and Seshadhri [23], who used it to prove the same lower bound for testing monotonicity. Their distribution is a mixture of two distributions, D + and D − , on positive and negative instances, respectively. The positive instances for their problem are functions that are monotone and, therefore, unate; the negative instances are functions that are ε-far from monotone. We show that their distribution D − is supported on functions that are ε-far from unate, i. e., negative instances for our problem. Then the required lower bound for unateness follows from the fact that every deterministic comparison-based tester needs the stated number of queries to distinguish the distributions D + and D − with high enough probability. We start by describing the distributions D + and D − used in [23]. We will define them as distributions on functions over the hypercube domain. Next, we explain how to convert functions over hypercubes to functions over hypergrids.
Without loss of generality, assume n is a power of 2 and let := log 2 n. For all z ∈ [n], let bin(z) denote the binary representation of z − 1 as an -bit vector (z 1 , . . . , z ), where z 1 is the least significant bit.
We now describe the mapping used to convert functions on hypergrids to functions on hypercubes. Let The hypercube is partitioned into 1/2ε sets S k of equal size, and each S k forms a subcube of dimension m = m − log(1/ε) + 1.
We now describe the distributions D + and D − for functions on hypercubes. The distribution D + consists of a single function f (x) = 2val(x). The distribution D − is the uniform distribution over m /2ε functions g j,k , where j ∈ [m ] and k ∈ [1/2ε], defined as follows: To get the distributions D + and D − for the hypergrid, we convert f to f and each function g j,k to g j,k , using the transformation defined before. Chakrabarty and Seshadhri [23] proved that f is monotone and each function g j,k is ε-far from monotone. It remains to show that functions g j,k are also ε-far from unate. Proof. To prove that g j,k is ε-far from unate, it suffices to show that there exists a dimension i, such that there are at least ε2 d increasing i-pairs and at least ε2 d decreasing i-pairs w.r.t. g j,k and that all of these i-pairs are disjoint. Let u, v ∈ [n] d be two points such that φ (u) and φ (v) differ only in the j th bit. Clearly, u and v form an i-pair, where k = k, then the i-pair (u, v) is increasing. Clearly, there are at least ε2 d such i-pairs. All the i-pairs we mentioned are disjoint. Hence, g j,k is ε-far from unate.
This completes the proof of Theorem 1.3.

The lower bound for nonadaptive testers over the hypergrid
The lower bound for nonadaptive testers over hypergrids follows from a combination of the lower bound for nonadaptive testers over hypercube and the lower bound for adaptive testers over hypergrids.
Proof of Theorem 1.4. Fix ε = 1/8. The proof consists of two parts. The lower bound for adaptive testers is also a lower bound for nonadaptive testers, and so, the bound of Ω(d log n) holds. Next, we extend the Ω(d log d) lower bound for hypercubes. Assume n to be even. Define function ψ : Any function f : {0, 1} d → R can be extended tof : [n] d → R using the mappingf (x) = f (Ψ(x)) for all x ∈ [n] d . The proof of Theorem 3.8 goes through for hypergrids as well, and so we have an Ω(d log d) lower bound. Combining the two lower bounds, we get a bound of Ω(d · max{log n, log d}), which is asymptotically equal to Ω(d(log n + log d)).

Conclusion and open questions
In this work, we give the first algorithms for testing unateness of real-valued functions over the hypercube as well as the hypergrid domains. We also show that our algorithms are optimal by proving matching lower bounds, thus resolving the query complexity of testing unateness of real-valued functions. Our results demonstrate that, for real-valued functions, adaptivity helps with testing unateness, which is not the case in monotonicity testing.
Subsequent to the initial publication of this work [4], the problem of testing unateness of Boolean functions received significant attention. Concurrent with our work, Chen et al. [29] proved a lower bound of Ω d 2/3 /log 3 d for adaptive unateness testers of Boolean functions over {0, 1} d . Subsequently, Chen et al. [30] gave an adaptive unateness tester with query complexity O d 3/4 /ε 2 for the same class of functions. An exciting recent development is an O d 2/3 /ε 2 -query algorithm for this problem by Chen and Waingarten [28], which only leaves a polylogarithmic (in d) gap between the upper bound and the lower bound.
Next, we discuss nonadaptive unateness testing of Boolean functions over {0, 1} d . In a subsequent work, Baleshzar et al. [3] proved a lower bound of Ω (d/log d) for one-sided error testers for this problem. Since Boolean functions are a special case of real-valued functions, our nonadaptive algorithm over the hypercube also works for Boolean functions. This algorithm has 1-sided error and its query complexity is O ((d/ε) log(d/ε)) which is currently the best known upper bound for any (even 2-sided error) nonadaptive algorithm. There is still a polylogarithmic (in d) gap between the upper bound and the lower bound. An interesting open question is to determine if testers with two-sided error have better query complexity than testers with one-sided error in the nonadaptive setting.
Finally, we mention some recent work on approximating the distance to unateness of Boolean functions over the hypercube domain. Levi and Waingarten [48] showed that every algorithm approximating the distance to unateness within a constant factor requires Ω(d) queries and strengthened their lower bound to Ω(d 3/2 ) queries for nonadaptive algorithms. Subsequently, Pallavoor et al. [50] proved that every nonadaptive algorithm approximating the distance to unateness within a d 1/2−k factor for k > 0 requires 2 d k queries. No nontrivial upper bounds are currently known for this problem.