On the Hardness of Learning With Errors with Binary Secrets

: We give a simple proof that the decisional Learning With Errors (LWE) problem with binary secrets (and an arbitrary polynomial number of samples) is at least as hard as the standard LWE problem (with unrestricted, uniformly random secrets, and a bounded, quasi-linear number of samples). This proves that the binary-secret LWE distribution is pseudorandom, under standard worst-case complexity assumptions on lattice problems. Our results are similar to those proved by Brakerski, Langlois, Peikert, Regev and Stehlé (STOC 2013), but provide a shorter, more direct proof, and a small improvement in the noise growth of the reduction.


Introduction
The Learning With Errors (LWE) problem [21,22] plays a central role in lattice cryptography, its secure instantiation, and its most advanced applications.The usefulness of LWE in cryptography is due in large part to its pseudorandomness properties, captured by the standard decisional LWE problem defined as follows.An LWE instance is described by a matrix A ∈ Z m×n q (chosen uniformly at random) and a vector DANIELE MICCIANCIO b ∈ Z m q which may be chosen either uniformly at random, or as b = As + e (mod q), where s ∈ Z n q is a random secret and e ∈ Z m is a "small" error vector, typically chosen with independent discrete Gaussian entries of standard deviation σ ≈ √ n.The (Decisional) LWE problem asks to distinguish between these two cases.
Several variants of LWE exist in the literature, depending on how s and e are chosen, all motivated by specific cryptographic applications.In the most standard formulation of LWE, the secret s ∈ Z n q is chosen uniformly at random.But this is often undesirable in many cryptographic applications, e. g., those making use of modulus-switching techniques, where large secrets result in substantial ciphertext quality degradation.Ideally, it would be best to choose s ∈ {0, 1} n as a vector with binary entries, as used for example in many Fully Homomorphic Encryption schemes (e. g., see [9,8]).This binary-secret LWE also plays a fundamental role in theoretical studies, like the proof that LWE is leakage resilient [11], and the proof that LWE with polynomial modulus q is at least as hard as worst-case lattice problems under classical (i.e., non-quantum) reductions [6].
This last work [6] is the best currently known hardness result for binary-secret LWE, and gives a reduction from (standard) LWE with arbitrary secret in Z n q , to LWE with secret in {0, 1} n log q , i. e., the secret can be restricted to binary vectors at the cost of increasing the dimension 1 from n to n log q.This reduction is a major part of the main result of [6] on the classical hardness of LWE, and takes a good half of that paper, going through a careful hybrid argument involving some technical ("first-is-errorless" and "extended-LWE") problem variants.
In this paper we present a direct and substantially shorter proof of this important result.In fact, while the proof of this result given in [6] is over 7 pages long, spanning multiple subsections, and involving a number of intermediate problems, our proof has a more direct structure and it is much shorter.A key insight leading to our simpler proof is the formulation of the binary LWE problem (denoted LWE ± ) using secrets in {±1} n , rather than {0, 1} n .This is easily seen (for odd2 modulus q) to be equivalent to the more common {0, 1} formulation via the affine transformation s → (2s − 1), but has the technical advantage that all secrets have exactly the same Euclidean length, simplifying the application of discrete Gaussian convolution theorems.Given the equivalence between the two problems, we will keep referring to LWE ± informally as the binary LWE problem.Other than presenting a simpler and shorter proof, we do not claim any new results over previous work: our results, and the range of parameters for which we reduce LWE to LWE ± , are essentially the same as in [6,Theorem 4.1], except possibly for reducing some constants, e. g., in our reduction the error grows by a factor 2 √ n + 1, while in [6, Theorem 4.1] it grows by √ 10n.Given the important role played by binary LWE in many cryptographic applications, we hope that our simplified treatment will make the theoretical hardness of this problem more easily accessible, and stimulate further research.

Related work.
The LWE problem with small secret was first formally considered by Applebaum, Cash, Peikert and Sahai in [3], who proved that, without loss of generality, one may assume that the secret follows the same distribution as the LWE errors.This allows the secret coordinates to be as small as √ n, but not as small as {0, 1}.For a list of applications using LWE with small secrets see [1].
Reducing LWE to have a binary secret was first considered by Goldwasser, Kalai, Peikert and Vaikuntanathan in [11], motivated by questions in leakage-resilient cryptography, where the problem is proved hard using "noise-flooding" techniques.A stronger reduction is given by Brakerski, Langlois, Peikert, Regev and Stehlé in [6], in the context of proving classical hardness results for LWE.
A different (and much harder) problem is that of proving that LWE is computationally hard when the error (and not just the secret) follows the binary distribution [17,7].In fact, LWE with small errors can be efficiently solved when sufficiently many samples are available [4,2,13].In this paper, we do not study LWE with binary errors.
Attacks against LWE with binary secret (and Gaussian errors) are considered in [1,5].Theoretically, the secret can be assumed binary by increasing the LWE dimension to n log q [6], but experimental results in [5] suggest that, heuristically, increasing the secret dimension by a log log n factor may already be enough to counter the best known cryptanalytic attacks for common parameter settings.
Paper organization.In Section 2 we introduce the notation used in this paper, provide a formal definition of the LWE problem, and present some background results, including a simple lemma on the projection of discrete Gaussians (Lemma 2.6), and the construction of a gadget matrix needed in our main reduction (Lemma 2.7).The proof that LWE with binary secrets is pseudorandom is given in Section 3. Section 4 concludes with a discussion of open problems.

Preliminaries
We use bold lowercase letters a for vectors, and bold uppercase A for matrices.Probability distributions are denoted using calligraphic letters A. We write vectors as columns v ∈ Z n = Z n×1 .The transpose of a vector or matrix A is denoted A t .We write [A 1 , . . ., A n ] for the horizontal concatenation of matrices A i ∈ Z k×m i , and use transpose notation [A 1 , . . ., A n ] t for the vertical concatenation of A t 1 , . . ., A t n .Let e 1 , . . ., e n be the standard basis of Z n , I = [e 1 , . . ., e n ] the n × n identity matrix, and u = ∑ i e i the all-ones vector.The Euclidean norm of a vector is and the max norm is for the diagonal matrix with the entries of z along the diagonal.So, for example, diag(u) = I.For any integer matrix Q ∈ Z n×m and for any positive integer k ≤ m, we write Q [k] for the matrix consisting of the first k columns of Q, and Q ]k[ for the matrix obtained by removing the first k columns from For any integer matrix Q ∈ Z k×m , we write ker(Q) = {x ∈ Z m : Qx = 0} for the kernel of Q : Z m → Z k as an integer linear map.We say that a matrix Q ∈ Z k×m is primitive if QZ m = Z k , i. e., if Q : Z m → Z k is surjective.As a special case, a row vector w t ∈ Z 1×k is primitive if and only if the greatest common divisor of its entries equals gcd(w) = 1.

Probabilities and asymptotics
We use standard asymptotic notation, O(•), Ω(•) and ω(•), and all asymptotics refer to a (possibly implicit) integer variable n.For example, we may write n O (1) for an arbitrary polynomially bounded function of n, and n −ω (1) for a negligible function.Other parameters defining the size of a problem instance are always assumed to be polynomial in n.So, if A ∈ Z k×m is a matrix with integer entries, the number of rows k = n O (1) , the number of columns m = n O (1) , and the bitsize max i, j log |a i, j | = n O (1) of the matrix entries are all assumed to be (at most) polynomial in n.
A probability ensemble is a sequence A n of probability distributions over sets A n , for n ∈ N = {1, 2, . ..}.We always assume that all elements of A n ⊆ {0, 1} (n) can be represented by strings of some fixed length (n).We write x ← A for the operation of sampling an element x according to distribution A, and Pr{x ← A} for the probability of x under A. The uniform distribution over a set A is denoted U(A).
The statistical distance between two distributions A, A over a set A is Two distribution ensembles A n , A n are statistically close (written 1) is negligible.Two ensembles A n , A n are computationally indistinguishable if for any efficient (probabilistic polynomial-time computable) predicate P, P(A n ) ≈ P(A n ).The gap is called the advantage of P in distinguishing between the two distributions.An ensemble A n over sets ) are statistically close, then we say that A n is almost uniform or statistically pseudorandom.We typically leave the parameter n implicit, and talk about individual distributions A over a single set A, but all asymptotic statements should be interpreted as referring to ensembles A n parameterized by an integer n in some obvious way.For example, we may say that a distribution A over a set A is pseudorandom if no efficient algorithm can distinguish A from U(A) with better than negligible advantage.More precisely, an efficiently sampleable ensemble {A n } n>0 over the sets {A n } n>0 is pseudorandom if any predicate P computable in probabilistic polynomial time n O (1) has at most negligible advantage We write Z for the set of integers, and Z q = Z/(qZ) for the integers modulo q.We will need the following version of the leftover hash lemma, and a bound on the probability that a random vector is primitive modulo q.
Proof.The probability that gcd(w, q) = 1 is at most where the summation is over all prime factors of q.We used the fact that all prime factors are at least p ≥ 2, and there are at most log 2 q of them.Better bounds are possible, but this crude estimate is more than enough for the purposes of this paper.

Gaussian distributions
Let ρ(x) = exp(−πx 2 ) be the Gaussian function with total mass x∈R ρ(x) dx = 1, and ρ σ (x) = ρ(x/σ ) its scaling by a factor σ > 0. For a set A, we write ρ σ (A) as a shorthand for ∑ x∈A ρ σ (A).The discrete Gaussian distribution of parameter σ , denoted3 D σ , picks each integer x ∈ Z with probability proportional to ρ σ (x), i. e., Pr{x A rank-n integer lattice is the set The last successive minimum of a rank-n lattice Λ is the smallest positive real λ n such that Λ contains n linearly independent vectors of length at most λ n .Another standard quantity associated to a lattice is the smoothing parameter η ε (Λ), which is parameterized by a positive real ε > 0. In this paper, all we need to know about the smoothing parameter are the following two bounds.
Lemma 2.4 (Smoothing Parameter Bound, [18,Lemma 3.3]).For any rank-n lattice Λ and positive real ε > 0, the smoothing parameter is at most In particular, for any ω( When the smoothing parameter η(Λ) is written without specifying the value of ε, it is assumed that ε = n −ω (1) is an arbitrary negligible function of the asymptotic variable n.For example, the smoothing parameter of the integer lattice is η(Z) ≤ ln(2(1 We will also need the following convolution theorems for discrete Gaussians.Lemma 2.5 (Convolution, [17,Theorem 3]).For any primitive vector v ∈ Z m and positive reals Lemma 2.6 (Gaussian Projection).For any primitive matrix T ∈ Z k×m , positive reals α, σ > 0, and Proof.Let y ∈ Z k be an arbitrary integer vector, and let x ∈ Z m be such that Tx = y.By linearity, any other z ∈ Z m maps to Tz = y if and only if z ∈ x + ker(T).So, by definition, the probability of and x 0 is orthogonal to the rows of T. It follows that x 1 is orthogonal to x 0 and ker(T).Therefore, ρ σ (x + ker(T)) = ρ σ (x 1 ) • ρ σ (x 0 + ker(T)).Since x 0 belongs to the linear span of ker(T), and σ ≥ η(ker(T)), by Lemma 2.3 the Gaussian mass ρ σ (x 0 + ker(T)) is essentially independent of x 0 , up to a negligible relative error.So, up to this error, the probability of y is proportional to ρ σ (x 1 ).Finally, we observe that x 1 2 = y t TT t y/α 4 = y 2 /α 2 , and therefore ρ σ (x 1 ) = ρ σ ( y /α) = ρ ασ (y).This proves that T(D m σ ) is statistically close to the discrete Gaussian distribution D k ασ .

A gadget matrix construction
Our main proof requires an integer matrix satisfying some special properties.In the following lemma, we state the required properties and give a simple construction.We recall that notation ) stands for the matrix obtained by taking (resp.dropping) the first n columns of a matrix Q.In particular, Q ]1[ is the matrix Q without its first column.

Lemma 2.7.
There is an efficiently computable matrix Proof.Define the matrix The idea is to start with the square matrix which is unitriangular (i.e., triangular, with unit elements along the diagonal, and, therefore, invertible), and it satisfies u t Q = e t 1 .We would like to use Lemma 2.6 to analyze the . However, X is not primitive and does not satisfy the property XX t = α 2 I required by Lemma 2.6 because adjacent rows of X have scalar product −1.Other pairs of rows are orthogonal, so XX t is tridiagonal (i.e., with nonzero entries only on or immediately next to the main diagonal), but not diagonal.To fix this, we extend Q to THEORY OF COMPUTING, Volume 14 (13), 2018, pp.1-17 where adjacent rows have scalar product 1, and cancel out with X.
where the position and sign of the new columns have been chosen to highlight the (square) unitriangular blocks X = [X, −e n ], Ỹ = [Y, e n ] ∈ Z n×n .Notice that Ỹ = X + 2I, and therefore the two blocks commute, i. e., X Ỹ = Ỹ X.
We already know that and it is immediate to verify that the vector ), where This matrix is primitive because it starts with a unitriangular block, and it satisfies TT t = 4I by construction.In order to apply Lemma 2.6, and conclude that T(D 2n+2 σ ) ≈ D 2σ , we only need to bound the smoothing parameter of Λ = ker(T).This lattice is defined by a system Tx = 0 of n linearly independent equations in 2n + 2 variables.So, Λ is a rank-(n + 2) lattice.Moreover, it contains (n + 2) vectors of length at most 2 given by the columns of the matrix The columns are linearly independent because the matrix satisfies WV = 2I.So V has rank n + 2. This proves that λ n+2 (Λ) ≤ 2, and therefore, by Lemma 2.4, η(Λ) ≤ ω(

Computational problems and LWE
All computational problems considered in this paper are decision problems about pseudorandom distributions.Specifically, for any distribution ensemble A n over sets A n , the A n -assumption is the assumption that A n is pseudorandom, and the A n -problem is the computational problem of distinguishing A n from the uniform distribution U(A n ) with non-negligible advantage.So, all problems will be implicitly specified simply by defining an appropriate set of distributions A n .
A reduction between (the decision problems associated to) two distributions A n and A n over sets A n and A n (from A n to A n ) is an efficient (probabilistic polynomial-time) algorithm that solves problem A n (i.e., distinguishes A n from the uniform distribution with non-negligible advantage) given access to any oracle that solves A n with (possibly different, but still) non-negligible advantage.In the simplest settings (e. g., see Lemmas 2.12 and 2.13) a reduction may be specified just by an efficient (probabilistic polynomial-time computable) function ϕ such that ϕ(A n ) ≈ A n and ϕ(U(A n )) ≈ U(A n ).Most of our reductions are more complex, and make use of hybrid arguments (see Lemma 2.9) that require oracle calls on distributions other than A n or U(A n ).
In this paper, it is convenient to consider a version of the Learning With Errors (LWE) problem where the secret is a matrix S, rather than a vector, defined as follows.
Definition 2.8.For any positive integers q, n, k, m and real σ , let LWE(q, n × k, m, σ ) be the LWE distribution with modulus q, number of samples m, secret dimension n × k, and error parameter σ , i. e., the distribution of obtained by picking A ← U(Z m×n q ) and S ← U(Z n×k q ) uniformly at random, and E ← D m×k σ with discrete Gaussian distribution.
When k = 1, the secret is just a vector s ∈ Z n q , and this is the standard version of LWE, which we write LWE(q, n, m, σ ) instead of LWE(q, n × 1, m, σ ).The m rows of the LWE can be viewed as random noisy labeled samples from a hard-to-learn linear function defined by the secret S. Worst-case to average-case reductions [21,19,6,20] support the conjecture that the LWE problem is hard for an arbitrary (polynomially bounded) number of samples m = n O (1) , and some reductions require this extra flexibility.(E.g., the LWE search-to-decision reduction in [21], but see also [16] for a sample-preserving reduction.)This version of the problem is denoted LWE(q, n, σ ).The modulus q is always assumed to have bit-size polynomial in n (i.e., log 2 q ≤ n O (1) ), but in most cryptographic applications it is just a small polynomial (e. g., q ≤ n 2 ), and integers modulo q are represented with O(log n) bits.
The vector and matrix variants of LWE are easily seen to be equivalent via a standard hybrid argument.
Proof.The intuition behind the proof is that the LWE distribution with secret matrix S ∈ Z n×k q may be regarded as k copies of the standard LWE distribution with secret vectors given by the columns of S, all using the same public random A. More technically, the reduction considers the sequence of hybrid distributions ), for i = 0, . . ., k.Each pair of neighboring hybrids A i , A i+1 can be generated by starting from an LWE(q, n, m, σ ) challenge sample ).The resulting distribution equals A = A i if b is random, and A = A i+1 if b = As + e is pseudorandom.So, any distinguisher with advantage ε against LWE(q, n × k, m, σ ) will achieve advantage ε/k against LWE(q, n, m, σ ).
We remark that LWE 0,1 and LWE ± could also be generalized to secret matrices S, and proved equivalent to the single-vector version exactly as in Lemma 2.9.But this is not used in this paper, so, for simplicity, we only define the secret-vector version of the problems.The next two lemmas show that LWE 0,1 and LWE ± are essentially the same problem.We remark that the lemmas are even more general than stated, and they apply to LWE problems with arbitrary error distribution, not just discrete Gaussians.All parameters (including the number of samples, and the exact error distribution) are preserved by the reductions, showing that the two problems are equivalent in a very strong sense.Lemma 2.12.For any odd integer q, there is a polynomial-time reduction from the LWE 0,1 (q, n, m, σ ) problem to the LWE ± (q, n, m, σ ) problem.
Proof.On input an LWE 0,1 instance (A, b), the reduction outputs where A/2 is computed modulo q, and u = (1, . . ., 1) ∈ Z n q .Notice that, since q is odd, the factor 2 is invertible modulo q, and A/2 is uniformly distributed.If b is uniform, then b is also uniform.On the other hand, if b = As + e, then b = (A/2)s + e where s = 2s − u is uniformly random in {±1} n .Lemma 2.13.For any odd integer q, there is a polynomial-time reduction from the LWE ± (q, n, m, σ ) problem to the LWE 0,1 (q, n, m, σ ) problem.
Proof.On input an LWE ± instance (A, b), the reduction outputs ϕ(A, b) = (2A, b = b + Au) where u ∈ {1} n is the all-ones vector.Notice that, since q is odd, the factor 2 is invertible modulo q, and 2A is uniformly distributed.If b is uniform, then b is also uniform.On the other hand, if b = As + e, then b = (2A)s + e where s = (s + u)/2 is uniformly random in {0, 1} n .

Pseudorandomness of binary LWE
In this section we present a proof that the binary-secret LWE distribution LWE ± is pseudorandom.The idea is to define a simple (efficiently computable) randomized transformation ϕ with the following properties: • If the input to ϕ is uniformly distributed, then the output ϕ(U) equals (or is statistically close to) the binary LWE distribution LWE ± (q, n, m, σ ) for some σ .
• There are two pseudorandom distributions B, B such that ϕ(B) equals (or is statistically close to) B.
Since ϕ is efficiently computable, the pseudorandomness of B implies that ϕ(U) ≈ LWE ± (q, n, m, σ ) is computationally indistinguishable from ϕ(B) ≈ B. By transitivity, since B is pseudorandom, it follows that LWE ± (q, n, m, σ ) is also pseudorandom.As our aim is to give a reduction from the standard LWE problem to binary LWE, we set B and B to two pseudorandom distributions related to LWE.Specifically, we use the distributions } for some σ related to σ .In other words B and B are the (transposed) "label" component of the LWE distributions LWE(q, k × m, n − 1, σ ) and LWE(q, (k + 1) × m, n + 1, 2σ ).Notice that any distinguisher between B and the uniform distribution can be immediately transformed into an LWE distinguisher that on input (A, B = AS + E) simply discards A, and then runs the original distinguishing procedure on B t .So, B is pseudorandom under the standard LWE assumption, and similarly for B.
Before getting into the details of the transformation, notice the difference between the high level structure of the proof presented here, and a typical reduction between variants of LWE.A typical reduction would map standard LWE samples to binary LWE samples, and uniform samples to uniform samples.Here, instead, on the one hand the standard LWE distribution is mapped again to a standard LWE distribution (with slightly different parameters).On the other hand, the uniform distribution is mapped to binary LWE.
Our randomized transformation ϕ is shown in Figure 1.The transformation uses, as randomness, both a uniform secret vector s and a binary secret vector z.Informally, the intuition is that by simultaneously multiplying by s (on the left) and by z (on the right), the same transformation is able to produce (depending on how the input B was chosen) either • a binary LWE distribution with secret z (when B is uniform), or • a (transposed4 ) standard LWE distribution with secret [s, S t ] t (when B = (AS + E) t ).
Intuitively, one may think of ϕ as mapping B to [B, Bz + e].So, when B is uniformly random, ϕ outputs the binary LWE distribution by construction.On the other hand, if which looks like a standard (transposed) LWE label matrix.In fact, by the Leftover Hash Lemma, one may argue that [A t , A t z] is statistically close to a uniformly distributed matrix.Unfortunately, the error matrix [E t , E t z + e] does not follow the Gaussian distribution 5 required by LWE.So, in order to address this and other technical difficulties, the actual transformation ϕ is a bit more complex.The details of the transformation are somewhat technical, Transformation ϕ(B; z, s, a, e, G): Transformation proving the pseudorandomness of binary LWE, where Q is the matrix specified in Lemma 2.7.
and they are primarily motivated by all the cancellations needed for the proof to work and obtain the proper LWE Gaussian error distribution.One way to gain additional insight into the construction is to notice that the transformation ϕ(B) always outputs a pair [X, x] such that Xz = s + Gv ≈ s + e = x.(See proof of Claim 3.2 for details.) So, distribution B = ϕ(B) will also satisfy this property with high probability: there must be a small vector ẑ ∈ {±1} n+1 such that ( Â Ŝ + Ê) t ẑ ≈ 0. This shows that the pseudorandom matrix B = ( Â Ŝ + Ê) t is already somehow close to a binary LWE instance because there is a ±1 combination of the first n columns of B that is close to the last column.In fact, something very similar can be proved directly, without using ϕ: matrix Ât maps a set {0, 1} n+1 of size 2 n > q k+1 to a set Z k+1 q of size q k+1 .So, by the pigeon-hole principle, there exist two binary inputs such that Ât ẑ0 = Ât ẑ1 , or, equivalently, a small vector ẑ = ẑ0 − ẑ1 (with z ∞ = 1) such that Bẑ = Êt ẑ ≈ 0. An informal interpretation of this argument (which, in fact, is closely related to the proof that LWE is robust with respect to the secret distribution [11]) is that matrix Ât hashes the binary secret ẑ to an almost uniform (smaller dimensional) secret Ât ẑ with entries in Z q .
But, as before, the problem with this intuitive approach is that the error distribution Êt ẑ is not Gaussian, and it is correlated with the secret ẑ.
Our theorem below solves these technical problems using a carefully designed gadget matrix Q (described in Lemma 2.7) which efficiently adjusts the error distribution using some extra randomness G. Notice how, in the process of transforming LWE into binary LWE, the number of samples n − 1 in the presumed hard LWE(q, k × m, n − 1, σ ) instance becomes the size n of the secret in the final binary LWE instance LWE ± (q, n, m, σ ).Similarly, the number of columns m (i.e., the number of parallel LWE instances) in the presumed hard LWE(q, (k + 1) × m, n + 1, 2σ ) instance becomes the number of samples in the final binary LWE instance.
Proof.We use Z as a shorthand for the diagonal matrix diag(z).We first show that the transformation ϕ maps the uniform distribution to the binary LWE distribution.
) is chosen uniformly at random, then ϕ(B) ∈ Z m×n q × Z m q is statistically close to the LWE ± (q, n, m, σ ) distribution.
Proof.Notice that, under the assumptions in the corollary statement, n ≥ (k + 1) log 2 q + k ≥ (k + 1) log 2 q + ω(log n) as required by Theorem 3.1.In order to invoke the theorem, we also need to verify the pseudorandomness conditions.Assume LWE(q, k, n + 1, σ ) is pseudorandom.Dropping the last two rows from the samples [A, b] ← LWE(q, k, n + 1, σ ) shows that LWE(q, k, n − 1, σ ) is also pseudorandom.The samples [A, b] can also be mapped to LWE(q, k + 1, n + 1, 2σ ) by performing the following two operations: • Add an extra Gaussian error term e ← D n+1 √ 3σ to b.By Lemma 2.5, this has the effect of increasing the error rate to • Append an extra column a to A and add a random multiple a • s to b.This has the effect of extending the secret with an extra coordinate s.
Notice how Corollary 3.4 establishes the pseudorandomness of LWE ± for any polynomial number of samples m = n O (1) , using, as an assumption, only the pseudorandomness of LWE for a fixed number (n+1 ≈ k log q) of samples.(This property is also implicit in [6].)We remark that we phrased Theorem 3.1 and Corollary 3.4 asymptotically (in terms of polynomial-time distinguishers achieving at most negligible advantage ε = n −ω (1) ) only for simplicity.All statements and proofs are easily adapted to other settings, e. g., to prove hardness of binary LWE against adversaries running in subexponential time.

Conclusion
We presented a simple proof that the LWE problem with binary secret of size n = O(k log 2 q) is as hard as LWE with uniformly random secret in Z k q .More specifically, if LWE with secrets in Z k q and n ≈ k log q samples is pseudorandom, then LWE with secrets in {0, 1} n or {±1} n (and an arbitrary polynomial number of samples n O (1) ) is also pseudorandom.As already observed in [6], the growth in the dimension of the secret is seemingly optimal, because it approximately preserves the bit-size of the secret, and the cost of a brute force attack.Starting from LWE with a fixed number of samples m ≈ k log q = O(k log k) (for typical modulus q = k O (1) polynomial in the LWE secret dimension k) is potentially useful for cryptanalysis, as it allows to generate and publish fixed-size random challenges for any value of k. (By contrast, the general LWE problem would require to give to the adversary access to an LWE sampling oracle that can be called an arbitrary number of times.)An interesting question is whether a reduction THEORY OF COMPUTING, Volume 14 (13), 2018, pp.1-17 can be given starting from LWE with an even smaller number of samples, e. g., m = O(k) linear in the secret dimension.
An important open problem is whether similar results can be proved for the structured variants of LWE based on algebraic lattices [15,14].The use of structured lattices is of primary importance to make lattice cryptography efficient in practice, and the use of LWE with binary secrets plays an important role in some applications, like Fully Homomorphic Encryption schemes [9,8], to control the noise growth when computing on encrypted data.We remark that the use of binary secrets and errors does not seem to pose any difficulty in the setting of one-way hash functions based on structured lattices [15].However, for LWE [21,14], it is unclear how to adapt the proof in this paper to the algebraic lattice setting.We hope our simple proof for unstructured lattices will bring more attention to this problem, and serve as a possible starting point to establish similar results for ring LWE.

Lemma 2 . 3 (
See[18, Lemma 4.1]  and[10, Lemma 2.4]).For any lattice Λ, ε ∈ (0, 1), and vector c in the linear span of (A, b), and then computing A = (A, [AS + E, b, B]) THEORY OF COMPUTING, Volume 14 (13), 2018, pp.1-17 where S ← U(Z n×i q ), E ← D m×i σ and B ← U(Z m×(k−i−1) q has pairwise orthogonal rows, but the first and last rows have a different norm than the rest.So, [X, Y][X, Y] t is diagonal, but it is still not a scalar matrix α 2 I.We complete the construction by adding 4 more columns to make each row of Q ]1[ contain precisely 4 nonzero ±1 entries.Our final construction is