Decoding Reed-Muller codes over product sets

We give a polynomial time algorithm to decode multivariate polynomial codes of degree $d$ up to half their minimum distance, when the evaluation points are an arbitrary product set $S^m$, for every $d<|S|$. Previously known algorithms can achieve this only if the set $S$ has some very special algebraic structure, or if the degree $d$ is significantly smaller than $|S|$. We also give a near-linear time randomized algorithm, which is based on tools from list-decoding, to decode these codes from nearly half their minimum distance, provided $d<(1-\epsilon)|S|$ for constant $\epsilon>0$. Our result gives an $m$-dimensional generalization of the well known decoding algorithms for Reed-Solomon codes, and can be viewed as giving an algorithmic version of the Schwartz-Zippel lemma.

For error-correcting purposes, if we are given a "received word" r : S m → F q such that there exists a polynomial P of degree at most d with ∆(r, P ) ≤ δ C /2, then we know that there is a unique such P . The problem that we consider in this paper, "decoding C upto half its minimum distance", is the algorithmic task of finding this P .

Our Results
There is a rich history with several deep algebraic ideas surrounding the problem of decoding multivariate polynomial codes. We first state our main results, and then discuss its relationship to the various other known results. Theorem 1.2 (Efficient decoding of multivariate polynomial codes upto half their minimum distance). Let F be a finite field, let S, d, m be as above, and let δ C = (1 − d |S| ). There is an algorithm, which when given as input a function r : S m → F, runs in time poly(|S| m , log |F|) finds the polynomial P (X 1 , . . . , X m ) ∈ F[X 1 , . . . , X m ] of degree at most d (if any) such that: ∆(r, P ) < δ C /2.
As we will discuss below, previously known efficient decoding algorithms for these codes only either worked for (1) very algebraically special sets S, or (2) very low degrees d, or (3) decoded from a much smaller fraction of errors (≈ 1 m+1 δ C instead of 1 2 δ C ). Using several further ideas, we also show how to implement the above algorithm in near-linear time to decode upto almost half the minimum distance, provided d is not (1 − o(1))|S|. Theorem 1.3 (Near-linear time decoding). Let F be a finite field, let S, d, m be as above, and let δ C = (1 − d |S| ). Assume δ C > 0 is a constant. There is a randomized algorithm, which when given as input a function r : S m → F, runs in time |S| m · poly(log |S| m , log |F|) finds the polynomial P (X 1 , . . . , X m ) ∈ F[X 1 , . . . , X m ] of degree at most d (if any) with: ∆(r, P ) < (1 − o(1)) · δ C /2.
Over the rational numbers, we get a version of Theorem 1.2 where the running time is poly(|S| m , t), where t is the maximum bit-complexity of any point in S or in the image of r. This enables us to decode multivariate polynomial codes upto half the minimum distance in the natural special case where the evaluation set S equals {1, 2, . . . , n}.
We also mention that decoding Reed-Muller codes over an arbitrary product set S m appears as a subroutine in the local decoding algorithm for multiplicity codes [17] (see Section 4 on "Solving the noisy system"). Our results allow the local decoding algorithms there to run efficiently over all fields ([17] could only do this over fields of small characteristic, where algebraically special sets S are available).

Related work
There have been many works studying the decoding of multivariate polynomial codes, which prove (and improve) various special cases of our main theorem.
Reed-Solomon codes (m = 1): When m = 1, our problem is also known as the problem of decoding Reed-Solomon codes upto half their minimum distance. That this problem can be solved efficiently is very classical, and a number of algorithms are known for this (Mattson-Solomon [5], Berlekamp-Massey [4], Berlekamp-Welch [12]). The underlying algorithmic ideas have subsequently had a tremendous impact on algebraic algorithms.
For Reed-Solomon codes, it is in fact known how to list-decode beyond half the minimum distance, upto the Johnson bound (Guruswami-Sudan [6]). This has had numerous further applications in coding theory, complexity theory and pseudorandomness.
Special sets S: For very special sets S, it turns out that there are some algebraic ways to reduce the decoding of multivariate polynomial codes over S m to the decoding of univariate polynomial codes. This kind of reduction is possible when S equals the whole field F, or more generally when S equals an affine subspace over the prime subfield of F.
When S = F q , then S m = F m q and S m can then be identified with the large field F q m in a natural F qlinear way (this understanding of Reed-Muller codes was discovered by [8]). This converts the multivariate setting into univariate setting, identifies the multivariate polynomial code as a subcode of the univariate polynomial code, and (somewhat miraculously), the minimum distance of the univariate polynomial code equals the minimum distance of the multivariate polynomial code. Thus the classical Reed-Solomon decoding algorithms can then be used, and this leads to an algorithm for the multivariate setting decoding upto half the minimum distance. In fact, Pellikaan-Wu [7] observed that this connection allows one to decode multivariate polynomial codes beyond half the minimum distance too, provided S is special in the above sense.
Another approach which works in the case of S = F q is based on local decoding. Here we use the fact that S m = F m q contains many lines (not just the axis-parallel ones), and then use the univariate decoding algorithms to decode on those lines from (1 − d q )/2 fraction errors. This approach manages to decode multivariate polynomial codes with S = F q from ( 1 2 − o(1)) of the minimum distance. Again, this approach does not work for general S, since a general S m usually contains only axis-parallel lines (while F m q has many more lines).
Low degree d: When the degree d of the multivariate polynomial code is significantly smaller than |S|, then a number of other list-decoding based methods come into play.
The powerful Reed-Muller list-decoding algorithm of Sudan [9] and its multiplicity-based generalization, based on (m + 1)-variate interpolation and root-finding, can decode from 1 − ( d |S| ) 1 m+1 fraction errors. With small degree d = o(|S|) and m = O(1), this decoding radius equals 1 − o(1)! However when d is much larger (say 0.9 · |S|), then the fraction of errors decodable by this algorithm is around 1 Another approach comes from the list-decoding of tensor codes [10]. While the multivariate polynomial codes we are interested in are not tensor codes, they are subcodes of the code of polynomials with individual degree at most d. Using the algorithm of [10] for decoding tensor codes, we get an algorithm that can decode from a 1 − o(1) fraction of errors when d = o(|S|), but fails to approach a constant fraction of the minimum distance when d approaches |S|.
In light of all the above, to the best of our knowledge, for multivariate polynomial codes with d > 0.9 · |S| (i.e., δ C < 0.1), and S generic, the largest fraction of errors which could be corrected efficiently was about 1 m+1 δ C . In particular, the correctable fraction of errors is a vanishing fraction of the minimum distance, as the number of variables m grows.
We thus believe it is worthwhile to investigate this problem, not only because of its basic nature, but also because of the many different powerful algebraic ideas that only give partial results towards it.

Overview of the decoding algorithm
We now give a brief overview of our decoding algorithms. Let us first discuss the bivariate (m = 2) case. Here we are given a received word r : Our goal is to find P (X, Y ). First some high-level strategy. An important role in our algorithm is played by the following observation: the restriction of a degree ≤ d bivariate polynomial P (X, Y ) to a vertical line (fixing X = α) or a horizontal line (fixing Y = β) gives a degree ≤ d univariate polynomial. Perhaps an even more important role is played by the following disclaimer: the previous observation does not characterize bivariate polynomials of degree d! The set of functions f : S 2 → F for which the horizontal restrictions and vertical restrictions are polynomials of degree ≤ d is the code of polynomials with individual degree at most d (this is the tensor Reed-Solomon code, with much smaller distance than the Reed-Muller code). For such a function f to be in the Reed-Muller code, the different univariate polynomials that appear as horizontal and vertical restrictions must be related in some way. The crux of our algorithm is to exploit these relations.
It will also help to recap the standard algorithm to decode tensor Reed-Solomon codes upto half their minimum distance (this scheme actually works for general tensor codes). Suppose we are given a received word r : S 2 → F, and we want to find a polynomial P (X, Y ) with individual degrees at most d which is close to r. One then takes the rows of this new received word (after having corrected the columns) and decodes them to the nearest degree ≤ d polynomial. The key point is to pass some "soft information" from the column decodings to the row decodings; the columns which were decoded from more errors are treated with lower confidence. This decodes the tensor Reed-Solomon code from 1/2 the minimum distance fraction errors. Several ingredients from this algorithm will appear in our Reed-Muller decoding algorithm. Now we return to the problem of decoding Reed-Muller codes. Let us write P (X, Y ) as a single variable consider the restricted univariate polynomial P (α, Y ). Since deg(P 0 ) = 0, P 0 (α) must be the same for each α. Thus all the polynomials P (α, Y ) α∈S have the same coefficient for Y d . Similarly, the coefficients of Y d−i in the polynomials P (α, Y ) α∈S fit a degree i polynomial.
As in the tensor Reed-Solomon case, our algorithm begins by decoding each column r(α, ·) to the nearest degree ≤ d univariate polynomial. Now, instead of trying to use these decoded column polynomials to recover P (X, Y ) in one shot, we aim lower and just try to recover P 0 (X). The advantage is that P 0 (X) is only a degree 0 polynomial, and is thus resilient to many more errors than a degree d polynomial. Armed with P 0 (X), we then proceed to find P 1 (X). The knowledge of P 0 (X) allows us to decode the columns r(α, ·) to a slightly larger radius; in turn this improved radius allows us to recover the degree 1 polynomial P 1 (X). At the i th stage, we have already recovered P 0 (X), P 1 (X), . . . , P i−1 (X). Consider, for each α ∈ S, Our algorithm decodes f α (Y ) to the nearest degree d − i polynomial: note that as i increases, we are decoding to a lower degree polynomial, and hence we are able to handle a larger fraction of errors. Define h(α) to be the coefficient of Y d−i in the polynomial so obtained; this "should" equal the evaluation of the degree i polynomial P i (α). So we next decode h(α) to the nearest degree i polynomial (using the appropriate soft information), and it turns out that this decoded polynomial must equal P i (X). By the time i reaches d, we would have recovered P 0 (X), P 1 (X), . . . , P d (X), and hence all of P (X, Y ). Summarizing, the algorithm repeatedly decodes the columns r(α, ·), and at each stage it uses the relationship between the different univariate polynomial P (α, Y ) to: (1) learn a little bit more about the polynomial P (X, Y ), and (2) increase the radius to which we can decode r(α, ·) in the next stage. This completes the description of the algorithm in the m = 2 case.
The case of general m is very similar, with only a small augmentation needed. Decoding m-variate polynomials turns out to reduce to decoding m − 1-variate polynomials with soft information; thus in order to make a sustainable recursive algorithm, we aim a little higher and instead solve the more general problem of decoding multivariate polynomial codes with uncertainties (where each coordinate of the received word has an associated "confidence" level).
To implement the above algorithms in near-linear time, we use some tools from list-decoding. The main bottleneck in the running time is the requirement of having to decode the same column r(α, ·) multiple times to larger and larger radii (to lower and lower degree polynomials). To save on these decodings, we can instead list-decode r(α, ·) to a large radius using a near-linear time list-decoder for Reed-Solomon codes; this reduces the number of required decodings of the same column from d to O(1) (provided d < (1 − Ω(1))|S|). For the m = 2 case this works fine, but for m > 2 case this faces a serious obstacle; in general it is impossible to efficiently list-decode Reed-Solomon codes with uncertainties beyond half the minimum distance of the code (the list size can be superpolynomial). We get around this using some technical ideas, based on speeding-up the decoding of Reed-Muller codes with uncertainties when the fraction of errors is significantly smaller than half the minimum distance. For details, see Section 6.

Organization of this paper
In Section 2, we cover the notion of weighted distance, which will be used in handling Reed-Solomon and Reed-Muller decoding with soft information on the reliability of the symbols in the encoding. In Section 3, we state and prove a polynomial time algorithm for decoding bivariate Reed-Muller codes to half the minimum distance. We then generalize the proof to decode multivariate Reed-Muller codes in Section 4. Finally, in sections 5 and 6, we show that decoding Reed-Muller codes to almost half the minimum distance can be done in near-linear time by improving on the algorithms in Section 3 and 4.

Preliminaries
At various stages of the decoding algorithm, we will need to deal with symbols and received words in which we have varying amounts of confidence. We now introduce some language to deal with such notions.
Let Σ denote an alphabet. A weighted symbol of Σ is simply an element of Σ × [0, 1]. In the weighted symbol (σ, u), we will be thinking of u ∈ [0, 1] as our uncertainty that σ is the symbol we should be talking about.
For a weighted symbol (σ, u) and a symbol σ ′ , we define their distance ∆((σ, u), σ ′ ) by: , and a (conventional) function f : T → Σ, we define their The key inequality here is the triangle inequality.
The crucial property that this implies is the unique decodability up to half the minimum distance of a code for weighted received words.
Then there is at most one f ∈ C satisfying ∆(r, f ) < ∆/2.

Bivariate Reed-Muller Decoding
In this section, we provide an algorithm for decoding bivariate Reed-Muller codes to half the minimum distance. Consider the bivariate Reed-Muller decoding problem. We are given a received word r : whose distance ∆(r, C) from the received word is at most half the minimum distance n(n − d)/2. The following result says that there is a polynomial time algorithm in the size of the input n 2 to find C: Theorem 3.1. Let F be a finite field and let S ⊆ F be a nonempty subset of size |S| = n. Given a received word r :

Outline of Algorithm
The general idea of the algorithm is to write coefficients as polynomials in F[X], and attempt to uncover the coefficients P i (X) one at a time.
We outline the first iteration of the algorithm, which uncovers the coefficient P 0 (X) of degree 0. View the encoded message as a matrix on S × S, where the rows are indexed by x ∈ S and the columns by y ∈ S. We first Reed-Solomon decode the rows r(x, Y ), x ∈ S to half the minimum distance (n − d)/2 and extract the coefficient of Y d in those decodings. This gives us guesses for what P 0 (x) is for x ∈ S. However, this isn't quite enough to determine P 0 (X). So we will also include some soft information which tells us how uncertain we are that the coefficient is correct. The uncertainty is a number in [0, 1] that is based on how far the decoded codeword G x (Y ) is from the received word r(x, Y ). The farther apart, the higher the uncertainty. A natural choice for the uncertainty is simply the ratio of the distance ∆(G x (Y ), r(x, Y )) to half the minimum distance (n − d)/2. Let f : S → F × [0, 1] be the function of guesses for P 0 (x) and their uncertainties. We then use a Reed-Solomon decoder with uncertainties to find the degree 0 polynomial that is closest to f (X). This will give us P 0 (X). Finally, subtract P 0 (X)Y d from r(X, Y ) and repeat to get the subsequent coefficients.
In the algorithm, we will use REED-SOLOMON-DECODER(r, d) to denote the O(n polylog n) time algorithm that performs Reed-Solomon decoding of degree d to half the minimum distance [11,12]. We use RS-SOFT-DECODER(r, d) to denote the O(n 2 polylog n) time algorithm that performs Reed-Solomon decoding of degree d with uncertainties to half the minimum distance, which is based on Forney's generalized minimum distance decoding algorithm for concatenated codes [13].

Algorithm 1 Decoding Bivariate Reed Muller
end for 10: Define the weighted function 12: end for 13: Output:

Proof of Theorem 3.1
Proof. Correctness of Algorithm It suffices to show that Q i (X) = P i (X) for i = 0, 1, . . . , d, which we prove by induction. For this proof, the base case and inductive step can be handled by a single proof. We assume the inductive hypothesis that we have Q j (X) = P j (X) for j < i. Note that the base case is i = 0 and in this case, we assume nothing.
Observe that: Hence, For each Then, for x ∈ A, we have And for We now upper bound ∆(f i (X), P i (X)):

Runtime of Algorithm
We claim that the runtime of our algorithm is O(n 3 polylog n), ignoring the polylog |F| factor from field operations. The algorithm has d + 1 iterations. In each iteration, we update r i , apply REED-SOLOMON-DECODER to n rows and apply RS-SOFT-DECODER a single time to get the leading coefficient. As updating takes O(n 2 ) time, REED-SOLOMON-DECODER takes O(n polylog n) time, and RS-SOFT-DECODER takes O(n 2 polylog n) time, we get O(n 2 polylog n) for each iteration. d + 1 iterations gives a total runtime of O(dn 2 polylog n) < O(n 3 polylog n).

Reed-Muller Decoding for General m
We now generalize the algorithm for decoding bivariate Reed-Muller codes to handle Reed-Muller codes of any number of variables. As before, we write the codeword as a polynomial in one of the variables and attempt to uncover its coefficients one at a time. Interestingly, this leads us to a Reed-Muller decoding on one fewer variable, but with uncertainties. This lends itself nicely to an inductive approach on the number of variables, however, the generalization requires us to be able to decode Reed-Muller codes with uncertainties. This leads us to our main theorem: Theorem 4.1. Let F be a finite field and let S ⊆ F be a nonempty subset of size |S| = n. Given a received word with uncertainties r : Note that to decode a Reed-Muller code without uncertainties, we may just set all the initial uncertainties to 0. The algorithm slows by a factor of n from the bivariate case due to having to use the RS-SOFT-DECODER instead of the faster REED-SOLOMON-DECODER on the rows of the received word.
Proof. The proof is by induction on the number of variables, and closely mirrors the proof of the bivariate case.

Base Case
We are given a received word with uncertainties r : S → F×[0, 1] and asked to find the unique polynomial of r. This is just Reed-Solomon decoding with uncertainty, which can be done in time O(n 2 polylog n).

Inductive Step
Assume that the result holds for m variables. That is, assume we have access to an algorithm REED-MULLER-DECODER(r, m, d) which takes as input a received word with uncertainties r : S m → F × [0, 1], and outputs the unique polynomial of degree at most d (if it exists) within weighted distance n m 2 1 − d n from r. We want to produce an algorithm for m + 1 variables. Before we progress, we set up some definitions to make the presentation and analysis of the algorithm cleaner. We are given r : S m+1 → F × [0, 1]. View r as a map from S m × S → F × [0, 1], and write r(X, Y ) = (r(X, Y ), u(X, Y )).
Suppose that there exists a polynomial C ∈ F[X, Y ] with deg(C) ≤ d such that The general strategy of the algorithm is to determine the P i 's inductively by performing d + 1 iterations from i = 0 to i = d, and recovering P i (X) at the i-th iteration.
For the i-th iteration, consider the word Since r is close to Our goal is to find P i (X), the leading coefficient of C i when viewed as a polynomial in Y . For each x ∈ S m , we decode the Reed-Solomon code with uncertainties given by r i (x, Y ) and extract the coefficient of Y d−i along with how uncertain we are about the correctness of this coefficient. This gives us a guess for the value P i (x) and our uncertainty for this guess. We construct the function f i : S m → F × [0, 1] of guesses for P i with their uncertainties. We then apply the induction hypothesis of Theorem 4.1 to f i to recover P i .

4:
for x ∈ S m do 5: end for 10: Define the weighted function f i :

Correctness of Algorithm
We will show by induction that the i-th iteration of the algorithm produces Q i (X) = P i (X). For this proof, the base case and inductive step can be handled by a single proof. We assume the inductive hypothesis that we have Q j (X) = P j (X) for j < i. Note that the base case is i = 0 and in this case, we assume nothing. It is enough to show ∆(f i (X), P i (X)) < n m 2 1 − i n . Then P i (X) is the unique polynomial within weighted distance n m Observe that: For each x ∈ S m , define Then, for x ∈ A, we have And for x / ∈ A, we have G x = C i,x , so We now upper bound ∆(f i (X), P i (X)):

Runtime of Algorithm
We claim the runtime of our m-variate Reed-Muller decoder is O(n m+2 polylog n), ignoring the polylog |F| factor from field operations. We again proceed by induction on m. In the base case of m = 1, we simply run the Reed-Solomon decoder with uncertainties, which runs in O(n 2 polylog n) time. Now suppose the m-variate Reed-Muller decoder runs in time O(n m+2 polylog n). We need to show that the m + 1-variate Reed-Muller decoder runs in time O(n m+3 polylog n).
The algorithm makes d + 1 iterations. In each iteration, we perform n m Reed-Solomon decodings with uncertainties, and extract the leading coefficient along with its uncertainty for each one. Each Reed-Solomon decoding takes O(n 2 polylog n) time, while computing an uncertainty of a leading coefficient takes O(n polylog n). So in this step, we have cumulative runtime O(n m+2 polylog n). Next we do a single mvariate Reed-Muller decoding with uncertainties, which takes O(n m+2 polylog n) by our induction hypothesis. This makes the total runtime O(dn m+2 polylog n) ≤ O(n m+3 polylog n), as desired.

Near-Linear Time Decoding in the Bivariate Case
In this section, we present our near-linear time, randomized decoding algorithm for bivariate Reed-Muller codes.

Outline of Improved Algorithm
Recall that the decoding algorithms we presented in the previous sections make d + 1 iterations, where d = αn, revealing a single coefficient of the nearest codeword during one iteration. In a given iteration, we decode each row of r i (X, Y ) to the nearest polynomial of degree d − i, extracting the coefficient of Y d−i and its uncertainty. Then we Reed-Solomon decode with uncertainties to get the leading coefficient of C(X, Y ), when viewed as a polynomial in Y . The runtime of this algorithm is O(n 3 polylog n). Each iteration has n Reed-Solomon decodings and a single Reed-Solomon decoding with uncertainties. As Reed-Solomon decoding takes O(n polylog n) time and Reed-Solomon decoding with uncertainties takes O(n 2 polylog n) time, we get a runtime of O(n 3 polylog n) with d + 1 iterations. To achieve near-linear time, we need to shave off a factor of n on both the number of Reed-Solomon decodings and the runtime of Reed-Solomon decoding with uncertainties.
To save on the number of Reed-Solomon decodings, we will instead list decode beyond half the minimum distance (using a near-linear time Reed-Solomon list-decoder), and show that the list we get is both small and essentially contains all of the decoded polynomials we require for Ω(n) iterations of i. So we will do O(n) Reed-Solomon list-decodings total instead of O(n 2 ) Reed-Solomon unique decodings to half the minimum distance.
To save on the runtime of Reed-Solomon decoding with uncertainties, we will use a probabilistic variant of Forney's generalized minimum distance decoding algorithm, which runs in near-linear time, but reduces the decoding radius from 1/2 the minimum distance to 1/2 − o(1) of the minimum distance.

Proof of Theorem 5.1. Reducing the Number of Decodings
To reduce the number of decodings, we will list decode past half the minimum distance. Let r i,x : S → F be a received word for a Reed-Solomon code C i of degree at most d i = d − i. Let t be the radius to which we list decode, and let L i,x = {C ∈ C i |∆(C, r i,x ) < t} be the list of codewords within distance t of r i,x . The radius to which we can decode while maintaining a polynomial-size list is given by the Johnson bound: − α is the relative distance of the code. By Taylor approximating the square root, we see that the Johnson bound exceeds half the minimum distance by Ω(n): where ǫ = 3(1 − α) 3 /16 is a positive constant. By a standard list-size bound as in the one in Cassuto and Bruck [14], we see that if we set the list decoding radius t = (n − d + i)/2 + ((1 − α) 2 /8)n, then the size of the list |L i,x | < 1 ǫ is constant. So the list decoding radius exceeds half the minimum distance by Ω(n), and the list size is constant. By Aleknovich's fast algorithm for weighted polynomial construction [15], the list L i,x can be produced in time (1/α) O(1) n log 2 n log log n = O(n polylog n). We will let RS-LIST-DECODER(r, d, t) denote the Reed-Solomon list decoder that outputs a list of all ordered pairs of polynomials of degree at most d within distance t to the received word r along with their distances to r. Since the list size is constant, all of the distances can be computed in O(n polylog n) time.

Faster Reed-Solomon Decoding with Uncertainties
In the appendix, we give a description of the probabilistic GMD algorithm that gives a faster Reed-Solomon decoder with uncertainties. We will refer to this algorithm as FAST-RS-DECODER(f, i), where f : S → F×[0, 1] is a received word with uncertainties, and i is the degree of the code. FAST-RS-DECODER(f, i) will output the codeword within distance (n − i − √ n)/2 (if it exists) with probability at least 1 − 1 n Ω(1) (the Ω(1) can be chosen to be an arbitrary constant, by simply repeating the algorithm independently several times). Therefore, in our final algorithm, with probability at least 99/100, all invocations of the FAST-RS-DECODER will succeed. Let t j = n−d+j·2cn 2 + cn.

15:
end for 16: Define the weighted function f j·2cn+k :

Correctness of Algorithm
View the received word as a matrix on S × S, where the rows are indexed by x ∈ S and the columns by y ∈ S. For correctness, we have to show two things. First, that Algorithm 3 produces the same row decodings G x (Y ) as Algorithm 2. Second, that the algorithm actually extracts the coefficients of C(X, Y ) = d i=0 P i (X)Y d−i when viewed as a polynomial in Y , i.e. Q i (X) = P i (X) for i = 0, . . . , d. Define Then we want to show that in each of the d + 1 iterations of (j, k), we have It is enough to instead show that the list L j,k,x contains all the polynomials of degree at most d−j ·2cn−k within distance t j = (n − d + j · 2cn)/2 + cn > (n − d + j · 2cn + k)/2 of r j·2cn+k,x (Y ). Furthermore, we want to show Q j·2cn+k (X) = P j·2cn+k (X).
We prove this by induction on (j, k). The base case is j = k = 0. For each row x ∈ S, we have The induction hypothesis is that for every (j ′ , k ′ ) < (j, k) in the lexicographic order, we have We will show the corresponding statements hold true for (j, k).
If k = 0, then the fact that the algorithm extracted the correct coefficients thus far means that the r j·2cn are the same in both Algorithm 2 and Algorithm 3. Since L j,0,x = RS-LIST-DECODER(r j·2cn,x (Y ), d − j · 2cn, t j ), the induction hypothesis on L j,0,x is met by the definition of RS-LIST-DECODER.
If k = 0, then we know from the induction hypothesis that L j,k−1, We defined L j,k,x in terms of L j,k−1,x to be: As Q j·2cn+k−1 (X) = P j·2cn+k−1 (X), L j,k,x is essentially obtained by taking the codewords with the correct leading coefficients and subtracting off the leading term. We claim that what we get is the set of all polynomials of degree at most d − j · 2cn − k within distance t j of r j·2cn+k,x .
Consider any (G, δ) ∈ L j,k,x . By definition of L j,k,x , we know there exists a C ∈ L j,k−1,x with r j·2cn+k−1,x )).

So we have
For the reverse inclusion, suppose G is a polynomial of degree at most d − j · 2cn − k at distance δ < t j of r j·2cn+k,x . Then C : It remains to show that Q j·2cn+k (X) = P j·2cn+k (X). As in the proof of Theorem 4.1, we show that ∆(f j·2cn+k (X), P j·2cn+k (X)) < n−j− √ n 2 , so that the output of FAST-RS-DECODER(f j·2cn+k (X), j) is P j·2cn+k (X). Using the first part of the induction we just proved, we get the same f j·2cn+k (X) as in Algorithm 2. This means we can adopt a nearly identical argument to get to this step: From here, we get:

Analysis of Runtime of Bivariate Reed-Muller Decoder
We run RS-LIST-DECODER d 2cn n = α 2c n = 4α (1−α) 2 n times. Also, we run FAST-RS-DECODER d = αn times. As both of these algorithms run in O(n polylog n) time, the total runtime of the algorithm is O(n 2 polylog(n, |F|)), after accounting for field operations. As the input is of size n 2 , this is near-linear in the size of the input.

Near-Linear Time Decoding in the General Case
A more involved variation of the near-linear time, randomized decoding algorithm for bivariate Reed-Muller codes can be used to get a near-linear time, randomized algorithm for decoding Reed-Muller codes on any number of variables: Theorem 6.1. Let F be a finite field and let S ⊆ F be a nonempty subset of size |S| = n. Let β > 1 2 . Given a received word r : S m → F, there is a O (n m · polylog(n, |F|)) time, randomized algorithm to find the unique polynomial (if it exists) C ∈ F[X 1 , . . . , X m ] with deg(C) ≤ d such that As part of the algorithm for near linear time Reed-Muller decoding, we will need to decode Reed-Muller codes with uncertainties to various radii less than half their minimum distance. We require the following theorem to do such decodings efficiently. Let t j = n−d+j·(e+1)−β √ n 2 .

16:
Define Q j·(e+1)+k : S m → F by Q j·(e+1)+k (X) = RM-UNC-DECODER f j·(e+1)+k (X), j · (e + 1) + k, n m 2 1 − j · (e + 1) + k + mβ √ n n − d + j · (e + 1) + k . 17: The algorithm proceeds as follows: As before, we write C(X, Y ) = d i=0 P i (X)Y d−i , and find the P i iteratively. In the i-th iteration, decode row r i,x , x ∈ S m to a degree d − i polynomial within radius To reduce the number of times we decode, we will instead decode to the larger radius 1 2 (n − d + i − β √ n) and use this decoding for e + 1 iterations. Construct the function n−e)/2 . Finally, decode f i (X) to a degree i polynomial within radius n m

Proof of Correctness
We have to show Q i (X) = P i (X). It is enough to show that Then P i will be the unique polynomial of degree i within distance n m When we decode r i,x to radius 1 , there are four possibilities: 1. The decoding is unsuccessful. In this case, we set D i,x to be any polynomial of degree n − d + i and set the uncertainty u i = 1. The contribution to ∆(f i , P i ) is ∆(f i (x), P i (x)) = 1/2, which is bounded above by 1 2 ∆(ri,x,Ci,x) (n−d+i−β √ n−e)/2 .
2. The decoding succeeds and is correct. In this case, 3. The decoding succeeds, but is the wrong codeword, whose leading coefficient disagrees with that of the correct codeword. In this case, 4. The decoding succeeds, but is the wrong codeword, whose leading coefficient matches that of the correct codeword. As in the previous case, D i,x = C i,x , and we have: Putting it all together, we have:

Analysis of Runtime
The algorithm can be divided into two parts: 1. Constructing the f i , i = 0, . . . , d.
2. Decoding the f i to get the P i , i = 0, . . . , d.
The dominant contribution to the runtime when constructing f i comes from all the Reed-Solomon decodings with uncertainties we have to do to get the D i,x (Y ). For every e + 1 iterations, we have to decode each row x ∈ S m again. The total number of such decodings is given by n e+1 · n m = n m+1 e+1 . Since each Reed-Solomon decoding with uncertainty can be done in O(n polylog n) time via the FAST-RS-DECODER, we have that the runtime of this part of the algorithm is O n m+2 e+1 polylog n . To understand the runtime of the second part of the algorithm, we will compute the runtime of decoding f i for some fixed i. Note that decoding f i is a Reed-Muller decoding with uncertainties problem with m variables. So we will write the decoding radius n m The runtime for all d + 1 iterations from i = 0, . . . , d is then · n m+1 polylog n .
It remains to bound d i=0 1 e i + 1 from above: The last inequality is a simple Riemann sum bound using the fact that the function decreases then increases continuously on [1, d − 1]. Computing the integral is a straightforward partial fraction decomposition: So the runtime for all d + 1 iterations is: This means the runtime for both parts of the algorithm is just O n m+2 e+1 polylog n .

Base Case
The algorithm for m = 2 is almost identical to that for general m, except that we decode f i (X) to a degree i polynomial within the larger radius n 2 1 − i+β √ n n to get Q i (X). Note that this radius is still less than half the minimum distance of the Reed-Solomon code of degree i. The correctness of the algorithm follows from the fact that P i is still the unique polynomial within distance n 2 1 − i+β √ n n of f i . We can again analyze the runtime of the two parts of the algorithm. The runtime for finding the f i follows the same analysis as before and is O( n 3 e+1 polylog n). For decoding the f i , we simply call the FAST-RS-DECODER for d + 1 different values of i. This has a runtime of O(dn polylog n) ≤ O(n 2 polylog n). So we get a total runtime of O( n 3 e+1 polylog n).
The algorithm for general Reed-Muller decoding follows the same strategy as the algorithm for Reed-Muller decoding with uncertainties to a radius less than half the minimum distance. Recall that to get the f i in the algorithm, we only needed to Reed-Solomon decode to a radius significantly less than half the minimum distance. We then saved on the number of Reed-Solomon decodings by instead decoding to half the minimum distance and reusing that decoding for many iterations. We now want to Reed-Muller decode to near half the minimum distance. Using the same algorithm doesn't save on enough Reed-Solomon decodings to achieve near linear time. However, when there are no uncertainties in the original received word, we can list decode efficiently to a radius significantly larger than half the minimum distance. We then use the lists for many iterations to generate the f i before list decoding again.
Proof of Theorem 6.1. In the case where the number of variables is 2, we are in the setting of decoding bivariate Reed-Muller codes to near half the minimum distance, which can be done in near-linear time by Theorem 5.1. Assume now that m ≥ 2 and that we have a Reed-Muller code in m + 1 variables.
As before, we want to show that Q i (X) = P i (X). It is enough to show We can use a similar analysis of ∆(f i , P i ) to the one in Theorem 6.2 to get to the following step: So we have:

Analysis of Runtime
Decoding the f i over the d + 1 values of i can be done in O(n m+1 polylog n) following the same runtime analysis from Theorem 6.2. For constructing the f i , we do O(n m ) Reed-Solomon list decodings taking O(n polylog n) time each. Within any given list, we need to compute uncertainties for each element of the list. This also takes O(n polylog n) time for each list. Finally, we update the lists at each iteration by identifying the elements with the correct leading coefficient and taking away their leading terms. Since the list size is constant, and there are O(n m ) lists to update in each iteration, the updating takes O(n m d) = O(n m+1 ) over d + 1 iterations. Hence the total runtime is O(n m+1 polylog n) as desired.

Open Problems
We conclude with some open problems.
1. The problem of list-decoding multivariate polynomial codes up to the Johnson radius is a very interesting open problem left open by our work. Generalizing our approach seems to require progress on another very interesting open problem, that of list-decoding Reed-Solomon concatenated codes. See [16] for the state of the art on this problem.
2. It would be interesting to understand the relationship between our algorithms and the m + 1-variate interpolation-based list-decoding algorithm of Sudan [9]. Their decoding radii are incomparable, and perhaps there is some insight into the polynomial method, which is known to face some difficulties in > 2 dimensions, that can be gained here.
3. It would be interesting to see if one can decode multiplicity codes [17] on arbitrary product sets upto half their minimum distance. Here too, we know algorithms that decode upto the minimum distance only in the case when S is very algebraically special (from [18]), or if the degree d is very small compared to |S| (via an m + 1-variate interpolation algorithm, similar to [9]).
of r ′ .  We say that a point is an erasure if it is erased by the algorithm. We say that a point (α i , β i ) is an error if (α, β) is not an erasure and f (α i ) = β i . Let E be the number of errors, and let F be the number of erasures. As the resulting n − F points form a Reed-Solomon code of block length n − F and degree d, the algorithm outputs f as long as 2E + F < n − d.
We will use Chebyshev's inequality to show that 2E + F < n − d with probability at least 3 4 . To help us compute the expectation and variance of 2E + F , we write E and F as a sum of indicator random variables. 1 i∈T .
We then can show E[2E + F ] is less than n − d by a significant amount √ n: Finally, we show that Var(2E + F ) is small: By Chebyshev's inequality, Pr(2E + F ≥ n − d) ≤ 1 4 . Hence we have Pr(2E + F < n − d) ≥ 3 4 . That is, with probability at least 3 4 , the algorithm outputs f . We now analyze the runtime of our fast Reed-Solomon decoder. The erasures can be done in O(n) time. Also, as the EE-DECODER is essentially a Reed-Solomon decoder to half the minimum distance, it runs in time O(n polylog n) [11,12]. This gives a total runtime of O(n polylog n).
Note that by repeating the algorithm Ω(log n) times, we find the unique codeword in O(n polylog n) time with probability 1 − 1/n Ω(1) (the Ω(1) can be chosen to be an arbitrary constant).