Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality

by Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani

Theory of Computing, Volume 8(14), pp. 321-350, 2012

Bibliography with links to cited articles

[1]   Nir Ailon and Bernard Chazelle: The fast Johnson–Lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput., 39(1):302–322, 2009. Preliminary version in STOC’06. [doi:10.1137/060673096]

[2]   Nir Ailon and Bernard Chazelle: Faster dimension reduction. Commun. ACM, 53(2):97–104, 2010. [doi:10.1145/1646353.1646379]

[3]   Alexandr Andoni and Piotr Indyk: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proc. 47th FOCS, pp. 459–468. IEEE Comp. Soc. Press, 2006. [doi:10.1109/FOCS.2006.49]

[4]   Alexandr Andoni and Piotr Indyk: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117–122, 2008. [doi:10.1145/1327452.1327494]

[5]   Alexandr Andoni, Piotr Indyk, and Robert Krauthgamer: Earth mover distance over high-dimensional spaces. In Proc. 19th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’08), pp. 343–352. ACM Press, 2008. [ACM:1347082.1347120]

[6]   Alexandr Andoni, Piotr Indyk, and Mihai Pătraşcu: On the optimality of the dimensionality reduction method. In Proc. 47th FOCS, pp. 449–458. IEEE Comp. Soc. Press, 2006. [doi:10.1109/FOCS.2006.56]

[7]   Sunil Arya and Theocharis Malamatos: Linear-size approximate Voronoi diagrams. In Proc. 13th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’02), pp. 147–155. ACM Press, 2002. [ACM:545400]

[8]   Sunil Arya, Theocharis Malamatos, and David M. Mount: Space-time tradeoffs for approximate nearest neighbor searching. J. ACM, 57(1):1:1–1:54, 2009. [doi:10.1145/1613676.1613677]

[9]   Sunil Arya and David M. Mount: Approximate nearest neighbor queries in fixed dimensions. In Proc. 4th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’93), pp. 271–280. ACM Press, 1993. [ACM:313768]

[10]   Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, and Angela Y. Wu: An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. J. ACM, 45(6):891–923, 1998. Preliminary version in SODA’94. [doi:10.1145/293347.293348]

[11]   Marshall W. Bern: Approximate closest-point queries in high dimensions. Inform. Process. Lett., 45(2):95–99, 1993. [doi:10.1016/0020-0190(93)90222-U]

[12]   Allan Borodin, Rafail Ostrovsky, and Yuval Rabani: Subquadratic approximation algorithms for clustering problems in high dimensional spaces. Machine Learning, 56(1-3):153–167, 2004. Preliminary version in STOC’99. [doi:10.1023/B:MACH.0000033118.09057.80]

[13]   Andrei Z. Broder: On the resemblance and containment of documents. In Proc. Conf. Compres. Complex. Sequences (CCCS’97), pp. 21–29, 1997. [doi:10.1109/SEQUEN.1997.666900]

[14]   Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig: Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8-13):1157–1166, 1997. (Proc. 6th Int. World Wide Web Conf. (WWW’97)). [doi:10.1016/S0169-7552(97)00031-7]

[15]   Jeremy Buhler: Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 17(5):419–428, 2001. [doi:10.1093/bioinformatics/17.5.419]

[16]   Jeremy Buhler and Martin Tompa: Finding motifs using random projections. J. Computational Biology, 9(2):225–242, 2002. Preliminary version in RECOMB’01. [doi:10.1089/10665270252935430]

[17]   Andrea Califano and Isidore Rigoutsos: FLASH: A fast look-up algorithm for string homology. In Proc. 1st Int. Conf. on Intelligent Systems for Molecular Biology (ISMB’93), pp. 56–64, 1993. [ACM:645630.662865]

[18]   Paul B. Callahan and S. Rao Kosaraju: A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields. J. ACM, 42(1):67–90, 1995. Preliminary version in STOC’92. [doi:10.1145/200836.200853]

[19]   Timothy M. Chan: Approximate nearest neighbor queries revisited. Discrete Comput. Geom., 20(3):359–373, 1998. Preliminary version in SCG’97. [doi:10.1007/PL00009390]

[20]   Moses Charikar and Robert Krauthgamer: Embedding the Ulam metric into 1. Theory of Computing, 2(1):207–224, 2006. [doi:10.4086/toc.2006.v002a011]

[21]   Moses S. Charikar: Similarity estimation techniques from rounding algorithms. In Proc. 34th STOC, pp. 380–388. ACM Press, 2002. [doi:10.1145/509907.509965]

[22]   Kenneth L. Clarkson: A randomized algorithm for closest-point queries. SIAM J. Comput., 17(4):830–847, 1988. [doi:10.1137/0217052]

[23]   Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein: Introduction to Algorithms. MIT Press, 2nd edition, 2001.

[24]   Graham Cormode: Sequence Distance Embeddings. Ph.D. Thesis. University of Warwick, 2003.

[25]   Graham Cormode, S. Muthukrishnan, and Süleyman Cenk Sahinalp: Permutation editing and matching via embeddings. In Proc. 28th Internat. Colloq. on Automata, Languages and Programming (ICALP’01), pp. 481–492, 2001. [doi:10.1007/3-540-48224-5_40]

[26]   Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni: Locality-sensitive hashing scheme based on p-stable distributions. In Proc. 20th Ann. Symp. on Computational Geometry (SCG’04), pp. 253–262, 2004. [doi:10.1145/997817.997857]

[27]   Debojyoti Dutta, Rajarshi Guha, Peter C. Jurs, and Ting Chen: Scalable partitioning and exploration of chemical spaces using geometric hashing. Journal of Chemical Information and Modeling, 46(1):321–333, 2006. [doi:10.1021/ci050403o]

[28]   Kristen Grauman and Trevor Darrell: The pyramid match kernel: Efficient learning with sets of features. J. Machine Learning Research, 8:725–760, 2007. Preliminary version in ICCV’05. [ACM:1314524]

[29]   Dan Greene, Michal Parnas, and Frances Yao: Multi-index hashing for information retrieval. In Proc. 35th FOCS, pp. 722–731. IEEE Comp. Soc. Press, 1994. [doi:10.1109/SFCS.1994.365720]

[30]   Sariel Har-Peled: A replacement for Voronoi diagrams of near linear size. In Proc. 42nd FOCS, pp. 94–103. IEEE Comp. Soc. Press, 2001. [doi:10.1109/SFCS.2001.959884]

[31]   Sariel Har-Peled: Geometric Approximation Algorithms. Amer. Math. Soc., 2011.

[32]   Sariel Har-Peled and Soham Mazumdar: On coresets for k-means and k-median clustering. In Proc. 36th STOC, pp. 291–300. ACM Press, 2004. [doi:10.1145/1007352.1007400]

[33]   Taher H. Haveliwala, Aristides Gionis, Dan Klein, and Piotr Indyk: Evaluating strategies for similarity search on the web. In Proc. 11th Internat. Conf. on World Wide Web (WWW’02), pp. 432–442. ACM Press, 2002. [doi:10.1145/511446.511502]

[34]   Piotr Indyk: Nearest neighbors in high-dimensional spaces. In J. E. Goodman and J. O’Rourke, editors, Handbook of Discrete and Computational Geometry, chapter 39, pp. 877–892. CRC Press LLC, 2nd edition, 2004.

[35]   Piotr Indyk and Rajeev Motwani: Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th STOC, pp. 604–613. ACM Press, 1998. [doi:10.1145/276698.276876]

[36]   Piotr Indyk and Rajeev Motwani: Approximate nearest neighbors: towards removing the curse of dimensionality. A final version of the STOC’98 paper, available at http://people.csail.mit.edu/indyk/nndraft.ps, 1999.

[37]   Piotr Indyk and Nitin Thaper: Fast color image retrieval via embeddings. In Work. Statis. Comput. Theo. Vision, 2003. Held at ICCV’03. [PDF].

[38]   William B. Johnson and Joram Lindenstrauss: Extensions of Lipschitz mapping into Hilbert space. Contemporary Mathematics, 26:189–206, 1984.

[39]   William B. Johnson and Gideon Schechtman: Embedding lpm into l1n. Acta Mathematica, 149:71–85, 1982. [doi:10.1007/BF02392350]

[40]   Richard M. Karp, Orli Waarts, and Geoffrey Zweig: The bit vector intersection problem. In Proc. 36th FOCS, pp. 621–630. IEEE Comp. Soc. Press, 1995. [doi:10.1109/SFCS.1995.492663]

[41]   Subhash Khot and Assaf Naor: Nonembeddability theorems via Fourier analysis. Mathematische Annalen, 334(4):821–852, 2006. Preliminary version in FOCS’05. [doi:10.1007/s00208-005-0745-0]

[42]   Jon M. Kleinberg: Two algorithms for nearest-neighbor search in high dimensions. In Proc. 29th STOC, pp. 599–608. ACM Press, 1997. [doi:10.1145/258533.258653]

[43]   Eyal Kushilevitz, Rafail Ostrovsky, and Yuval Rabani: Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput., 30(2):457–474, 2000. Preliminary version at STOC’98. [doi:10.1137/S0097539798347177]

[44]   Nathan Linial, Eran London, and Yuri Rabinovich: The geometry of graphs and some of its algorithmic applications. Combinatorica, 15(2):215–245, 1995. Preliminary version in FOCS’94. [doi:10.1007/BF01200757]

[45]   Richard J. Lipton and Robert Endre Tarjan: Applications of a planar separator theorem. SIAM J. Comput., 9(3):615–627, 1980. [doi:10.1137/0209046]

[46]   Stefan Meiser: Point location in arrangements of hyperplanes. Inform. and Comput., 106(2):286–303, 1993. [doi:10.1006/inco.1993.1057]

[47]   Marvin Minsky and Seymour Papert: Perceptrons. MIT Press, Cambridge, MA, 1969.

[48]   Rajeev Motwani, Assaf Naor, and Rina Panigrahy: Lower bounds on locality sensitive hashing. SIAM J. Discrete Math., 21(4):930–935, 2007. Preliminary version at SCG’06. [doi:10.1137/050646858]

[49]   S. Muthukrishnan and Süleyman Cenk Sahinalp: Approximate nearest neighbors and sequence comparison with block operations. In Proc. 32nd STOC, pp. 416–424. ACM Press, 2000. [doi:10.1145/335305.335353]

[50]   Ryan O’Donnell, Yi Wu, and Yuan Zhou: Optimal lower bounds for locality sensitive hashing (except when q is tiny). In Proc. 2nd Symp. Innovations in Comput. Sci. (ICS’11), pp. 275–283, 2011. [ICS’11], [arXiv].

[51]   Rafail Ostrovsky and Yuval Rabani: Low distortion embeddings for edit distance. J. ACM, 54(5), 2007. Preliminary version in STOC’05. [doi:10.1145/1284320.1284322]

[52]   Rina Panigrahy: Entropy based nearest neighbor search in high dimensions. In Proc. 17th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’06), pp. 1186–1195. ACM Press, 2006. [doi:10.1145/1109557.1109688]

[53]   Rina Panigrahy, Kunal Talwar, and Udi Wieder: Lower bounds on near neighbor search via metric expansion. In Proc. 51st FOCS, pp. 805–814. IEEE Comp. Soc. Press, 2010. [doi:10.1109/FOCS.2010.82]

[54]   Ramamohan Paturi, Sanguthevar Rajasekaran, and John Reif: The light bulb problem. Inform. and Comput., 117(2):187–192, 1995. Preliminary version in COLT’89. [doi:10.1006/inco.1995.1038]

[55]   Gilles Pisier: The volume of convex bodies and Banach space geometry. Cambridge University Press, 1989.

[56]   Deepak Ravichandran, Patrick Pantel, and Eduard Hovy: Randomized algorithms and NLP: Using locality sensitive hash functions for high speed noun clustering. In Proc. Conf. 43rd Ann. Meeting of the Association for Computational Linguistics (ACL’05), 2005. [ACM:1219917]

[57]   Yogish Sabharwal, Nishant Sharma, and Sandeep Sen: Nearest neighbors search using point location in balls with applications to approximate Voronoi decompositions. J. Comput. System Sci., 72(6):955–977, 2006. Preliminary version at FSTTCS’02. [doi:10.1016/j.jcss.2006.01.007]

[58]   Gregory Shakhnarovich, Trevor Darrell, and Piotr Indyk, editors. Nearest Neighbor Methods in Learning and Vision. Neural Processing Information Series, MIT Press, 2006.

[59]   Michael Ian Shamos and Dan Hoey: Closest-point problems. In Proc. 16th FOCS, pp. 151–162. IEEE Comp. Soc. Press, 1975. [doi:10.1109/SFCS.1975.8]

[60]   Richard Weber, Hans-J. Schek, and Stephen Blott: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proc. 24th Internat. Conf. on Very Large Data Bases (VLDB’98), pp. 194–205, 1998. [ACM:671192]