Abstract
Accelerating High-dimensional Nearest Neighbor Queries
by: Christian A. Lang, Ambuj K. Singh
Abstract:
The performance of nearest neighbor (NN) queries degrades noticeablywith increasing dimensionality of the data. This stems not only from reducedselectivity of high-dimensional data but also from an increased number of seekoperations during NN-query execution.If the NN-radii would be known in advance, the disk accesses could be reorderedsuch that seek operations are minimized. We therefore propose a new way ofestimating the NN-radius based on the fractal dimensionality and sampling. We show that the estimation error is considerably lower than for previous approaches. Since it is possible that the radius is underestimated, we proposea way of finding a close upper bound on the radius based on a given radius estimate. This technique is applicable to any page-based index structure. Weshow that this upper bound is close to the correct radius.In the second part of the paper, we present two applications of these findings.We show how the radius estimations can be used to transform k-NN queriesinto at most two range queries, and how it can be used to reduce the number of page reads during all-NN queries. In both cases, we observe significant speedups over traditional techniques for synthetic and real-world data.
Keywords:
nearest neighbor query, radius estimation, high-dimensional
Date:
29 January 2002
Document: 2002-04