Abstract
Predicting the Performance of Index Structures for High-Dimensional Datasets
by: C. Lang and A. Singh
Abstract:
We present two new models for predicting the number of index page accessesduring nearest neighbor queries for high-dimensional datasets, a density-basedmodel that depends weakly on the underlying index structure and results incoarser predictions, and a sampling-based model that depends strongly on theunderlying index structure and gives more accurate predictions. We givedetailed evaluations of both models and validate them experimentally byconsidering a number of synthetic and real datasets. We analyze the error inthe models and illustrate how it can be minimized. As an application of ourmodels, we show how the sampling-based model can be used to determine theoptimal number of index dimensions for a high-dimensional dataset. Sinceexisting models are tailored either to uniform datasets or to a small number ofdimensions, this research makes a significant contribution to the understandingof high-dimensional multimedia data.
Keywords:
databases, multidimensional data, performance evaluation, multimedia
Date:
November 1999
Document: 1999-34