Abstract

Predicting the Performance of Index Structures for High-Dimensional Datasets

by: C. Lang and A. Singh

Abstract:

We present two new models for predicting the number of index page accessesduring nearest neighbor queries for high-dimensional datasets, a density-basedmodel that depends weakly on the underlying index structure and results incoarser predictions, and a sampling-based model that depends strongly on theunderlying index structure and gives more accurate predictions. We givedetailed evaluations of both models and validate them experimentally byconsidering a number of synthetic and real datasets. We analyze the error inthe models and illustrate how it can be minimized. As an application of ourmodels, we show how the sampling-based model can be used to determine theoptimal number of index dimensions for a high-dimensional dataset. Sinceexisting models are tailored either to uniform datasets or to a small number ofdimensions, this research makes a significant contribution to the understandingof high-dimensional multimedia data.

Keywords:

databases, multidimensional data, performance evaluation, multimedia

Date:

November 1999

Document: 1999-34

Department of Computer Science

University of California, Santa Barbara

Abstract

Predicting the Performance of Index Structures for High-Dimensional Datasets

by: C. Lang and A. Singh