Abstract

Vector Approximation based Indexing for Non-uniform HighDimensional Data Sets

by: H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. El Abbadi

Abstract:

With the proliferation of multimedia data, there is increasing need tosupport the indexing and searching of high dimensional data. Recently,a vector approximation based technique called VA-file has beenproposed for indexing high dimensional data. It has been shown thatthe VA-file is an effective technique compared to the currentapproaches based on space and data partitioning. The VA-file givesgood performance especially when the data set is uniformlydistributed. Real data sets are not uniformly distributed, are oftenclustered, and the dimensions of the feature vectors in real data setsare usually correlated. More careful analysis for non-uniform orcorrelated data is needed for effectively indexing high dimensionaldata. We propose a solution to these problems and propose a newtechnique for indexing high dimensional data sets based on vectorapproximations. We conclude with an evaluation of nearest neighborqueries and show that the proposed technique results significantimprovements over the current VA-file approach for several real datasets.

Keywords:

indexing, vector approximation, high dimensional data, non-uniform data.

Date:

May 2000

Document: 2000-10

Department of Computer Science

University of California, Santa Barbara

Abstract

Vector Approximation based Indexing for Non-uniform HighDimensional Data Sets

by: H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. El Abbadi