Transparent Gif

Department of Computer Science

University of California, Santa Barbara

Abstract

Using Transformation Techniques Towards Efficient Filtration of String Proximity Search of Biological Sequences

by: S. Alireza Aghili, Divyakant Agrawal, Amr El Abbadi

Abstract:

The problem of proximity search in biological databases is addressed. We study vector transformations and conductthe application of DFT(Discrete Fourier Transformation) and DWT(Discrete Wavelet Transformation, Haar) dimensionalityreduction techniques for DNA sequence proximity search to reduce the search time of range queries. Our empiricalresults on a number of Prokaryote and Eukaryote DNA contig databases demonstrate up to 50-fold filtration ratio of thesearch space, and up to 13 times faster filtration. The proposed transformation techniques may easily be integrated asa preprocessing phase on top of the current existing similarity search heuristics such as BLAST, PattenHunter, FastATA,QUASAR and to efficiently prune non-relevant sequences. We study the precision of applying dimensionality reductiontechniques for faster and more efficient range query searches,and discuss the imposed trade-offs.

Keywords:

Approximate String Search, Dimensionality Reduction, Transformation, Range Query, Bioinformatics, Databases

Date:

January 2003

Document: 2003-01

XHTML Validation | CSS Validation
Updated 14-Nov-2005
Questions should be directed to: webmaster@cs.ucsb.edu