Abstract
PSI: Indexing Protein Structures for Fast Similarity Search
by: Orhan Camoglu, Tamer Kahveci, and Ambuj Singh
Abstract:
We consider the problem of finding similarities in proteinstructure databases. Our techniques extract feature vectors on triplets ofSSEs (Secondary Structure Elements). Later, these feature vectors are indexed using a multidimensional index structure. Our first technique finds proteins similar to a query protein in a protein dataset.This technique quickly prunes unpromising proteins using theindex structure. The remaining proteins are then aligned using a popularalignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Our second technique considers the problem of joining two protein datasets to find an all-to-all similarity. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times while keeping the sensitivity similar.
Keywords:
Protein structures, feature vectors, indexing, dataset join
Date:
January 2003
Document: 2003-03