Transparent Gif

Department of Computer Science

University of California, Santa Barbara

Abstract

TWIX: Approximate and Exact Twig Structure and Content Matching over XML Document Collections using Binary Labeling

by: S. Alireza Aghili, Hua-Gang Li, Divyakant Agrawal, and Amr El Abbadi

Abstract:

XML queries specify predicates on the content and the structure of the elements of tree-structured XML documents. Hence, discovering the occurrences of twig (tree structure) query patterns is a core operation for XML query processing. Prior works have typically applied top-down decomposition of the twig patterns into (i) binary (parent-child or ancestor-descendant) relationships, or (ii) path expression queries, followed by a join operation to reconstruct matched twig patterns. However, most of these methods (1) rely on the user\'s knowledge of the underlying database to pose well-formed queries, and (2) suffer from inspecting too many irrelevant results. In this paper, we propose a novel heuristic for matching of XML twig query patterns, named TWIX, which imposed minimal restrictions on the user and causes substantial reduction of the search space through a distributed binary labeling technique. The algorithm incorporates a holistic ranking scheme of structure and content, named TRANK, to rank and report the top-k results. Furthermore, TWIX benefits from an interactive graphical user interface twig query matching. Experimental results on real datasets depict the ranking semantics and efficient filtration of the search space.

Keywords:

XML, Twig Query, Structure and Content Search, Binary Labeling

Date:

May 2005

Document: 2005-10

XHTML Validation | CSS Validation
Updated 14-Nov-2005
Questions should be directed to: webmaster@cs.ucsb.edu