Transparent Gif

Department of Computer Science

University of California, Santa Barbara

Abstract

Duplicate Detection in Click Streams

by: Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi

Abstract:

We consider the problem of finding duplicates in data streams. Duplicate detection in data streams is utilized in various applications including fraud detection. We develop a solutions based on Bloom Filters, and discuss the space and time requirements for running the proposed algorithm in both the contexts of sliding, and landmark stream windows. We run a comprehensive set of experiments, using both real and synthetic click streams, to evaluate the performance of the proposed solution. The results demonstrate that the proposed solution yields extremely low error rates.

Keywords:

Data Streams, Duplicate Detection, Bloom Filters, advertising networks, sliding windows, landmark windows

Date:

September 2004

Document: 2004-23

XHTML Validation | CSS Validation
Updated 14-Nov-2005
Questions should be directed to: webmaster@cs.ucsb.edu