Transparent Gif

Department of Computer Science

University of California, Santa Barbara

Abstract

Fault-tolerance of Distributed Multithreaded Applications inShared-Nothing Systems

by: J. James and A. Singh

Abstract:

The ubiquity of distributed systems has led to increasingly complex distributedapplications. That complexity has been increased by multithreadedapplications, shared-nothing environments like the Internet, and the use ofnested transactions to access multiple sets of data atomically. Providingfault tolerance for such applications is complicated by the loss of thepiecewise determinism assumption (due to multithreading), the necessity ofreplicating data (due to the shared-nothing environment), and the necessity ofmaintaining consistency for nested transactions.Providing fault tolerance for such applications in an ad hoc manner isdifficult. We explore a systematic approach to providing fault tolerance. Weshow that the assumption of data-race-freedom has some of the benefits ofpiecewise determinism, but allows multithreading. We develop a logical ringstructure for the logging and recovery processes, and show how the ringsimplifies those tasks. We discuss roll-forward versus roll-backward recoveryand suggest the use of roll-forward techniques. Finally, we investigate amessage combining technique that can reduce the number of messages sent duringthe logging process.

Keywords:

fault-tolerance, logging, recovery, multithreading

Date:

November 1999

Document: 1999-35

XHTML Validation | CSS Validation
Updated 14-Nov-2005
Questions should be directed to: webmaster@cs.ucsb.edu