Abstract
Run-time Techniques for Exploiting Irregular Task Parallelism onDistributed Memory Architectures
by: Cong Fu and Tao Yang
Abstract:
Automatic scheduling for directed acyclic graphs (DAG) and its applications forcoarse-grained irregular problems such as large n-body simulation have beenstudied in the literature. However solving irregular problems with mixedgranularities such as sparse matrix factorization is challenging since itrequires efficient run-time support for executing the DAG schedule. In thispaper, we investigate run-time compilation and supporting techniques forexecuting general asynchronous DAG schedules on distributed memory machines.Our solution tightly integrates the run-time scheme with a fast communicationmechanism to eliminate unnecessary overhead in message buffering and copying,and takes advantage of task dependence properties to ensure the correctness ofexecution. We demonstrate the applications of this scheme in sparse Choleskyand LU factorizations for which actual speedups have been hard to obtain in theliterature. Our experiments on Meiko CS-2 show that the automaticallyscheduled code achieves scalable performance for these problems and therun-time overhead is small compared to the total execution time.
Keywords:
Directed acyclic graphs, task scheduling, schedule execution,sparse Cholesky factorization, sparse LU factorization.
Date:
November 1995
Document: 1995-21