##
Approximations for the General Block Distribution of a Matrix

####
Abstract

The general block distribution of a matrix is a rectilinear partition of
the matrix into orthogonal blocks such that the maximum sum of the elements
within a single block is minimized. This corresponds to partitioning the
matrix onto parallel processors so as to minimize processor load while
maintaining regular communication patterns. Applications of the problem
include various parallel sparse matrix computations, compilers for high-performance
languages, particle in cell computations, video and image compression,
and simulations associated with a communication network. We analyze the
performance guarantee of a natural and practical heuristic based on iterative
refinement, which has previously been shown to give good empirical results.
When $p^2$ is the number of blocks, we show that the tight performance
ratio is $\theta(\sqrt{p})$. When the matrix has rows of large cost, the
details of the objective function of the algorithm are shown to be important,
since a naive implementation can lead to a $\Omega(p)$ performance ratio.
Extensions to more general cost functions, higher-dimensional arrays, and
randomized initial configurations are also considered. In comparison, Khanna
et al.\ have shown the problem to be approximable within $O(1)$ factor
\cite{KMS97}, while Grigni and Manne have shown it to be NP-hard to approximate
within any constant factor less than two \cite{GM96}.