Obligatory task 1
Develop a parallel program for multiplication of large matrices A and B
( C = AxB ) of double precision distributed on a rectangular mesh of processors.
Choose a message-passing algorithm that you consider the best, for example,
Cannon's algorithm or the DNS one. (Don't copy Pacheco's implementation
of Fox's algorithm). Use the MPI communication library.
You must write a report including
HINTS to reach maximum performance:
Detailed description of your algorithm with analysis of its efficiency
Detailed description of your code
Source listing of the code with detailed comments
Description of tests with graphs of performance in Megaflops (run on several
processor grids and with several matrix sizes)
Report on the best Megaflop rate per node attained with your code
Make your source files accessible through the network
1. If you want to get maximum performance, you should code all local
(on each node) matrix operations in terms of the BLAS library (linked with
2. Look at the local
page for compiler switches to use.