Obligatory task 1
Develop a parallel program for multiplication of large matrices A and B
( C = AxB ) of double precision distributed on a rectangular mesh of processors.
Choose a message-passing algorithm that you consider the best, for example,
Cannon's algorithm or the DNS one. (Don't copy Pacheco's implementation
of Fox's algorithm). Use the MPI communication library.
You must write a report including
-
Detailed description of your algorithm with analysis of its efficiency
and scalability
-
Detailed description of your code
-
Source listing of the code with detailed comments
-
Description of tests with graphs of performance in Megaflops (run on several
processor grids and with several matrix sizes)
-
Report on the best Megaflop rate per node attained with your code
-
Make your source files accessible through the network
HINTS to reach maximum performance:
1. If you want to get maximum performance, you should code all local
(on each node) matrix operations in terms of the BLAS library (linked with
-lblas).
2. Look at the local
page for compiler switches to use.