I236 Obligatorisk oppgave 1

Obligatory task 1

Develop a parallel program for multiplication of large matrices A and B ( C = AxB ) of double precision distributed on a rectangular mesh of processors. Choose a message-passing algorithm that you consider the best, for example, Cannon's algorithm or the DNS one. (Don't copy Pacheco's implementation of Fox's algorithm). Use the MPI communication library.

You must write a report including

Detailed description of your algorithm with analysis of its efficiency and scalability
Detailed description of your code
Source listing of the code with detailed comments
Description of tests with graphs of performance in Megaflops (run on several processor grids and with several matrix sizes)
Report on the best Megaflop rate per node attained with your code
Make your source files accessible through the network

HINTS to reach maximum performance:

1. If you want to get maximum performance, you should code all local (on each node) matrix operations in terms of the BLAS library (linked with -lblas).

2. Look at the local page for compiler switches to use.