Obligatory task 1

Develop a parallel program for multiplication of large matrices A and B ( C = AxB ) of double precision distributed on a rectangular mesh of processors. Choose a message-passing algorithm that you consider the best, for example, Cannon's algorithm or the DNS one. (Don't copy Pacheco's implementation of Fox's algorithm). Use the MPI communication library.

You must write a report including

HINTS to reach maximum performance:

1. If you want to get maximum performance, you should code all local (on each node) matrix operations in terms of the BLAS library (linked with -lblas).

2. Look at the local page for compiler switches to use.