i) Block row striping
ii) Cyclic row striping
iii) 2D block partitioning
iv) 2D cyclic partitioning
a) Given the following grid graph and assume that p=4. For each mapping show which vertices are mapped to which processors.
a - b - c - d
| |
| |
e - f - g - h
| |
| |
i - j - k - l
| |
| |
m-n -o - p
b) For general n and p rank the 4 different partitioning schemes in terms of execution time for this problem. You can assume that p is square number >1 and that both sqrt(p) and p divides n.
min_distance = +oo
do i=1,n
if (distance(i) < min_distance)
min_distance = distance(i)
min_location = i
endif
end do
a)
Explain why it is not possible to parallelize this loop on a shared
memory computer by only inserting OpenMP directives.
b)
OpenMP has an integer function OMP_GET_NUM_THREADS() that returns the
number of processors used. Use this function and rewrite the loop
so that it can be parallelized using OpenMP directives. You can assume
that p<<n and that p divides n.
Other possible question: Reproduce the parallel version of Dijkstras
algorithm, possibly with analysis.
a)
Give a sequential algorithm that solves this problem in O(n) time.
b)
Define the function C(i,j) as the sum of ai,a(i+1),...,aj. Given
C(1,n/2) and C(n/2+1,n/2+j) for j=1,n/2. Explain how this can be used to
calculate C(1,j) for j=n/2,n in O(1) time on a CREW PRAM.
c)
Based on the observation in a), design a parallel algorithm for solving
the prefix sum problem on a CREW PRAM using n/2 processors. You can assume
that n is a power of 2.
(Possible hint: Use recursive halving when designing the algorithm).
Give the parallel execution time, speedup, and cost of your algorithm.
Is it cost optimal?
d)
Modify the algorithm in c) to use p processors, p<n/2. You can assume
that p is a power of 2. What is the parallel execution time and speedup
of your algorithm?
e)
For what values of p is the algorithm in d) cost optimal.
Other possible question: Map your algorithm onto a particular architecture and analyze.
a)
Assume that we know C(1,2i) for i=1,n/2. Explain how this can be used
to calculate C(1,i) for i=1,n.
b)
Based on the observation in a) design a parallel algorithm for solving
the prefix sum problem on a EREW PRAM using n/2 processors. (Hint: Start
by adding ai to a(i+1) for i odd.)
Give the parallel execution time, speedup, and cost of your algorithm.
c) Modify the algorithm in b) to use p processors, p<n/2. You can assume that p is a power of 2.
d) For what values of p is the algorithm in c) cost optimal.
e) What is the execution time of your algorithm when using the largest value for p from d) ?
Other possible question: Map algorithm onto a particular architecture and analyze.