Given an

i) Block row striping

ii) Cyclic row striping

iii) 2D block partitioning

iv) 2D cyclic partitioning

a) Given the following grid graph and assume that *p*=4. For each
mapping show which vertices are mapped to which processors.

a - b - c - d

| |
| |

e - f - g - h

| |
| |

i - j - k - l

| |
| |

m-n -o - p

b) For general *n* and *p* rank the 4 different partitioning
schemes in terms of execution time for this problem. You can assume that
*p* is square number >1 and that both sqrt(*p*) and *p*
divides *n*.

min_distance = +oo

do i=1,n

if (distance(i) < min_distance)

min_distance = distance(i)

min_location = i

endif

end do

a)

Explain why it is not possible to parallelize this loop on a shared
memory computer by only inserting OpenMP directives.

b)

OpenMP has an integer function OMP_GET_NUM_THREADS() that returns the
number of processors used. Use this function and rewrite the loop
so that it can be parallelized using OpenMP directives. You can assume
that *p<<n *and that *p* divides *n*.

Other possible question: Reproduce the parallel version of Dijkstras
algorithm, possibly with analysis.

a)

Give a sequential algorithm that solves this problem in O(n) time.

b)

Define the function C(i,j) as the sum of ai,a(i+1),...,aj. Given
C(1,n/2) and C(n/2+1,n/2+j) for j=1,n/2. Explain how this can be used to
calculate C(1,j) for j=n/2,n in O(1) time on a CREW PRAM.

c)

Based on the observation in a), design a parallel algorithm for solving
the prefix sum problem on a CREW PRAM using n/2 processors. You can assume
that n is a power of 2.

(Possible hint: Use recursive halving when designing the algorithm).

Give the parallel execution time, speedup, and cost of your algorithm.
Is it cost optimal?

d)

Modify the algorithm in c) to use p processors, p<n/2. You can assume
that p is a power of 2. What is the parallel execution time and speedup
of your algorithm?

e)

For what values of p is the algorithm in d) cost optimal.

Other possible question: Map your algorithm onto a particular architecture and analyze.

a)

Assume that we know C(1,2i) for i=1,n/2. Explain how this can be used
to calculate C(1,i) for i=1,n.

b)

Based on the observation in a) design a parallel algorithm for solving
the prefix sum problem on a EREW PRAM using n/2 processors. (Hint: Start
by adding ai to a(i+1) for i odd.)

Give the parallel execution time, speedup, and cost of your algorithm.

c) Modify the algorithm in b) to use p processors, p<n/2. You can assume that p is a power of 2.

d) For what values of p is the algorithm in c) cost optimal.

e) What is the execution time of your algorithm when using the largest value for p from d) ?

Other possible question: Map algorithm onto a particular architecture and analyze.