Examples of problems that could have appeared on the exam for I236

1.
Given an n x n grid graph with weights on the vertices. For each vertex we wish to calculate the average weight of its own weight and those of its neighbors. This is repeated until a stable solution is found. Consider the following mapping schemes onto a p processor distributed memory computer:

i)   Block row striping
ii)  Cyclic row striping
iii) 2D block partitioning
iv) 2D cyclic partitioning

a) Given the following grid graph and assume that p=4. For each mapping show which vertices are mapped to which processors.

a - b - c - d
 |      |      |      |
e - f  - g - h
 |      |      |      |
 i - j  - k - l
 |     |       |      |
 m-n -o - p

b) For general n and p rank the 4 different partitioning schemes in terms of execution time for this problem. You can assume that p is square number >1 and that both sqrt(p) and p divides n.

2.

The following code is taken from a sequential  implementation of Dijkstras algorithm. You can assume that distance(i) is set to +oo for all vertices where the minimum distance from the source is known.

min_distance = +oo
do i=1,n
     if (distance(i) < min_distance)
        min_distance = distance(i)
        min_location = i
     endif
end do

a)
Explain why it is not possible to parallelize this loop on a shared memory computer by only inserting OpenMP directives.

b)
OpenMP has an integer function OMP_GET_NUM_THREADS() that returns the number of processors used.  Use this function and rewrite the loop so that it can be parallelized using OpenMP directives. You can assume that p<<n and that p divides n.

Other possible question: Reproduce  the parallel version of Dijkstras algorithm, possibly with analysis.
 

3.

Given a sequence of real numbers a1, a2, a3, ..., an. For each a(k) k=1,n we wish to compute the sum of the k first values. This is known as the prefix sum problem.

a)
Give a sequential algorithm that solves this problem in O(n) time.

b)
Define the function C(i,j) as the sum of ai,a(i+1),...,aj.  Given C(1,n/2) and C(n/2+1,n/2+j) for j=1,n/2. Explain how this can be used to calculate C(1,j) for j=n/2,n in O(1) time on a CREW PRAM.

c)
Based on the observation in a), design a parallel algorithm for solving the prefix sum problem on a CREW PRAM using n/2 processors. You can assume that n is a power of 2.
(Possible hint: Use recursive halving when designing the algorithm).
Give the parallel execution time, speedup, and cost of your algorithm. Is it cost optimal?

d)
Modify the algorithm in c) to use p processors, p<n/2. You can assume that p is a power of 2. What is the parallel execution time and speedup of your algorithm?

e)
For what values of p is the algorithm in d) cost optimal.

Other possible question: Map your algorithm onto a particular architecture and analyze.

4.

Again we consider the prefix sum problem from 3.

a)
Assume that we know C(1,2i) for i=1,n/2. Explain how this can be used to calculate C(1,i) for i=1,n.

b)
Based on the observation in a) design a parallel algorithm for solving the prefix sum problem on a EREW PRAM using n/2 processors. (Hint: Start by adding ai to a(i+1) for i odd.)
Give the parallel execution time, speedup, and cost of your algorithm.

c) Modify the algorithm in b) to use p processors, p<n/2. You can assume that p is a power of 2.

d) For what values of p is the algorithm in c) cost optimal.

e) What is the execution time of  your algorithm when using the largest value for p from d) ?

Other possible question: Map algorithm onto a particular architecture and analyze.