Reference : Introduction to Parallel Computing, Edition Second by Vipin Kumar et al
Partitioning Strategies
- Replicated Data Approach (No partitioning)
- Each has the entire data but does a subset of computation.
- Partition to different process (Most Common)
- Domain Decomposition
- Divide and Conquer
- Partition of program func (Less Common)
- Functional Decomposition
Partition to Diff Process
Use MPI_Scatter and MPI_reduce for Vector Addition
- Root sends data to all processes (including itself)
- MPI_Scatterv() —> scatters variable lengths. MPI_Allreduce() —> returns result to all processors
Domain Decomp via Divide-and-Conquer
- For problems that can be recursively divided into smaller probs of the same type; such as summation.
- The analysis for simple binary tree is as following
The Reduce and Scan Abstractions
- 이미 프로세서들에게 파티션된 벡터들의 합이 이에 해당한다.
- Reduce: Combines a set of val to produce a single value; mapping a binary tree communication pattern between processors.
- Scan: performs a seq opreation in parts and carries along the intermediate results.
Scans
Replacs each element with the cumulative sum of all preceding elements (inclusive or exclusive)
Parallel random-access machine
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15X = [0,1,2,3,4,5,6,7]
Inc_Scan = [0,1,3,6,10,15,21,28]
Exc_Scan = [0,0,1,3,6,10,15,21]
//EX #1 - 8 val on 8 nodes.
for (d=1; d < N; d ∗= 2)
FORALL (k=0; k < N; k ++) IN PARALLEL
if (k >= d)
X[k] = X[k−d] + X[k];
/***
Ex #2 - Bucket Sort
1. Assign one bucket to each processors
2. Assign p small buckets to each processors --> can ues MPI_Alltoall()
Ex #3 - Integraton (SPMD model using static distribution)
--> Uneven workload, terminating point for small division is an issue.The analysis for parallel bucket sort.
- The worst case scenario: data is chunked at certain value —> same as the bucket sort problem?
Barnes Hut Algorithm
- Algorithm —> It creates oct-tree
- Divide one cube into 8
- Delete if no needed
- sub-cube with more than 1 divided into 8 again.
- Continue untill each one has only one particle.
- Scaling is O(n lg n)
- Load balancing is likely be an issue for parallel code.