Partition

Reference : Introduction to Parallel Computing, Edition Second by Vipin Kumar et al

Partitioning Strategies

  • Replicated Data Approach (No partitioning)
    • Each has the entire data but does a subset of computation.
  • Partition to different process (Most Common)
    • Domain Decomposition
    • Divide and Conquer
  • Partition of program func (Less Common)
    • Functional Decomposition

Partition to Diff Process

Use MPI_Scatter and MPI_reduce for Vector Addition

  • Root sends data to all processes (including itself)
  • MPI_Scatterv() —> scatters variable lengths. MPI_Allreduce() —> returns result to all processors

Domain Decomp via Divide-and-Conquer

  • For problems that can be recursively divided into smaller probs of the same type; such as summation.
  • The analysis for simple binary tree is as following

The Reduce and Scan Abstractions

  • 이미 프로세서들에게 파티션된 벡터들의 합이 이에 해당한다.
  • Reduce: Combines a set of val to produce a single value; mapping a binary tree communication pattern between processors.
  • Scan: performs a seq opreation in parts and carries along the intermediate results.

Scans

  • Replacs each element with the cumulative sum of all preceding elements (inclusive or exclusive)

    Parallel random-access machine

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    X = [0,1,2,3,4,5,6,7]
    Inc_Scan = [0,1,3,6,10,15,21,28]
    Exc_Scan = [0,0,1,3,6,10,15,21]
    //EX #1 - 8 val on 8 nodes.
    for (d=1; d < N; d ∗= 2)
    FORALL (k=0; k < N; k ++) IN PARALLEL
    if (k >= d)
    X[k] = X[k−d] + X[k];
    /***
    Ex #2 - Bucket Sort
    1. Assign one bucket to each processors
    2. Assign p small buckets to each processors --> can ues MPI_Alltoall()

    Ex #3 - Integraton (SPMD model using static distribution)
    --> Uneven workload, terminating point for small division is an issue.
  • The analysis for parallel bucket sort.

  • The worst case scenario: data is chunked at certain value —> same as the bucket sort problem?

Barnes Hut Algorithm

  • Algorithm —> It creates oct-tree
    1. Divide one cube into 8
    2. Delete if no needed
    3. sub-cube with more than 1 divided into 8 again.
    4. Continue untill each one has only one particle.
  • Scaling is O(n lg n)
  • Load balancing is likely be an issue for parallel code.
Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×