ScalES-PPM
Loading...
Searching...
No Matches
Partition How-to

Overview

This document intends to show how to:

  • Describe distributed data.
  • Decompose a global domain by various partitioning routines.
  • Repartition to adjust for changing work-load.

The partitioning modules provide the following functionality:

Multiple data structures are available to represent a partition, addressing the different needs of the available algorithms.

  • set_i4: The most simple representation is simply an array of integers denoting the members of one part, an array of set_i4 consequently can represent the partition
  • partition_assignment denotes for each index (potentially with an indirection map) the partition it's assigned to.
  • partition_vec is a concise representation of the same information an array of type set_i4 holds, but only uses two allocatables, one to hold the elements of all sets and another to note the division indices in this array.
  • block_decomposition finally describes block decompositions only, where each division of one dimension is described by one instance of block_decomposition. Therefore, 2D block decomposition would typically use a 2-element array of block_decomposition.

Setting up for Partitioning

To decompose a data structure one first needs to describe the topology of the data structure in question. ScalES-PPM provides for several basic data types to make this step as simple as possible:

  • n-dimensional rectilinears like e.g. regular grids use use a rank 1 n-dimensional extent array to describe the enumeration of indices in each dimension

    i.e. to describe a 2-dimensional grid with indices 1..m in x-direction and indices 1..n in y-direction the corresponding data structure would be declared as:

    USE ppm_extents, ONLY: extent
    ...
    TYPE(extent) :: grid(2)
    INTEGER :: m, n
    ...
    grid = (/ extent(1, m), extent(1, n) /)
    contains definition of extent and interval types and associated functions
    Definition ppm_extents.f90:50
  • For graph-structured data contains a distributed graph data structure ppm_distributed::graph_csr_dist_i4 graph_csr_dist_i4. The module also contains routines to construct such a graph from a rectilinear partitioned into one rectilinear per process.

  • For repartitioning the input already is a partition. The following types are therefore both input and output argument to some routines.

    Given p parts over a set of n elements, arbitrary partitions vary in requirements. To bridge the gap from succinct but rigid to flexible but more redundant data structures the following types are provided by :

  • partition_vec consists of two tables start(1:p+1) tabulates the sub-array of the second table elements which lists the elements sorted by partition, i.e. elements(start(x):start(x+1)-1) lists the elements of part x
  • set_i4 lists in its component array elem the elements for one (sub-)set. A vector of set_i4 can accordingly describe a partition.
  • partition_assignment with component part_range = [1,p], component array assigned tabulates for each element $i \in [1,n]$ the assignment to partition elem(i) where elem(i) $\in$part_range
  • block_decomposition In a block decomposition, each part only contains one contiguous range of indices per dimension. Each dimension having elements 1..n can therefore be partitioned into parts where part i is partition(i).

also contains assignment and equality operators and routines to handle conversions and other basic operations on these types.

Computing a partition

Uniform partition of n-dimensional grids

Extending the example from above, an m $\times$ n grid can be partitioned into k $\times$l even sized parts by calling the uniform_partition method of : USE ppm_extents, ONLY: extent USE ppm_rectilinear, ONLY: lidx2rlcoord USE ppm_uniform_partition, ONLY: uniform_partition ... TYPE(extent) :: grid(2), part(2), deco(2) INTEGER :: m, n, k, l, comm_rank, ierror ... CALL mpi_comm_rank(mpi_comm_world, comm_rank, ierror) deco = (/ extent(1, k), extent(1, l) /) grid = (/ extent(1, m), extent(1, n) /) part = uniform_partition(grid, deco, lidx2rlcoord(deco, comm_rank + 1)) In the above example, the linear MPI rank is mapped to a cartesian coordinate with the help of the utility module ppm_rectilinear.

Because the axes are divided separately, the partition thus obtained forms a block partition.

Balanced hierarchical partition

Graph partition

The library provides convenience wrappers for both MeTiS and ParMeTiS. The following example code shows how to

  1. build a graph from a rectilinear grid
  2. call the MeTiS wrapper
USE ppm_extents, ONLY: extent
USE ppm_graph_csr, ONLY: build_graph, graph_csr
USE ppm_set_partition_base, ONLY: partition_assignment
USE ppm_graph_partition_serial, ONLY: graph_partition_metis
...
TYPE(extent) :: grid(2)
INTEGER :: m, n, k, l, num_parts, ierror
INTEGER ::
TYPE(graph_csr) :: grid_graph
TYPE(partition_assignment) :: partition
INTEGER, ALLOCATABLE :: grid_pt_weight(:)
...
grid = (/ extent(1, m), extent(1, n) /)
CALL build_graph(grid_graph, grid)
ALLOCATE(grid_pt_weight(m * n))
! compute weights per grid point
CALL graph_partition_metis(partition, grid_graph, num_parts, &
vertex_weights=grid_pt_weights)
data structure for representation of graph in csr format
Definition ppm_graph_csr.f90:47
perform partitioning of graph from serial code
Definition ppm_graph_partition_serial.f90:49
basic routines and data structures for handling partitions
Definition ppm_set_partition_base.f90:47

Set partition

If your data has no inherent connectivity, i.e. the problem is embarrassingly parallel, graphs or grids are improper models to use for partitioning. In this case partitioning by weight only is probably the most sensible solution. The library provides a greedy method to partition such data sets.

USE ppm_set_partition_base, ONLY: set_i4
USE ppm_set_partition, ONLY: greedy_partitioning
...
INTEGER, PARAMETER :: set_size=1000
TYPE(set_i4), ALLOCATABLE :: partition(:)
INTEGER :: weights(set_size)
...
CALL greedy_partitioning(partition, weights)

Repartition

Repartitioning for graphs

Repartitioning sets

For an already partitioned set the library offers a routine based on swapping elements such that memory reallocation can be avoided. The example is for the multi-process variant which has MPI collective call semantics.

USE ppm_set_repartition, ONLY: repartition_swap_mp
USE ppm_set_partition_base, ONLY: set_i4
...
INTEGER(i4), ALLOCATABLE :: weight(:)
TYPE(set_i4) :: part, new_part
...
new_part = part
CALL repartition_swap_mp(new_part%elem, weight, mpi_comm_world)

Based on the part and new_part arrays of above example, the application can then reorganize the data decomposition. If a certain amount of imbalance among partitions is tolerable, it can be beneficial to add an efficiency_threshold argument to the call to repartition_swap_mp.

Das diesem Bericht zugrundeliegende Vorhaben wurde mit Mitteln des Bundesministeriums für Bildung, und Forschung unter dem Förderkennzeichen 01IH08004E gefördert. Die Verantwortung für den Inhalt dieser Veröffentlichung liegt beim Autor.