3.4 Running applications in parallel
This section describes how to run OpenFOAM in parallel on distributed processors. The method of parallel computing used by OpenFOAM is known as domain decomposition, in which the geometry and associated fields are broken into pieces and allocated to separate processors for solution. The process of parallel computation involves: decomposition of mesh and fields; running the application in parallel; and, post-processing the decomposed case as described in the following sections. The parallel running uses the public domain openMPI implementation of the standard message passing interface (MPI).
3.4.1 Decomposition of mesh and initial field data
The mesh and fields are decomposed using the decomposePar utility. The underlying aim is to break up the domain with minimal effort but in such a way to guarantee a fairly economic solution. The geometry and fields are broken up according to a set of parameters specified in a dictionary named decomposeParDict that must be located in the system directory of the case of interest. An example decomposeParDict dictionary can be copied from the interFoam/damBreak tutorial if the user requires one; the dictionary entries within it are reproduced below:
18 numberOfSubdomains 4;
19
20 method simple;
21
22 simpleCoeffs
23 {
24 n ( 2 2 1 );
25 delta 0.001;
26 }
27
28 hierarchicalCoeffs
29 {
30 n ( 1 1 1 );
31 delta 0.001;
32 order xyz;
33 }
34
35 metisCoeffs
36 {
37 processorWeights ( 1 1 1 1 );
38 }
39
40 manualCoeffs
41 {
42 dataFile "";
43 }
44
45 distributed no;
46
47 roots ( );
48
49
50 // ************************************************************************* //
The user has a choice of four methods of decomposition, specified by the method keyword as described below.
- simple
- Simple geometric decomposition in which the domain is split into
pieces by direction, e.g. 2 pieces in the
direction, 1 in
etc.
- hierarchical
- Hierarchical geometric decomposition which is the same as
simple except the user specifies the order in which the directional split
is done, e.g. first in the
-direction, then the
-direction etc.
- scotch
- Scotch decomposition which requires no geometric input from the user and attempts to minimise the number of processor boundaries. The user can specify a weighting for the decomposition between processors, through an optional processorWeights keyword which can be useful on machines with differing performance between processors. There is also an optional keyword entry strategy that controls the decomposition strategy through a complex string supplied to Scotch. For more information, see the source code file: $FOAM_SRC/decompositionMethods/decompositionMethods/scotchDecomp/scotchDecomp.C
- metis
- METIS decomposition is similar to Scotch, but the library is non-free for commercial use, so will be discontinued in favour of Scotch in future releases of OpenFOAM.
- manual
- Manual decomposition, where the user directly specifies the allocation of each cell to a particular processor.
For each method there are a set of coefficients specified in a sub-dictionary of decompositionDict, named <method>Coeffs as shown in the dictionary listing. The full set of keyword entries in the decomposeParDict dictionary are explained in Table 3.4.
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The decomposePar utility is executed in the normal manner by typing
decomposePar
where
represents a processor number and contains a time directory,
containing the decomposed field descriptions, and a constant/polyMesh directory
containing the decomposed mesh description.
3.4.2 Running a decomposed case
A decomposed OpenFOAM case is run in parallel using the openMPI implementation of MPI.
openMPI can be run on a local multiprocessor machine very simply but when running on machines across a network, a file must be created that contains the host names of the machines. The file can be given any name and located at any path. In the following description we shall refer to such a file by the generic name, including full path, <machines>.
The <machines> file contains the names of the machines listed one machine
per line. The names must correspond to a fully resolved hostname in the
/etc/hosts file of the machine on which the openMPI is run. The list must contain
the name of the machine running the openMPI. Where a machine node contains
more than one processor, the node name may be followed by the entry
cpu=
where
is the number of processors openMPI should run on that
node.
For example, let us imagine a user wishes to run openMPI from machine aaa on the following machines: aaa; bbb, which has 2 processors; and ccc. The <machines> would contain:
aaa
bbb cpu=2
ccc
An application is run in parallel using mpirun.
mpirun --hostfile <machines> -np <nProcs>
<foamExec> <otherArgs> -parallel > log &
mpirun --hostfile machines -np 4 icoFoam -parallel > log &
3.4.3 Distributing data across several disks
Data files may need to be distributed if, for example, if only local disks are used in order to improve performance. In this case, the user may find that the root path to the case directory may differ between machines. The paths must then be specified in the decomposeParDict dictionary using distributed and roots keywords. The distributed entry should read
distributed yes;
roots
<nRoots>
(
"<root0>"
"<root1>"
…
);
Each of the processor
directories should be placed in the case directory at
each of the root paths specified in the decomposeParDict dictionary. The system
directory and files within the constant directory must also be present in each case
directory. Note: the files in the constant directory are needed, but the polyMesh
directory is not.
3.4.4 Post-processing parallel processed cases
When post-processing cases that have been run in parallel the user has two options:
- reconstruction of the mesh and field data to recreate the complete domain and fields, which can be post-processed as normal;
- post-processing each segment of decomposed domain individually.
3.4.4.1 Reconstructing mesh and data
After a case has been run in parallel, it can be reconstructed for post-processing.
The case is reconstructed by merging the sets of time directories from each
processor
directory into a single set of time directories. The reconstructPar
utility performs such a reconstruction by executing the command:
reconstructPar
3.4.4.2 Post-processing decomposed cases
The user may post-process decomposed cases using the paraFoam post-processor, described in section 6.1. The whole simulation can be post-processed by reconstructing the case or alternatively it is possible to post-process a segment of the decomposed domain individually by simply treating the individual processor directory as a case in its own right.
,
,

,
,
