View Issue Details

IDProjectCategoryView StatusLast Update
0001436OpenFOAMBugpublic2016-08-16 12:52
ReportermartinB Assigned Tohenry  
PrioritynormalSeveritytweakReproducibilityalways
Status closedResolutionsuspended 
PlatformLinuxOSOpenSUSEOS Version13.1
Summary0001436: LTS particle transport does not scale well in parallel
DescriptionThe LTS particle transport algorithm does not scale well in parallel for other than very special cases. However this can be tweaked rather easily with a minor change in the transport loop.

The problem is located at the file src/lagrangian/basic/Cloud/Cloud.C
The forAllIter loop starting at line 251 does the following: All particles on the local processor are transported from their starting position until they reach the processor boundary.

Let's assume the worst case: the particles shall be transported along a pipe, and this pipe is decomposed to the processors in flow direction (see attached test case).
Processor0 transports all particles from the inlet to boundary to Processor1, while all other Processors are sitting idle. After the forAllIter loop is finished, Processor1 can transport all the particles from boundary to Processor0 to boundary of Processor2, while all other Processors are sitting idle etc.
There is no parallel speedup, the performance is limited to a single processors speed.

With a minor modification this limitation can be overcome: the forAllIter loop is interrupted after some particles are transported. Let's say after 5000 particles a "break" is called to leave the forAllIter loop. The 5000 particles can now be transfered to another processor. The while-loop continues until no particles are left for transportation, so in the next passage Processor0 can transport the next 5000 particles, while Processor1 can transport the first 5000 particles etc.

The attached test case shows these timings on my workstation:
serial case: 128s
parallel case, "best" decomposition: 17s
parallel case, "worst" decomposition: 102s
parallel case, "best" decomposition with proposed modifications: 17s
parallel case, "worst" decomposition with proposed modifications: 21s

(By the way: I'm a little bit surprised that ~30% of the particles are getting lost in this simple case... but that's another building site...)
Steps To ReproduceRunning the test case:
- blockMesh
- modification of system/decomposeParDict to best or worst case simpleCoeffs
- decompose
- run uncoupledKinematicParcelFoam in parallel
Additional InformationIn the attached modification (Cloud.C.modified) these new variables are introduced:

label nParticlesWaiting(0); // global sum of the particles waiting for transport

IDLList<ParticleType> alreadyMovedParticles; // list to store the particles, that do not leave the local processor domain. It is necessary to remove them from the "this" list, so that the forAllIter loop skips these particles.

label transportCounter(0); // counter that induces the break call after a certain number of particles have been transported


There are many comments added to the attached Cloud.C.modified file, I hope that they will make things clear in the local context.
Since the modifications only make sense for LTS transport, you might want to embed a proper if-statement for the steadyState transport case.
TagsNo tags attached.

Activities

martinB

2014-11-04 12:14

reporter  

0_case.tar.gz (4,278 bytes)

martinB

2014-11-04 12:14

reporter  

Cloud.C.modified (15,286 bytes)

henry

2016-01-17 15:00

manager   ~0005845

Sorry for not getting back to you sooner on this.

Thanks for the detailed report, excellent test-case and patch. Your proposal looks fine and I am testing it OpenFOAM-dev on a few cases at the moment.
I am also looking for the best way to calculate or provide the transfer block-size which is currently hard-coded to 5000.

It seems to me that this approach is equally applicable to transient and LTS operation, why do you suggest it should be selected only for LTS?

martinB

2016-01-17 15:27

reporter   ~0005846

Thanks for coming back to this issue!

My proposal seems to freeze at least sometimes if less than 5000 particles are injected. However it has been working fine and stable for hundreds of simulations with larger meshes (10e6 cells) and very large number of particles (100 transports of 1e7 particles).
I have a test case stored where it freezes. I'll investigate this case again and let you know...

I have not investigated the performance of transient transport. I think it is less important here, since each particle is transported for a single time step only and the processors along the particles path don't have to wait too much. On the other and it should not do any harm to use my proposal...

henry

2016-01-17 15:35

manager   ~0005847

Could you provide the test case which reproduces the freezing problem?

henry

2016-01-18 08:30

manager   ~0005849

There is a lot of complexity and some cost involved in moving the particles between lists; why is this preferable to simply storing the iterator of the current particle and restart from there?

martinB

2016-01-18 15:20

reporter   ~0005850

I have rerun the case that froze in the past. However it runs smoothly with my proposal now...

I don't think it is preferable to use lists. It was just the first solution I tried at that time and after observing the resulting speedup I was satisfied with that performance gain. If you store the iterators instead I'll be happy to test it with my ongoing simulations.

I am updating the rest of my code to 3.0.x and -dev, so that I can test any changes immediately.

henry

2016-01-18 15:27

manager   ~0005851

Could you provide an updated version which avoids the need for transfers between lists? Using a stored iterator would have a very small overhead and the same algorithm could be used for all cases whereas the overheads of multi-list approach is such that it would be necessary to provide both it and the previous algorithm which adds to complexity and maintenance effort.

martinB

2016-01-19 10:01

reporter   ~0005856

I'll do the code refactoring but it might take some time.

henry

2016-08-16 12:03

manager   ~0006674

Any news on the code refactoring or should I close this report?

martinB

2016-08-16 12:48

reporter   ~0006676

Unfortunately, no news from my side... you can close the report. I hope I can come back to this topic in the future.

henry

2016-08-16 12:52

manager   ~0006677

Waiting for a patch to OpenFOAM-dev which conforms to OpenFOAM coding conventions.

Issue History

Date Modified Username Field Change
2014-11-04 12:14 martinB New Issue
2014-11-04 12:14 martinB File Added: 0_case.tar.gz
2014-11-04 12:14 martinB File Added: Cloud.C.modified
2016-01-17 15:00 henry Note Added: 0005845
2016-01-17 15:27 martinB Note Added: 0005846
2016-01-17 15:35 henry Note Added: 0005847
2016-01-18 08:30 henry Note Added: 0005849
2016-01-18 15:20 martinB Note Added: 0005850
2016-01-18 15:27 henry Note Added: 0005851
2016-01-19 10:01 martinB Note Added: 0005856
2016-08-16 12:03 henry Note Added: 0006674
2016-08-16 12:48 martinB Note Added: 0006676
2016-08-16 12:52 henry Note Added: 0006677
2016-08-16 12:52 henry Status new => closed
2016-08-16 12:52 henry Assigned To => henry
2016-08-16 12:52 henry Resolution open => suspended