On-the-fly clustering for exascale molecular dynamics simulations. - presented by Dr Alizée Dubois and Thierry Carrard

On-the-fly clustering for exascale molecular dynamics simulations.

Alizée Dubois and Thierry Carrard

ADThierry Carrard
Slide at 23:46
Activités
Visionneur de documents
3 mars 15:25 0
pres_algo_killian.pdf
192,2%
IMPACT OF MPI X OMP PARTITIONING ON EXECUTION TIME
4 MPI node 32 OMP
MD EAM
Analysis
Baseline
Best compositor
core
(1:100)
1x128
2x64
4x32
8x16
16x8
32x4
64x2
Node tasks compositon (MPIxOMP)
A. DUBOIS - T. CARRARD - COMPUTER PHYSICS COMMUNICATIONS SEMINAR SERIES - 03/03/25
II Votre écran est partagé par le biais de l'application app.zoom.us.
Arrêter le partage
Masquer
Share slide
Summary (AI generated)

To evaluate the performance of our simulation, we conducted a test involving 81.9 million atoms. The accompanying graph illustrates the activity across the cores over time, with a specific focus on the workload distribution indicated by the work ID. The lower graph integrates this information, showing the number of active workers on the cores throughout the simulation. We observed a bottleneck related to the communication of information through the master MPI; however, the time required for this process is relatively minor compared to the overall analysis duration. While this presents a limitation, it is manageable and not a critical issue.

Our goal is to determine the performance of the algorithm by selecting the optimal partitioning for the Neural Network Potential (NNP). The ideal partitioning should enhance the efficiency of the Molecular Dynamics analysis. We utilize nodes composed of 64 cores each, allowing for various decomposition configurations. For instance, a configuration of 1 MPI with 125 threads can be tested. Increasing the number of MPI processes results in a higher volume of messages, but adding more threads can lead to increased memory access time, which may not be optimal. Therefore, we aim to identify the best scenario, ultimately determining that a configuration of 4 MPI with 32 threads yields the best performance.

In terms of strong scaling, we maintain a constant load of 81.9 million atoms, which is relatively manageable for our system.