MFNetSim: A Multi-Fidelity Network Simulation Framework for Multi-Trafic Modeling of Dragonfly Systems

MFNetSim: A Multi-Fidelity Network Simulation Framework for Multi-Trafic Modeling of Dragonfly Systems

Wang, X., Brown, K. A., Ross, R. B., Carothers, C.D., Lan, Z.

image

  • Caption: An illustration of workload replay module.

In high-performance computing (HPC), modern supercomputers typically provide exclusive computing resources to user applications. Nevertheless, the interconnect network is a shared resource for both inter-node communication and across-node I/O access, among co-running workloads, leading to inevitable network interference. In this study, we develop MFNetSim, a multi-fidelity modeling framework that enables simulation of multi-traffic simultaneously over the interconnect network, including inter-process communication and I/O traffic. By combining different levels of abstraction, MFNetSim can efficiently co-model the communication and I/O traffic occurring on HPC systems equipped with flash-based storage. We conduct simulation studies of hybrid workloads composed of traditional HPC applications and emerging ML applications on a 1,056-node Dragonfly system with various configurations. Our analysis provides various observations regarding how network interference affects communication and I/O traffic.

https://doi.org/10.1145/3729424