"EMMA: Efficient Multi-node Memory-aware AllReduce Algorithms (poster)"
EMMA: Efficient Multi-node Memory-aware AllReduce Algorithms (poster)
Guerrini, V., Fan, K., Kumar, S.
- Publication: Chicago, IL
- Link: https://gcasr.org/2025/posters
- PDF: valentino.pdf
AllReduce is a critical collective in both HPC and large-scale AI workloads. However, scaling it to Exascale systems presents key challenges due to inter-node communication bottlenecks and underutilization of intra-node resources like shared memory and NVLink. This work analyzes state-of-the-art AllReduce algorithms to identify inefficiencies and opportunities for hybrid strategies that explicitly separate intra- and inter-node communication.
We introduce a preliminary algorithmic design that leverages tunable intra-node communication patterns and discuss key performance criteria, including message count and data volume. Our early results provide insight into communication trade-offs and guide the development of adaptive AllReduce implementations optimized for Exascale systems.