OAC Core: AIMCI: Artificial Intelligence for Managing Cyberinfrastructure

OAC Core: AIMCI: Artificial Intelligence for Managing Cyberinfrastructure

Lan, Z., Papka, M. E.

image

  • Caption: Manual, static, single-cluster resource amangement - - Automated, dynamic, facility-wide resource management

Advanced cyberinfrastructure (CI) is undergoing disruptive changes in system architectures and application workloads. The landscape of cyberinfrastructure workloads is rapidly expanding beyond traditional computational simulations to include a hybrid mix of applications. CI facilities now host diverse high-performance systems with heterogeneous configurations, leading to a complex mix of computing, memory, and storage components. Existing CI management methods, which are heavily heuristic or manual-based, struggle with these evolving challenges. This project addresses the complex challenges of CI resource management by integrating artificial intelligence (AI) technologies with human expertise. The proposed AIMCI framework transitions from managing isolated single clusters to coordinating facility-wide management, orchestrating the entire facility as a unified pool of diverse resources for a broad spectrum of applications with various resource requirements.