Authors: Neena Imam (NVIDIA Corporation), Nageswara Rao (Oak Ridge National Laboratory (ORNL)), Prineha Narang (University of California, Los Angeles (UCLA)), Benjamin Brown (DOE Office of Advanced Scientific Computing Research)
Abstract: High-Performance Computing systems that have been traditionally deployed at a single site are expected to significantly expand their reach to include a variety of remote edge systems. These edge systems include computing platforms located near instruments as well as instruments themselves. Examples range from interconnected ecosystems of large science instruments to smart energy grids supported by complex analytics and control. These interconnected systems form a compute and instrument continuum wherein computation is orchestrated in various stage. This BoF will discuss the aggregation and synthesis of previously distinct techniques and tools (including HPC, AI/ML, and digital twins) to enable continuum computing.
Long Description: Computing ecosystems are poised for significant transformations to keep pace with the growth and expansion of geographically distributed science infrastructures that encompass IoT edge devices, large scale instruments, upgraded networks, datacenters, as well as exascale computing platforms. Experimental science is also evolving as new approaches are being adopted for effective operation and collaboration of science instruments that are reaching unprecedented scales and complexities. For example, within the next decade, the U.S. Department of Energy (DOE) assets will produce enormous data sets with unparalleled complexity and resolution. The global high-energy physics community will deploy AI-controlled, city-size scientific instruments (particle accelerators and particle detectors) within next decade that will produce zettabytes of data. These observational datasets will be combined with exascale-enabled simulations and digital twins to enable major scientific advances. Consequently, continuum computing (also called digital continuum) is emerging to enable in-situ AI for predictive analytics and controls/optimization of instruments that are distributed across of hundreds of kilometers. In the continuum paradigm, computation and data are orchestrated in various stages from the edge to core to optimize data movement and response times. Optimizing end-to-end performance in such a complex continuum is challenging. It is necessary to develop new approaches that seamlessly combine resources and services at the edge and along the data path with multiple facilities and computer resources. Methods based on AI/ML and digital twins need to be developed for the convergence of experimental/simulation data and autonomous steering of experiments.
To enable this new paradigm, the scientific community needs to work across multiple disciplines to develop technologies to integrate and orchestrate the distributed data and computing resources. Novel solutions are needed for system design, software, libraries and frameworks, data-driven dynamic workflows that can react to dynamic data sizes, monitoring tools, multisite governance policies, collection of actionable experimental metrics, etc. We organized the inaugural BOF on continuum computing at SC22 which was well-received. The inaugural BOF was mainly focused on discussing state-of-the-art in continuum computing hardware, software, and workflows. This follow-on BOF will discuss the aggregation and synthesis of previously distinct techniques and tools (such as HPC, AI/ML, and digital twins) necessary to advance continuum computing. Digital twining methodologies for continuum systems can enable data sharing between experiments and simulations to help steer discovery, analysis or simulation in real time. Additionally, the incorporation of ML in these systems also necessitates the application of novel ML techniques such as federated learning to ensure data privacy and low-latency communication. This BOF will discuss how these emerging technologies may be combined for continuum computing.
Given the cross-disciplinary nature of this BOF, we believe the session will be well-attended and enthusiastically received. SC presents a rare opportunity for cross-disciplinary teams to meet and discuss how to create a roadmap for continuum computing. The targeted audience includes computer and computational scientists, instrument scientists, experts in ML/AI and digital twins, system architects, as well as end users. The session leaders provide a diverse set of perspectives from industry (Imam), federal government (Brown), national lab (Rao), and academia (Narang).
Back to Birds of a Feather Archive Listing