Towards HPC System Throughput Optimization

published Jun 13, 2018

Jeff Durachta, NOAA, US

It is easy to focus on the more glamorous aspects of the Earth System Model / High Performance Computing intersection: Novel Algorithm Development / Big Models / Glittering Hardware. But one must not forget that the modeling efforts and thus many aspects of the science progress through a constant flux of model history output transformed into the graphs, tables and charts that are the input to day-to-day research efforts. Further, a given line of research requires not one but many simulation runs related by time series and/or parameter variations. All of this implies workflow throughput requirements to accomplish the science.

At the same time, the environments in which these simulations run have been undergoing enormous (and perhaps some would say catastrophic) growth in size and complexity. This complexity drives myriads of interactions between and among system hardware and software components and user applications producing reactions sometimes subtle; sometimes not so. In addition to the loss of individual job executions, these complex interactions can rob workflows of throughput performance in ways that often vary over time. And regardless of time variation, root causes for throughput slow down are typically quite time consuming and difficult to track down.

This talk will review some previous and current efforts at the GFDL to capture and utilize information generated by the workflow itself. While the current state represents progress over the almost 20 years I've worked with the lab, you will readily see that it only encompasses islands of data capture and analysis. Motivated by this body of work and the sometimes painful lessons learned, I will describe efforts to design and build a much more comprehensive workflow data gathering infrastructure to enable detailed throughput analysis. At need the infrastructure must be light weight, non-intrusive and deal gracefully with missing data. Further, the infrastructure must be modular, encapsulated and extensible since economics dictate that it will be deployed in stages and start from simplicity to build complexity. The end goal is to understand and optimize the scientific data production throughput in environments of increasing complexity for I fear that without such analysis capabilities, the ability to run at exascale may do us little good if the throughput is held to petascale levels.

Redesigning CAM-SE on Sunway TaihuLight for Peta-Scale Performance

published Jun 13, 2018

Lin Gan, Tsinghua University and National Supercomputing Center in Wuxi, CN

The Community Atmosphere Model (CAM-SE) is ported, redesigned, and scaled to the full system of the Sunway TaihuLight, and provides peta-scale climate modeling performance. We refactored and optimized the complete code using OpenACC directives at the first stage. A more aggressive and finer-grained redesign is then applied on the CAM, to achieve finer memory control and usage, more efficient vectorization and compute-communication-overlapping. The well-tuned program running on the powerful TaihuLight system enables us to perform, to our knowledge, the first simulation of the complete lifecycle of hurricane Katrina, and achieves close-to-observation simulation results for both track and intensity.

Preliminary evaluation of systematic biases in a FV3-powered global cloud-permitting model

published Jun 13, 2018

Shian-Jiann Lin, NOAA, US

As a contribution to the "Dyamond Project", the FV3 group at the Geophysical Fluid Dynamics Laboratory and the Research Center for Environmental Change (RCEC) at the Academia Sinica, Taiwan, is making several 40-day simulations using an advanced version of the Finite-Volume Dynamical Core on the cubed-Sphere (FV3) that has several built-in SubGrid parameterizations. In particular, the SubGrid Orographical (SGO) effects are now part of the new "FV3 dynamics", which unavoidably breaks the traditional boundary between "dynamics" and "physics". I believe the hard boundary (the confinement) between the "dynamics" and "physics" set by the modeling framework is one reason that limits the progress. This is therefore an evolution in the design and development of the "super dynamics" for the gray zone, defined roughly here as between 1 km to 10 km, which is the gap between weather and climate modeling.

We will carry out the experiments and preliminary analyses across the gray-zone at three different horizontal resolution: 13, 6.5, and 3.25 km. As a potential tool for sub-seasonal predictions, we shall analyze the hindcast skill (first 10 days) and the systematic "climate basis" for the last 30 days.

The ESiWACE Demonstrators: Scalability, Performance Prediction, Evaluation

published Jun 13, 2018

Philipp Neumann, DKRZ, DE

With exascale computing becoming available in the next decade, global weather prediction at the kilometer scale will be enabled. Moreover, the climate community has already begun to contemplate a new generation of high-resolution climate models.

High-resolution model development is confronted with several challenges. Scalability of the models needs to be optimal, including all relevant components such as I/O which easily becomes a bottleneck; both runtime and I/O will dictate how fine a resolution can be chosen while still being able to run the model at production level; 1-30 years/day are anticipated for this purpose, depending on the questions to be addressed. Moreover, given various scalability experiments from prototypical runs and additional model data, estimating performance for new simulations can become challenging.

I present results achieved in the scope of the Centre of Excellence in Simulation of Weather and Climate in Europe (ESiWACE) for global high-resolution simulations. I show results from multi-week global 5km simulations, and I discuss current features and limits of the simulations. I further link the findings to a new intercomparison initiative DYAMOND for high-resolution predictions. Finally, I discuss performance prediction approaches for existing performance data.

Near-global climate simulation at 1km resolution with COSMO 5.0

published Jun 13, 2018

Carlos Osuna, MeteoSwiss, CH

The climate community has set ambitious goals to reach global km-scale modeling capability on future exascale high-performance computing (HPC) systems. But currently, state-of-the-art CMIP simulations are executed using grid spacing’s of 25 – 50 km and none of the productive climate models is capable of exploiting modern HPC architecture with hybrid node designs. In this talk we present near-global simulations using a regional climate model (COSMO) that have been executed on Europe’s largest supercomputer, Piz Daint. COSMO has been systematically adapted for performance portability on multiple hardware architectures and is capable of scaling onto the full system size for large enough problem sizes. The results presented can serve as a baseline of what could be achieved today using a state-of-the-art atmospheric model on a modern, accelerated hardware architecture. Finally, we conclude by highlighting some of the remaining challenges and potential solutions on the way to global km-scale climate simulations.

Tropical Cyclones and resolution sensitivity in HighResMIP GCMs

published Jun 13, 2018

Pier Luigi Vidale (NCAS-Climate, University of Reading), Malcolm Roberts(Met Office Hadley Centre), Kevin Hodges (CMCC), Louis-Philippe CARON (BSC), Rein Haarsma(KNMI), Enrico Scoccimarro (CMCC), Alessio Bellucci (CMCC) and Jenny Mecking (Southampton Oceanography Centre) (Blue-Action), all PRIMAVERA partners (models and analysis)

For the first time in the CMIP exercise, international modelling groups have come together under a coordinated protocol, HighResMIP, which is designed to investigate the role of model resolution in the simulation of climate processes. The protocol prescribes long simulations of 100 years each in both atmosphere-only mode (1950-2050), and coupled mode, in two sets: a) constant 1950s radiative forcing and b) historic forcing (1950-2014). The principal focus of the high-resolution simulations is at around 20km mesh size.

Past, opportunity-driven intercomparisons, such as carried out by the US CLIVAR Hurricane Working Group, have revealed that Tropical Cyclone track densities start to be credibly represented at resolutions ~50km, including their interannual variability, while simulations of TC intensities start to be more realistic for models at ~20km and beyond.

New results from the current HighResMIP exercise, using six GCMs so far, confirm past findings in terms of the increased realism of Tropical Cyclone simulations, and further stress the substantial impact of refining resolution from ~100km to ~20km. The unprecedented length of the simulations in these coordinated experiments also reveals significant responses to (forced and unforced) climate variability, impossible to address with typical 30-year simulations. The coordinated protocol additionally permits the investigation of the role model formulation (e.g. the use of stochastic physics), which, in individual cases, can be as significant as the impact of model resolution.

Challenges of NICAM toward the exascale era

published Jun 13, 2018

Hisashi Yashiro, RIKEN, JP

Nonhydrostatic ICosahedral Atmospheric Model (NICAM) is one of the most high-resolution global atmospheric models in the world. From the early 2000s, we performed global cloud-system resolving simulations without using convection parameterization. NICAM has been developed for more than fifteen years keeping pace with the development of supercomputers in Japan. Especially, the Earth Simulator (2003~) and the K computer (2011~) brought big computational resources to us and enabled larger-scale simulations. Integration of new components and optimization to new machines are continually going on. NICAM was chosen as one of the proxy application for evaluating Japanese next flagship supercomputer, post-K. The "co-design" effort for designs of the system is regarded as important in the post-K project. Through a performance evaluation of NICAM and the other proxy apps, parameters of the CPU architecture were considered. NICAM played an important role in the system software development such as compiler and the MPI implementation. In preparation for the coming exascale era, the development of NICAM is approaching a turning point. The transfer of a huge amount of data limits both the simulation and analysis. By introducing our efforts, the problems about capability, scalability, and performance portability are pointed out.