The effective use of Graphics Processing Units (GPUs) poses significant programming challenges in atmospheric modeling. The performance profiles of atmospheric models generally indicate that there is no single kernel which can be offloaded to the accelerator to improve performance drastically. Moreover, there are extensive data dependencies between kernels through multi-dimensional fields. Host-device copies of these inside the time-stepping loop would require excessive communication through the PCI-X bus which would dominate over any improvement the GPU could offer. Thus, porting these models to GPUs implies an "all-or-nothing" stategy, requiring all model components within the time loop to be ported, and almost all of the data be copied to/from the device outside the time loop.

The Icosahedral Non-hydrostatic (ICON) climate model currently under development at the Max Planck Institute for Meteorology (MPI-M) and the German Weather Service (DWD) is a typical example of such an atmospheric model. Together with MPI-M and DWD, the Swiss National Supercomputing Centre (CSCS) has undertaken a port of ICON (both dynamics and physics) to GPUs using the OpenACC standard for accelerator directives. A central requirement of this port is that the code changes are minimal and non-intrusive, do not impose any significant changes in CPU performance, and can be incorporated directly into the development trunk.

In this talk we present some of the employed programming techniques and report on the status of this co-design project. Initial results indicate that the performance is directly related to the peak memory bandwidth and therefore that the GPU implementation can perform roughly 50% faster on an NVIDIA K20x than a dual-socket Intel Haswell (E5-2690 v3) based machine when the GPU memory is fully occupied.


Slides to this talk