12.09.2024

From 26 to 30 August 2024, Madrid (Spain) held the 30th International European Conference on Parallel and Distributed Computing. EuroPar is the prime European conference covering all aspects of parallel and distributed processing, ranging from theory to practice, from small to the largest parallel and distributed systems and infrastructures, from fundamental computational problems to full-fledged applications, from architecture, compiler, language, and interface design and implementation, tools, support infrastructures, and application performance aspects.

Members of the EuroHPC Joint Undertaking-funded Center for Excellence ESiWACE3 from the Netherlands eScience Center (NLeSC) received the “Best Paper Award” at the EuroPar conference for their work on auto-tuning HIP code on Nvidia and AMD GPUs. In the paper, they introduce support for tuning HIP code into Kernel Tuner, a Python tool for automating performance tuning of GPU applications, and show the impact that tuning has on achieving the best performance on AMD and Nvidia hardware.

Stijn Heldens, from NLeSC, who presented the work at EuroPar stated: “We tested Kernel Tuner on a diverse set of GPU functions that are widely used in HPC and research software. When you compare the optimally tuned versions for AMD and Nvidia, you will see they are vastly different.” He added: “Overall, it seems that AMD GPUs seem to be more ‘picky’, and while AMD GPUs can often outperform Nvidia GPUs, they only do so after extensive tuning of the code. Automatic performance tuning is thus essential to getting the most out of the hardware.”

1000014236.jpg

Caption: Stijn Heldens (NLeSC) presenting the work on Auto-tuning during the EuroPar conference.

The research was deemed very relevant in the current HPC landscape. Ben van Werkhoven, assistant professor at Leiden University and leading the research on Kernel Tuner, explained: “As more AMD-based exascale-class supercomputers are coming online, it is very relevant to investigate the portability of GPU code between chips of major vendors. This result shows that transferring code between Nvidia and AMD will generally result in sub-optimal performance. And remarkably, the Nvidia architecture tends to be more forgiving in this respect, or vice versa, tuning GPU code is more important on AMD chips.”

While the work presented at EuroPar is mainly focused on using auto-tuning to achieve optimal performance, Kernel Tuner also supports tuning GPU code for energy efficiency, and future work will focus on studying the effects of this on weather and climate models.