Optimize GPU kernels efficiently and painlessly from Python
Kernel Tuner simplifies the development of highly-optimized and auto-tuned CUDA, OpenCL, HIP, and C/C++ code. It works as an external tool to benchmark and optimize GPU kernels in isolation. By enabling Python-based unit testing of GPU code and seamless auto-tuning, it allows developers to directly tune user-defined parameters without requiring extensive changes to the original kernel code or introducing new dependencies in the production code.