@roberta.toscano

Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX

, , , , , , and . 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), page 377--386. (2021)
DOI: 10.1109/IPDPSW52791.2021.00066

Abstract

Between a widening range of GPU vendors and the trend of having more GPUs per compute node in supercomputers such as Summit, Perlmutter, Frontier and Aurora, developing performant yet portable distributed HPC applications becomes ever more challenging. Leveraging existing solutions like Kokkos for platform-independent code and HPX for distributing the application in a task-based fashion can alleviate these challenges. However, using such frameworks in the same application requires them to work together seamlessly. In this work we present an HPX Kokkos integration that works both ways: we can integrate CPU and GPU Kokkos kernels as HPX tasks and inversely use HPX worker threads to work on Kokkos kernels. Using HPX futures makes launching and synchronizing Kokkos kernels from multiple threads easy, allowing us to move away from the more traditional fork-join model. To evaluate our integrations we ported existing Vc and CUDA kernels within an existing HPX application, Octo-Tiger, to use Kokkos instead. We achieve comparable, or better, performance than with previous Vc and CUDA kernels, showing both the viability of our HPX Kokkos integration, as well as future-proofing Octo-Tiger for a wider range of potential machines. Furthermore, we introduce event polling for synchronizing CUDA kernels (or Kokkos kernels on the respective backend) achieving speedups over the previous solution using callbacks.

Links and resources

Tags

community

  • @unibiblio
  • @roberta.toscano
  • @tpollinger
@roberta.toscano's tags highlighted