Abstract
This paper presents a new programming methodology for intro- ducing and tuning parallelism for heterogeneous shared-memory systems (comprising a mixture of CPUs and GPUs), using a com- bination of algorithmic skeletons (such as farms and pipelines), Monte-Carlo tree search for deriving mappings of tasks to avail- able hardware resources, and refactoring tool support for applying the patterns and mappings in an easy and effective way. Using our approach, we demonstrate easily obtainable, significant and scal- able speedups on a number of case studies showing speedups of up to 41 over the sequential code on a 24-core machine with one GPU. We also demonstrate that the mappings the MCTS algorithm suggest are comparable to the best possible speedups that can be obtained.
Users
Please
log in to take part in the discussion (add own reviews or comments).