PUMA publications for /group/simtech/GPUMon Sep 30 13:17:58 CEST 2024New York, NY, USAProceedings of the 12th International Workshop on OpenCL and SYCL041-4IWOCL '24Evaluation of SYCL’s Different Data Parallel Kernels2024myown Performance Evaluation CPU SVM SYCL GPU AISA exc2075 SYCL provides programmers with four, and in the case of AdaptiveCpp even five, ways for calling and writing a device kernel. This paper analyzes the performance of these diverse kernel invocation types for DPC++ and AdaptiveCpp as SYCL implementations on an NVIDIA A100 GPU, an AMD Instinct MI210 GPU, and a dual-socket AMD EPYC 9274F CPU. Using the example of a kernel matrix assembly, we show why the performance can differ by a factor of 100 in the worst case on the same hardware for the same problem using different SYCL implementations and kernel invocation types.