Mastersthesis,

Sparse matrix vector multiplications on graphic processors

M. Wafai.
University of Stuttgart, Nobelstr. 19, 70569, Stuttgart, (November 2009)

Abstract

The modern computer architecture is moving towards multi-core systems. Intel processors are now coming with double or even quad cores like Xeon processor. Graphics Processing Units (GPUs) are considered to be highly parallel multi-core processors with tremendous performance. They are specially designed to deal with 3D and realtime graphics. And after the introduction of the new API, Compute Unified Device Architecture (CUDA), from NVIDA, the GPU became an attractive choice for general purpose parallel computing to solve many complex numerical problems. Sparse Matrix-Vector (SpMV) multiplication is one of the most important kernels in scientific computing. Its sparsity, irregularity and indirect addressing properties present new challenges to map it to multi-core systems. The objective of this work is to analyze the speed of execution of SpMV multiplication on NVIDIA GPUs (Tesla C1060). An algorithm based on a tailored version of ELLPACK, called Aligned-ELLPACK-R, as well as different algorithms have been developed using different storage formats. These implementations are done using the programming language CUDA. Finally the comparison of that performance has been done with respect to different implementations of SpMV on Intel Xeon E5560 processor using Jagged Diagonal Formats (JAD), ELLPACK and ELLPACK-R storage formats. The results show the superiority of JAD storage format over the matrices used to test SpMV on conventional super scaler processors. SpMV on Tesla C1060 based on Aligned-ELLPACK-R outperforms the fastest implementation on CPU with speedup factor 13 times. It also outperforms the CUDA library based on ELLPACK with 2.3 speedup factor.

BibTeX key: wafai09
entry type: mastersthesis
address: Nobelstr. 19, 70569, Stuttgart
year: 2009
month: Nov.
school: University of Stuttgart
date-added: 2015-08-18 13:53:17 +0000
date-modified: 2015-08-18 14:02:15 +0000

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@mastersthesis{wafai09, abstract = {The modern computer architecture is moving towards multi-core systems. Intel processors are now coming with double or even quad cores like Xeon processor. Graphics Processing Units (GPUs) are considered to be highly parallel multi-core processors with tremendous performance. They are specially designed to deal with 3D and realtime graphics. And after the introduction of the new API, Compute Unified Device Architecture (CUDA), from NVIDA, the GPU became an attractive choice for general purpose parallel computing to solve many complex numerical problems. Sparse Matrix-Vector (SpMV) multiplication is one of the most important kernels in scientific computing. Its sparsity, irregularity and indirect addressing properties present new challenges to map it to multi-core systems. The objective of this work is to analyze the speed of execution of SpMV multiplication on NVIDIA GPUs (Tesla C1060). An algorithm based on a tailored version of ELLPACK, called Aligned-ELLPACK-R, as well as different algorithms have been developed using different storage formats. These implementations are done using the programming language CUDA. Finally the comparison of that performance has been done with respect to different implementations of SpMV on Intel Xeon E5560 processor using Jagged Diagonal Formats (JAD), ELLPACK and ELLPACK-R storage formats. The results show the superiority of JAD storage format over the matrices used to test SpMV on conventional super scaler processors. SpMV on Tesla C1060 based on Aligned-ELLPACK-R outperforms the fastest implementation on CPU with speedup factor 13 times. It also outperforms the CUDA library based on ELLPACK with 2.3 speedup factor.}, added-at = {2016-01-29T09:34:55.000+0100}, address = {Nobelstr. 19, 70569, Stuttgart}, author = {Wafai, Mhd. Amer}, biburl = {https://puma.ub.uni-stuttgart.de/bibtex/2ce0df7291d15f3b0f1467222f3958c4d/amerwafai}, date-added = {2015-08-18 13:53:17 +0000}, date-modified = {2015-08-18 14:02:15 +0000}, interhash = {782fe775e19c904f41d1b5e185f207f1}, intrahash = {ce0df7291d15f3b0f1467222f3958c4d}, keywords = {Architecture CUDA Compute Device GPU Graphics HLRS Processing SCOPE Unified Unit myown}, month = {Nov.}, school = {University of Stuttgart}, timestamp = {2016-01-29T08:39:18.000+0100}, title = {Sparse matrix vector multiplications on graphic processors}, year = 2009 }

PUMA

Sparse matrix vector multiplications on graphic processors

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on