@clausbraun

Efficacy and Efficiency of Algorithm-Based Fault Tolerance on GPUs

, , und . Proceedings of the IEEE International On-Line Testing Symposium (IOLTS'13), Seite 240--243. (2013)

Zusammenfassung

Computer simulations drive innovations in science and industry, and they are gaining more and more importance. However, their high computational demand generates extraordinary challenges for computing systems. Typical highperformance computing systems, which provide sufficient performance and high reliability, are extremly expensive. Modern GPUs offer high performance at very low costs, and they enable simulation applications on the desktop. However, they are increasingly prone to transient effects and other reliability threats. To fulfill the strict reliability requirements in scientific computing and simulation technology, appropriate fault tolerance measures have to be integrated into simulation applications for GPUs. Algorithm-Based Fault Tolerance on GPUs has the potential to meet these requirements. In this work we investigate the efficiency and the efficacy of ABFT for matrix operations on GPUs. We compare ABFT against fault tolerance schemes that are based on redundant computations and we evaluate its error detection capabilities

Links und Ressourcen

DOI:
10.1109/IOLTS.2013.6604090
BibTeX-Schlüssel:
WundeBH2013
Suchen auf:

Kommentare und Rezensionen  
(0)

Es gibt bisher keine Rezension oder Kommentar. Sie können eine schreiben!

Tags


Zitieren Sie diese Publikation