PUMA publications for /tag/algorithm-based%20fault%20myown%20simulationhttps://puma.ub.uni-stuttgart.de/tag/algorithm-based%20fault%20myown%20simulationPUMA RSS feed for /tag/algorithm-based%20fault%20myown%20simulation2024-03-29T02:55:39+01:00Algorithm-based fault tolerance for matrix operations on graphics processing units: analysis and extension to autonomous operation.https://puma.ub.uni-stuttgart.de/bibtex/278feed56c1636b8fcbfd657450c145bd/clausbraunclausbraun2018-03-19T16:42:05+01:00ABFT GPGPU GPU SimTech algebra algorithm-based error error-detection fault fault-tolerance linear matrix-operations myown simulation <meta content="thesis" itemprop="educationalUse"/><span data-person-type="author" class="authorEditorList "><span><span itemtype="http://schema.org/Person" itemscope="itemscope" itemprop="author"><a title="Claus Braun" itemprop="url" href="/person/1f4aa6bff08e99d1685a2218270cadc80/author/0"><span itemprop="name">C. Braun</span></a></span></span>. </span><span class="additional-entrytype-information"><em>University of Stuttgart, </em>(<em><span>2015<meta content="2015" itemprop="datePublished"/></span></em>)</span>Mon Mar 19 16:42:05 CET 2018Algorithm-based fault tolerance for matrix operations on graphics processing units: analysis and extension to autonomous operation.2015ABFT GPGPU GPU SimTech algebra algorithm-based error error-detection fault fault-tolerance linear matrix-operations myown simulation Efficacy and Efficiency of Algorithm-Based Fault Tolerance on GPUshttps://puma.ub.uni-stuttgart.de/bibtex/292cad6c6d7a90044e7289f504f6f4cf7/clausbraunclausbraun2018-03-19T16:15:07+01:00ABFT GPGPU SimTech algorithm-based computing errors fault fault-tolerance myown scientific simulation <span data-person-type="author" class="authorEditorList "><span><span itemtype="http://schema.org/Person" itemscope="itemscope" itemprop="author"><a title="Hans-Joachim Wunderlich" itemprop="url" href="/person/1852ec5b9e00df1c4437700418d91759c/author/0"><span itemprop="name">H. Wunderlich</span></a></span>, </span><span><span itemtype="http://schema.org/Person" itemscope="itemscope" itemprop="author"><a title="Claus Braun" itemprop="url" href="/person/1852ec5b9e00df1c4437700418d91759c/author/1"><span itemprop="name">C. Braun</span></a></span>, </span> and <span><span itemtype="http://schema.org/Person" itemscope="itemscope" itemprop="author"><a title="Sebastian Halder" itemprop="url" href="/person/1852ec5b9e00df1c4437700418d91759c/author/2"><span itemprop="name">S. Halder</span></a></span></span>. </span><span class="additional-entrytype-information"><span itemtype="http://schema.org/Book" itemscope="itemscope" itemprop="isPartOf"><em><span itemprop="name">Proceedings of the IEEE International On-Line Testing Symposium (IOLTS'13)</span>, </em></span><em>page <span itemprop="pagination">240--243</span>. </em>(<em><span>2013<meta content="2013" itemprop="datePublished"/></span></em>)</span>Mon Mar 19 16:15:07 CET 2018Proceedings of the IEEE International On-Line Testing Symposium (IOLTS'13)240--243{Efficacy and Efficiency of Algorithm-Based Fault Tolerance on GPUs}2013ABFT GPGPU SimTech algorithm-based computing errors fault fault-tolerance myown scientific simulation Computer simulations drive innovations in science and industry, and they are gaining more and more importance. However, their high computational demand generates extraordinary challenges for computing systems. Typical highperformance computing systems, which provide sufficient performance and high reliability, are extremly expensive.
Modern GPUs offer high performance at very low costs, and they enable simulation applications on the desktop. However, they are increasingly prone to transient effects and other reliability threats. To fulfill the strict reliability requirements in scientific computing and simulation technology, appropriate fault tolerance measures have to be integrated into simulation applications for GPUs. Algorithm-Based Fault Tolerance on GPUs has the potential to meet these requirements.
In this work we investigate the efficiency and the efficacy of ABFT for matrix operations on GPUs. We compare ABFT against fault tolerance schemes that are based on redundant computations and we evaluate its error detection capabilities