{"92cad6c6d7a90044e7289f504f6f4cf7clausbraun":{"DOI":"http://dx.doi.org/10.1109/IOLTS.2013.6604090","ISBN":"","ISSN":"","URL":"","abstract":"Computer simulations drive innovations in science and industry, and they are gaining more and more importance. However, their high computational demand generates extraordinary challenges for computing systems. Typical highperformance computing systems, which provide sufficient performance and high reliability, are extremly expensive.\r\nModern GPUs offer high performance at very low costs, and they enable simulation applications on the desktop. However, they are increasingly prone to transient effects and other reliability threats. To fulfill the strict reliability requirements in scientific computing and simulation technology, appropriate fault tolerance measures have to be integrated into simulation applications for GPUs. Algorithm-Based Fault Tolerance on GPUs has the potential to meet these requirements.\r\nIn this work we investigate the efficiency and the efficacy of ABFT for matrix operations on GPUs. We compare ABFT against fault tolerance schemes that are based on redundant computations and we evaluate its error detection capabilities","annote":"","author":[{"family":"Wunderlich","given":"Hans-Joachim"},{"family":"Braun","given":"Claus"},{"family":"Halder","given":"Sebastian"}],"citation-label":"WundeBH2013","collection-editor":[],"collection-title":"","container-author":[],"container-title":"Proceedings of the IEEE International On-Line Testing Symposium (IOLTS'13)","documents":[],"edition":"","editor":[],"event-date":{"date-parts":[["2013"]],"literal":"2013"},"event-place":"","id":"92cad6c6d7a90044e7289f504f6f4cf7clausbraun","interhash":"852ec5b9e00df1c4437700418d91759c","intrahash":"92cad6c6d7a90044e7289f504f6f4cf7","issue":"","issued":{"date-parts":[["2013"]],"literal":"2013"},"keyword":"ABFT GPGPU SimTech algorithm-based computing errors fault fault-tolerance myown scientific simulation","misc":{"file":"http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2013/IOLTS_WundeBH2013.pdf","doi":"http://dx.doi.org/10.1109/IOLTS.2013.6604090"},"note":"","number":"","number-of-pages":"3","page":"240--243","page-first":"240","publisher":"","publisher-place":"","status":"","title":"Efficacy and Efficiency of Algorithm-Based Fault Tolerance on GPUs","type":"paper-conference","username":"clausbraun","version":"","volume":""},"bfd0a364cc8901abde747841b8f60a69clausbraun":{"DOI":"http://dx.doi.org/10.1109/IOLTS.2017.8046244","ISBN":"","ISSN":"","URL":"","abstract":"Iterative solvers like the Preconditioned Conjugate Gradient (PCG) method are widely-used in compute-intensive domains including science and engineering that often impose tight accuracy demands on computational results. At the same time, the error resilience of such solvers may change in the course of the iterations, which requires careful adaption of the induced approximation errors to reduce the energy demand while avoiding unacceptable results. A novel adaptive method is presented that enables iterative Preconditioned Conjugate Gradient (PCG) solvers on Approximate Computing hardware with high energy efficiency while still providing correct results. The method controls the underlying precision at runtime using a highly efficient fault tolerance technique that monitors the induced error and the quality of intermediate computational results.","annote":"","author":[{"family":"Schöll","given":"Alexander"},{"family":"Braun","given":"Claus"},{"family":"Wunderlich","given":"Hans-Joachim"}],"citation-label":"SchoeBW2017","collection-editor":[],"collection-title":"","container-author":[],"container-title":"Proceedings of the 23rd IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS'17)","documents":[],"edition":"","editor":[],"event-date":{"date-parts":[["2017"]],"literal":"2017"},"event-place":"","id":"bfd0a364cc8901abde747841b8f60a69clausbraun","interhash":"70d88e8ae6518962c2cc6a2b24f8fbd6","intrahash":"bfd0a364cc8901abde747841b8f60a69","issue":"","issued":{"date-parts":[["2017"]],"literal":"2017"},"keyword":"AxC SimTech approximate computing energy-efficiency fault monitoring myown quality tolerance","misc":{"file":"http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2017/IOLTS_SchoeBW2017.pdf","doi":"http://dx.doi.org/10.1109/IOLTS.2017.8046244"},"note":"","number":"","number-of-pages":"2","page":"237--239","page-first":"237","publisher":"","publisher-place":"","status":"","title":"Energy-efficient and Error-resilient Iterative Solvers for Approximate Computing","type":"paper-conference","username":"clausbraun","version":"","volume":""},"8a8906e8a66690ce05e59dd8e68e839cclausbraun":{"DOI":"http://dx.doi.org/10.1109/IOLTS.2016.7604686","ISBN":"","ISSN":"","URL":"","abstract":"Approximate computing in hardware and software promises significantly improved computational performance combined with very low power and energy consumption. This goal is achieved by both relaxing strict requirements on accuracy and precision, and by allowing a deviating behavior from exact Boolean specifications to a certain extent. Today, approximate computing is often limited to applications with a certain degree of inherent error tolerance, where perfect computational results are not always required. However, in order to fully utilize its benefits, the scope of applications has to be significantly extended to other compute-intensive domains including science and engineering. To meet the often rather strict quality and reliability requirements for computational results in these domains, the use of appropriate characterization and fault tolerance measures is highly required. In this paper, we evaluate some of the available techniques and how they may extend the scope of application for approximate computing.","annote":"","author":[{"family":"Wunderlich","given":"Hans-Joachim"},{"family":"Braun","given":"Claus"},{"family":"Schöll","given":"Alexander"}],"citation-label":"WundeBS2016","collection-editor":[],"collection-title":"","container-author":[],"container-title":"Proceedings of the 22nd IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS'16)","documents":[],"edition":"","editor":[],"event-date":{"date-parts":[["2016"]],"literal":"2016"},"event-place":"","id":"8a8906e8a66690ce05e59dd8e68e839cclausbraun","interhash":"c3a518fb3206211e0d7da07a36661164","intrahash":"8a8906e8a66690ce05e59dd8e68e839c","issue":"","issued":{"date-parts":[["2016"]],"literal":"2016"},"keyword":"AxC SimTech approximate characterization computing fault metrics myown precision tolerance variable","misc":{"file":"http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/IOLTS_WundeBS2016.pdf","doi":"http://dx.doi.org/10.1109/IOLTS.2016.7604686"},"note":"","number":"","number-of-pages":"3","page":"133--136","page-first":"133","publisher":"","publisher-place":"","status":"","title":"Pushing the Limits: How Fault Tolerance Extends the Scope of Approximate Computing","type":"paper-conference","username":"clausbraun","version":"","volume":""}}