@clausbraun

Algorithm-Based Fault Tolerance for Many-Core Architectures

, and . Proceedings of the 15th IEEE European Test Symposium (ETS'10), page 253--253. IEEE Computer Society, (2010)
DOI: http://dx.doi.org/10.1109/ETSYM.2010.5512738

Abstract

Modern many-core architectures with hundreds of cores provide a high computational potential. This makes them particularly interesting for scientific high-performance computing and simulation technology. Like all nano scaled semiconductor devices, many-core processors are prone to reliability harming factors like variations and soft errors. One way to improve the reliability of such systems is software-based hardware fault tolerance. Here, the software is able to detect and correct errors introduced by the hardware. In this work, we propose a software-based approach to improve the reliability of matrix operations on many-core processors. These operations are key components in many scientific applications.

Links and resources

Tags

community

  • @simtechpuma
  • @clausbraun
  • @dblp
  • @katharinafuchs
@clausbraun's tags highlighted