@ipvs-sc

A Massively-Parallel, Fault-Tolerant Solver for High-Dimensional PDEs

, , , and . Euro-Par 2016: Parallel Processing Workshops, volume 10104 of Lecture Notes in Computer Science (LNCS), page 635--647. Cham, Springer, (Mai 2017)
DOI: 10.1007/978-3-319-58943-5_51

Abstract

We investigate the effect of hard faults on a massively-parallel implementation of the Sparse Grid Combination Technique (SGCT), an efficient numerical approach for the solution of high-dimensional time-dependent PDEs. The SGCT allows us to increase the spatial resolution of a solver to a level that is out of scope with classical discretization schemes due to the curse of dimensionality. We exploit the inherent data redundancy of this algorithm to obtain a scalable and fault-tolerant implementation without the need of checkpointing or process replication. It is a lossy approach that can guarantee convergence for a large number of faults and a wide range of applications. We present first results using our fault simulation framework – and the first convergence and scalability results with simulated faults and algorithm-based fault tolerance for PDEs in more than three dimensions.

Links and resources

Tags