Author of the publication

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Job-Site Level Fault Tolerance for Cluster and Grid environments., , , , , , and . CLUSTER, page 1-9. IEEE Computer Society, (2005)Blue Gene/L Log Analysis and Time to Interrupt Estimation., , , , , , , and . ARES, page 173-180. IEEE Computer Society, (2009)Symmetric Active/Active Replication for Dependent Services., , , and . ARES, page 260-267. IEEE Computer Society, (2008)A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance., , , and . IPDPS, page 1-10. IEEE, (2007)Machine Learning Models for GPU Error Prediction in a Large Scale HPC System., , , , , , and . DSN, page 95-106. IEEE Computer Society, (2018)Big Data Meets HPC Log Analytics: Scalable Approach to Understanding Systems at Extreme Scale., , , and . CLUSTER, page 758-765. IEEE Computer Society, (2017)Power-Capping Aware Checkpointing: On the Interplay Among Power-Capping, Temperature, Reliability, Performance, and Energy., , , , , , and . DSN, page 311-322. IEEE Computer Society, (2016)Epidemic failure detection and consensus for extreme parallelism., , , and . IJHPCA, 32 (5): 729-743 (2018)Scalable and Fault Tolerant Failure Detection and Consensus., , , and . EuroMPI, page 13:1-13:9. ACM, (2015)A tunable holistic resiliency approach for high-performance computing systems., , , , , , , , , and 4 other author(s). PPOPP, page 305-306. ACM, (2009)