Author of the publication

Understanding GPU errors on large-scale HPC systems and the implications for system design and operation.

, , , , , , , , , , , and . HPCA, page 331-342. IEEE Computer Society, (2015)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Timely Result-Data Offloading for Improved HPC Center Scratch Provisioning and Serviceability., , and . IEEE Trans. Parallel Distrib. Syst., 22 (8): 1307-1322 (2011)On Timely Staging of HPC Job Input Data., , and . IEEE Trans. Parallel Distrib. Syst., 24 (9): 1841-1851 (2013)Positioning Dynamic Storage Caches for Transient Data., , , and . CLUSTER, IEEE Computer Society, (2006)On-the-Fly Recovery of Job Input Data in Supercomputers., , , , and . ICPP, page 620-627. IEEE Computer Society, (2008)stdchk: A Checkpoint Storage System for Desktop Grid Computing., , , and . ICDCS, page 613-624. IEEE Computer Society, (2008)The Neutron Science TeraGrid Gateway: a TeraGrid science gateway to support the Spallation Neutron Source., , , , , , , , , and . Concurrency and Computation: Practice and Experience, 19 (6): 809-826 (2007)GPU age-aware scheduling to improve the reliability of leadership jobs on Titan., , , , and . SC, page 7:1-7:11. IEEE / ACM, (2018)Virtual Organizations Guest Editors' Introduction., , and . IEEE Internet Comput., 12 (2): 10-12 (2008)Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines., , , , , , and . FAST, page 119-132. USENIX, (2013)An Analysis Workflow-Aware Storage System for Multi-Core Active Flash Arrays., , , , , and . IEEE Trans. Parallel Distrib. Syst., 30 (2): 271-285 (2019)