Author of the publication

CUDA-For-Clusters: A System for Efficient Execution of CUDA Kernels on Multi-core Clusters.

, , and . Euro-Par, volume 7484 of Lecture Notes in Computer Science, page 415-426. Springer, (2012)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

A Predictive Performance Model for Superscalar Processors., , and . MICRO, page 161-170. IEEE Computer Society, (2006)Improving GPGPU concurrency with elastic kernels., , and . ASPLOS, page 407-418. ACM, (2013)Design of an MS-DOS PC program profiler., and . Microprocessors and Microsystems - Embedded Hardware Design, 18 (5): 261-269 (1994)A Cache coherence protocol for MIN-based multiprocessors., , and . The Journal of Supercomputing, 8 (2): 163-185 (1994)Microarchitecture Sensitive Empirical Models for Compiler Optimizations., , , and . CGO, page 131-143. IEEE Computer Society, (2007)Software Pipelined Execution of Stream Programs on GPUs., , and . CGO, page 200-209. IEEE Computer Society, (2009)A Programmable Hardware Path Profiler., , and . CGO, page 217-228. IEEE Computer Society, (2005)Synergistic execution of stream programs on multicores with accelerators., , and . LCTES, page 99-108. ACM, (2009)Parallel hough transform algorithm performance., and . Image Vision Comput., 9 (2): 88-92 (1991)Construction and use of linear regression models for processor performance analysis., , and . HPCA, page 99-108. IEEE Computer Society, (2006)