Author of the publication

Enhancing the Programmability and Performance Portability of GPU Tensor Operations.

, , , , and . Euro-Par, volume 11725 of Lecture Notes in Computer Science, page 213-226. Springer, (2019)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Boda: A Holistic Approach for Implementing Neural Network Computations., , and . Conf. Computing Frontiers, page 53-62. ACM, (2017)A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications., , and . CoRR, (2016)FireCaffe: near-linear acceleration of deep neural network training on compute clusters., , , and . CoRR, (2015)Shallow Networks for High-Accuracy Road Object-Detection., , , , and . CoRR, (2016)Enhancing the Programmability and Performance Portability of GPU Tensor Operations., , , , and . Euro-Par, volume 11725 of Lecture Notes in Computer Science, page 213-226. Springer, (2019)Audio-Based Multimedia Event Detection with DNNs and Sparse Sampling., , , , , , and . ICMR, page 611-614. ACM, (2015)Boda-RTC: Productive Generation of Portable, Efficient Code for Convolutional Neural Networks on Mobile Computing Platforms., , and . CoRR, (2016)DenseNet: Implementing Efficient ConvNet Descriptor Pyramids., , , , , and . CoRR, (2014)Shallow Networks for High-accuracy Road Object-detection., , , , and . VEHITS, page 33-40. SciTePress, (2017)Developing Architectural Platforms: A Disciplined Approach., , , , , , , , , and 1 other author(s). IEEE Design & Test of Computers, 19 (6): 6-16 (2002)