Inproceedings,

Latency optimized architectures for a real-time inference pipeline for control tasks

F. Schellroth, J. Lehner, and A. Verl.
2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA), page 166-171. (November 2021)
DOI: 10.1109/ICT4DA53266.2021.9672224

Abstract

With the increasing development of GPUs, the inference time of CNNs continues to decrease. This enables new AI applications in manufacturing that have a direct impact on the control of a process. For this, a GPU is integrated into a real-time system so that the CNN can be executed in real-time. However, it is not sufficient to consider the inference process only, but also to minimize the latency of the whole pipeline. For this purpose, execution strategies of the inference pipeline are presented and evaluated in this paper. The presented architectures are compared using criteria for latency, implementation effort, and exchangeability. The latencies are quantified with measurements on a demonstrator. As a result, the most synchronous architecture has the lowest latency but is not suitable for the use in a service-oriented architecture as targeted by the Industry 4.0. For this, another architecture is presented, providing a good balance between latency and service orientation.

BibTeX key: 9672224
entry type: inproceedings
booktitle: 2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)
year: 2021
month: nov
pages: 166-171
DOI: 10.1109/ICT4DA53266.2021.9672224

PUMA

Latency optimized architectures for a real-time inference pipeline for control tasks

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on