Abstract

A neural network, based on the ‘You Only Look Once’ (YOLO) network, has been trained to detect objects, using conventional RGB images. Taking advantage of the pixel relationship between the RGB image and the depth map, the positions of the detected objects will be projected onto a depth map. After some statistic analysis, the pixels pertaining to one object will be extracted. Finally, the 3D position of the object in the surroundings will be calculated.

Links and resources

Tags