Multi-Modal Fusion and 3-D Reconstruction in the PANORAMA project


The VCA group, as a TU/e leading group for 3-D reconstruction, has contributed to the PANORAMA project for 3-D modeling and 3-D image processing with various sensors (Figure 1). The contribution consists of design and implementation of several algorithms including distance-aware weighting strategies, real-time edge detection and planar segmentation of depth images, real-time RGB-D registration Pipeline (R3P), 3-D reconstruction applications for large-scale environments and multi-modal fusion of 3-D models obtained via intrinsically different sensors. The following paragraphs give a brief overview for each of the above-mentioned contributions. Interested readers can obtain detailed information via the provided links.

PANORAMA is a research project of the ENIAC Joint Undertaking (JU) and is co-funded by grants from Belgium, Italy, France, the Netherlands, and the United Kingdom.

Project Description: 
1 Distance-aware weighting strategies for 3-D reconstruction applications Utilizing low-cost depth sensors and corresponding applications for 3-D reconstruction, we have proposed intelligent distance-aware weighting strategies for the Truncated Signed Distance Function (TSDF) voxel-based model to enhance 3D reconstruction model quality. The increased model quality leads to an improvement of the pose-estimation algorithm by providing more accurate data. In conventional TSDF, every newly sensed depth value is directly integrated into the 3D model, so that, when using low-cost depth sensors, less accurate depth data can overwrite more accurate data. For distance-aware weighting, we have considered weight definition and model updating to be essential aspects. These aspects are combined into our new proposed weighting strategies, Distance-Aware (DA) and Distance-Aware Slow-Saturation (DASS) methods, to intelligently integrate the depth data into the synthetic 3D model, according to the distance-sensitivity metric of the sensor. Both the DA and DASS methods prevent the already-fused data to be overwritten by less accurate data. We have shown visually that in several cases, which the original KinFu destroys the final synthetic 3D model or deforms it, the DA and DASS methods are sufficiently robust to preserve model reconstruction (Figure 2). We have found that there is a clear mutual dependence between the pose-estimation accuracy and the quality of the 3D model. This can be exploited to enhance either the pose estimation or the 3D model quality, depending on the application or interest. Read more about this topic on 3-D model quality enhancement, Pose-estimation accuracy improvement and an extension for outdoor data (stereo cameras).

Figure 2. From left to right, the snapshots of the Desk region of the final 3D meshes obtained by the original KinFu, DA, and DASS methods, respectively.

2 Real-time 3-D edge detection and planar segmentation of depth images We have introduced a real-time planar segmentation algorithm, which enables plane detection in depth images avoiding any normal-estimation calculation (Figure 3). First, the proposed algorithm searches for 3D edges in a depth image and then finds the line-segments located in between of these 3D edges. Second, the algorithm merges all the points on each pair of the intersecting line-segments into a plane candidate. The developed 3D edge detection algorithm considers four different types of edges: jump, corner, curved and extremum edges. For each of those edges, we have defined the corresponding thresholds and a number-of-neighbors parameter. For the planar segmentation algorithm, we designed three quality-enhancing properties, i.e. curvature and size validation and merging of separate segments. The planar segmentation pipeline is capable of segmenting planes in a depth image with a rate of 58 fps. Utilizing pipeline-interleaving techniques for the proposed implementation further increases the rate up to 100 fps. Read more about this topic on fast and Real-time planar segmentation of depth images.

Figure 3. Five examples of datasets containing various types of 3D edges and the corresponding outcomes: (a) color images of each scene, (b) extracted 3D edges, and (c) planar surfaces (Note that the merging and size-validation phases have not been incorporated in the visual results).

2.1 Noise reduction of 3-D edges in depth images In order to de-noise 3D edge images, we have evaluated conventional filters, but each individual filter can only combat a part of the noise while leaving several remaining challenges. These involve: (a) manipulation of source data resulting in false edges, (b) weakening and destroying narrow edges and (c) amplifying noisy pixels. To solve these challenges, we have proposed the solidarity filter for de-noising the 3D edge images. The promising results show that the solidarity filter can de-noise 3D edge images directly without any data manipulation (Figure 4). The filter is based on ranking principles such as defining neighboring pixels with similar properties and connecting those into larger segments beyond the size of a conventional filter aperture. Read more on this topic at Solidarity filter for noise reduction of 3D edges in depth images.

Figure 4. Applying the planar-segmentation algorithm to depth images with and without noise reduction of 3-D edges based on the proposed Solidarity filter.

3 Multi-modal fusion: laser-scanner based 3-D models enriched with depth sensors 3-D data A wide range of sensors is available for large-scale 3-D reconstruction with very different intrinsic properties, each coming with its own advantages, limitations and costs. Accurate laser scanners provide high-quality 3-D models via intensive computing (Figure 5), whereas low-cost depth sensors generate relatively inaccurate 3-D data in real-time. Laser scanners are able to accurately cover long ranges, but suffer from the gaps and holes caused by occlusion. Low-cost depth sensors can produce a continuous 3-D model for smaller scales fulfilling such holes, however, they cannot overcome the problems caused by the drift e.g. erroneous pose-estimation and 3-D model deformation. Based on adopting the best features of both sensor types, we propose a method to combine the 3-D models obtained via these intrinsically different sensors.

Figure 5. Samples taken by the FARO Focus 3D laser scanner: (a) the 3-D model of the VCA office and laboratory at Eindhoven University of Technology (The Netherlands) consisting of 16 registered scans and (b) the 3-D model of a supermarket in Nice (France) including 64 registered scans.

The proposed system generates integrated 3-D models, accurately covering large-scale environments with high-level detail and without gaps and holes (Figure 6). The model construction takes much less time than when individually using one of the sensors. For our system, we first introduce the FlexiFusion (Figure 7) as a 3-D reconstruction application based on RGB-D sensors, to generate adjustable 3-D models (Figure 8). Second, we propose a registration pipeline (Figure 9 and Figure 10) to adjust the resulting 3-D models to be aligned with the 3-D models obtained via a laser scanner. Read more on this topic: the links will be uploaded soon.

Figure 6. Registered 3-D chain of the FlexiFusion aligned to the FARO-based 3-D model with a focus on the quality of alignment. The colored point-clouds are not shown to enable readers for a better investigation of the alignment quality, since coloring a point cloud can hide part of the misalignments.

3.1 FlexiFusion: Large scale 3-D reconstruction application The FlexiFusion method addresses the problem of 3-D reconstruction of large-scale environments from RGB-D and point-cloud data with a focus on the above-mentioned adjustability. The resulting 3-D models are not uniformly rigid and it is possible to adjust various parts of the models to be well-aligned with the reference FARO-based 3-D model. To achieve this objective, the FlexiFusion stores the sensed environment as a 3-D chain, which is a linked list of rigid 3-D models called 3-D boxes. A 3-D box is a standalone 3-D model with an arbitrary size, which has a user-defined amount of overlap with its neighboring 3-D boxes. By performing this boxing method, we prevent the accumulated drift error from being incorporated into the 3-D model, and consecutively, preserve the global model from deformation. Each 3-D box in the 3-D chain can be independently transformed to be fit into a reference FARO-based 3-D model.

Figure 7. FlexiFusion architecture illustrating the CPU/GPU threads and data structures with the internal/external communications: the system receives raw 3-D data as input and generates 3-D chains as output. The Focus of the FlexiFusion is to deliver the highest possible level of adjustability, which is achieved via the proposed 3-D chain model.

Figure 8. Snapshots of the FlexiFusion registered 3-D chain as a unified 3-D model: (partly) colored at left and adjusted 3-D boxes at right

3.2 R3P: Real-time RGB-D Registration Pipeline We have proposed a real-time RGB-D registration pipeline, called R3P, as a generic processing architecture, which is applicable to any form of 3-D data presented in RGB-D format. The main focus of R3P is on real-time robotic and 3-D reconstruction applications. One of the main attractive features of the proposed architecture is the reduction of the amount of corresponding key points between a pair of RGB-D images into a set of minimal, but the strongest corresponding key points. The minimal number of correspondents enables real-time performance, while their strong correspondence ensures the pose-estimation accuracy. We plan to enrich the proposed pipeline by adding plane-based registration, according to our above-mentioned real-time planar-segmentation algorithm. This can improve the quality of the results when features are lacking and insufficient for obtaining a high-quality transformation. Read more on this topic: the links will be uploaded soon.

Figure 9. R3P architecture: the 2-D and 3-D algorithms are separated into two main group of layers.

Figure 10. R3P pipeline: (1) feeding color images to 2-D phase layers, (2) obtaining 2-D key points, (3) adding depth information to 2-D key points, (4) feeding 2-D key points with corresponding depth information to 3-D phase layers, (5) estimated transformation based on key-point-clouds, (6) key-point-clouds represented as bold dots, and (7) the complete point-clouds are well aligned based on the transformation obtained for key-point-clouds

Application Area: 
Video/Imaging Discipline: 
3D processing