Reconstructing 3D Shapes as a Union of Boxes from Multi-View Images
Reconstructing object shapes from images has become increasingly important in various fields, such as computer vision, robotics, augmented reality. While approaches for reconstructing shapes with varying levels of detail have been proposed, balancing representation accuracy and model complexity remains a challenge. To address this challenge, we propose an end-to-end approach for reconstructing object shapes from multiple images using a union of box primitives. Our approach offers a simpler and more efficient 3D representation of objects without the need for intermediate products such as voxels, resulting in faster inference times. Additionally, we introduce an auxiliary task to aid in learning how to extract and transform spatial features from images without requiring camera calibrations. Extensive experiments demonstrate that our method can produce comparable results to approaches that require 3D voxelized input while utilizing only 2D images as input. Furthermore, our method significantly outperforms the aforementioned approaches in terms of inference time.