EvaSurf: Effcient View-Aware Implicit Textured Surface Reconstruction
on Mobile Devices

Jingnan Gao1     Zhuo Chen1     Yichao Yan1     Bowen Pan2     Zhe Wang2     Jiangjing Lyu2     Xiaokang Yang1
1Shanghai Jiao Tong University        2Alibaba Group

 

 


EvaSurf can reconstruct high-quality appearance and accurate mesh on various devices in real-time for both synthetic and real-world objects.


Abstract

Reconstructing real-world 3D objects has numerous applications in computer vision, such as virtual reality, video games, and animations. Ideally, 3D reconstruction methods should generate high-fidelity results with 3D consistency in real-time. Traditional methods match pixels between images using photo-consistency constraints or learned features, while differentiable rendering methods like Neural Radiance Fields (NeRF) use differentiable volume rendering or surface-based representation to generate high-fidelity scenes. However, these methods require excessive runtime for rendering, making them impractical for daily applications. To address these challenges, we present EvaSurf, an Efficient View-Aware implicit textured Surface reconstruction method on mobile devices. In our method, we first employ an efficient surface-based model with a multi-view supervision module to ensure accurate mesh reconstruction. To enable high-fidelity rendering, we learn an implicit texture embedded with view-aware encoding to capture view-dependent information. Furthermore, with the explicit geometry and the implicit texture, we can employ a lightweight neural shader to reduce the expense of computation and further support real-time rendering on common mobile devices. Extensive experiments demonstrate that our method can reconstruct high-quality appearance and accurate mesh on both synthetic and real-world datasets. Moreover, our method can be trained in just 1-2 hours using a single GPU and run on mobile devices at over 40 FPS (Frames Per Second), with a final package required for rendering taking up only 40-50 MB.


Video


Overview

We design a two-stage framework to respectively reconstruct the geometry and appearance. (1) In the first stage, we employ an efficient surface-based model with a multi-view supervision module to learn accurate structures of geometry. (2) In the second stage of appearance reconstruction, the view-dependent effect severely influences the quality of rendering images. Therefore, we incorporate view-aware encoding into an implicit texture, to assign different weights to the dimensions of the view-aware texture, thus modeling the appearance with view-dependent effects. (3) With the explicit geometry and the learned implicit texture, we show that a lightweight neural shader is sufficient to achieve high-quality differentiable rendering. The small size of the neural shader can greatly reduce the consumption of computation and memory sources, empowering real-time rendering on mobile devices.



Dataset

To demonstrate the effectiveness of our method, we also construct experiments on a set of real-world high-resolution (2K) data. This dataset comprises 15 objects, with each object consisting of over 200 images. These images are captured by accurate camera poses, specifically, we provide 3 types of camera poses for each object following the convention of Colmap (Sparse), NeRF (transforms.json) and Neus (cameras_sphere.npz). The corresponding masks for the images are also included.

The reconstructed results are shown below. The dataset would be released soon.


Results

Mesh Comparisons

Re-ReND Ours
NeRF2Mesh Ours


NeRF2Mesh (better rendering setting) Ours
NeRF2Mesh (better mesh setting) Ours

View-Aware Encoding Ablation

w/o w/
w/o w/

Real World Reconstruction Results

BibTeX

@article{gao2023evasurf,
      title={EvaSurf: Efficient View-Aware Implicit Textured Surface Reconstruction on Mobile Devices}, 
      author={Jingnan Gao and Zhuo Chen and Yichao Yan and Bowen Pan and Zhe Wang and Jiangjing Lyu and Xiaokang Yang},
      journal={arXiv preprint arXiv:2311.09806},
      year={2023}
}