Invertible Neural BRDF for Object Inverse Rendering
Zhe Chen, Shohei Nobuhara, and Ko Nishino
We introduce a novel neural network-based BRDF model and a Bayesian framework for object inverse rendering, i.e., joint estimation of reflectance and natural illumination from a single image of an object of known geometry. The BRDF is expressed with an invertible neural network, namely, normalizing flow, which provides the expressive power of a high-dimensional representation, computational simplicity of a compact analytical model, and physical plausibility of a real-world BRDF. We extract the latent pace of real-world reflectance by conditioning this model, which directly results in a strong reflectance prior. We refer to this model as the invertible neural BRDF model (iBRDF). We also devise a deep illumination prior by leveraging the structural bias of deep neural networks. By integrating this novel BRDF model and reflectance and illumination priors in a MAP estimation formulation, we show that this joint estimation can be computed efficiently with stochastic gradient descent. We experimentally validate the accuracy of the invertible neural BRDF model on a large number of measured data and demonstrate its use in object inverse rendering on a number of synthetic and real images. The results show new ways in which deep neural networks can help solve challenging radiometric inverse problems.
Invertible Neural BRDF for Object Inverse Rendering
Z. Chen, S. Nobuhara, and K. Nishino,
in Proc. of European Conference on Computer Vision ECCV’20, Aug., 2020. (Spotlight)
[ paper ][ supp. material ][ project ][ talk ] Overview
We introduce the invertible neural BRDF model (iBRDF) for joint estimation of reflectance and illumination from a single image of object appearance. We show that this combination of an invertible, differentiable model that has the expressive power better than a nonparametric representation together with a MAP formulation with differentiable rendering enable efficient, accurate real-world object inverse rendering. We exploit the inherent structure of the reflectance by modeling its bidirectional reflectance distribution function (BRDF) as an invertible neural network, namely, a nonlinearly transformed parametric distribution based on normalized flow. In sharp contrast to past methods that use low-dimensional parametric models, the deep generative neural network makes no assumptions on the underlying distribution and expresses the complex angular distributions of the BRDF with a series of non-linear transformations applied to a simple input distribution. We show that this provides us with comparable or superior expressiveness to nonparametric representations. Moreover, the invertibility of the model ensures Helmholtz reciprocity and energy conservation, which are essential for physical plausibility. In addition, although we do not pursue in this paper, this invertibility makes iBRDF also suitable for forward rendering applications due to its bidirectional, differentiable bijective mapping. To model the intrinsic structure of the reflectance variation of real-world materials, we condition this generative model to extract a parametric embedding space. This embedding of BRDFs in a simple parametric distribution provides us a strong prior for estimating the reflectance.
Deep Illumination Prior
For the illumination, we employ a nonparametric representation by modeling it as a collection of point sources in the angular space (i.e., equirectangular environment map). Past methods heavily relied on simplistic assumptions that can be translated into analytical constraints to tame the high-dimensional complexity associated with this nonparametric illumination representation. Instead, we constrain the illumination to represent realistic natural environments by exploiting the structural bias induced by a deep neural network (i.e., deep image prior). We device this deep illumination prior by encoding the illumination as the output of an encoder-decoder deep neural network and by optimizing its parameters on a fixed random image input.
To evaluate the accuracy of invertible neural BRDF, we learn its parameters to express measured BRDF data in the MERL database and evaluate the representation accuracy using the root mean squared error (RMSE) in log space. As the figure shows, the invertible neural BRDF achieves higher accuracy than the nonparametric bivariate BRDF model. The conditional iBRDF, learned on 100%, 80%, and 60% training data all achieve high accuracy superior to other parametric models namely the DSBRDF and Cook-Torrance models. Note that all these conditional iBRDFs were trained without the test BRDF data. This resilience to varying amounts of training data demonstrates the robustness of the invertible neural BRDF model and its generalization power encoded in the learnt embedding codes. The results show that the model learns a latent space that can be used as a reflectance prior without sacrificing its expressive power.
We derive a Bayesian object inverse rendering framework by combining the deep illumination prior together with the invertible neural BRDF and a differentiable renderer to evaluate the likelihood. Due to the full differentiability of the BRDF and illumination models and priors, the estimation can be achieved through backpropagation with stochastic gradient descent. We synthesized a total of 100 images of spheres rendered with 20 different measured BRDFs sampled from the MERL database under 5 different environment maps. The figure shows some of the estimation results. Qualitatively, the recovered BRDF and illumination match the ground truth well, demonstrating the effectiveness of iBRDF and priors for object inverse rendering. As evident in the illumination estimates, our method is able to recover high-frequency details that are not attainable in past methods.
We apply our method to images of real objects taken under natural illumination. We use the Objects Under Natural Illumination Database. The figure shows the results of jointly estimating the BRDF and illumination. Our reflectance estimates are more faithful to the object appearance than those by Lombardi and Nishino, and the illumination estimates have more details, which collectively shows that our method more robustly disentangles the two from the object appearance. Note that the color shifts in the BRDF estimates arise from inherent color constancy, and the geometry dictates the recoverable portions of the environment. The estimates are in HDR and exposures are manually set to match as there is an ambiguity in global scaling.
We also compare to Georgoulis et al. TPAMI 2017 . The figure shows comparisons of the estimated reflectance and illumination side-by-side with their results. We show results of rendered sphere of estimated illumination with mirror reflection (mirror) and rendered sphere of the estimated BRDF with a different illumination (nat. illum.). Overall, judging from the sphere renderings of the estimated BRDF with a different illumination, our BRDF estimates qualitatively appear more accurate and faithful to the underlying reflectance of the input image as well as ground truth (e.g., higher frequency details of illumination estimates). Our method is a physically-based reconstruction, that decouples the reflectance and illumination of object appearance. In contrast, the method of  is a learned decomposition on tens of thousands of images, fundamentally bound by the combinations seen in the training data. These results show that our method generalizes well.