Kyoto University Computer Vision Lab

Kyoto University Computer Vision Lab

Dept. of Intelligence Science and Technology, Graduate School of Informatics

Diffusion Reflectance Map:
Single-Image Stochastic Inverse Rendering of Illumination and Reflectance

Yuto Enyo and Ko Nishino
Kyoto University

DRMNet header image — Reflectance bounds the frequency spectrum of illumination in the object appearance. In this paper, we introduce the first stochastic inverse rendering method, which recovers the attenuated frequency spectrum of an illumination jointly with the reflectance of an object of known geometry from a single image. Our key idea is to solve this blind inverse problem in the reflectance map, an appearance representation invariant to the underlying geometry, by learning to reverse the image formation with a novel diffusion model which we refer to as the Diffusion Reflectance Map Network (DRMNet). Given an observed reflectance map converted and completed from the single input image, DRMNet generates a reflectance map corresponding to a perfect mirror sphere while jointly estimating the reflectance. The forward process can be understood as gradually filtering a natural illumination with lower and lower frequency reflectance and additive Gaussian noise. DRMNet learns to invert this process with two subnetworks, IllNet and RefNet, which work in concert towards this joint estimation. The network is trained on an extensive synthetic dataset and is demonstrated to generalize to real images, showing state-of-the-art accuracy on established datasets.

Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance
Y. Enyo and K. Nishino,
in Proc. of Conference on Computer Vision and Pattern Recognition CVPR’24, Jun., 2024. [highlight]
[ arXiv ][ paper ][ supp. PDF ][ project ][ code/data ]

Overview

DRMNet overview — How should we recover the true illumination and reflectance from object appearance? The frequency spectrum of the object appearance is bounded by the highest frequency of either the reflectance or the illumination, the former of which is usually lower. This suggested that there is no such thing as a true estimate in inverse rendering. Given the object appearance, the illumination must be generated together with the reflectance as only its combined, partial information is encoded in it. Inverse rendering, particularly from a single image, as such, is a generative process. We argue for stochastic inverse rendering in which image formation is explicitly modeled as a stochastic process and its inversion becomes a stochastic generative reverse process. Our problem is inherently blind as the reflectance acting as the forward operator is also unknown. Our key idea is to learn to generate the illumination from the object appearance with a diffusion model on the reflectance map. By formulating the task on a geometry-invariant reflectance map, it enables radiometric disentanglement in the same domain eliminating the need for complex differentiable rendering. We introduce a novel diffusion model, Diffusion Reflectance Map Network (DRMNet), that generates a reflectance map corresponding to a perfect mirror sphere from the observed reflectance map while jointly estimating the reflectance. DRMNet consists of two subnetworks that each recover the illumination as a mirror reflectance map and the reflectance. DRMNet seamlessly integrates stochasticity in the inverse rendering process via a reverse diffusion process on the additive Gaussian observation noise of radiometric image formation. This enables estimation of illumination faithful to the observation with stochastic variability without separate sampling.

Results

Qualitative comparison on the iBRDF synthetic dataset — Qualitative results on iBRDF synthetic dataset. For each input, the top row is the illumination estimate shown as a spherical panorama and the bottom row is the reflectance estimate rendered as a sphere under a point source. DRMNet achieves higher accuracy and recovers more natural illumination. The concurrent work by Lyu et al. [49] results in illumination estimates that significantly deviate from the ground truth, as it is a naive noise seeded diffusion model used as an external prior in a classic Bayesian inverse-rendering formulation.

Quantitative comparison on the nLMVS real dataset — Illumination estimates of the nLMVS-Real dataset for different objects taken in complex environments. DRMNet successfully recovers accurate and plausible detailed illumination from the frequency-attenuated object appearance.

estimation of DeePoint on the PKU-MMD dataset — Object replacement results on the nLMVS-Real dataset. Since our method explicitly recovers the high frequency spectrum of the illumination, even objects with higher frequency than the one used to recover the illumination can be relit with natural appearance.

estimation of DeePoint on the PKU-MMD dataset — Results of estimating the illumination and reflectance multiple times for the same input images from a set of input images under different illumination and of objects with different surface roughnesses. For the same observed reflectance map, a variation of illumination environments are estimated and their variance is large for dull reflectance closer to Lambertian and decreases for more specular reflectance centered around the ground truth. The larger the surface roughness, the wider-band of high-frequency of illumination are attenuated which is accurately reflected in these results. Note how well the recovered reflectance maps preserve the overall structure of the illumination up to the necessary frequencies-it respects the observation as much as it needs to. This is in sharp contrast to other methods that completely hallucinate an environment from noise.