In recent years, novel view synthesis from a single image has seen significant progress thanks to the rapid advancements in 3D scene representation and image inpainting techniques. While the current approaches are able to synthesize geometrically consistent novel views, they often do not handle the view-dependent effects properly. Specifically, the highlights in their synthesized images usually appear to be glued to the surfaces, making the novel views unrealistic. To address this major problem, we make a key observation that the process of synthesizing novel views requires changing the shading of the pixels based on the novel camera, and moving them to appropriate locations. Therefore, we propose to split the view synthesis process into two independent tasks of pixel reshading and relocation. During the reshading process, we take the single image as the input and adjust its shading based on the novel camera. This reshaded image is then used as the input to an existing view synthesis method to relocate the pixels and produce the final novel view image. We propose to use a neural network to perform reshading and generate a large set of synthetic input-reshaded pairs to train our network. We demonstrate that our approach produces plausible novel view images with realistic moving highlights on a variety of real world scenes.
Here, we take a closer look at the image formation process to understand the relationship between the input and novel views. Light from a light source bounces off of the point on the table marked in red and projects to a particular pixel in the input view. Similarly, we get a projection of this point in the novel view. Basically, these two pixels in the input and novel view correspond to same point in the scene. When we measure the displacement of these pixels from the left image edge, as shown by the blue arrow, we can see that they don’t match. So, the first task is to move the pixels in the input image such that they align with the novel view. We call this process pixel relocation and this is the fundamental challenge that all existing approaches tackle. However, the position is not the only difference between the two pixels. As we zoom into the pixels, we can see that the pixels also have different shading. The shading of the input pixel is based on the outgoing direction towards the input camera. Similarly, for the other pixel, it is based on the outgoing direction towards the novel camera. Therefore, during novel view synthesis, we also have to change the shading of the pixel. We call this pixel reshading.
We generate this dataset synthetically using a path tracer. Consider a scene with an input camera shown in blue. We render the input image as usual using the path tracer. For the reshaded image, we place a random novel camera around the input camera. During rendering, we compute the first surface intersection points using the primary rays from the input camera. Then, we shade those pixels with rays coming from the novel camera. As a result, the reshaded image has the same viewpoint as the input image but shading from the novel camera’s viewpoint.
We compare against a modular single image view synthesis approach. 3D Moments warps the highlights along with the texture.
@article{Paliwal2023reshader,
author = {Paliwal, Avinash and Nguyen, Brandon G. and Tsarov, Andrii and Kalantari, Nima Khademi},
title = {ReShader: View-Dependent Highlights for Single Image View-Synthesis},
journal = {ACM Trans. Graph.},
publisher = {Association for Computing Machinery},
year = {2023},
issue_date = {December 2023},
volume = {42},
number = {6},
articleno = {216},
numpages = {9},
month = {dec},
doi = {10.1145/3618393},
}
We thank the SIGGRAPH Asia reviewers for their comments and suggestions. This work was funded by Leia Inc. (contract #415290). Nima Khademi Kalantari was in part supported by CAREER Award (#2238193). Portions of this research were conducted with the advanced computing resources provided by Texas A&M High Performance Research Computing.