Single-Plane
The network fails to interpolate objects with large disparity using a single channel output.
In this paper, we propose an approach for view-time interpolation of stereo videos. Specifically, we build upon X-Fields that approximates an interpolatable mapping between the input coordinates and 2D RGB images using a convolutional decoder. Our main contribution is to analyze and identify the sources of the problems with using XFields in our application and propose novel techniques to overcome these challenges. Specifically, we observe that XFields struggles to implicitly interpolate the disparities for large baseline cameras. Therefore, we propose multi-plane disparities to reduce the spatial distance of the objects in the stereo views. Moreover, we propose non-uniform time coordinates to handle the non-linear and sudden motion spikes in videos. We additionally introduce several simple, but important, improvements over X-Fields. We demonstrate that our approach is able to produce better results than the state of the art, while running in near real-time rates and having low memory and storage costs.
The network fails to interpolate objects with large disparity using a single channel output.
We propose multi-plane disparities that places objects in different planes to reduce displacement in encdoing space.
The multi-plane disparities output by the view synthesis network between reference viewpoints.
The shifted multi-plane disparities based on plane position and viewpoint.
Natural videos have non-linear motions, and thus it is difficult to represent the two flows at each frame using a single Jacobian. A straightforward way to address this problem is to estimate two different Jacobians at each frame (Dual Jacobians). However, this approach (the same as single Jacobian) will have difficulty handling motion spikes. With our proposed non-uniform coordinates, we use two different coordinates to estimate the previous and next Jacobians at each frame. The large unused regions in-between allow the network to smoothly accommodate motion spikes. The mean flow magnitude plot shows the performance of different coordinate schemes with respect to the guidance flow. We plot the average flow magnitude per frame in a video sequence.
Dual Jacobians network fails to accommodate large non-linear motion from a shaky video captured using a handheld device.
Non-uniform coordinates fit the motion spikes allowing for smooth video frame interpolation.
Frame Interpolation for Dynamic Scenes with Implicit Flow Encoding is another work where we utilize implicit networks to interpolate between two near duplicate images of a scene. We discuss and demonstrate the effects of several hyperparameters on interpolation.
@inproceedings{Paliwal2023implicit,
author = {Paliwal, Avinash and Tsarov, Andrii and Kalantari, Nima Khademi},
title = {Implicit View-Time Interpolation of Stereo Videos using Multi-Plane Disparities and Non-Uniform Coordinates},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year = {2023},
}