Computer Vision News - June 2020
3 Summary First Order Motion Model... 15 In the presence of occlusions in S, optical flow and the feature warping strategy mentioned above to solve the misalignment between and S may not be sufficient to generate the reconstruced frame . The authors propose introduction of an occlusionmap to diminish the impact of the features corresponding to the occluded parts; this is estimated from the keypoint representation by adding a channel to the final layer of the dense motion network. The whole system is trained using the reconstruction loss: performed on a number of resolutions, obtained by down-sampling and D, and referred as pyramid loss. The last addition to the approach is the extension of the equivariance loss (that forces the model to predict consistent keypoints with respect to known geometric transformations) to take into account the prediction of the Jacobians for each keypoint by including constraints on them. A simple L1 loss is employed to this end. Datasets and metrics The method was trained and tested on 4 different datasets of slightly different nature, among which the VoxCeleb dataset and the UvA-Nemo dataset (both facial), the BAIR robot pushing dataset and the ad hoc made Tai-Chi-HD dataset. All were pre-processed in order to get a resolution of 256x256, with variable length frames. Evaluation was performed using the following metrics : average distance, Average Keypoint Distance (AKD) using ground-truth keypoints by third party pre- trained detectors, Missing Keypoint Rate (MKR) and Average Euclidean Distance (AED). Experiments and performance The method was evaluated both quantitatively and qualitatively on the task of video reconstruction. The first experiment consisted on an ablation study comparing different variants of the proposed model: ̂ ← ( ) ℒ ( ̂ , ) = ∑| ( ̂ ) − ( )| =1 ℒ ( ̂ , ) = ∑| ( ̂ ) − ( )| =1 ℒ ( ̂ , ) = ∑| ( ̂ ) − ( )| =1 ℒ 1
Made with FlippingBook
RkJQdWJsaXNoZXIy NTc3NzU=