Computer Vision News

( ) = 1 ෍ =1 − ෝ ( ) 2 + − ෢ ′ ( ) 2 Where the value of the parameter is 0.8. The goal, of course, is to find the optimal parameter values, such that they minimize the loss function given by the equation ( ) . In this model, they are the concatenation of all the weighting matrices from all 3 LSTM modules. Note that these modules combine LSTM layers with linear layers, as described in the illustration below. The model can be trained end-to-end; the method was implemented using TensorFlow and Adam optimization. Results The authors evaluated the LSTM-KF architecture’s performance by comparison to a variety of temporal regularization methods: 2 standard Kalman filter implementations, assuming constant velocity or constant acceleration (denoted as Kalman Vel and Kalman Acc, respectively), exponential moving average filter (EMA), and a standard LSTM module (Std. LSTM). The evaluation was run on 4 different datasets, one for evaluating 3D human pose, two for evaluating camera pose, and one for object pose estimation, all using RGB images as input. The illustration below presents Kalman gain values as a function of training time (epochs) for the LSTM-KF error and mean Kalman gain. Note that the gain (as well as the error) is high at first, which is an indication that at this stage the method relies almost entirely on measurements. As training progresses, Kalman gain drops significantly, indicating that the Kalman filter is significantly relying on both measurements and the module. Human Pose Estimation Results: The Human3.6M database consists of 3.6 million RGB images from video sequences. Each video sequence includes 7 actors, each of them performing 15 activities of varying movement complexity. Each activity takes between 3,000 Research 8 Research Computer Vision News

Computer Vision News - April 2019