Computer Vision News - September 2016

32 Computer Vision News Research Research The CNNs are trained in reverse order (DESC → ORG → DET); namely, the Descriptor, then the Orientation Estimator given the learned descriptor, and lastly the Detector, conditioned on the other two. The inputs to train the Siamese architecture are quadruplets of image patches with the following properties: • P 1 , P 2 - Two different views of the same 3D physical point used by the Descriptor as a positive example during training. • P 3 - 3D point different from that of P 1 and P 2 used by the Descriptor as negative example • P 4 - 3D point with no distinctive information used as negative example to train the Detector. The image patches which are used as input are assumed to be small enough to contain only one dominant local feature. The table below details each of the three CNNs in the LIFT framework: The Descriptor Original method Simo-Serra, E et al. Learning of Deep Convolutional Feature Point Descriptors. In: ICCV (2015) Training input / output The last layer is the first to be trained. Therefore, for the training process, the image locations and orientations of the feature points used by the SFM to generate image patches p Adaption made in the LIFT framework The increasing mining scheme starts with r = 1 and double every 5000 batches. use balanced batches with 128 positive pairs and 128 negative pairs, mining each separately The Orientation Estimator Original method Jaderberg, M., Spatial Transformer Networks . In: NIPS (2015) Training input / output Only the positive patches, P 1 , P 2 are used to train the Orientation Estimator. The already trained Descriptor is used to compute the description vectors and the input locations are used from SFM as in the Descriptor layer Adaption made in the LIFT framework None The Detector Original method Verdie, Y., et al. TILDE: A Temporally Invariant Learned Detector . In: CVPR (2015) Training input / output The Detector is trained on the full LIFT pipeline as the Orientation Estimator and the Descriptor are already learned by this point. Adaption made in the LIFT framework To let S have maxima in places other than a fixed location retrieved by SFM, we treat this location implicitly, as a latent variable.