Three sources of apparent object deformation can occur: a change in the shape of the object itself, partial or full occlusion by dynamically changing background (other moving object or imaging conditions), or camera motion. Deforming objects are in general hard to track, owing to their unpredictable shape (of course depending on the amount of deformation). However, when searching for these objects using a template, limits on the permitted template deformation can be placed according to the known mechanics of the objects being searched. For example, when looking for a pattern of a man walking, the predictable anatomy of humans can be used as a wireframe model with known physiological limitation (e.g head usually above legs). In many of the ’ordinary’ deformable pattern matching cases, that will suffice to have a good match between a man and its wireframe template; in other cases, like tracking humans in gymnastics, human body might stretch to its physiological limits of flexibility, requiring adjustment of the sensitivity of the template to sharp and highly bent forms.
Faces as deformable objects
Another example of deformable objects are faces. Although feature-based matching performs rather well, feature tracking is in many times limited to the search region imposed by the partially flexible face wireframe template superimposed on the image. This is of course done after a face was initially recognized in an image. The actual search for a template of a face is done with the aid of Eigenspace of faces (Eigenfaces or Eigenimages). That is, a dataset of images containing various prominent features of human faces. The Eigenfaces space is comprised of set of human faces templates in various forms and orientations, which can give rise to a wide spectrum of face appearances.
The idea of the use of Eigenfaces is analogous to that of Eigenvectors, that is, a face in the image can be represented by a linear combination of Eigenfaces templates. The Eigenface space is constructed by statistical procedure of sampling a large set of faces beforehand. For each face, the set of Eigenfaces is generated by subtracting the average face (after normalizing all images) from all other faces in the database and then reducing its dimensionality by performing principal component analysis on the covariance between images in order to keep only the most influential components (deviations from the mean).
Disadvantages of using the Eigenface representation stem primarily from the inability to extract meaningful facial features from the recognized face. It is therefore adequate for face recognition, but rather inadequate for classification. In addition, the Eigenspace constructed by the Eigenimages have been shown to be sensitive to lightning conditions, which account for the most influential factor in the Eigenfaces space.
A more suitable course of action in these cases of deformable pattern matching would be to employ the Fisherfaces methodology. Fisherfaces is also more suitable for classification since it separates the Eigenspace into classes, such that within-class variance is low while between-class variance is high. In practice, a projection matrix is found in such a way that the ratio of the determinant of the between-class to the within-class scatter matrices is maximized after projection. The projection is analogous to the reduction of dimensionality in PCA. With Fisherfaces, one can start classifying faces post their recognition in images. The advantage of using either the Eigenfaces or Fisherfaces is that, once the Eigenspace has been defined (or learned), the recognition can run at real-time.
To tackle deformation stemming from partial occlusion, the Eigenspace has to allow partial matching of templates or reduction of the score given. This is of course computationally intensive, since face representation by partial templates increases the Eigenspace by several orders of magnitude. It is then more suitable to allow partial face match by increasing the sensitivity to the face matching score. In the case of video, combination of crude object tracking can assist in determining degree of occlusion.
Apparent object deformation caused by camera motion is a major source of trouble for facial recognition and classification. The major difficulty is obtaining stabilization, in the sense that camera and object motions are deconvolved. The methodologies to treat such cases of deformable pattern matching are beyond the scope of this short article.
Deformable pattern matching in practice
Vison-based applications, in which deformable objects need to be detected, recognized and classified, encompass a large spectrum of technical and conceptual challenges, for which off-the-shelf methodologies are yet to perform in a satisfactory manner. Tailoring solutions in the scope of the challenges ahead is necessary to acquire a high degree of robustness, reputability and accuracy. At RSIP Vision we have been solving problems by tailoring cutting edge solutions for more than 25 years. Please visit our projects page to explore the spectrum of consulting and R&D expertise offered to our clients around the globe.