One of the most interesting challenges in video classification is how to determine the content of a video, especially when the uploader did not define any video tags. The need for assessing the content stems from the advertisers’ desire to target specific audiences based on the content of the videos they watch. By securing a proper classification to all the items in their catalog, video-sharing platforms like YouTube, Dailymotion, Vimeo and many others can offer better targeted advertising to both clients and viewers. For example, a manufacturer of running shoes will prefer to place ads around sports videos and the like.

Most videos focusing on great sporting events like the Super Bowl or the NBA games are correctly tagged, but are generally too expensive for most advertisers. Those are more attracted by other relevant videos, containing various kinds of sports (even amateur) and which are often untagged. This area of video classification is a very hot topic and RSIP Vision’s engineers have developed software for one of the projects completed in this area.

Our Solution:

The main idea is to train the system to recognize such a video when it is uploaded. One simple option is to detect specific features in the image, which can be done by a SIFT (Scale-Invariant Feature Transform) algorithm or similar, before creating visual statistics of these features. One known technique to do this is BoVW (Bag of Visual Words): this algorithm springs from a well-known representation method called BoW (Bag of Words), often used in natural language processing and information  retrieval.

This technology was successively converted to the computer vision area by applying the same method on the features discovered in the image. Bag of Visual Words allows us to identify and classify a video by comparing visual histograms, where the technique of PCA (Principal Component Analysis) helps us find the most common factors and make a match faster. The procedure is parallel to that used in creating word histograms, which are traced according to word frequency in a text: visual clues is represented in a histogram according to its presence among the features of that video.

In fact, the ease of the classification task depends on the subject at hand: it might be quite simple to detect sports played on snow or grass, but other subjects lacking a distinct visual appearance will be more difficult to identify.

It is worth noting that advertisers might want to exclude adult content from their choice of subjects. Filters based on colors and motion detection are very successful in sorting out and eliminating adult videos in order to fulfil that request too.

The video classification system can also use other keys found in the images: for instance, it might detect texts in the wild, pass them to the OCR engine and finally get some keywords which might give precious clues regarding the subject of the video. In the same way, logo detection might serve as a useful tool to shed additional light on the video at hand.

Video Classification Software:

This project resulted in leading video classification software that uses machine learning and object recognition to match adverts with captive audience members interested in related content.

This article was first published on our magazine Computer Vision News of June 2016 at pages 30-31.

Here is the archive with all issues of Computer Vision News and where you can subscribe for free.