Creative coding studio Støj is experimenting with object recognition in a project called An algorithm watching a movie trailer . Lasse Korsgaard and Andreas Refsgaard are exploring movie trailers through an algorithm designed to detect objects. The technology has been applied to the trailer for Warner Bros.' “The Wolf of Wall Street” and is presented in various ways.
Korsgaard and Refsgaard use object recognition, which identifies predetermined objects within a video. The human eye has no problem with such a task, regardless of scale or presentation. Historically, algorithms have struggled with the task. However, recent improvements have increased both the speed and accuracy with which algorithms successfully identify objects. New technology allows them to identify multiple objects within the same image.
“We wondered what a fast-paced movie trailer would look like seen through the lens of an object-detection algorithm,” Støj writes on their website. To do this, they sent all frames from the “Wolf of Wall Street”movie trailer through Yolo-2 with a threshold of .15, which means that “the algorithm will only react to objects detected with a confidence of 15% or higher.”
Yolo-2, which stands for “You Only Look Once,” processes images at 40−90 FPS and has a mAP on VOC 2007 of 78.6%, as well as a mAP of 44% on COCO test-dev. Unlike previous detection systems that work by repurposing classifiers and localizers to perform detection and applying the model to an image at various scales and locations detecting images at high-scoring regions, Yolo-2 applies a single neural network to an image. Then, the image is split into regions and the network predicts bounding boxes and probabilities within the regions.
Yolo-2 makers argue that this system has many advantages over previous systems. On their website, they write, “It looks at the whole image at test time so its predictions are informed by global context in the image.” It is more than 1,000 times faster than R-CNN and 100 times faster than Fast R-CNN.
Yolo-2 makes several improvements on Version 1, using a fully convolutional model. It trains on whole images and adjusts priors on bounding boxes, predicting x- and y-coordinates directly.
If you’re interested in detecting objects with the Yolo-2 system using a pre-trained model, install Darknet and follow the tutorial found here.
Støj, using Yolo-2 technology, filtered the video in three different ways — displaying only the parts of the trailer with recognized objects, the object recognition markers themselves, and finally a video in which the area of object recognition is censored with blurring.
Image courtesy of prosthetic knowledge
Learn more about Electronic Products Magazine