Advertisement

AI learns to predict human behavior after watching hundreds of hours of The Office and YouTube videos

MIT researchers construct an algorithm that understand human visual and social cues

AI_Machine_Learning_MIT

Humans who self-indulge in hundreds of hours of television are scoffed at as social rejects, but computers who do the same are praised for their social insight. An algorithm created by MIT’s Computer Science and Artificial Intelligence Laboratory learned to predict human social interactions after watching 600 hours’ worth of YouTube videos and television shows like The Office, Desperate Housewives, and Scrubs .

A seemingly dubious lineup at best, MIT doctoral candidate and project researcher Carl Vondrick reasons that the cultural memetics and accessibility of the media are realistic enough for base-level predictions. “We just wanted to use random videos from YouTube,” Vondrick said. “The reason for television is that it’s easy for us to get access to that data, and it’s somewhat realistic in terms of describing everyday situations.”

Using the artificial intelligence technique of “deep learning,” the algorithm articulated human interactions into patterns that it could understand through machine logic, sorting out what it found important and what it did not. After sifting through data and categorizing for hours, a set of rules begin to emerge, defining what hugs, kisses, high-fives, and handshakes are.

“Humans don't need to have someone give us thousands of examples of things saying, 'This is a kiss.' We just need a few examples,” says Carl Vondrick, a doctoral candidate at MIT and one of the researchers on the project. “So what's powerful about this is that [the algorithm] can learn its own rules.”

After sufficient data acquisition, researchers showed the software videos of people one second away from one of four interactions, and it generated several possibilities of what would ensue, using what it learned from all the hours spent watching the footage. The results? Not too shabby. The computer accurately guesses 43% of the time, whereas its human counterparts were correct 71% of the time. Given time and greater content consumption, its accuracy will only improve.

To what end will this benefit humankind?  Given the algorithm’s ability to understand visual cues and interpret human behavior, its descendant may serve as the foundation for intelligent home assistant robots or security cameras that automatically request emergency services as distress unfolds in real-time. Conversely, such tools may be used to influence human behavior subtly, by inserting specifically-crafted memetic messages into mainstream media.

Vondrick’s team’s research is not the first foray into video prediction algorithms, but it’s the most accurate to date. Deviating from the previous video prediction techniques—which attempted to create pixel-by-pixel representations of possible future outcomes—the MIT approach relied on abstract image wherein the computer

Previous video predicting techniques sought to create pixel-by-pixel representations of possible future outcomes, rendering everything in full detail. The MIT approach deviates from this in that it relies entirely on abstract representations of cues it deems important to social interactions. Rather than painting a full picture of every colored pixel, the algorithm pictures objects and actions more generally, discerning the difference between a face and an inanimate object. Pedro Domingos, a machine learning expert, and professor at the University of Washington likens the approach to the same basic principle used by Facebook to identify suggested photo taggings.

 “It's hard for even a professional painter to paint something realistic,” he says. “So we were arguing that it wasn't necessary to render the full future. Instead, you could try to predict an abstract version of the image.”

In a more complex experiment that tests contextual awareness, the algorithm correctly guessed 11% of the time when asked to predict what item will appear within the next five seconds. A minute achievement, but amazing progress nonetheless when considering the millions of years of evolution that have gone into creating nature’s own visual and interpretation system.

Source: NPR.org

Advertisement



Learn more about Electronic Products Magazine

Leave a Reply