A new artificial intelligence program modeled after the human brain taught itself to play 49 Atari 2600 video games, performing as good, if not better, than humans in more than half the games. Dubbed Deep Q-network, the program was developed by Google’s DeepMind, and marks the first time an AI has learned to master a wide range of tasks on its own, “up until now, self-learning systems have only been used for relatively simple problems. For the first time, we have used it in a perceptually rich environment to complete tasks that are very challenging to humans,” states DeepMind’s founder and president of engineering, Demis Hassabis.
Having been acquired by in 2014 for $400 million, DeepMind’s experiment highlights the growing value companies are placing on machine learning. As many may recall, game-playing AI’s have previously emerged: Back in 1997, IBM’s chess-playing Deep Blue computer defeated the chess grand master Garry Kasparov. And later, in 2011, IBM’s Watson defeated the reigning Jeopardy champions Ken Jennings and Brad Rutter at their own game. But of course, while these examples are notable, they represent systems preprogrammed with all the movies and trivia knowledge beforehand, and pale in comparison to the Deep Q-network, which succeeded through a trial-and-error-based approach.
Hassabis exclaims that in their experiment, DeepMind didn’t program DQN with instructions on how to win the game, nor how to even play the game or what its rules are. DQN plays the 49 games “right out of the box,” scoring more than 75 percent of the human score on more than half the titles, including Space Invaders, Pong, Enduro, in addition to boxing and tennis games. “You literally give it a new game, a new screen and it figures out after a few hours of game play what to do,” adds Hassabis.
“(Deep Q-network) learns how to play from the ground up,” Hassabis said. “The idea is that these types of systems are more human-like in the way they learn. Our brains make models that allow us to learn and navigate the world. That’s exactly the type of system we’re trying to design here.”
The only information fed into the system consists of the pixel location on the screen and the goal of reaching a high score, everything else was learned by moving the controls and leveraging a process called “deep reinforcement learning,” which Vlad Mnih —the author of the research paper — describes as the same learning process enacted by humans and animals.
“We are trying to explore the space of algorithms for intelligence. We have one example of (intelligence) — the human brain,” Hassabis said. “We can be certain that reinforced learning is something that works and something humans and animals use to learn.”
At first glance, the Atari 2600 games appear rudimentary with their 70s and 80s game mechanics, creating the perception that writing the code to beat them is quite trivial (which it is), but the key breakthrough here is that DQN uses the pixels to create its own perceptual inputs devoid of user input.
DQN outperformed professional high-scoring gamers in 29 of the games including Video Pinball and Breakout, but it struggled with Ms. Pac-Man, Private Eye, and Montezuma’s Revenge, games requiring a certain amount of pre-planning and foresight; DQN only plans 1 frame ahead based on the 3 previous plus current frame.
The ability to scan image a large amount pixels and extra useful data will be a key development in the fruition of self-driving cars and robots reliant on reacting to their surroundings. DQN is a step in the right direction.
Source: BBC, PBS via Nature
Learn more about Electronic Products Magazine