Advertisement

DeepMind’s AI learned to adopt “highly aggressive” behavior when losing

Deep reinforcement learning lets scientist model factors contributing toward AI agents’ cooperation

System_Shock_SHODAN_AI

A few years earlier, Stephen Hawking warned that uncontrolled AI advancement could one day spell doom for humankind. Nearly echoing Hawking's 2014 statement, scientists at the Google's DeepMind AI lab have created an instance in which AI adapts its behavior to changing environmental circumstance, much like humans. By training artificial agents with deep multi-agent reinforcement learning and modeling behavior using the Prisoner’s Dilemma , scientists found that AI agents act in an “aggressive manner” in the presence of resource scarcity, but may also cooperate when working as a team. Is this hyperbole or the beginning of Hawking's doomsday prophecy? Read on to decide for yourself.

The experiment begin with two 2D games: a fruit-gathering game and a wolfpack-hunting game. Both are extremely rudimentary, representing objects and AI agents with single pixels similar to those used in DeepMind’s previous work with Atari .  

According to the research (not yet published in a peer-reviewed journal), the AI system was trained to collect apples using the deep reinforcement learning  — a paradigm of machine learning inspired by trial-and-error from rewards and punishments. Next, two rival AI agents — represented by red and blue pixels — competed by gathering the most apples. Whenever an agent collected an apple, it was rewarded with a “1” before the apple disappeared. Agents also possessed the ability to “zap” one another with a “beam” that temporary removes the opposing player from the game for a set period; therefore, sabotaging the opponent increased the likelihood of winning.

Eventually, a pattern began to emerge after 40 million in-game steps: the agents began exhibiting increasing levels of “highly aggressive” behavior as the number of apples (resources) dwindled. When there were abundant apples in the environment, the agents learned to coexist, but as the number of apples shrank, the agents began tagging one another to give themselves time to collect the remaining apples.

“Less aggressive policies emerge from learning in relatively abundant environments with less possibility for costly action,” the paper says. “The greed motivation reflects the temptation to take out a rival and collect all of the apples for oneself.”

Behavior took a different turn in the second game. Here, two in-game characters acting as wolves chase a third object represented as the “prey.” If both wolves are near the prey as it’s captured, they each receive a reward, suggesting the idea that “prey is dangerous; a lone wolf can overcome it but is at risk of losing the carcass to scavengers,” says the paper.

Similar to the gathering game, AI agents learned strategies by striving to maximize reward (as is the case with reinforced deep learning). What set the Wolfpack game apart was that in chasing rewards, the AI characters “learned” to cooperate by seeking each other out before hunting the prey, or by cornering the prey while the waiting for the other “wolf” to arrive. 

“Interestingly, in another game called Wolfpack, we find that greater capacity to implement complex strategies leads to more cooperation between agents, the opposite of the finding with Gathering. So, depending on the situation, having a greater capacity to implement complex strategies may yield either more or less cooperation,” writes Google. See for yourself in the video below.  

As obvious as the outcomes of the research may seem, such tests are vital for understanding how machine neural networks learn based on incentives and how they react in light of missing information. In short, the future of AI depends on testing the AI within a series of progressively more complex scenarios.   

One interesting observation is how quickly the AI systems embrace aggression when the resources stack against competing interests. The DeepMind experiment obviously uses very limited constraints and may be a bit hyperbolic, to say the least; it nonetheless represents an analogy of what may occur if objectives do not balance against the overall goal of benefiting humankind beyond all. 

Source: Wired via Google

Advertisement



Learn more about Electronic Products Magazine

Leave a Reply