Understanding Facebook’s ReBeL — A Noteworthy Step in Artificial General Intelligence

Or is it?

Facebook recently introduced ReBeL — Recursive Belief-based Learning as a “general game-playing AI bot.”

In 1997, Deep Blue — IBM’s chess-playing computer defeated the world champion of chess, Garry Kasparov
In 2011, Watson — IBM’s computer made for the quiz show Jeopardy won more than three times that its human competitors.
In 2016, Alphabet’s AlphaGo defeated Lee Sedol, then world champion.

We have known AI excelling at the above-mentioned perfect-information games such as Chess and even defeating world champions of the respective games.

In a nutshell, perfect-information games are ones where nothing about the game is hidden from the participants. On the other hand, imperfect-information games like Poker where the entire game is to keep your cards secretive and to play your opponents.

As such, creating an AI for imperfect-information games like Poker is difficult but what becomes more challenging is programming a single AI to exceed at both games (a basic step towards AGI — Artificial General Intelligence).

In this article, I will talk about:

What Is Generalized AI or Artificial General Intelligence (AGI)
What Is Facebook’s ReBeL?
Related AI for Imperfect-Information Games
How Does ReBeL Work?
[Possible] Applications of ReBeL (RL + search)
Limitations of ReBeL

Let’s begin!

Remember R2-D2 from Star Wars? That’s the future of Artificial General Intelligence (AGI) we are aiming for.

AGI, also referred to as ‘strong AI’, is basically where machine intelligence applies its consciousness to perform in a wide variety of contexts — more like how we do it.

In a perfect world, an AI bot trained to play games can perform equally well at both perfect-information games and imperfect-information games. For instance, Facebook’s ReBeL can play Chess and Poker with equal ease, which is a step forward in AGI.

The Challenge

The attempt to create a “strong AI” is more about mirroring human intelligence. To give you a better perspective, here is why mirroring human intelligence has been challenging.

A significant part of human intelligence is tacit knowledge. We don’t actually have a formula we apply to perform our daily tasks.

If you ask a professional bicycler to articulate how they calculate the degree to which they bend to a certain turn, the output they give would normally regress to a lower level.

And, even acquiring that knowledge won’t help someone become a better bicycler.

As such, transferring this tacit knowledge to an AI robot is more difficult than it might seem. Hubert and Stuart Dreyfus — American philosophers, said in their book that “ expert systems are not able to capture the skills of an expert performer.”

However, that was said in 1986 and a lot has changed since then. And one of the very relevant discoveries in this context is Big Data.

The ability of machines to process terabytes of data has definitely broadened the applications of AI and its use cases.

In Recursive Belief-based Learning (ReBeL), Facebook has used Reinforcement Learning (RL) + search that has worked perfectly for perfect-information games in the past.

But, this is where things become interesting. In perfect-information games, RL+search machine learning models make assumptions that no longer are true for imperfect information games.

Unlike prior models, ReBel makes “decisions by factoring in the probability distribution of different beliefs each player might have about the current state of the game, which we call a public belief state (PBS).”

ReBeL has achieved state-of-the-art results with lesser domain knowledge as compared to any other AI Poker bot, to date!

Source

Note: Reinforcement Learning (RL) basically allows its agents to create their own learning experience by interacting directly with the environment.

The agents are rewarded every time they make a right move and penalized for every wrong move they make. The ultimate goal of these agents is to maximize their reward points.

Artificial Intelligence for imperfect-information games has been developed in the past. In this section, I am going to mention two different AI that gave a state-of-the-art performance against professional players of imperfect-information games.

DeepStack: DeepStack came in the highlights when it defeated professional Poker players and winning 486 milli-big-blinds per game in Taxes Hold ‘em.

Like ReBeL, the PBS value function during search was used in DeepStack. But unlike ReBeL, it was not trained using self-play RL. DeepStack, in a way, used an inefficient system where it generates random PBSs and tries to estimate their values using search.

Pluribus Poker AI: The Pluribus Poker AI was trained using self-play where it played against a blueprint of itself. So, if at any point, it needs to know the outcome of a different move than the current one, it can ask itself.

During the test time, Pluribus deploys search to choose its best strategy which is a limitation in itself because it doesn’t use search during its training time.

Facebook’s AI research titled Facebook, Carnegie Mellon build first AI that beats pros in 6-player poker reads –

“Pluribus’s self-play outputs what we refer to as the blueprint strategy for the entire game. During actual play, Pluribus improves upon this blueprint strategy using its search algorithm. But Pluribus does not adapt its strategy to the observed tendencies of its opponents.”

Before we get into how Facebook’s ReBeL works, here is a note of how it uses PBS in a different way than any previous AI.

Public Belief System (PBS): At the beginning of an imperfect information game, say Poker, the belief distribution of each player is uniform random. ReBeL, with every move that is made, updates its belief distribution based on the cards played.

Here’s a basic outline of how ReBeL works.

Even in imperfect games like two-man Poker, one can assume that the strategies of both players are known to each other. This, in turn, also makes the probability of a player choosing “each action for each possible card” also common knowledge.

So, a years-long idea that imperfect games are continuous-state perfect-information games comes into play here. But, for the first time, ReBeL has cracked how to use self-play reinforcement learning in an adversarial setting.

To reduce the resultant complexity of search in two-player zero-sum games, ReBeL uses CFR — Counterfactual Regret Minimization. CFR is a well-known framework for solving large imperfect-information games.

Facebook tested ReBeL’s implementation for two games — heads-up no-limit Texas Hold’em and Liar’s Dice. Facebook has even open-sourced its implementation of ReBeL.

Apart from playing perfect and imperfect information games with equal ease, what could be possible applications of ReBeL?

ReBeL has unlocked the door to many universal applications by effectively using RL + search. It could be used to predict upcoming requirements such as hiring surges, energy optimization in factories, and much more!

Besides these, RL + search can also be implemented in search predictions and recommendations in apps like Netflix.

Here are a few other possible applications of ReBeL.

Fraud Detection & Cybersecurity

Cyberattackers are becoming smarter each day. And with (Deep) Reinforcement Learning methods + search, recognizing and defending against these latest & sophisticated attacks can become easier and autonomous.

For instance, an IDS — Intrusion Detection System based on (D) RL methods and search can not only detect but also reduce the impact of malevolent activities on web & mobile apps, networks, and more!

Self-Driving Cars

Like imperfect-information games or even games of incomplete information, it is impossible to predict everything that happens when driving a car. So, you cannot possibly feed every possible scenario to define the intelligence of self-driving cars.

Reinforcement Learning, combined with search, can be used for motion planning, trajectory optimization, controller optimization, dynamic pathing, and scenario-based learning in self-driving cars.

Real-Time Bidding in Display Advertising

Using Multi-Agent Reinforcement Learning along with search for Real-Time Bidding can offer a significant improvement in ROI, effectiveness, and reach of display ads.

For further info, you can read this paper which talks about this in detail in the context of a real industrial setting.

ReBeL is definitely a step forward in creating a more General AI but it has a few limitations. Here are a few of them.

ReBeL works for two-player zero-sum games which limits its usage in real-life scenarios.
Another thing ReBeL relies on is knowing the rules of the game beforehand.
The computational requirements of ReBeL increase significantly for games that are played with little common knowledge, like Recon Chess.

Not long ago, AlphaZero — created to play games like Chess and Go was extended to MuZero for playing games with unknown rules.

Something similar for ReBeL can be truly groundbreaking, making it more powerful and capable of applications across various domains.

References:
[1] Anton Bakhtin and Noam Brown, ReBeL: A general game-playing AI bot that excels at poker and more, Facebook AI
[2]Noam Brown, Facebook, Carnegie Mellon build first AI that beats pros in 6-player poker, Facebook AI
[3]Ragnar Fjelland, Why general artificial intelligence will not be realized, Humanities and Social Sciences Communications
[4]B Ravi Kiran and team, Deep Reinforcement Learning for Autonomous Driving: A Survey, arXiv.org

Or is it?

The Challenge

Fraud Detection & Cybersecurity

Self-Driving Cars

Real-Time Bidding in Display Advertising

Footer