Any scores out of date? Make a Pull Request.
This is a leaderboard comparing world record human performance to start of the art machine performance in the Arcade Learning Environment (ALE).
Game | Top Human Score | Top Machine Score | Best | Best Machine | Learning Type | Notes |
---|---|---|---|---|---|---|
Alien | 103583 | 9491 | Human | Rainbow | Q-gradient | |
Amidar | 71529 | 5131 | Human | Rainbow | Q-gradient | |
Assault | 8647 | 14497 | Machine | A3C | Policy-gradient | |
Asterix | 1000000 | 428200 | Human | Rainbow | Q-gradient | |
Asteroids | 57340 | 5093 | Human | A3C | Policy-gradient | * |
Atlantis | 10604840 | 2311815 | Human | PPO | Policy-gradient | |
Bank Heist | 45899 | 1611 | Human | Dueling DDQN | Q-gradient | |
Battlezone | 98000 | 62010 | Human | Rainbow | Q-gradient | |
Beamrider | 52866 | 26172 | Human | Prioritized DDQN | Q-gradient | 1B |
Berzerk | 1057940 | 2545 | Human | Rainbow | Q-gradient | |
Bowling | 279 | 135 | Human | HyperNEAT | Genetic Policy | J |
Boxing | 99 | 99 | Draw | Rainbow, ACER | Q,Policy-gradient | |
Breakout | 864 | 766 | Human | A3C | Policy-gradient | |
Centipede | 453916 | 25275 | Human | HyperNEAT | Genetic Policy | |
Chopper Command | 999999 | 16654 | Human | Rainbow | Q-gradient | |
Crazy Climber | 219900 | 183135 | Human | Prioritized DDQN | Q-gradient | |
Defender | 5443150 | 233021 | Human | A3C | Policy-gradient | N |
Demon Attack | 100100 | 115201 | Machine | A3C | Policy-gradient | + |
Enduro | 1666 | 2260 | Machine | Distribution DQN | Q-gradient | |
Fishing Derby | 51 | 46 | Human | Dueling DDQN | Q-gradient | |
Freeway | 38 | 34 | Human | Rainbow | Q-gradient | 1B |
Frostbite | 248460 | 9590 | Human | Rainbow | Q-gradient | |
Gopher | 30240 | 70354 | Machine | Rainbow | Q-gradient | |
Gravitar | 39100 | 1419 | Human | Rainbow | Q-gradient | |
HERO | 257310 | 55887 | Human | Rainbow | Q-gradient | J |
Ice Hockey | 25 | 10 | Human | HyperNEAT | Genetic Policy | |
Kangaroo | 1424600 | 14854 | Human | Dueling DDQN | Q-gradient | N |
Krull | 104100 | 12601 | Human | HyperNEAT | Genetic Policy | N |
Kung Fu Master | 79360 | 52181 | Human | Rainbow | Q-gradient | |
Montezumas Revenge | 400000 | 384 | Human | Rainbow | Q-gradient | |
Ms Pacman | 211480 | 6283 | Human | Dueling DDQN | Q-gradient | J |
Name This Game | 21210 | 13439 | Human | Prioritized DDQN | Q-gradient | |
Phoenix | 251180 | 108528 | Human | Rainbow | Q-gradient | |
Pitfall | 114000 | 0 | Human | Several | Q-gradient | |
Pong | 21 | 21 | Draw | Several | Several | E |
Private Eye | 101800 | 15172 | Human | Distribution DQN | Q-gradient | ** |
Qbert | 2400000 | 33817 | Human | Rainbow | Q-gradient | N |
Road Runner | 210200 | 73949 | Human | A3C | Policy-gradient | |
Robot Tank | 68 | 65 | Human | Dueling DDQN | Q-gradient | |
Seaquest | 294940 | 50254 | Human | Dueling DDQN | Q-gradient | |
Skiing | -3272 | -6522 | Human | Vanilla GA | Genetic Policy | |
Space Invaders | 43710 | 23864 | Human | A3C | Policy-gradient | 1B |
Star Gunner | 77400 | 164766 | Machine | A3C | Policy-gradient | N |
Time Pilot | 34400 | 27202 | Human | A3C | Policy-gradient | |
Tutankham | 2026 | 280 | Human | ACER | Policy-gradient | |
Venture | 38900 | 1107 | Human | Distribution DQN | Q-gradient | N |
Video Pinball | 3523988 | 533936 | Human | Rainbow | Q-gradient | 1B |
Wizard of Wor | 129500 | 18082 | Human | A3C | Policy-gradient | |
Yars Revenge | 2011099 | 102557 | Human | Rainbow | Q-gradient | ++ |
Zaxxon | 83700 | 24622 | Human | A3C | Q-gradient |
N
NTSC, no emulator results availableJ
Score from jvgs.netE
Game is so easy there's no world record category1B
Game 1, Difficulty B*
Game 6, Difficulty B+
Game 7, Difficulty B**
Game 1, Points++
Game 2, Difficulty A
I decided to put this together after noticing two trends in reinforcement learning papers:
- Not comparing to state of art.
- Comparing an algorithm with 1000s of hours playtime to a human that played for a few hours.
Respectively, these make it hard to see the relative progress of the field from paper to paper, and the absolute progress compared to human level game playing.
Though RL papers routinely quote >100% normalized human performance, the reality is that machine learning algorithms just barely beat humans on only 5 out of 49 games here, and humans have a substantial lead in the rest. We have a long way to go.
When we exclude human scores, per-algorithm win count are as follows (two way ties friendly, three or more unfriendly):
Algorithm | Type | Wins |
---|---|---|
Rainbow | Q-gradient | 18 |
A3C (FF and LSTM) | Policy-gradient | 11 |
Dueling DDQN | Q-gradient | 6 |
HyperNEAT | Genetic Policy | 4 |
Distribution DQN | Q-gradient | 3 |
Prioritized DDQN | Q-gradient | 3 |
ACER | Policy-gradient | 2 |
PPO | Policy-gradient | 1 |
Vanilla GA | Genetic Policy | 1 |
Noisy DQN | Q-gradient | 0 |
Vanilla ES | Genetic Policy | 0 |
Since the ALE uses the stella Atari emulator, the Top Human Score is the top human score on an emulator. Atari (and other game) releases tend to vary across region, so this is the only way to ensure that both human and machine have, for example, equal access to game breaking bugs.
If possible, scores are taken from Twin Galaxies, which is the Guiness source for game world records, otherwise links are provided to score sources.
A valid machine score is one achieved by a reinforcement learning algorithm trained directly on pixels and raw rewards, such as one that can be trained against common ALE wrappers / forks, like gym or xitari. This means that algorithms like this one which use hand-engineered intermediate rewards do not qualify.
Reference papers vary in:
- Start type (no-op, random-op, human-op)
- Number of test trials (from 30-200)
I take the approach here of favouring no-op starts over random ones (they usually have higher scores anyway), and treating all sample sizes equally.
- Human
- A2C
- A3C
- ACER
- Distribution DQN
- Dueling DDQN
- HyperNeat, also checked against original paper
- Rainbow
- Prioritized DDQN
- Proximal Policy Optimization
- Vanilla Evolution Strategies
- Vanilla Genetic Algorithm