POPGym: Partially Observable Process Gym

🚨 Try our new GPU-accelerated, Atari-style POPGym Arcade 🚨

POPGym is designed to benchmark memory in deep reinforcement learning. It contains a set of environments and a collection of memory model baselines. The full paper is available on OpenReview.

Please see the documentation for advanced installation instructions and examples. The environment quickstart will get you up and running in a few minutes.

Quickstart Install

bash
# Install base environments, only requires numpy and gymnasium
pip install popgym
# Also include navigation environments, which require mazelib
pip install "popgym[navigation]"

Quickstart Usage

python
import popgym
from popgym.wrappers import PreviousAction, Antialias, Flatten, DiscreteAction
env = popgym.envs.PositionOnlyCartPoleEasy()
print(env.reset(seed=0))
wrapped = DiscreteAction(Flatten(PreviousAction(env))) # Append prev action to obs, flatten obs/action spaces, then map the multidiscrete action space to a single discrete action for Q learning
print(wrapped.reset(seed=0))

POPGym Environments

POPGym contains Partially Observable Markov Decision Process (POMDP) environments following the Gymnasium interface. POPGym environments have minimal dependencies and are fast enough to solve on a laptop CPU in less than a day. We provide the following environments:

Environment	Tags	Temporal Ordering	Colab FPS	Macbook Air (2020) FPS
Battleship	Game	None	117,158	235,402
Concentration	Game	Weak	47,515	157,217
Higher Lower	Game, Noisy	None	24,312	76,903
Labyrinth Escape	Navigation	Strong	1,399	41,122
Labyrinth Explore	Navigation	Strong	1,374	30,611
Minesweeper	Game	None	8,434	32,003
Multiarmed Bandit	Noisy	None	48,751	469,325
Autoencode	Diagnostic	Strong	121,756	251,997
Count Recall	Diagnostic, Noisy	None	16,799	50,311
Repeat First	Diagnostic	None	23,895	155,201
Repeat Previous	Diagnostic	Strong	50,349	136,392
Position Only Cartpole	Control	Strong	73,622	218,446
Velocity Only Cartpole	Control	Strong	69,476	214,352
Noisy Position Only Cartpole	Control, Noisy	Strong	6,269	66,891
Position Only Pendulum	Control	Strong	8,168	26,358
Noisy Position Only Pendulum	Control, Noisy	Strong	6,808	20,090

Feel free to rerun this benchmark using this colab notebook.

POPGym Baselines

[!WARNING]
The baselines rely on difficult-to-maintain dependencies that are no longer supported. You will need to install an old version of python and downgrade some packages if you intend to use them.

POPGym baselines implements recurrent and memory model in an efficient manner. POPGym baselines is implemented on top of rllib using their custom model API.

bash
pip install "popgym[baselines]"

We provide the following baselines:

Contributing

Follow style and ensure tests pass

bash
# Using uv, you can also use pip instead
uv sync --extra navigation
uv run pre-commit install
uv run pytest tests/

Citing

@inproceedings{
morad2023popgym,
title={{POPG}ym: Benchmarking Partially Observable Reinforcement Learning},
author={Steven Morad and Ryan Kortvelesy and Matteo Bettini and Stephan Liwicki and Amanda Prorok},
booktitle={The Eleventh International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=chDrutUTs0K}
}