CORL
High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC
**CORL** is a High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC The project is written primarily in Python, distributed under the Apache License 2.0 license, first published in 2022. It has gained significant community traction with 1,365 stars and 171 forks on GitHub. Key topics include: d4rl, gym, offline-reinforcement-learning, reinforcement-learning.
CORL (Clean Offline Reinforcement Learning)
<img src="https://img.shields.io/badge/license-Apache_2.0-blue">
🧵 CORL is an Offline Reinforcement Learning library that provides high-quality and easy-to-follow single-file implementations of SOTA ORL algorithms. Each implementation is backed by a research-friendly codebase, allowing you to run or tune thousands of experiments. Heavily inspired by cleanrl for online RL, check them out too!<br/>
- 📜 Single-file implementation
- 📈 Benchmarked Implementation for N algorithms
- 🖼 Weights and Biases integration
- ⭐ If you're interested in discrete control, make sure to check out our new library — Katakomba. It provides both discrete control algorithms augmented with recurrence and an offline RL benchmark for the NetHack Learning environment.
Getting started
bashgit clone https://github.com/tinkoff-ai/CORL.git && cd CORL pip install -r requirements/requirements_dev.txt # alternatively, you could use docker docker build -t <image_name> . docker run --gpus=all -it --rm --name <container_name> <image_name>
Algorithms Implemented
D4RL Benchmarks
You can check the links above for learning curves and details. Here, we report reproduced final and best scores. Note that they differ by a significant margin, and some papers may use different approaches, not making it always explicit which reporting methodology they chose. If you want to re-collect our results in a more structured/nuanced manner, see results.
Offline
Last Scores
Gym-MuJoCo
| Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
|---|---|---|---|---|---|---|---|---|---|---|
| halfcheetah-medium-v2 | 42.40 ± 0.19 | 42.46 ± 0.70 | 48.10 ± 0.18 | 49.46 ± 0.62 | 47.04 ± 0.22 | 48.31 ± 0.22 | 64.04 ± 0.68 | 68.20 ± 1.28 | 67.70 ± 1.04 | 42.20 ± 0.26 |
| halfcheetah-medium-replay-v2 | 35.66 ± 2.33 | 23.59 ± 6.95 | 44.84 ± 0.59 | 44.70 ± 0.69 | 45.04 ± 0.27 | 44.46 ± 0.22 | 51.18 ± 0.31 | 60.70 ± 1.01 | 62.06 ± 1.10 | 38.91 ± 0.50 |
| halfcheetah-medium-expert-v2 | 55.95 ± 7.35 | 90.10 ± 2.45 | 90.78 ± 6.04 | 93.62 ± 0.41 | 95.63 ± 0.42 | 94.74 ± 0.52 | 103.80 ± 2.95 | 98.96 ± 9.31 | 104.76 ± 0.64 | 91.55 ± 0.95 |
| hopper-medium-v2 | 53.51 ± 1.76 | 55.48 ± 7.30 | 60.37 ± 3.49 | 74.45 ± 9.14 | 59.08 ± 3.77 | 67.53 ± 3.78 | 102.29 ± 0.17 | 40.82 ± 9.91 | 101.70 ± 0.28 | 65.10 ± 1.61 |
| hopper-medium-replay-v2 | 29.81 ± 2.07 | 70.42 ± 8.66 | 64.42 ± 21.52 | 96.39 ± 5.28 | 95.11 ± 5.27 | 97.43 ± 6.39 | 94.98 ± 6.53 | 100.33 ± 0.78 | 99.66 ± 0.81 | 81.77 ± 6.87 |
| hopper-medium-expert-v2 | 52.30 ± 4.01 | 111.16 ± 1.03 | 101.17 ± 9.07 | 52.73 ± 37.47 | 99.26 ± 10.91 | 107.42 ± 7.80 | 109.45 ± 2.34 | 101.31 ± 11.63 | 105.19 ± 10.08 | 110.44 ± 0.33 |
| walker2d-medium-v2 | 63.23 ± 16.24 | 67.34 ± 5.17 | 82.71 ± 4.78 | 66.53 ± 26.04 | 80.75 ± 3.28 | 80.91 ± 3.17 | 85.82 ± 0.77 | 87.47 ± 0.66 | 93.36 ± 1.38 | 67.63 ± 2.54 |
| walker2d-medium-replay-v2 | 21.80 ± 10.15 | 54.35 ± 6.34 | 85.62 ± 4.01 | 82.20 ± 1.05 | 73.09 ± 13.22 | 82.15 ± 3.03 | 84.25 ± 2.25 | 78.99 ± 0.50 | 87.10 ± 2.78 | 59.86 ± 2.73 |
| walker2d-medium-expert-v2 | 98.96 ± 15.98 | 108.70 ± 0.25 | 110.03 ± 0.36 | 49.41 ± 38.16 | 109.56 ± 0.39 | 111.72 ± 0.86 | 111.86 ± 0.43 | 114.93 ± 0.41 | 114.75 ± 0.74 | 107.11 ± 0.96 |
| locomotion average | 50.40 | 69.29 | 76.45 | 67.72 | 78.28 | 81.63 | 89.74 | 83.52 | 92.92 | 73.84 |
Maze2d
| Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
|---|---|---|---|---|---|---|---|---|---|---|
| maze2d-umaze-v1 | 0.36 ± 8.69 | 12.18 ± 4.29 | 29.41 ± 12.31 | 82.67 ± 28.30 | -8.90 ± 6.11 | 42.11 ± 0.58 | 106.87 ± 22.16 | 130.59 ± 16.52 | 95.26 ± 6.39 | 18.08 ± 25.42 |
| maze2d-medium-v1 | 0.79 ± 3.25 | 14.25 ± 2.33 | 59.45 ± 36.25 | 52.88 ± 55.12 | 86.11 ± 9.68 | 34.85 ± 2.72 | 105.11 ± 31.67 | 88.61 ± 18.72 | 57.04 ± 3.45 | 31.71 ± 26.33 |
| maze2d-large-v1 | 2.26 ± 4.39 | 11.32 ± 5.10 | 97.10 ± 25.41 | 209.13 ± 8.19 | 23.75 ± 36.70 | 61.72 ± 3.50 | 78.33 ± 61.77 | 204.76 ± 1.19 | 95.60 ± 22.92 | 35.66 ± 28.20 |
| maze2d average | 1.13 | 12.58 | 61.99 | 114.89 | 33.65 | 46.23 | 96.77 | 141.32 | 82.64 | 28.48 |
Antmaze
| Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
|---|---|---|---|---|---|---|---|---|---|---|
| antmaze-umaze-v2 | 55.25 ± 4.15 | 65.75 ± 5.26 | 70.75 ± 39.18 | 57.75 ± 10.28 | 92.75 ± 1.92 | 77.00 ± 5.52 | 97.75 ± 1.48 | 0.00 ± 0.00 | 0.00 ± 0.00 | 57.00 ± 9.82 |
| antmaze-umaze-diverse-v2 | 47.25 ± 4.09 | 44.00 ± 1.00 | 44.75 ± 11.61 | 58.00 ± 7.68 | 37.25 ± 3.70 | 54.25 ± 5.54 | 83.50 ± 7.02 | 0.00 ± 0.00 | 0.00 ± 0.00 | 51.75 ± 0.43 |
| antmaze-medium-play-v2 | 0.00 ± 0.00 | 2.00 ± 0.71 | 0.25 ± 0.43 | 0.00 ± 0.00 | 65.75 ± 11.61 | 65.75 ± 11.71 | 89.50 ± 3.35 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
| antmaze-medium-diverse-v2 | 0.75 ± 0.83 | 5.75 ± 9.39 | 0.25 ± 0.43 | 0.00 ± 0.00 | 67.25 ± 3.56 | 73.75 ± 5.45 | 83.50 ± 8.20 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
| antmaze-large-play-v2 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 20.75 ± 7.26 | 42.00 ± 4.53 | 52.25 ± 29.01 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
| antmaze-large-diverse-v2 | 0.00 ± 0.00 | 0.75 ± 0.83 | 0.00 ± 0.00 | 0.00 ± 0.00 | 20.50 ± 13.24 | 30.25 ± 3.63 | 64.00 ± 5.43 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
| antmaze average | 17.21 | 19.71 | 19.33 | 19.29 | 50.71 | 57.17 | 78.42 | 0.00 | 0.00 | 18.12 |
Adroit
| Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
|---|---|---|---|---|---|---|---|---|---|---|
| pen-human-v1 | 71.03 ± 6.26 | 26.99 ± 9.60 | -3.88 ± 0.21 | 81.12 ± 13.47 | 13.71 ± 16.98 | 78.49 ± 8.21 | 103.16 ± 8.49 | 6.86 ± 5.93 | 5.07 ± 6.16 | 67.68 ± 5.48 |
| pen-cloned-v1 | 51.92 ± 15.15 | 46.67 ± 14.25 | 5.13 ± 5.28 | 89.56 ± 15.57 | 1.04 ± 6.62 | 83.42 ± 8.19 | 102.79 ± 7.84 | 31.35 ± 2.14 | 12.02 ± 1.75 | 64.43 ± 1.43 |
| pen-expert-v1 | 109.65 ± 7.28 | 114.96 ± 2.96 | 122.53 ± 21.27 | 160.37 ± 1.21 | -1.41 ± 2.34 | 128.05 ± 9.21 | 152.16 ± 6.33 | 87.11 ± 48.95 | -1.55 ± 0.81 | 116.38 ± 1.27 |
| door-human-v1 | 2.34 ± 4.00 | -0.13 ± 0.07 | -0.33 ± 0.01 | 4.60 ± 1.90 | 5.53 ± 1.31 | 3.26 ± 1.83 | -0.10 ± 0.01 | -0.38 ± 0.00 | -0.12 ± 0.13 | 4.44 ± 0.87 |
| door-cloned-v1 | -0.09 ± 0.03 | 0.29 ± 0.59 | -0.34 ± 0.01 | 0.93 ± 1.66 | -0.33 ± 0.01 | 3.07 ± 1.75 | 0.06 ± 0.05 | -0.33 ± 0.00 | 2.66 ± 2.31 | 7.64 ± 3.26 |
| door-expert-v1 | 105.35 ± 0.09 | 104.04 ± 1.46 | -0.33 ± 0.01 | 104.85 ± 0.24 | -0.32 ± 0.02 | 106.65 ± 0.25 | 106.37 ± 0.29 | -0.33 ± 0.00 | 106.29 ± 1.73 | 104.87 ± 0.39 |
| hammer-human-v1 | 3.03 ± 3.39 | -0.19 ± 0.02 | 1.02 ± 0.24 | 3.37 ± 1.93 | 0.14 ± 0.11 | 1.79 ± 0.80 | 0.24 ± 0.24 | 0.24 ± 0.00 | 0.28 ± 0.18 | 1.28 ± 0.15 |
| hammer-cloned-v1 | 0.55 ± 0.16 | 0.12 ± 0.08 | 0.25 ± 0.01 | 0.21 ± 0.24 | 0.30 ± 0.01 | 1.50 ± 0.69 | 5.00 ± 3.75 | 0.14 ± 0.09 | 0.19 ± 0.07 | 1.82 ± 0.55 |
| hammer-expert-v1 | 126.78 ± 0.64 | 121.75 ± 7.67 | 3.11 ± 0.03 | 127.06 ± 0.29 | 0.26 ± 0.01 | 128.68 ± 0.33 | 133.62 ± 0.27 | 25.13 ± 43.25 | 28.52 ± 49.00 | 117.45 ± 6.65 |
| relocate-human-v1 | 0.04 ± 0.03 | -0.14 ± 0.08 | -0.29 ± 0.01 | 0.05 ± 0.03 | 0.06 ± 0.03 | 0.12 ± 0.04 | 0.16 ± 0.30 | -0.31 ± 0.01 | -0.17 ± 0.17 | 0.05 ± 0.01 |
| relocate-cloned-v1 | -0.06 ± 0.01 | -0.00 ± 0.02 | -0.30 ± 0.01 | -0.04 ± 0.04 | -0.29 ± 0.01 | 0.04 ± 0.01 | 1.66 ± 2.59 | -0.01 ± 0.10 | 0.17 ± 0.35 | 0.16 ± 0.09 |
| relocate-expert-v1 | 107.58 ± 1.20 | 97.90 ± 5.21 | -1.73 ± 0.96 | 108.87 ± 0.85 | -0.30 ± 0.02 | 106.11 ± 4.02 | 107.52 ± 2.28 | -0.36 ± 0.00 | 71.94 ± 18.37 | 104.28 ± 0.42 |
| adroit average | 48.18 | 42.69 | 10.40 | 56.75 | 1.53 | 53.43 | 59.39 | 12.43 | 18.78 | 49.21 |
Best Scores
Gym-MuJoCo
| Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
|---|---|---|---|---|---|---|---|---|---|---|
| halfcheetah-medium-v2 | 43.60 ± 0.14 | 43.90 ± 0.13 | 48.93 ± 0.11 | 50.06 ± 0.50 | 47.62 ± 0.03 | 48.84 ± 0.07 | 65.62 ± 0.46 | 72.21 ± 0.31 | 69.72 ± 0.92 | 42.73 ± 0.10 |
| halfcheetah-medium-replay-v2 | 40.52 ± 0.19 | 42.27 ± 0.46 | 45.84 ± 0.26 | 46.35 ± 0.29 | 46.43 ± 0.19 | 45.35 ± 0.08 | 52.22 ± 0.31 | 67.29 ± 0.34 | 66.55 ± 1.05 | 40.31 ± 0.28 |
| halfcheetah-medium-expert-v2 | 79.69 ± 3.10 | 94.11 ± 0.22 | 96.59 ± 0.87 | 96.11 ± 0.37 | 97.04 ± 0.17 | 95.38 ± 0.17 | 108.89 ± 1.20 | 111.73 ± 0.47 | 110.62 ± 1.04 | 93.40 ± 0.21 |
| hopper-medium-v2 | 69.04 ± 2.90 | 73.84 ± 0.37 | 70.44 ± 1.18 | 97.90 ± 0.56 | 70.80 ± 1.98 | 80.46 ± 3.09 | 103.19 ± 0.16 | 101.79 ± 0.20 | 103.26 ± 0.14 | 69.42 ± 3.64 |
| hopper-medium-replay-v2 | 68.88 ± 10.33 | 90.57 ± 2.07 | 98.12 ± 1.16 | 100.91 ± 1.50 | 101.63 ± 0.55 | 102.69 ± 0.96 | 102.57 ± 0.45 | 103.83 ± 0.53 | 103.28 ± 0.49 | 88.74 ± 3.02 |
| hopper-medium-expert-v2 | 90.63 ± 10.98 | 113.13 ± 0.16 | 113.22 ± 0.43 | 103.82 ± 12.81 | 112.84 ± 0.66 | 113.18 ± 0.38 | 113.16 ± 0.43 | 111.24 ± 0.15 | 111.80 ± 0.11 | 111.18 ± 0.21 |
| walker2d-medium-v2 | 80.64 ± 0.91 | 82.05 ± 0.93 | 86.91 ± 0.28 | 83.37 ± 2.82 | 84.77 ± 0.20 | 87.58 ± 0.48 | 87.79 ± 0.19 | 90.17 ± 0.54 | 95.78 ± 1.07 | 74.70 ± 0.56 |
| walker2d-medium-replay-v2 | 48.41 ± 7.61 | 76.09 ± 0.40 | 91.17 ± 0.72 | 86.51 ± 1.15 | 89.39 ± 0.88 | 89.94 ± 0.93 | 91.11 ± 0.63 | 85.18 ± 1.63 | 89.69 ± 1.39 | 68.22 ± 1.20 |
| walker2d-medium-expert-v2 | 109.95 ± 0.62 | 109.90 ± 0.09 | 112.21 ± 0.06 | 108.28 ± 9.45 | 111.63 ± 0.38 | 113.06 ± 0.53 | 112.49 ± 0.18 | 116.93 ± 0.42 | 116.52 ± 0.75 | 108.71 ± 0.34 |
| locomotion average | 70.15 | 80.65 | 84.83 | 85.92 | 84.68 | 86.28 | 93.00 | 95.60 | 96.36 | 77.49 |
Maze2d
| Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
|---|---|---|---|---|---|---|---|---|---|---|
| maze2d-umaze-v1 | 16.09 ± 0.87 | 22.49 ± 1.52 | 99.33 ± 16.16 | 136.61 ± 11.65 | 92.05 ± 13.66 | 50.92 ± 4.23 | 162.28 ± 1.79 | 153.12 ± 6.49 | 149.88 ± 1.97 | 63.83 ± 17.35 |
| maze2d-medium-v1 | 19.16 ± 1.24 | 27.64 ± 1.87 | 150.93 ± 3.89 | 131.50 ± 25.38 | 128.66 ± 5.44 | 122.69 ± 30.00 | 150.12 ± 4.48 | 93.80 ± 14.66 | 154.41 ± 1.58 | 68.14 ± 12.25 |
| maze2d-large-v1 | 20.75 ± 6.66 | 41.83 ± 3.64 | 197.64 ± 5.26 | 227.93 ± 1.90 | 157.51 ± 7.32 | 162.25 ± 44.18 | 197.55 ± 5.82 | 207.51 ± 0.96 | 182.52 ± 2.68 | 50.25 ± 19.34 |
| maze2d average | 18.67 | 30.65 | 149.30 | 165.35 | 126.07 | 111.95 | 169.98 | 151.48 | 162.27 | 60.74 |
Antmaze
| Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
|---|---|---|---|---|---|---|---|---|---|---|
| antmaze-umaze-v2 | 68.50 ± 2.29 | 77.50 ± 1.50 | 98.50 ± 0.87 | 78.75 ± 6.76 | 94.75 ± 0.83 | 84.00 ± 4.06 | 100.00 ± 0.00 | 0.00 ± 0.00 | 42.50 ± 28.61 | 64.50 ± 2.06 |
| antmaze-umaze-diverse-v2 | 64.75 ± 4.32 | 63.50 ± 2.18 | 71.25 ± 5.76 | 88.25 ± 2.17 | 53.75 ± 2.05 | 79.50 ± 3.35 | 96.75 ± 2.28 | 0.00 ± 0.00 | 0.00 ± 0.00 | 60.50 ± 2.29 |
| antmaze-medium-play-v2 | 4.50 ± 1.12 | 6.25 ± 2.38 | 3.75 ± 1.30 | 27.50 ± 9.39 | 80.50 ± 3.35 | 78.50 ± 3.84 | 93.50 ± 2.60 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.75 ± 0.43 |
| antmaze-medium-diverse-v2 | 4.75 ± 1.09 | 16.50 ± 5.59 | 5.50 ± 1.50 | 33.25 ± 16.81 | 71.00 ± 4.53 | 83.50 ± 1.80 | 91.75 ± 2.05 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.50 ± 0.50 |
| antmaze-large-play-v2 | 0.50 ± 0.50 | 13.50 ± 9.76 | 1.25 ± 0.43 | 1.00 ± 0.71 | 34.75 ± 5.85 | 53.50 ± 2.50 | 68.75 ± 13.90 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
| antmaze-large-diverse-v2 | 0.75 ± 0.43 | 6.25 ± 1.79 | 0.25 ± 0.43 | 0.50 ± 0.50 | 36.25 ± 3.34 | 53.00 ± 3.00 | 69.50 ± 7.26 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
| antmaze average | 23.96 | 30.58 | 30.08 | 38.21 | 61.83 | 72.00 | 86.71 | 0.00 | 7.08 | 21.04 |
Adroit
| Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
|---|---|---|---|---|---|---|---|---|---|---|
| pen-human-v1 | 99.69 ± 7.45 | 59.89 ± 8.03 | 9.95 ± 8.19 | 121.05 ± 5.47 | 58.91 ± 1.81 | 106.15 ± 10.28 | 127.28 ± 3.22 | 56.48 ± 7.17 | 35.84 ± 10.57 | 77.83 ± 2.30 |
| pen-cloned-v1 | 99.14 ± 12.27 | 83.62 ± 11.75 | 52.66 ± 6.33 | 129.66 ± 1.27 | 14.74 ± 2.31 | 114.05 ± 4.78 | 128.64 ± 7.15 | 52.69 ± 5.30 | 26.90 ± 7.85 | 71.17 ± 2.70 |
| pen-expert-v1 | 128.77 ± 5.88 | 134.36 ± 3.16 | 142.83 ± 7.72 | 162.69 ± 0.23 | 14.86 ± 4.07 | 140.01 ± 6.36 | 157.62 ± 0.26 | 116.43 ± 40.26 | 36.04 ± 4.60 | 119.49 ± 2.31 |
| door-human-v1 | 9.41 ± 4.55 | 7.00 ± 6.77 | -0.11 ± 0.06 | 19.28 ± 1.46 | 13.28 ± 2.77 | 13.52 ± 1.22 | 0.27 ± 0.43 | -0.10 ± 0.06 | 2.51 ± 2.26 | 7.36 ± 1.24 |
| door-cloned-v1 | 3.40 ± 0.95 | 10.37 ± 4.09 | -0.20 ± 0.11 | 12.61 ± 0.60 | -0.08 ± 0.13 | 9.02 ± 1.47 | 7.73 ± 6.80 | -0.21 ± 0.10 | 20.36 ± 1.11 | 11.18 ± 0.96 |
| door-expert-v1 | 105.84 ± 0.23 | 105.92 ± 0.24 | 4.49 ± 7.39 | 106.77 ± 0.24 | 59.47 ± 25.04 | 107.29 ± 0.37 | 106.78 ± 0.04 | 0.05 ± 0.02 | 109.22 ± 0.24 | 105.49 ± 0.09 |
| hammer-human-v1 | 12.61 ± 4.87 | 6.23 ± 4.79 | 2.38 ± 0.14 | 22.03 ± 8.13 | 0.30 ± 0.05 | 6.86 ± 2.38 | 1.18 ± 0.15 | 0.25 ± 0.00 | 3.49 ± 2.17 | 1.68 ± 0.11 |
| hammer-cloned-v1 | 8.90 ± 4.04 | 8.72 ± 3.28 | 0.96 ± 0.30 | 14.67 ± 1.94 | 0.32 ± 0.03 | 11.63 ± 1.70 | 48.16 ± 6.20 | 12.67 ± 15.02 | 0.27 ± 0.01 | 2.74 ± 0.22 |
| hammer-expert-v1 | 127.89 ± 0.57 | 128.15 ± 0.66 | 33.31 ± 47.65 | 129.66 ± 0.33 | 0.93 ± 1.12 | 129.76 ± 0.37 | 134.74 ± 0.30 | 91.74 ± 47.77 | 69.44 ± 47.00 | 127.39 ± 0.10 |
| relocate-human-v1 | 0.59 ± 0.27 | 0.16 ± 0.14 | -0.29 ± 0.01 | 2.09 ± 0.76 | 1.03 ± 0.20 | 1.22 ± 0.28 | 3.70 ± 2.34 | -0.18 ± 0.14 | 0.05 ± 0.02 | 0.08 ± 0.02 |
| relocate-cloned-v1 | 0.45 ± 0.31 | 0.74 ± 0.45 | -0.02 ± 0.04 | 0.94 ± 0.68 | -0.07 ± 0.02 | 1.78 ± 0.70 | 9.25 ± 2.56 | 0.10 ± 0.04 | 4.11 ± 1.39 | 0.34 ± 0.09 |
| relocate-expert-v1 | 110.31 ± 0.36 | 109.77 ± 0.60 | 0.23 ± 0.27 | 111.56 ± 0.17 | 0.03 ± 0.10 | 110.12 ± 0.82 | 111.14 ± 0.23 | -0.07 ± 0.08 | 98.32 ± 3.75 | 106.49 ± 0.30 |
| adroit average | 58.92 | 54.58 | 20.51 | 69.42 | 13.65 | 62.62 | 69.71 | 27.49 | 33.88 | 52.60 |
Offline-to-Online
Scores
| Task-Name | AWAC | CQL | IQL | SPOT | Cal-QL |
|---|---|---|---|---|---|
| antmaze-umaze-v2 | 52.75 ± 8.67 → 98.75 ± 1.09 | 94.00 ± 1.58 → 99.50 ± 0.87 | 77.00 ± 0.71 → 96.50 ± 1.12 | 91.00 ± 2.55 → 99.50 ± 0.50 | 76.75 ± 7.53 → 99.75 ± 0.43 |
| antmaze-umaze-diverse-v2 | 56.00 ± 2.74 → 0.00 ± 0.00 | 9.50 ± 9.91 → 99.00 ± 1.22 | 59.50 ± 9.55 → 63.75 ± 25.02 | 36.25 ± 2.17 → 95.00 ± 3.67 | 32.00 ± 27.79 → 98.50 ± 1.12 |
| antmaze-medium-play-v2 | 0.00 ± 0.00 → 0.00 ± 0.00 | 59.00 ± 11.18 → 97.75 ± 1.30 | 71.75 ± 2.95 → 89.75 ± 1.09 | 67.25 ± 10.47 → 97.25 ± 1.30 | 71.75 ± 3.27 → 98.75 ± 1.64 |
| antmaze-medium-diverse-v2 | 0.00 ± 0.00 → 0.00 ± 0.00 | 63.50 ± 6.84 → 97.25 ± 1.92 | 64.25 ± 1.92 → 92.25 ± 2.86 | 73.75 ± 7.29 → 94.50 ± 1.66 | 62.00 ± 4.30 → 98.25 ± 1.48 |
| antmaze-large-play-v2 | 0.00 ± 0.00 → 0.00 ± 0.00 | 28.75 ± 7.76 → 88.25 ± 2.28 | 38.50 ± 8.73 → 64.50 ± 17.04 | 31.50 ± 12.58 → 87.00 ± 3.24 | 31.75 ± 8.87 → 97.25 ± 1.79 |
| antmaze-large-diverse-v2 | 0.00 ± 0.00 → 0.00 ± 0.00 | 35.50 ± 3.64 → 91.75 ± 3.96 | 26.75 ± 3.77 → 64.25 ± 4.15 | 17.50 ± 7.26 → 81.00 ± 14.14 | 44.00 ± 8.69 → 91.50 ± 3.91 |
| antmaze average | 18.12 → 16.46 | 48.38 → 95.58 | 56.29 → 78.50 | 52.88 → 92.38 | 53.04 → 97.33 |
| pen-cloned-v1 | 88.66 ± 15.10 → 86.82 ± 11.12 | -2.76 ± 0.08 → -1.28 ± 2.16 | 84.19 ± 3.96 → 102.02 ± 20.75 | 6.19 ± 5.21 → 43.63 ± 20.09 | -2.66 ± 0.04 → -2.68 ± 0.12 |
| door-cloned-v1 | 0.93 ± 1.66 → 0.01 ± 0.00 | -0.33 ± 0.01 → -0.33 ± 0.01 | 1.19 ± 0.93 → 20.34 ± 9.32 | -0.21 ± 0.14 → 0.02 ± 0.31 | -0.33 ± 0.01 → -0.33 ± 0.01 |
| hammer-cloned-v1 | 1.80 ± 3.01 → 0.24 ± 0.04 | 0.56 ± 0.55 → 2.85 ± 4.81 | 1.35 ± 0.32 → 57.27 ± 28.49 | 3.97 ± 6.39 → 3.73 ± 4.99 | 0.25 ± 0.04 → 0.17 ± 0.17 |
| relocate-cloned-v1 | -0.04 ± 0.04 → -0.04 ± 0.01 | -0.33 ± 0.01 → -0.33 ± 0.01 | 0.04 ± 0.04 → 0.32 ± 0.38 | -0.24 ± 0.01 → -0.15 ± 0.05 | -0.31 ± 0.05 → -0.31 ± 0.04 |
| adroit average | 22.84 → 21.76 | -0.72 → 0.22 | 21.69 → 44.99 | 2.43 → 11.81 | -0.76 → -0.79 |
Regrets
| Task-Name | AWAC | CQL | IQL | SPOT | Cal-QL |
|---|---|---|---|---|---|
| antmaze-umaze-v2 | 0.04 ± 0.01 | 0.02 ± 0.00 | 0.07 ± 0.00 | 0.02 ± 0.00 | 0.01 ± 0.00 |
| antmaze-umaze-diverse-v2 | 0.88 ± 0.01 | 0.09 ± 0.01 | 0.43 ± 0.11 | 0.22 ± 0.07 | 0.05 ± 0.01 |
| antmaze-medium-play-v2 | 1.00 ± 0.00 | 0.08 ± 0.01 | 0.09 ± 0.01 | 0.06 ± 0.00 | 0.04 ± 0.01 |
| antmaze-medium-diverse-v2 | 1.00 ± 0.00 | 0.08 ± 0.00 | 0.10 ± 0.01 | 0.05 ± 0.01 | 0.04 ± 0.01 |
| antmaze-large-play-v2 | 1.00 ± 0.00 | 0.21 ± 0.02 | 0.34 ± 0.05 | 0.29 ± 0.07 | 0.13 ± 0.02 |
| antmaze-large-diverse-v2 | 1.00 ± 0.00 | 0.21 ± 0.03 | 0.41 ± 0.03 | 0.23 ± 0.08 | 0.13 ± 0.02 |
| antmaze average | 0.82 | 0.11 | 0.24 | 0.15 | 0.07 |
| pen-cloned-v1 | 0.46 ± 0.02 | 0.97 ± 0.00 | 0.37 ± 0.01 | 0.58 ± 0.02 | 0.98 ± 0.01 |
| door-cloned-v1 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.83 ± 0.03 | 0.99 ± 0.01 | 1.00 ± 0.00 |
| hammer-cloned-v1 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.65 ± 0.10 | 0.98 ± 0.01 | 1.00 ± 0.00 |
| relocate-cloned-v1 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 |
| adroit average | 0.86 | 0.99 | 0.71 | 0.89 | 0.99 |
Citing CORL
If you use CORL in your work, please use the following bibtex
bibtex@inproceedings{ tarasov2022corl, title={{CORL}: Research-oriented Deep Offline Reinforcement Learning Library}, author={Denis Tarasov and Alexander Nikulin and Dmitry Akimov and Vladislav Kurenkov and Sergey Kolesnikov}, booktitle={3rd Offline RL Workshop: Offline RL as a ''Launchpad''}, year={2022}, url={https://openreview.net/forum?id=SyAS49bBcv} }
Contributors
Showing top 8 contributors by commit count.
