GitPedia

CORL

High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC

From tinkoff-ai·Updated June 12, 2026·View on GitHub·
·Archived

**CORL** is a High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC The project is written primarily in Python, distributed under the Apache License 2.0 license, first published in 2022. It has gained significant community traction with 1,365 stars and 171 forks on GitHub. Key topics include: d4rl, gym, offline-reinforcement-learning, reinforcement-learning.

Latest release: v2.0Offline-to-Online support and 30 offline Datasets Covered
June 15, 2023View Changelog →

CORL (Clean Offline Reinforcement Learning)

Twitter
arXiv
<img src="https://img.shields.io/badge/license-Apache_2.0-blue">
Ruff

🧵 CORL is an Offline Reinforcement Learning library that provides high-quality and easy-to-follow single-file implementations of SOTA ORL algorithms. Each implementation is backed by a research-friendly codebase, allowing you to run or tune thousands of experiments. Heavily inspired by cleanrl for online RL, check them out too!<br/>

  • 📜 Single-file implementation
  • 📈 Benchmarked Implementation for N algorithms
  • 🖼 Weights and Biases integration

  • ⭐ If you're interested in discrete control, make sure to check out our new library — Katakomba. It provides both discrete control algorithms augmented with recurrence and an offline RL benchmark for the NetHack Learning environment.

Getting started

bash
git clone https://github.com/tinkoff-ai/CORL.git && cd CORL pip install -r requirements/requirements_dev.txt # alternatively, you could use docker docker build -t <image_name> . docker run --gpus=all -it --rm --name <container_name> <image_name>

Algorithms Implemented

AlgorithmVariants ImplementedWandb Report
Offline and Offline-to-Online
Conservative Q-Learning for Offline Reinforcement Learning <br>(CQL)offline/cql.py <br /> finetune/cql.pyOffline <br /> <br /> Offline-to-online
Accelerating Online Reinforcement Learning with Offline Datasets <br>(AWAC)offline/awac.py <br /> finetune/awac.pyOffline <br /> <br /> Offline-to-online
Offline Reinforcement Learning with Implicit Q-Learning <br>(IQL)offline/iql.py <br /> finetune/iql.pyOffline <br /> <br /> Offline-to-online
Offline-to-Online only
Supported Policy Optimization for Offline Reinforcement Learning <br>(SPOT)finetune/spot.pyOffline-to-online
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning <br>(Cal-QL)finetune/cal_ql.pyOffline-to-online
Offline only
✅ Behavioral Cloning <br>(BC)offline/any_percent_bc.pyOffline
✅ Behavioral Cloning-10% <br>(BC-10%)offline/any_percent_bc.pyOffline
A Minimalist Approach to Offline Reinforcement Learning <br>(TD3+BC)offline/td3_bc.pyOffline
Decision Transformer: Reinforcement Learning via Sequence Modeling <br>(DT)offline/dt.pyOffline
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble <br>(SAC-N)offline/sac_n.pyOffline
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble <br>(EDAC)offline/edac.pyOffline
Revisiting the Minimalist Approach to Offline Reinforcement Learning <br>(ReBRAC)offline/rebrac.pyOffline
Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size <br>(LB-SAC)offline/lb_sac.pyOffline Gym-MuJoCo

D4RL Benchmarks

You can check the links above for learning curves and details. Here, we report reproduced final and best scores. Note that they differ by a significant margin, and some papers may use different approaches, not making it always explicit which reporting methodology they chose. If you want to re-collect our results in a more structured/nuanced manner, see results.

Offline

Last Scores

Gym-MuJoCo
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
halfcheetah-medium-v242.40 ± 0.1942.46 ± 0.7048.10 ± 0.1849.46 ± 0.6247.04 ± 0.2248.31 ± 0.2264.04 ± 0.6868.20 ± 1.2867.70 ± 1.0442.20 ± 0.26
halfcheetah-medium-replay-v235.66 ± 2.3323.59 ± 6.9544.84 ± 0.5944.70 ± 0.6945.04 ± 0.2744.46 ± 0.2251.18 ± 0.3160.70 ± 1.0162.06 ± 1.1038.91 ± 0.50
halfcheetah-medium-expert-v255.95 ± 7.3590.10 ± 2.4590.78 ± 6.0493.62 ± 0.4195.63 ± 0.4294.74 ± 0.52103.80 ± 2.9598.96 ± 9.31104.76 ± 0.6491.55 ± 0.95
hopper-medium-v253.51 ± 1.7655.48 ± 7.3060.37 ± 3.4974.45 ± 9.1459.08 ± 3.7767.53 ± 3.78102.29 ± 0.1740.82 ± 9.91101.70 ± 0.2865.10 ± 1.61
hopper-medium-replay-v229.81 ± 2.0770.42 ± 8.6664.42 ± 21.5296.39 ± 5.2895.11 ± 5.2797.43 ± 6.3994.98 ± 6.53100.33 ± 0.7899.66 ± 0.8181.77 ± 6.87
hopper-medium-expert-v252.30 ± 4.01111.16 ± 1.03101.17 ± 9.0752.73 ± 37.4799.26 ± 10.91107.42 ± 7.80109.45 ± 2.34101.31 ± 11.63105.19 ± 10.08110.44 ± 0.33
walker2d-medium-v263.23 ± 16.2467.34 ± 5.1782.71 ± 4.7866.53 ± 26.0480.75 ± 3.2880.91 ± 3.1785.82 ± 0.7787.47 ± 0.6693.36 ± 1.3867.63 ± 2.54
walker2d-medium-replay-v221.80 ± 10.1554.35 ± 6.3485.62 ± 4.0182.20 ± 1.0573.09 ± 13.2282.15 ± 3.0384.25 ± 2.2578.99 ± 0.5087.10 ± 2.7859.86 ± 2.73
walker2d-medium-expert-v298.96 ± 15.98108.70 ± 0.25110.03 ± 0.3649.41 ± 38.16109.56 ± 0.39111.72 ± 0.86111.86 ± 0.43114.93 ± 0.41114.75 ± 0.74107.11 ± 0.96
locomotion average50.4069.2976.4567.7278.2881.6389.7483.5292.9273.84
Maze2d
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
maze2d-umaze-v10.36 ± 8.6912.18 ± 4.2929.41 ± 12.3182.67 ± 28.30-8.90 ± 6.1142.11 ± 0.58106.87 ± 22.16130.59 ± 16.5295.26 ± 6.3918.08 ± 25.42
maze2d-medium-v10.79 ± 3.2514.25 ± 2.3359.45 ± 36.2552.88 ± 55.1286.11 ± 9.6834.85 ± 2.72105.11 ± 31.6788.61 ± 18.7257.04 ± 3.4531.71 ± 26.33
maze2d-large-v12.26 ± 4.3911.32 ± 5.1097.10 ± 25.41209.13 ± 8.1923.75 ± 36.7061.72 ± 3.5078.33 ± 61.77204.76 ± 1.1995.60 ± 22.9235.66 ± 28.20
maze2d average1.1312.5861.99114.8933.6546.2396.77141.3282.6428.48
Antmaze
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
antmaze-umaze-v255.25 ± 4.1565.75 ± 5.2670.75 ± 39.1857.75 ± 10.2892.75 ± 1.9277.00 ± 5.5297.75 ± 1.480.00 ± 0.000.00 ± 0.0057.00 ± 9.82
antmaze-umaze-diverse-v247.25 ± 4.0944.00 ± 1.0044.75 ± 11.6158.00 ± 7.6837.25 ± 3.7054.25 ± 5.5483.50 ± 7.020.00 ± 0.000.00 ± 0.0051.75 ± 0.43
antmaze-medium-play-v20.00 ± 0.002.00 ± 0.710.25 ± 0.430.00 ± 0.0065.75 ± 11.6165.75 ± 11.7189.50 ± 3.350.00 ± 0.000.00 ± 0.000.00 ± 0.00
antmaze-medium-diverse-v20.75 ± 0.835.75 ± 9.390.25 ± 0.430.00 ± 0.0067.25 ± 3.5673.75 ± 5.4583.50 ± 8.200.00 ± 0.000.00 ± 0.000.00 ± 0.00
antmaze-large-play-v20.00 ± 0.000.00 ± 0.000.00 ± 0.000.00 ± 0.0020.75 ± 7.2642.00 ± 4.5352.25 ± 29.010.00 ± 0.000.00 ± 0.000.00 ± 0.00
antmaze-large-diverse-v20.00 ± 0.000.75 ± 0.830.00 ± 0.000.00 ± 0.0020.50 ± 13.2430.25 ± 3.6364.00 ± 5.430.00 ± 0.000.00 ± 0.000.00 ± 0.00
antmaze average17.2119.7119.3319.2950.7157.1778.420.000.0018.12
Adroit
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
pen-human-v171.03 ± 6.2626.99 ± 9.60-3.88 ± 0.2181.12 ± 13.4713.71 ± 16.9878.49 ± 8.21103.16 ± 8.496.86 ± 5.935.07 ± 6.1667.68 ± 5.48
pen-cloned-v151.92 ± 15.1546.67 ± 14.255.13 ± 5.2889.56 ± 15.571.04 ± 6.6283.42 ± 8.19102.79 ± 7.8431.35 ± 2.1412.02 ± 1.7564.43 ± 1.43
pen-expert-v1109.65 ± 7.28114.96 ± 2.96122.53 ± 21.27160.37 ± 1.21-1.41 ± 2.34128.05 ± 9.21152.16 ± 6.3387.11 ± 48.95-1.55 ± 0.81116.38 ± 1.27
door-human-v12.34 ± 4.00-0.13 ± 0.07-0.33 ± 0.014.60 ± 1.905.53 ± 1.313.26 ± 1.83-0.10 ± 0.01-0.38 ± 0.00-0.12 ± 0.134.44 ± 0.87
door-cloned-v1-0.09 ± 0.030.29 ± 0.59-0.34 ± 0.010.93 ± 1.66-0.33 ± 0.013.07 ± 1.750.06 ± 0.05-0.33 ± 0.002.66 ± 2.317.64 ± 3.26
door-expert-v1105.35 ± 0.09104.04 ± 1.46-0.33 ± 0.01104.85 ± 0.24-0.32 ± 0.02106.65 ± 0.25106.37 ± 0.29-0.33 ± 0.00106.29 ± 1.73104.87 ± 0.39
hammer-human-v13.03 ± 3.39-0.19 ± 0.021.02 ± 0.243.37 ± 1.930.14 ± 0.111.79 ± 0.800.24 ± 0.240.24 ± 0.000.28 ± 0.181.28 ± 0.15
hammer-cloned-v10.55 ± 0.160.12 ± 0.080.25 ± 0.010.21 ± 0.240.30 ± 0.011.50 ± 0.695.00 ± 3.750.14 ± 0.090.19 ± 0.071.82 ± 0.55
hammer-expert-v1126.78 ± 0.64121.75 ± 7.673.11 ± 0.03127.06 ± 0.290.26 ± 0.01128.68 ± 0.33133.62 ± 0.2725.13 ± 43.2528.52 ± 49.00117.45 ± 6.65
relocate-human-v10.04 ± 0.03-0.14 ± 0.08-0.29 ± 0.010.05 ± 0.030.06 ± 0.030.12 ± 0.040.16 ± 0.30-0.31 ± 0.01-0.17 ± 0.170.05 ± 0.01
relocate-cloned-v1-0.06 ± 0.01-0.00 ± 0.02-0.30 ± 0.01-0.04 ± 0.04-0.29 ± 0.010.04 ± 0.011.66 ± 2.59-0.01 ± 0.100.17 ± 0.350.16 ± 0.09
relocate-expert-v1107.58 ± 1.2097.90 ± 5.21-1.73 ± 0.96108.87 ± 0.85-0.30 ± 0.02106.11 ± 4.02107.52 ± 2.28-0.36 ± 0.0071.94 ± 18.37104.28 ± 0.42
adroit average48.1842.6910.4056.751.5353.4359.3912.4318.7849.21

Best Scores

Gym-MuJoCo
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
halfcheetah-medium-v243.60 ± 0.1443.90 ± 0.1348.93 ± 0.1150.06 ± 0.5047.62 ± 0.0348.84 ± 0.0765.62 ± 0.4672.21 ± 0.3169.72 ± 0.9242.73 ± 0.10
halfcheetah-medium-replay-v240.52 ± 0.1942.27 ± 0.4645.84 ± 0.2646.35 ± 0.2946.43 ± 0.1945.35 ± 0.0852.22 ± 0.3167.29 ± 0.3466.55 ± 1.0540.31 ± 0.28
halfcheetah-medium-expert-v279.69 ± 3.1094.11 ± 0.2296.59 ± 0.8796.11 ± 0.3797.04 ± 0.1795.38 ± 0.17108.89 ± 1.20111.73 ± 0.47110.62 ± 1.0493.40 ± 0.21
hopper-medium-v269.04 ± 2.9073.84 ± 0.3770.44 ± 1.1897.90 ± 0.5670.80 ± 1.9880.46 ± 3.09103.19 ± 0.16101.79 ± 0.20103.26 ± 0.1469.42 ± 3.64
hopper-medium-replay-v268.88 ± 10.3390.57 ± 2.0798.12 ± 1.16100.91 ± 1.50101.63 ± 0.55102.69 ± 0.96102.57 ± 0.45103.83 ± 0.53103.28 ± 0.4988.74 ± 3.02
hopper-medium-expert-v290.63 ± 10.98113.13 ± 0.16113.22 ± 0.43103.82 ± 12.81112.84 ± 0.66113.18 ± 0.38113.16 ± 0.43111.24 ± 0.15111.80 ± 0.11111.18 ± 0.21
walker2d-medium-v280.64 ± 0.9182.05 ± 0.9386.91 ± 0.2883.37 ± 2.8284.77 ± 0.2087.58 ± 0.4887.79 ± 0.1990.17 ± 0.5495.78 ± 1.0774.70 ± 0.56
walker2d-medium-replay-v248.41 ± 7.6176.09 ± 0.4091.17 ± 0.7286.51 ± 1.1589.39 ± 0.8889.94 ± 0.9391.11 ± 0.6385.18 ± 1.6389.69 ± 1.3968.22 ± 1.20
walker2d-medium-expert-v2109.95 ± 0.62109.90 ± 0.09112.21 ± 0.06108.28 ± 9.45111.63 ± 0.38113.06 ± 0.53112.49 ± 0.18116.93 ± 0.42116.52 ± 0.75108.71 ± 0.34
locomotion average70.1580.6584.8385.9284.6886.2893.0095.6096.3677.49
Maze2d
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
maze2d-umaze-v116.09 ± 0.8722.49 ± 1.5299.33 ± 16.16136.61 ± 11.6592.05 ± 13.6650.92 ± 4.23162.28 ± 1.79153.12 ± 6.49149.88 ± 1.9763.83 ± 17.35
maze2d-medium-v119.16 ± 1.2427.64 ± 1.87150.93 ± 3.89131.50 ± 25.38128.66 ± 5.44122.69 ± 30.00150.12 ± 4.4893.80 ± 14.66154.41 ± 1.5868.14 ± 12.25
maze2d-large-v120.75 ± 6.6641.83 ± 3.64197.64 ± 5.26227.93 ± 1.90157.51 ± 7.32162.25 ± 44.18197.55 ± 5.82207.51 ± 0.96182.52 ± 2.6850.25 ± 19.34
maze2d average18.6730.65149.30165.35126.07111.95169.98151.48162.2760.74
Antmaze
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
antmaze-umaze-v268.50 ± 2.2977.50 ± 1.5098.50 ± 0.8778.75 ± 6.7694.75 ± 0.8384.00 ± 4.06100.00 ± 0.000.00 ± 0.0042.50 ± 28.6164.50 ± 2.06
antmaze-umaze-diverse-v264.75 ± 4.3263.50 ± 2.1871.25 ± 5.7688.25 ± 2.1753.75 ± 2.0579.50 ± 3.3596.75 ± 2.280.00 ± 0.000.00 ± 0.0060.50 ± 2.29
antmaze-medium-play-v24.50 ± 1.126.25 ± 2.383.75 ± 1.3027.50 ± 9.3980.50 ± 3.3578.50 ± 3.8493.50 ± 2.600.00 ± 0.000.00 ± 0.000.75 ± 0.43
antmaze-medium-diverse-v24.75 ± 1.0916.50 ± 5.595.50 ± 1.5033.25 ± 16.8171.00 ± 4.5383.50 ± 1.8091.75 ± 2.050.00 ± 0.000.00 ± 0.000.50 ± 0.50
antmaze-large-play-v20.50 ± 0.5013.50 ± 9.761.25 ± 0.431.00 ± 0.7134.75 ± 5.8553.50 ± 2.5068.75 ± 13.900.00 ± 0.000.00 ± 0.000.00 ± 0.00
antmaze-large-diverse-v20.75 ± 0.436.25 ± 1.790.25 ± 0.430.50 ± 0.5036.25 ± 3.3453.00 ± 3.0069.50 ± 7.260.00 ± 0.000.00 ± 0.000.00 ± 0.00
antmaze average23.9630.5830.0838.2161.8372.0086.710.007.0821.04
Adroit
Task-NameBC10% BCTD3+BCAWACCQLIQLReBRACSAC-NEDACDT
pen-human-v199.69 ± 7.4559.89 ± 8.039.95 ± 8.19121.05 ± 5.4758.91 ± 1.81106.15 ± 10.28127.28 ± 3.2256.48 ± 7.1735.84 ± 10.5777.83 ± 2.30
pen-cloned-v199.14 ± 12.2783.62 ± 11.7552.66 ± 6.33129.66 ± 1.2714.74 ± 2.31114.05 ± 4.78128.64 ± 7.1552.69 ± 5.3026.90 ± 7.8571.17 ± 2.70
pen-expert-v1128.77 ± 5.88134.36 ± 3.16142.83 ± 7.72162.69 ± 0.2314.86 ± 4.07140.01 ± 6.36157.62 ± 0.26116.43 ± 40.2636.04 ± 4.60119.49 ± 2.31
door-human-v19.41 ± 4.557.00 ± 6.77-0.11 ± 0.0619.28 ± 1.4613.28 ± 2.7713.52 ± 1.220.27 ± 0.43-0.10 ± 0.062.51 ± 2.267.36 ± 1.24
door-cloned-v13.40 ± 0.9510.37 ± 4.09-0.20 ± 0.1112.61 ± 0.60-0.08 ± 0.139.02 ± 1.477.73 ± 6.80-0.21 ± 0.1020.36 ± 1.1111.18 ± 0.96
door-expert-v1105.84 ± 0.23105.92 ± 0.244.49 ± 7.39106.77 ± 0.2459.47 ± 25.04107.29 ± 0.37106.78 ± 0.040.05 ± 0.02109.22 ± 0.24105.49 ± 0.09
hammer-human-v112.61 ± 4.876.23 ± 4.792.38 ± 0.1422.03 ± 8.130.30 ± 0.056.86 ± 2.381.18 ± 0.150.25 ± 0.003.49 ± 2.171.68 ± 0.11
hammer-cloned-v18.90 ± 4.048.72 ± 3.280.96 ± 0.3014.67 ± 1.940.32 ± 0.0311.63 ± 1.7048.16 ± 6.2012.67 ± 15.020.27 ± 0.012.74 ± 0.22
hammer-expert-v1127.89 ± 0.57128.15 ± 0.6633.31 ± 47.65129.66 ± 0.330.93 ± 1.12129.76 ± 0.37134.74 ± 0.3091.74 ± 47.7769.44 ± 47.00127.39 ± 0.10
relocate-human-v10.59 ± 0.270.16 ± 0.14-0.29 ± 0.012.09 ± 0.761.03 ± 0.201.22 ± 0.283.70 ± 2.34-0.18 ± 0.140.05 ± 0.020.08 ± 0.02
relocate-cloned-v10.45 ± 0.310.74 ± 0.45-0.02 ± 0.040.94 ± 0.68-0.07 ± 0.021.78 ± 0.709.25 ± 2.560.10 ± 0.044.11 ± 1.390.34 ± 0.09
relocate-expert-v1110.31 ± 0.36109.77 ± 0.600.23 ± 0.27111.56 ± 0.170.03 ± 0.10110.12 ± 0.82111.14 ± 0.23-0.07 ± 0.0898.32 ± 3.75106.49 ± 0.30
adroit average58.9254.5820.5169.4213.6562.6269.7127.4933.8852.60

Offline-to-Online

Scores

Task-NameAWACCQLIQLSPOTCal-QL
antmaze-umaze-v252.75 ± 8.67 → 98.75 ± 1.0994.00 ± 1.58 → 99.50 ± 0.8777.00 ± 0.71 → 96.50 ± 1.1291.00 ± 2.55 → 99.50 ± 0.5076.75 ± 7.53 → 99.75 ± 0.43
antmaze-umaze-diverse-v256.00 ± 2.74 → 0.00 ± 0.009.50 ± 9.91 → 99.00 ± 1.2259.50 ± 9.55 → 63.75 ± 25.0236.25 ± 2.17 → 95.00 ± 3.6732.00 ± 27.79 → 98.50 ± 1.12
antmaze-medium-play-v20.00 ± 0.00 → 0.00 ± 0.0059.00 ± 11.18 → 97.75 ± 1.3071.75 ± 2.95 → 89.75 ± 1.0967.25 ± 10.47 → 97.25 ± 1.3071.75 ± 3.27 → 98.75 ± 1.64
antmaze-medium-diverse-v20.00 ± 0.00 → 0.00 ± 0.0063.50 ± 6.84 → 97.25 ± 1.9264.25 ± 1.92 → 92.25 ± 2.8673.75 ± 7.29 → 94.50 ± 1.6662.00 ± 4.30 → 98.25 ± 1.48
antmaze-large-play-v20.00 ± 0.00 → 0.00 ± 0.0028.75 ± 7.76 → 88.25 ± 2.2838.50 ± 8.73 → 64.50 ± 17.0431.50 ± 12.58 → 87.00 ± 3.2431.75 ± 8.87 → 97.25 ± 1.79
antmaze-large-diverse-v20.00 ± 0.00 → 0.00 ± 0.0035.50 ± 3.64 → 91.75 ± 3.9626.75 ± 3.77 → 64.25 ± 4.1517.50 ± 7.26 → 81.00 ± 14.1444.00 ± 8.69 → 91.50 ± 3.91
antmaze average18.12 → 16.4648.38 → 95.5856.29 → 78.5052.88 → 92.3853.04 → 97.33
pen-cloned-v188.66 ± 15.10 → 86.82 ± 11.12-2.76 ± 0.08 → -1.28 ± 2.1684.19 ± 3.96 → 102.02 ± 20.756.19 ± 5.21 → 43.63 ± 20.09-2.66 ± 0.04 → -2.68 ± 0.12
door-cloned-v10.93 ± 1.66 → 0.01 ± 0.00-0.33 ± 0.01 → -0.33 ± 0.011.19 ± 0.93 → 20.34 ± 9.32-0.21 ± 0.14 → 0.02 ± 0.31-0.33 ± 0.01 → -0.33 ± 0.01
hammer-cloned-v11.80 ± 3.01 → 0.24 ± 0.040.56 ± 0.55 → 2.85 ± 4.811.35 ± 0.32 → 57.27 ± 28.493.97 ± 6.39 → 3.73 ± 4.990.25 ± 0.04 → 0.17 ± 0.17
relocate-cloned-v1-0.04 ± 0.04 → -0.04 ± 0.01-0.33 ± 0.01 → -0.33 ± 0.010.04 ± 0.04 → 0.32 ± 0.38-0.24 ± 0.01 → -0.15 ± 0.05-0.31 ± 0.05 → -0.31 ± 0.04
adroit average22.84 → 21.76-0.72 → 0.2221.69 → 44.992.43 → 11.81-0.76 → -0.79

Regrets

Task-NameAWACCQLIQLSPOTCal-QL
antmaze-umaze-v20.04 ± 0.010.02 ± 0.000.07 ± 0.000.02 ± 0.000.01 ± 0.00
antmaze-umaze-diverse-v20.88 ± 0.010.09 ± 0.010.43 ± 0.110.22 ± 0.070.05 ± 0.01
antmaze-medium-play-v21.00 ± 0.000.08 ± 0.010.09 ± 0.010.06 ± 0.000.04 ± 0.01
antmaze-medium-diverse-v21.00 ± 0.000.08 ± 0.000.10 ± 0.010.05 ± 0.010.04 ± 0.01
antmaze-large-play-v21.00 ± 0.000.21 ± 0.020.34 ± 0.050.29 ± 0.070.13 ± 0.02
antmaze-large-diverse-v21.00 ± 0.000.21 ± 0.030.41 ± 0.030.23 ± 0.080.13 ± 0.02
antmaze average0.820.110.240.150.07
pen-cloned-v10.46 ± 0.020.97 ± 0.000.37 ± 0.010.58 ± 0.020.98 ± 0.01
door-cloned-v11.00 ± 0.001.00 ± 0.000.83 ± 0.030.99 ± 0.011.00 ± 0.00
hammer-cloned-v11.00 ± 0.001.00 ± 0.000.65 ± 0.100.98 ± 0.011.00 ± 0.00
relocate-cloned-v11.00 ± 0.001.00 ± 0.001.00 ± 0.001.00 ± 0.001.00 ± 0.00
adroit average0.860.990.710.890.99

Citing CORL

If you use CORL in your work, please use the following bibtex

bibtex
@inproceedings{ tarasov2022corl, title={{CORL}: Research-oriented Deep Offline Reinforcement Learning Library}, author={Denis Tarasov and Alexander Nikulin and Dmitry Akimov and Vladislav Kurenkov and Sergey Kolesnikov}, booktitle={3rd Offline RL Workshop: Offline RL as a ''Launchpad''}, year={2022}, url={https://openreview.net/forum?id=SyAS49bBcv} }

Contributors

Showing top 8 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from tinkoff-ai/CORL via the GitHub API.Last fetched: 6/17/2026