# rl-tools **Repository Path**: null_352_6000/rl-tools ## Basic Information - **Project Name**: rl-tools - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-02-10 - **Last Updated**: 2026-02-10 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Paper on arXiv | Live demo (browser) | Documentation | Zoo | Studio

Trained on a 2020 MacBook Pro (M1) using RLtools SAC and TD3 (respectively)

Trained on a 2020 MacBook Pro (M1) using RLtools PPO/Multi-Agent PPO

Trained in 18s on a 2020 MacBook Pro (M1) using RLtools TD3

## Benchmarks

Benchmarks of training the Pendulum swing-up using different RL libraries (PPO and SAC respectively)

Benchmarks of training the Pendulum swing-up on different devices (SAC, RLtools)

Benchmarks of the inference frequency for a two-layer [64, 64] fully-connected neural network across different microcontrollers (types and architectures).

## Quick Start Clone this repo, then build a Zoo example: ``` g++ -std=c++17 -O3 -ffast-math -I include src/rl/zoo/l2f/sac.cpp ``` Run it `./a.out 1337` (number = seed) then run `./tools/serve.sh` to visualize the results. Open `http://localhost:8000` and navigate to the ExTrack UI to watch the quadrotor flying. - **macOS**: Append `-framework Accelerate -DRL_TOOLS_BACKEND_ENABLE_ACCELERATE` for fast training (~4s on M3) - **Ubuntu**: Use `apt install libopenblas-dev` and append `-lopenblas -DRL_TOOLS_BACKEND_ENABLE_OPENBLAS` (~6s on Zen 5). ## Algorithms | Algorithm | Example | |-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | **TD3** | [Pendulum](./src/rl/environments/pendulum/td3/cpu/standalone.cpp), [Racing Car](./src/rl/environments/car/car.cpp), [MuJoCo Ant-v4](./src/rl/environments/mujoco/ant/td3/training.h), [Acrobot](./src/rl/environments/acrobot/td3/acrobot.cpp) | | **PPO** | [Pendulum](./src/rl/environments/pendulum/ppo/cpu/training.cpp), [Racing Car](./src/rl/environments/car/training_ppo.h), [MuJoCo Ant-v4 (CPU)](./src/rl/environments/mujoco/ant/ppo/cpu/training.h), [MuJoCo Ant-v4 (CUDA)](./src/rl/environments/mujoco/ant/ppo/cuda/training_ppo.cu) | | **Multi-Agent PPO** | [Bottleneck](./src/rl/zoo/bottleneck-v0/ppo.h) | | **SAC** | [Pendulum (CPU)](./src/rl/environments/pendulum/sac/cpu/training.cpp), [Pendulum (CUDA)](./src/rl/environments/pendulum/sac/cuda/sac.cu), [Acrobot](./src/rl/environments/acrobot/sac/acrobot.cpp) | ## Projects Based on RLtools - Learning to Fly in Seconds: [GitHub](https://github.com/arplaboratory/learning-to-fly) / [arXiv](https://arxiv.org/abs/2311.13081) / [YouTube](https://youtu.be/NRD43ZA1D-4) / [IEEE Spectrum](https://spectrum.ieee.org/amp/drone-quadrotor-2667196800) - Data-Driven System Identification of Quadrotors Subject to Motor Delays [GitHub](https://github.com/arplaboratory/data-driven-system-identification) / [arXiv](https://arxiv.org/abs/2404.07837) / [YouTube](https://youtu.be/G3WGthRx2KE) / [Project Page](https://sysid.tools) # Getting Started > **⚠️ Note**: Check out [Getting Started](https://docs.rl.tools/getting_started.html) in the documentation for a more thorough guide To get started implementing your own environment please refer to [rl-tools/example](https://github.com/rl-tools/example) # Documentation The documentation is available at [docs.rl.tools](https://docs.rl.tools) and consists of C++ notebooks. You can also run them locally to tinker around: ``` docker run -p 8888:8888 rltools/documentation ``` After running the Docker container, open the link that is displayed in the CLI (http://127.0.0.1:8888/...) in your browser and enjoy tinkering! | Chapter | Interactive Notebook | |-----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | [Overview ](https://docs.rl.tools/overview.html) | - | | [Getting Started ](https://docs.rl.tools/getting_started.html) | - | | [Containers ](https://docs.rl.tools/01-Containers.html) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=01-Containers.ipynb) | | [Multiple Dispatch ](https://docs.rl.tools/02-Multiple%20Dispatch.html) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=02-Multiple%20Dispatch.ipynb) | | [Deep Learning ](https://docs.rl.tools/03-Deep%20Learning.html) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=03-Deep%20Learning.ipynb) | | [CPU Acceleration ](https://docs.rl.tools/04-CPU%20Acceleration.html) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=04-CPU%20Acceleration.ipynb) | | [MNIST Classification ](https://docs.rl.tools/05-MNIST%20Classification.html) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=05-MNIST%20Classification.ipynb) | | [Deep Reinforcement Learning ](https://docs.rl.tools/06-Deep%20Reinforcement%20Learning.html) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=06-Deep%20Reinforcement%20Learning.ipynb) | | [The Loop Interface ](https://docs.rl.tools/07-The%20Loop%20Interface.html) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=07-The%20Loop%20Interface.ipynb) | | [Custom Environment ](https://docs.rl.tools/08-Custom%20Environment.html) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=08-Custom%20Environment.ipynb) | | [Python Interface ](https://docs.rl.tools/09-Python%20Interface.html) | [![Run Example on Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rl-tools/documentation/blob/master/docs/09-Python%20Interface.ipynb) | [//]: # (## Content) [//]: # (- [Getting Started](#getting-started)) [//]: # ( - [Cloning the Repository](#cloning-the-repository)) [//]: # ( - [Docker](#docker)) [//]: # ( - [Native](#native)) [//]: # ( - [Unix (Linux and macOS)](#unix-linux-and-macos)) [//]: # ( - [Windows](#windows)) [//]: # (- [Embedded Platforms](#embedded-platforms)) [//]: # (- [Naming Convention](#naming-convention)) [//]: # (- [Citing](#citing)) ### Python Interface We provide Python bindings that available as `rltools` through PyPI (the pip package index). Note that using Python Gym environments can slow down the trianing significantly compared to native RLtools environments. ``` pip install rltools gymnasium ``` Usage: ``` from rltools import SAC import gymnasium as gym from gymnasium.wrappers import RescaleAction seed = 0xf00d def env_factory(): env = gym.make("Pendulum-v1") env = RescaleAction(env, -1, 1) env.reset(seed=seed) return env sac = SAC(env_factory) state = sac.State(seed) finished = False while not finished: finished = state.step() ``` You can find more details in the [Python Interface documentation](https://docs.rl.tools/09-Python%20Interface.html) and from the repository [rl-tools/python-interface](https://github.com/rl-tools/python-interface). ## Embedded Platforms ### Inference & Training - [iOS](https://github.com/rl-tools/ios) - [teensy](./embedded_platforms) ### Inference - [Crazyflie](embedded_platforms/crazyflie) - [ESP32](embedded_platforms) - [PX4](embedded_platforms) ## Naming Convention We use `snake_case` for variables/instances, functions as well as namespaces and `PascalCase` for structs/classes. Furthermore, we use upper case `SNAKE_CASE` for compile-time constants. ## Citing When using RLtools in an academic work please cite our publication using the following Bibtex citation: ``` @article{eschmann_rltools_2024, author = {Jonas Eschmann and Dario Albani and Giuseppe Loianno}, title = {RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control}, journal = {Journal of Machine Learning Research}, year = {2024}, volume = {25}, number = {301}, pages = {1--19}, url = {http://jmlr.org/papers/v25/24-0248.html} } ```