# rl-tools
**Repository Path**: null_352_6000/rl-tools
## Basic Information
- **Project Name**: rl-tools
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-02-10
- **Last Updated**: 2026-02-10
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
Paper on arXiv | Live demo (browser) | Documentation | Zoo | Studio
Trained on a 2020 MacBook Pro (M1) using RLtools SAC and TD3 (respectively)
Trained on a 2020 MacBook Pro (M1) using RLtools PPO/Multi-Agent PPO
Trained in 18s on a 2020 MacBook Pro (M1) using RLtools TD3
## Benchmarks
Benchmarks of training the Pendulum swing-up using different RL libraries (PPO and SAC respectively)
Benchmarks of training the Pendulum swing-up on different devices (SAC, RLtools)
Benchmarks of the inference frequency for a two-layer [64, 64] fully-connected neural network across different microcontrollers (types and architectures).
## Quick Start
Clone this repo, then build a Zoo example:
```
g++ -std=c++17 -O3 -ffast-math -I include src/rl/zoo/l2f/sac.cpp
```
Run it `./a.out 1337` (number = seed) then run `./tools/serve.sh` to visualize the results. Open `http://localhost:8000` and navigate to the ExTrack UI to watch the quadrotor flying.
- **macOS**: Append `-framework Accelerate -DRL_TOOLS_BACKEND_ENABLE_ACCELERATE` for fast training (~4s on M3)
- **Ubuntu**: Use `apt install libopenblas-dev` and append `-lopenblas -DRL_TOOLS_BACKEND_ENABLE_OPENBLAS` (~6s on Zen 5).
## Algorithms
| Algorithm | Example |
|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **TD3** | [Pendulum](./src/rl/environments/pendulum/td3/cpu/standalone.cpp), [Racing Car](./src/rl/environments/car/car.cpp), [MuJoCo Ant-v4](./src/rl/environments/mujoco/ant/td3/training.h), [Acrobot](./src/rl/environments/acrobot/td3/acrobot.cpp) |
| **PPO** | [Pendulum](./src/rl/environments/pendulum/ppo/cpu/training.cpp), [Racing Car](./src/rl/environments/car/training_ppo.h), [MuJoCo Ant-v4 (CPU)](./src/rl/environments/mujoco/ant/ppo/cpu/training.h), [MuJoCo Ant-v4 (CUDA)](./src/rl/environments/mujoco/ant/ppo/cuda/training_ppo.cu) |
| **Multi-Agent PPO** | [Bottleneck](./src/rl/zoo/bottleneck-v0/ppo.h) |
| **SAC** | [Pendulum (CPU)](./src/rl/environments/pendulum/sac/cpu/training.cpp), [Pendulum (CUDA)](./src/rl/environments/pendulum/sac/cuda/sac.cu), [Acrobot](./src/rl/environments/acrobot/sac/acrobot.cpp) |
## Projects Based on RLtools
- Learning to Fly in Seconds: [GitHub](https://github.com/arplaboratory/learning-to-fly) / [arXiv](https://arxiv.org/abs/2311.13081) / [YouTube](https://youtu.be/NRD43ZA1D-4) / [IEEE Spectrum](https://spectrum.ieee.org/amp/drone-quadrotor-2667196800)
- Data-Driven System Identification of Quadrotors Subject to Motor Delays [GitHub](https://github.com/arplaboratory/data-driven-system-identification) / [arXiv](https://arxiv.org/abs/2404.07837) / [YouTube](https://youtu.be/G3WGthRx2KE) / [Project Page](https://sysid.tools)
# Getting Started
> **⚠️ Note**: Check out [Getting Started](https://docs.rl.tools/getting_started.html) in the documentation for a more thorough guide
To get started implementing your own environment please refer to [rl-tools/example](https://github.com/rl-tools/example)
# Documentation
The documentation is available at [docs.rl.tools](https://docs.rl.tools) and consists of C++ notebooks. You can also run them locally to tinker around:
```
docker run -p 8888:8888 rltools/documentation
```
After running the Docker container, open the link that is displayed in the CLI (http://127.0.0.1:8888/...) in your browser and enjoy tinkering!
| Chapter | Interactive Notebook |
|-----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Overview ](https://docs.rl.tools/overview.html) | - |
| [Getting Started ](https://docs.rl.tools/getting_started.html) | - |
| [Containers ](https://docs.rl.tools/01-Containers.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=01-Containers.ipynb) |
| [Multiple Dispatch ](https://docs.rl.tools/02-Multiple%20Dispatch.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=02-Multiple%20Dispatch.ipynb) |
| [Deep Learning ](https://docs.rl.tools/03-Deep%20Learning.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=03-Deep%20Learning.ipynb) |
| [CPU Acceleration ](https://docs.rl.tools/04-CPU%20Acceleration.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=04-CPU%20Acceleration.ipynb) |
| [MNIST Classification ](https://docs.rl.tools/05-MNIST%20Classification.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=05-MNIST%20Classification.ipynb) |
| [Deep Reinforcement Learning ](https://docs.rl.tools/06-Deep%20Reinforcement%20Learning.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=06-Deep%20Reinforcement%20Learning.ipynb) |
| [The Loop Interface ](https://docs.rl.tools/07-The%20Loop%20Interface.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=07-The%20Loop%20Interface.ipynb) |
| [Custom Environment ](https://docs.rl.tools/08-Custom%20Environment.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=08-Custom%20Environment.ipynb) |
| [Python Interface ](https://docs.rl.tools/09-Python%20Interface.html) | [](https://colab.research.google.com/github/rl-tools/documentation/blob/master/docs/09-Python%20Interface.ipynb) |
[//]: # (## Content)
[//]: # (- [Getting Started](#getting-started))
[//]: # ( - [Cloning the Repository](#cloning-the-repository))
[//]: # ( - [Docker](#docker))
[//]: # ( - [Native](#native))
[//]: # ( - [Unix (Linux and macOS)](#unix-linux-and-macos))
[//]: # ( - [Windows](#windows))
[//]: # (- [Embedded Platforms](#embedded-platforms))
[//]: # (- [Naming Convention](#naming-convention))
[//]: # (- [Citing](#citing))
### Python Interface
We provide Python bindings that available as `rltools` through PyPI (the pip package index). Note that using Python Gym environments can slow down the trianing significantly compared to native RLtools environments.
```
pip install rltools gymnasium
```
Usage:
```
from rltools import SAC
import gymnasium as gym
from gymnasium.wrappers import RescaleAction
seed = 0xf00d
def env_factory():
env = gym.make("Pendulum-v1")
env = RescaleAction(env, -1, 1)
env.reset(seed=seed)
return env
sac = SAC(env_factory)
state = sac.State(seed)
finished = False
while not finished:
finished = state.step()
```
You can find more details in the [Python Interface documentation](https://docs.rl.tools/09-Python%20Interface.html) and from the repository [rl-tools/python-interface](https://github.com/rl-tools/python-interface).
## Embedded Platforms
### Inference & Training
- [iOS](https://github.com/rl-tools/ios)
- [teensy](./embedded_platforms)
### Inference
- [Crazyflie](embedded_platforms/crazyflie)
- [ESP32](embedded_platforms)
- [PX4](embedded_platforms)
## Naming Convention
We use `snake_case` for variables/instances, functions as well as namespaces and `PascalCase` for structs/classes. Furthermore, we use upper case `SNAKE_CASE` for compile-time constants.
## Citing
When using RLtools in an academic work please cite our publication using the following Bibtex citation:
```
@article{eschmann_rltools_2024,
author = {Jonas Eschmann and Dario Albani and Giuseppe Loianno},
title = {RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control},
journal = {Journal of Machine Learning Research},
year = {2024},
volume = {25},
number = {301},
pages = {1--19},
url = {http://jmlr.org/papers/v25/24-0248.html}
}
```