# rt-pose
**Repository Path**: mirrors_qubvel/rt-pose
## Basic Information
- **Project Name**: rt-pose
- **Description**: Real-time pose estimation pipeline with 🤗 Transformers
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-02-16
- **Last Updated**: 2026-03-21
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
RT-Pose
Real-time (GPU) pose estimation pipeline with 🤗 Transformers
## Notebooks
- 🚀🚀🚀 Walkthrough for optimizations done, speeding up the pipeline 9 -> 47 FPS - [notebook](./notebooks/optimizing_pose_estimation_pipeline.ipynb)
- 🎥 Run inference on video - [notebook](./notebooks/video_inference.ipynb)
## Installation
1. [Optional] It's recommended to run with `uv` for faster installation.
First, install `uv`:
```bash
pip install uv
```
2. Install `rt_pose` (you can ignore `uv` in case you want to install with pure `pip`)
```bash
uv pip install rt-pose # with minimal dependencies
uv pip install rt-pose[demo] # with additional dependencies to run `scripts/` and `notebooks/`
```
## Quick start
- [Python snippet](#python-snippet)
- [Script to run on image](#run-pose-estimation-on-image)
- [Script to run on video](#run-pose-estimation-on-video)
### Python snippet
```python
import torch
from rt_pose import PoseEstimationPipeline
# Load pose estimation pipeline
pipeline = PoseEstimationPipeline(
object_detection_checkpoint="PekingU/rtdetr_r50vd_coco_o365",
pose_estimation_checkpoint="usyd-community/vitpose-plus-small",
device="cuda",
dtype=torch.bfloat16,
compile=False, # or True to get more speedup
)
# Run pose estimation on image
output = pipeline(image)
# output.person_boxes_xyxy (`torch.Tensor`):
# of shape `(N, 4)` with `N` boxes of detected persons on the image in (x_min, y_min, x_max, y_max) format
# output.keypoints_xy (`torch.Tensor`):
# of shape `(N, 17, 2)` with 17 keypoints per each person
# output.scores (`torch.Tensor`):
# of shape (N, 17) with corresponding scores (aka confidence) for each keypoint
# Visualize with supervision/matplotlib/opencv
# see ./scripts/run_on_image.py
```
Other object detection checkpoints on the Hub:
- [RT-DETR](https://huggingface.co/PekingU)
- [DETR](https://huggingface.co/models?other=detr)
- [YOLOS](https://huggingface.co/models?other=yolos)
Other pose estimation checkpoints on the Hub:
- [ViTPose and ViTPose++](https://huggingface.co/usyd-community)
### Run pose estimation on image
- `--input` can be URL or path
```bash
python scripts/run_on_image.py \
--input "https://res-3.cloudinary.com/dostuff-media/image/upload//w_1200,q_75,c_limit,f_auto/v1511369692/page-image-10656-892d1842-b089-4a7a-80f1-5be99b2b3454.png" \
--output "results/image.png" \
--device "cuda:0"
```
### Run pose estimation on video
- `--input` can be URL or path
- `--dtype` it's recommended to run in `bfloat16` precision to get the best precision/speed tradeoff
- `--compile` you can compile models in the pipeline to get even more speed up (x2), but compilation can be quite long, so it makes sense
to activate for long videos only.
```bash
python scripts/run_on_video.py \
--input "https://huggingface.co/datasets/qubvel-hf/assets/blob/main/rt_pose_break_dance_v1.mp4" \
--output "results/rt_pose_break_dance_v1_annotated.mp4" \
--device "cuda:0" \
--dtype bfloat16
```