# Gen3R
**Repository Path**: falin1/Gen3R
## Basic Information
- **Project Name**: Gen3R
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-01-20
- **Last Updated**: 2026-01-20
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction
[Jiaxin Huang](https://jaceyhuang.github.io/), [Yuanbo Yang](https://github.com/freemty), [Bangbang Yang](https://ybbbbt.com/), [Lin Ma](https://scholar.google.com/citations?user=S4HGIIUAAAAJ&hl=en), [Yuewen Ma](https://scholar.google.com/citations?user=VG_cdLAAAAAJ), [Yiyi Liao](https://yiyiliao.github.io/)
TL;DR: Gen3R creates multi-quantity geometry with RGB from images via a unified latent space that aligns geometry and appearance.
## 🛠️ Setup
We train and test our model under the following environment:
- Debian GNU/Linux 12 (bookworm)
- NVIDIA H20 (96G)
- CUDA 12.4
- Python 3.11
- Pytorch 2.5.1+cu124
1. Clone this repository
```bash
git clone https://github.com/JaceyHuang/Gen3R
cd Gen3R
```
2. Install packages
```bash
conda create -n gen3r python=3.11.2 -y
conda activate gen3r
pip install -r requirements.txt
```
3. (**Important**) Download pretrained Gen3R checkpoint from [HuggingFace](https://huggingface.co/JaceyH919/Gen3R) to ./checkpoints
```bash
sudo apt install git-lfs
git lfs install
git clone https://huggingface.co/JaceyH919/Gen3R ./checkpoints
```
- **Note:** At present, direct loading weights from HuggingFace via `from_pretrained("JaceyH919/Gen3R")` is not supported due to module naming errors. Please download the model checkpoint **locally** and load it using `from_pretrained("./checkpoints")`.
## 🚀 Inference
Run the python script `infer.py` as follows to test the examples
```bash
python infer.py \
--pretrained_model_name_or_path ./checkpoints \
--task 2view \
--prompts examples/2-view/colosseum/prompts.txt \
--frame_path examples/2-view/colosseum/first.png examples/2-view/colosseum/last.png \
--cameras free \
--output_dir ./results \
--remove_far_points
```
Some important inference settings below:
- `--task`: `1view` for `First Frame to 3D`, `2view` for `First-last Frames to 3D`, and `allview` for `3D Reconstruction`.
- `--prompts`: the text prompt string or the path to the text prompt file.
- `--frame_path`: the path to the conditional images/video. For the `allview` task, this can be either the path to a folder containing all frames or the path to the conditional video. For the other two tasks, it should be the path to the conditional image(s).
- `--cameras`: the path to the conditional camera extrinsics and intrinsics. We also provide basic trajectories by specifying this argument as `zoom_in`, `zoom_out`, `arc_left`, `arc_right`, `translate_up` or `translate down`. In this way, we will first use [VGGT](https://github.com/facebookresearch/vggt) to estimate the initial camera intrinsics and scene scale. To disable camera conditioning, set this argument to `free`.
Note that the default resolution of our model is 560×560. If the resolution of the conditioning images or videos differs from this, we first apply resizing followed by center cropping to match the required resolution.
### More examples