# SAT **Repository Path**: xana/SAT ## Basic Information - **Project Name**: SAT - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-09 - **Last Updated**: 2026-01-09 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # SAT [![arXiv](https://img.shields.io/badge/arXiv-Paper-b31b1b.svg?logo=arxiv)](https://arxiv.org/abs/2312.17183) [![HF](https://img.shields.io/badge/Hugging%20Face-Model-yellow)](https://huggingface.co/zzh99/SAT) [![Dropbox](https://img.shields.io/badge/Dropbox-Model%20-blue?logo=dropbox)](https://www.dropbox.com/scl/fo/922fefjab8fp9j5czrqxo/AGU0eCBC-SLrO8BnsIzrQIg?rlkey=gddj22sfcpu5rr9vlzj3a2jmq&st=uzim2ow3&dl=0) [![SATDS](https://img.shields.io/badge/GitHub-Data-green?logo=github)](https://github.com/zhaoziheng/SAT-DS) This is the official repository for "Large-Vocabulary Segmentation for Medical Images with Text Prompts" 🚀 It's a knowledge-enhanced universal segmentation model built upon an unprecedented data collection (72 public 3D medical segmentation datasets), which can segment 497 classes from 3 different modalities (MR, CT, PET) and 8 human body regions, prompted by text (anatomical terminology). ![Example Figure](docs/resources/new_teaser.png) It can be powerful and more efficient than training and deploying a series of specialist models. Find more on our [paper](https://arxiv.org/abs/2312.17183). ![Example Figure](docs/resources/radar_v3.png) ## Latest News: - 2025.03 📢 SAT is one of the baseline method for [CVPR 2025: FOUNDATION MODELS FOR TEXT-GUIDED 3D BIOMEDICAL IMAGE SEGMENTATION](https://www.codabench.org/competitions/5651/). Check our latest [branch](https://github.com/zhaoziheng/SAT/tree/cvpr2025challenge). - 2025.03 📢 We released the code adn knowledge data for knowledge pre-training in SAT. Check this [repo](https://github.com/zhaoziheng/SAT-Pretrain/tree/master). ## Requirements The implementation of U-Net relies on a customized version of [dynamic-network-architectures](https://github.com/MIC-DKFZ/dynamic-network-architectures), to install it: ``` cd model pip install -e dynamic-network-architectures-main ``` Some other key requirements: ``` torch>=1.10.0 numpy==1.21.5 monai==1.1.0 transformers==4.21.3 nibabel==4.0.2 einops==0.6.1 positional_encodings==6.0.1 ``` You also need to install `mamba_ssm` if you want the U-Mamba variant of SAT-Nano ## Inference Guidance (Command Line): - S1. Build the environment following `requirements.txt`. - S2. Download checkpoint of SAT and Text Encoder from [huggingface](https://huggingface.co/zzh99/SAT). - S3. Prepare the data in a jsonl file. Check the demo in `data/inference_demo/demo.jsonl`. 1. `image`(path to image), `label`(name of segmentation targets in a list), `dataset`(which dataset the sample belongs to) and `modality`(ct, mri or pet) are needed for each sample to segment. Modalities and classes that SAT supports can be found in in Table 12 of the paper. 2. `orientation_code`(orientation) is `RAS` by default, which suits most images in axial plane. For images in sagittal plane (for instance, spine examination), set this to `ASR`. The input image should be with shape `H,W,D` Our data process code will normalize the input image in terms of orientation, intensity, spacing and so on. Two successfully processed images can be found in `demo\processed_data`, make sure the normalization is done correctly to guarantee the performance of SAT. - S4. Start the inference with SAT-Pro 🕶: ``` torchrun \ --nproc_per_node=1 \ --master_port 1234 \ inference.py \ --rcd_dir 'demo/inference_demo/results' \ --datasets_jsonl 'demo/inference_demo/demo.jsonl' \ --vision_backbone 'UNET-L' \ --checkpoint 'path to SAT-Pro checkpoint' \ --text_encoder 'ours' \ --text_encoder_checkpoint 'path to Text encoder checkpoint' \ --max_queries 256 \ --batchsize_3d 2 ``` ⚠️ NOTE: `--batchsize_3d` is the batch size of input image patches, and need to be adjusted based on the gpu memory (check the table below); `--max_queries` is recommended to set larger than the classes in the inference dataset, unless your gpu memory is very limited; | Model | batchsize_3d | GPU Memory | |---|---|---| | SAT-Pro | 1 | ~ 34GB | | SAT-Pro | 2 | ~ 62GB | | SAT-Nano | 1 | ~ 24GB | | SAT-Nano | 2 | ~ 36GB | - S5. Check `--rcd_dir` for outputs. Results are organized by datasets. For each case, the input image, aggregated segmentation result and a folder containing segmentations of each class will be found. All outputs are stored as nifiti files. You can visualize them using the [ITK-SNAP](http://www.itksnap.org/pmwiki/pmwiki.php). - If you want to use SAT-Nano trained on 72 datasets, just modify `--vision_backbone` to 'UNET', and change the `--checkpoint` and `--text_encoder_checkpoint` accordingly. - For other SAT-Nano variants (trained on 49 datasets): UNET-Ours: set `--vision_backbone 'UNET'` and `--text_encoder 'ours'`; UNET-CPT: set `--vision_backbone 'UNET'` and `--text_encoder 'medcpt'`; UNET-BB: set `--vision_backbone 'UNET'` and `--text_encoder 'basebert'`; UMamba-CPT: set `--vision_backbone 'UMamba'` and `--text_encoder 'medcpt'`; SwinUNETR-CPT: set `--vision_backbone 'SwinUNETR'` and `--text_encoder 'medcpt'`; ## Train Guidance: Some preparation before start the training: 1. You need to build your training data following this [repo](https://github.com/zhaoziheng/SAT-DS/tree/main), specifically, from step 1 to step 5. A jsonl containing all the training samples is required. 2. You need to fetch the pre-trained text encoder checkpoint from https://huggingface.co/zzh99/SAT to generate prompts. If you want to re-do the knowledge enhancement pre-training from scratch, you may refer to this [repo](https://github.com/zhaoziheng/SAT-Pretrain/tree/master). Our recommendation for training SAT-Nano is 8 or more A100-80G, for SAT-Pro is 16 or more A100-80G. You can of course modify the `crop_size` or other hyperparameters to reduce computational consumption and requirement. Use the slurm script in `sh/` for a reference to start the training process. Take SAT-Pro for example: ``` sbatch sh/train_sat_pro.sh ``` ## Evaluation Guidance: This also requires to build test data following this [repo](https://github.com/zhaoziheng/SAT-DS/tree/main). You may refer to the slurm script `sh/evaluate_sat_pro.sh` to start the evaluation process: ``` sbatch sh/evaluate_sat_pro.sh ``` ## Baselines We provide the detailed configurations of all the specialist models (nnU-Nets, U-Mambas, SwinUNETR) we have trained and evaluated [here](https://github.com/zhaoziheng/SAT-DS/blob/main/data/specialist_model_config). ## Citation If you use this code for your research or project, please cite: ``` @article{zhao2025large, title={Large-vocabulary segmentation for medical images with text prompts}, author={Zhao, Ziheng and Zhang, Yao and Wu, Chaoyi and Zhang, Xiaoman and Zhou, Xiao and Zhang, Ya and Wang, Yanfeng and Xie, Weidi}, journal={NPJ Digital Medicine}, volume={8}, number={1}, pages={566}, year={2025}, publisher={Nature Publishing Group UK London} } ```