# ToST
**Repository Path**: yt7589/mcrt
## Basic Information
- **Project Name**: ToST
- **Description**: Maximum Coding Rate Transformer based on Pytorch.
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-09-24
- **Last Updated**: 2025-10-06
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction
This repo contains the implementation for ToST (Token Statistics Transformer), a linear-time architecture derived via algorithmic unrolling.
## Updates
- [02/05/25] Code for ToST on vision, language tasks is released!
- [01/22/25] Accepted to ICLR 2025!
## Usage
We have organized the implementation for vision and language tasks into the respective `tost_vision` and `tost_lang` directories. Please follow the instuctions within them. We recommend useing separate environments for these two implementations.
## Citation
If you find this project helpful for your research and applications, please consider cite our work:
```bibtex
@article{wu2024token,
title={Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction},
author={Wu, Ziyang and Ding, Tianjiao and Lu, Yifu and Pai, Druv and Zhang, Jingyuan and Wang, Weida and Yu, Yaodong and Ma, Yi and Haeffele, Benjamin D},
journal={arXiv preprint arXiv:2412.17810},
year={2024}
}
```
## Acknowledgements
- [XCiT: Cross-Covariance Image Transformer](https://github.com/facebookresearch/xcit): the code for vision is largely based on this repo.
- [nanogpt](https://github.com/karpathy/nanoGPT): the code for language is mostly based on this repo.