# ToST **Repository Path**: yt7589/mcrt ## Basic Information - **Project Name**: ToST - **Description**: Maximum Coding Rate Transformer based on Pytorch. - **Primary Language**: Python - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-09-24 - **Last Updated**: 2025-10-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction This repo contains the implementation for ToST (Token Statistics Transformer), a linear-time architecture derived via algorithmic unrolling.

arXiv Website
## Updates - [02/05/25] Code for ToST on vision, language tasks is released! - [01/22/25] Accepted to ICLR 2025! ## Usage We have organized the implementation for vision and language tasks into the respective `tost_vision` and `tost_lang` directories. Please follow the instuctions within them. We recommend useing separate environments for these two implementations. ## Citation If you find this project helpful for your research and applications, please consider cite our work: ```bibtex @article{wu2024token, title={Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction}, author={Wu, Ziyang and Ding, Tianjiao and Lu, Yifu and Pai, Druv and Zhang, Jingyuan and Wang, Weida and Yu, Yaodong and Ma, Yi and Haeffele, Benjamin D}, journal={arXiv preprint arXiv:2412.17810}, year={2024} } ``` ## Acknowledgements - [XCiT: Cross-Covariance Image Transformer](https://github.com/facebookresearch/xcit): the code for vision is largely based on this repo. - [nanogpt](https://github.com/karpathy/nanoGPT): the code for language is mostly based on this repo.