TA 关注的仓库 - 大师兄 (daidai258)

关注的仓库(14)

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

最近更新: 4个月前

Watch 大师兄/Sparse-VideoGen

[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention

最近更新: 4个月前

Watch 大师兄/spdlog

最近更新: 4个月前

Watch 大师兄/pybind11

最近更新: 4个月前

Watch 大师兄/flash-attention

Fast and memory-efficient exact attention

最近更新: 4个月前

Watch 大师兄/composable_kernel

最近更新: 4个月前

Watch 大师兄/pypatoh

最近更新: 4个月前

Watch 大师兄/apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

最近更新: 4个月前

Watch 大师兄/InternEvo

最近更新: 4个月前

大师兄

Watch 大师兄/pbs-attn

Watch 大师兄/MInference

Watch 大师兄/Sparse-VideoGen

Watch 大师兄/spdlog

Watch 大师兄/pybind11

Watch 大师兄/flash-attention

Watch 大师兄/composable_kernel

Watch 大师兄/pypatoh

Watch 大师兄/apex

Watch 大师兄/InternEvo

大师兄

Watch 大师兄/pbs-attn

Watch 大师兄/MInference

Watch 大师兄/Sparse-VideoGen

Watch 大师兄/spdlog

Watch 大师兄/pybind11

Watch 大师兄/flash-attention

Watch 大师兄/composable_kernel

Watch 大师兄/pypatoh

Watch 大师兄/apex

Watch 大师兄/InternEvo

搜索帮助