# speculators **Repository Path**: li256785/speculators ## Basic Information - **Project Name**: speculators - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-19 - **Last Updated**: 2026-01-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

[![License](https://img.shields.io/github/license/vllm-project/speculators.svg)](https://github.com/vllm-project/speculators/blob/main/LICENSE) [![Python Versions](https://img.shields.io/badge/Python-3.10--3.13-orange)](https://pypi.org/project/speculators/) [![docs](https://img.shields.io/badge/docs-Speculators-blue)](https://docs.vllm.ai/projects/speculators/en/latest/) [![PyPI](https://img.shields.io/pypi/v/speculators.svg)](https://pypi.org/project/speculators/) [![tests](https://github.com/vllm-project/speculators/actions/workflows/main.yml/badge.svg)](https://github.com/vllm-project/speculators/actions/workflows/main.yml)

## Overview Speculators is a unified library for building, training and storing speculative decoding algorithms for large language model (LLM) inference, including in frameworks like vLLM. Speculative decoding is a lossless technique that speeds up LLM inference by using a smaller, faster draft model (i.e "the speculator") to propose tokens, which are then verified by the larger base model, reducing latency without compromising output quality. The speculator intelligently drafts multiple tokens ahead of time, and the base model verifies them in a single forward pass. This approach boosts performance without sacrificing output quality, as every accepted token is guaranteed to match what the main model would have generated on its own. Speculators standardizes this process by providing a productionized end-to-end framework to train draft models with reusable formats and tools. Trained models can seamlessly run in vLLM, enabling the deployment of speculative decoding in production-grade inference servers.

Speculators user flow diagram

______________________________________________________________________ 💬 Join us on the [vLLM Community Slack](https://communityinviter.com/apps/vllm-dev/join-vllm-developers-slack) and share your questions, thoughts, or ideas in: - `#speculators` - `#feat-spec-decode` ______________________________________________________________________ ## Key Features - **Offline Training Data Generation using vLLM:** Enable the generation of hidden states using vLLM. Data samples are saved to disk and can be used for draft model training. - **Draft Model Training Support:** E2E training support of single and multi-layer draft models. Training is supported for both non-MoE and MoE models. VL Training is coming soon. - **Standardized, Extensible Format:** Provides a Hugging Face-compatible format for defining speculative models, with tools to convert from external research repositories into a standard speculators format for easy adoption. - **Seamless vLLM Integration:** Built for direct deployment into vLLM, enabling low-latency, production-grade inference with minimal overhead. > [!TIP] > Read more about Speculators features in this [vLLM blog post](https://blog.vllm.ai/2025/12/13/speculators-v030.html). ## Supported Models The following table summarizes the models that have been trained end-to-end by our team as well as others in the roadmap:

Verifier Architecture	Verifier Size	Training Support	vLLM Deployment Support
Llama	8B-Instruct	EAGLE-3 ✅	✅
	70B-Instruct	EAGLE-3 ✅	✅

Qwen3	8B	EAGLE-3 ✅	✅
	14B	EAGLE-3 ✅	✅
	32B	EAGLE-3 ✅	✅
gpt-oss	20b	EAGLE-3 ✅	✅
gpt-oss	120b	EAGLE-3 ✅	✅
Qwen3 MoE	30B-Instruct	EAGLE-3 ✅	✅
	235B-Instruct	EAGLE-3 ✅	✅
	235B	EAGLE-3 ✅	✅
Qwen3-VL	235B-A22B	EAGLE-3 ⏳	⏳
Mistral 3 Large	675B-Instruct	EAGLE-3 ⏳	⏳

✅ = Supported, ⏳ = In Progress, ❌ = Not Yet Supported ## Examples End-To-End Training Examples: - [Train Llama3 Draft Model](https://github.com/vllm-project/speculators/blob/main/examples/data_generation_and_training/llama3_8b_sharegpt_5k.py) - [Train Qwen3 (Non-MoE) Draft Model](https://github.com/vllm-project/speculators/blob/main/examples/data_generation_and_training/qwen3_8b_sharegpt_ultrachat.py) - [Train GPT-OSS Draft Model](https://github.com/vllm-project/speculators/blob/main/examples/data_generation_and_training/gpt_oss_20b_ultrachat_5k.py) ## vLLM Inference Models trained through Speculators can run seamlessly in vLLM using a simple `vllm serve ` command. This will run the model in vLLM using default arguments, defined in the `speculator_config` of the model's config.json. ```bash VLLM_USE_V1=1 vllm serve RedHatAI/Qwen3-8B-speculator.eagle3 ``` Served models can then be benchmarked using [GuideLLM](https://github.com/vllm-project/guidellm). Below, we show sample benchmark results where we compare our speculator with its dense counterpart. We also additionally compare [quantization](https://github.com/vllm-project/llm-compressor) to explore additional performance improvements by swapping the dense verifier, `Qwen/Qwen3-8B` with the quantized FP8 model, [RedHatAI/Qwen3-8B-FP8-dynamic](https://huggingface.co/RedHatAI/Qwen3-8B-FP8-dynamic) in the `speculator_config`.

GuideLLM Logo

## Getting Started ### Installation #### Prerequisites Before installing, ensure you have the following: - **Operating System:** Linux or macOS - **Python:** 3.10 or higher - **Package Manager:** pip (recommended) or conda #### Install from PyPI (Recommended) Install the latest stable release from PyPI: ```bash pip install speculators ``` #### Install from Source For the latest development version or to contribute to the project: ```bash git clone https://github.com/vllm-project/speculators.git cd speculators pip install -e . ``` For development with additional tools: ```bash pip install -e ".[dev]" ``` To enable the generation of data (i.e hidden states) from vLLM for speculator training: ```bash pip install -e ".[datagen]" ``` #### Verify Installation You can verify your installation by checking the version: ```bash speculators --version ``` Or by importing the package in Python: ```python import speculators print(speculators.__version__) ``` ## License Speculators is licensed under the [Apache License 2.0](https://github.com/vllm-project/speculators/blob/main/LICENSE). ## Cite If you find Speculators helpful in your research or projects, please consider citing it: ```bibtex @misc{speculators2025, title={Speculators: A Unified Library for Speculative Decoding Algorithms in LLM Serving}, author={Red Hat}, year={2025}, howpublished={\url{https://github.com/vllm-project/speculators}}, } ```