HomeExplorespeculative-decoding

Speculative Decoding Collection

Repositories tagged with "speculative-decoding"

RARE

TCG-style cards with ATK/DEF/SPD stats

UNCOMMON

⭐2.6kHP

◆

🔥Fire

★★

lucebox-hub

Luce-Org

C++cudacuda-kernels

“Fast LLM speculative inference server for consumer hardware.”

★

2.6k

240

2.6k

240 forks

ATK

DEF

SPD

GitPedia #319

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

2.6k

240

2.6k

UNCOMMON

⭐2.4kHP

◆

🔮Psychic

★★

EAGLE

SafeAILab

Pythonlarge-language-modelsllm-inference

“Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).”

★

2.4k

286

2.4k

286 forks

ATK

DEF

SPD

GitPedia #176

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

2.4k

286

2.4k

UNCOMMON

⭐2.2kHP

◆

🔮Psychic

★★

intel-extension-for-transformers

intel

Python4-bitsautoround

“⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡”

★

2.2k

217

2.2k

217 forks

ATK

DEF

SPD

GitPedia #080

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

2.2k

217

2.2k

UNCOMMON

⭐1.8kHP

◆

🔥Fire

★★

aphrodite-engine

dphnAI

C++api-restcuda

“Large-scale LLM inference engine”

★

1.8k

200

1.8k

200 forks

ATK

DEF

SPD

GitPedia #680

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

1.8k

200

1.8k

UNCOMMON

⭐1.3kHP

◆

🔮Psychic

★★

AngelSlim

Tencent

Pythonaudiodeepseek

“Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.”

★

1.3k

153

1.3k

153 forks

ATK

DEF

SPD

GitPedia #962

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

1.3k

153

1.3k

UNCOMMON

⭐790HP

◆

🔮Psychic

★★

MTPLX

youssofal

Pythonanthropic-compatibleapple-silicon

“2.24x decode TPS increase On Qwen 3.6 27B @ temp 0.6 | Native MTP Speculative Decoding On Apple Silicon With No External Drafter.”

★

790

43 forks

ATK

DEF

SPD

GitPedia #385

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

790

UNCOMMON

⭐510HP

◆

⚙️Steel

★★

atlas

Avarok-Cybersecurity

Rustcudadgx

“Pure Rust Inference Engine”

★

510

77 forks

ATK

DEF

SPD

GitPedia #060

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

510

COMMON

⭐377HP

◆

🔮Psychic

★

Sequoia

Infini-AI-Lab

Pythonefficiencyinference

“scalable and robust tree-based speculative decoding algorithm”

★

377

37 forks

ATK

DEF

SPD

GitPedia #476

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

377

COMMON

⭐372HP

◆

🔮Psychic

★

LayerSkip

facebookresearch

Pythonearly-exitlayer-drop

“Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024”

★

372

43 forks

ATK

DEF

SPD

GitPedia #634

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

372

COMMON

⭐361HP

◆

📦Normal

★

awesome-on-policy-distillation

chrisliu298

awesomeawesome-list

“A curated collection of papers, technical reports, frameworks, and tools for on-policy distillation (OPD) of large language models”

★

361

7 forks

ATK

DEF

SPD

GitPedia #877

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

361

COMMON

⭐289HP

◆

🔮Psychic

★

Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash

AEON-7

Pythonabliterationblackwell

“Fully uncensored, capability-enhanced abliteration of Qwen3.6-27B. NVFP4 + z-lab DFlash speculative decoding (n=12) on the unified ghcr.io/aeon-7/aeon-vllm-ultimate:latest container, tuned for long-context draft acceptance on DGX Spark. 6 HF variants (BF16/NVFP4/MTP/MTP-XS), docker-compose, and QuickStart.”

★

289

29 forks

ATK

DEF

SPD

GitPedia #335

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

289

COMMON

⭐281HP

◆

🔮Psychic

★

TriForce

Infini-AI-Lab

Pythonaccelerationefficiency

“[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding”

★

281

20 forks

ATK

DEF

SPD

GitPedia #650

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

281

COMMON

⭐270HP

◆

🔮Psychic

★

mini-infer

psmarter

Pythoncontinuous-batchingcuda

“LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving ”

★

270

16 forks

ATK

DEF

SPD

GitPedia #931

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

270

COMMON

⭐240HP

◆

🔮Psychic

★

tessera

zengxiao-he

Pythoncudaflash-attention

“From teacher to tiles — a from-scratch LLM distillation & serving engine: custom Triton/CUDA kernels, FSDP distillation, paged-KV continuous batching, speculative decoding, a Rust gateway, a JAX oracle, and interpretability tooling.”

★

240

4 forks

ATK

DEF

SPD

GitPedia #378

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

240

COMMON

⭐227HP

◆

🌊Water

★

llm-server

raketenkater

Gocudagguf

“Auto-tuned launcher for GGUF models on llama.cpp / ik_llama.cpp — OpenAI-compatible server with multi-GPU tensor-split, MoE expert placement, measured flag tuning (AI Tune), hardware-matched HuggingFace downloads, and crash recovery. An Ollama alternative for multi-GPU rigs.”

★

227

11 forks

ATK

DEF

SPD

GitPedia #201

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

227

COMMON

⭐219HP

◆

⚔️Fighting

★

REST

FasterDecoding

Cllm-inferenceretrieval

“REST: Retrieval-Based Speculative Decoding, NAACL 2024”

★

219

17 forks

ATK

DEF

SPD

GitPedia #629

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

219

COMMON

⭐141HP

◆

🔮Psychic

★

ddtree-mlx

humanrouter

Pythonapple-siliconinference

“Tree-based speculative decoding for Apple Silicon (MLX). ~10-15% faster than DFlash on code, ~1.5x over autoregressive. First MLX port with custom Metal kernels for hybrid model support.”

★

141

11 forks

ATK

DEF

SPD

GitPedia #075

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

141

COMMON

⭐136HP

◆

🔮Psychic

★

FLy

AMD-AGI

Pythonflyloosely

“FLy: Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match (ICLR 2026)”

★

136

4 forks

ATK

DEF

SPD

GitPedia #699

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

136