ModelTC/LightLLM — Gitpedia

visitors

</div>

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance. LightLLM harnesses the strengths of numerous well-regarded open-source implementations, including but not limited to FasterTransformer, TGI, vLLM, and FlashAttention.

English Docs | 中文文档 | Blogs

Tech Blogs

[2025/11] 🚀 Prefix KV Cache Transfer between DP rankers is now supported! Check out the technical deep dive in our blog post.

News

[2025/09] 🔥 LightLLM v1.1.0 release!
[2025/08] Pre $^3$ achieves the outstanding paper award of ACL2025.
[2025/05] LightLLM paper on constrained decoding accepted by ACL2025 (Pre $^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation). For a more accessible overview of the research with key insights and examples, check out our blog post: LightLLM Blog
[2025/04] LightLLM paper on request scheduler published in ASPLOS’25 (Past-Future Scheduler for LLM Serving under SLA Guarantees)
[2025/02] 🔥 LightLLM v1.0.0 release, achieving the fastest DeepSeek-R1 serving performance on single H200 machine.

Get started

Performance

Learn more in the release blogs: v1.1.0 blog.

FAQ

Please refer to the FAQ for more information.

Projects using LightLLM

We welcome any coopoeration and contribution. If there is a project requires LightLLM's support, please contact us via email or create a pull request.

Projects based on LightLLM or referenced LightLLM components:

LoongServe, Peking University
vLLM (some LightLLM's kernel used)
SGLang (some LightLLM's kernel used)
ParrotServe, Microsoft
Aphrodite (some LightLLM's kernel used)
S-LoRA
OmniKV, Ant Group
Lab4AI LightLLM+LlamaIndex, Lab4AI LightLLM+Qwen3-8B
LazyLLM

Also, LightLLM's pure-python design and token-level KC Cache management make it easy to use as the basis for research projects.

Academia works based on or use part of LightLLM:

Community

For further information and discussion, join our discord server. Welcome to be a member and look forward to your contribution!

License

This repository is released under the Apache-2.0 license.

Acknowledgement

We learned a lot from the following projects when developing LightLLM.

Citation

We have published a number of papers around components or features of LightLLM, if you use LightLLM in your work, please consider citing the relevant paper.

constrained decoding: accepted by ACL2025 and achieved the outstanding paper award.

bibtex
@inproceedings{
anonymous2025pre,
title={Pre\${\textasciicircum}3\$: Enabling Deterministic Pushdown Automata for Faster Structured {LLM} Generation},
author={Anonymous},
booktitle={Submitted to ACL Rolling Review - February 2025},
year={2025},
url={https://openreview.net/forum?id=g1aBeiyZEi},
note={under review}
}

Request scheduler: accepted by ASPLOS’25:

bibtex
@inproceedings{gong2025past,
  title={Past-Future Scheduler for LLM Serving under SLA Guarantees},
  author={Gong, Ruihao and Bai, Shihao and Wu, Siyu and Fan, Yunqian and Wang, Zaijun and Li, Xiuhong and Yang, Hailong and Liu, Xianglong},
  booktitle={Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},
  pages={798--813},
  year={2025}
}