Pre$^3$: Unlocking Faster, Structured LLM Generation with Deterministic Pushdown Automata
We are delighted to introduce our paper (Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation) on constrained decoding, which has been accepted by ACL25 Main Conference.
LightLLM v1.0.0: Now Available!
We're thrilled to announce the official release of LightLLM v1.0.0! This major update brings groundbreaking improvements, including minimal inter-process communication overhead, the fastest DeepSeek-R1 serving performance on a single H200, and prototype support for PD-Disaggregation.