Speculative Decoding Is Finally Useful for Local LLMs
Speculative decoding has moved from inference-paper trivia into the local LLM hot path. The useful version combines ngram speculation for repeated text, MTP for models that were trained to draft ahead, and KV compression like TurboQuant when long context is the real constraint.