Local-Llm

Speculative Decoding Is Finally Useful for Local LLMs

Speculative decoding has moved from inference-paper trivia into the local LLM hot path. The useful version now includes ngram speculation, native MTP heads, DFlash-style drafting, and TurboQuant / TCQ KV compression, with early llama.cpp forks showing both real speedups and real rough edges.

Local LLMs in 2026: Models, Hardware, and Best Practices

Everything you need to know about running local LLMs in 2026: model recommendations, hardware configurations, and practical best practices.