Speculative decoding pipeline showing draft, verify, and commit stages for local LLM inference

Speculative Decoding Is Finally Useful for Local LLMs

Speculative decoding has moved from inference-paper trivia into the local LLM hot path. The useful version now includes ngram speculation, native MTP heads, DFlash-style drafting, and TurboQuant / TCQ KV compression, with early llama.cpp forks showing both real speedups and real rough edges.

May 7, 2026 · 16 min · 3279 words · Marco

Z-Image vs Z-Image Turbo: Complete Comparison Guide

Deep dive into the differences between Z-Image (the full foundation model) and Z-Image Turbo (the distilled 8-step variant). Includes technical specs, community benchmarks, VRAM requirements, and practical usage recommendations.

January 28, 2026 · 8 min · 1507 words · Marco