News tagged #speculative-decoding

1 post tagged #speculative-decoding.

2026-06-30
DGX Spark runs Qwen3.5-122B at 59 tok/s general and 81 on agent traffic, with speculative decoding (not NVFP4)

On a single DGX Spark, Qwen3.5-122B-A10B with DFlash block-speculative decoding runs about 59 tok/s on general decode and about 81 tok/s on real agent and tool-call traffic. For agentic work the lever is speculative decoding, not the stock low-50s decode reviews quote, and not NVFP4 (no measured win on this model).

#performance#nvidia#dgx-spark#speculative-decoding#qwen#agentic