Two-Node DGX Spark Cluster: Running DeepSeek V4 Flash at 20 TPS
Building and optimizing a multi-node DeepSeek V4 Flash inference cluster on DGX Spark with vLLM — from baseline 5.5 TPS to ~20 TPS through safe, production-proven optimizations, including MTP speculation and PIECEWISE cudagraph.