DeepSeek 4 Flash and PRO local inference engine for Metal, CUDA and ROCm