LLM 推理效能大看板 V2

多維度對比:本地部署 vs 雲端 API | 數據驅動決策

更新日期: 2026-04-14
模型名稱 首字延遲 (TTFT) ↓ 生成速度 (TPS) 總耗時 總字數 速度視覺化
local-gemma4:26b (128K)
Local
87.99s 10.79 131.7s 717
local-gemma4:26b (32K)
Local
78.67s 10.16 127.7s 722
local-gemma4:e4b
Local
39.93s 12.34 110.4s 1338
ollama-deepseek-v3.1:671b-cloud
Cloud (Ollama)
1.04s 51.74 7.3s 479
ollama-gemma4:31b-cloud
Cloud
0.85s 31.79 14.2s 613
ollama-glm-5:cloud
Cloud (Ollama)
13.58s 102.25 19.5s 779
ollama-kimi-k2.5:cloud
Cloud (Ollama)
15.91s 29.67 23.3s 505
ollama-minimax-m2.7:cloud
Cloud (Ollama)
40.40s 3.75 40.9s 508
百煉-qwen3-max
Cloud
0.86s 6.18 14.1s 595
百煉-qwen3.5-35b-a3b
Cloud
37.64s 69.15 39.2s 543
百煉-qwen3.6-plus
Cloud
77.35s 15.25 83.3s 507
百煉-qwen3.6-plus-v2
Cloud
47.14s 15.58 53.1s 503
直連-MiniMax-M2.7
Cloud (Direct)
1.19s 1.97 13.9s 842
硅基流動-DeepSeek-R1-Qwen-8B
Cloud
10.15s 75.57 13.4s 398