1.2T param, 78B active, hybrid MoE
That's enormous, very much not local, heh.
Here's the actual article translation (which seems right comparing to other translations):
Translation
DeepSeek R2: Unit Cost Drops 97.3%, Imminent Release + Core Specifications
Author: Chasing Trends Observer
Veteran Crypto Investor Watching from Afar
2025-04-25 12:06:16 Sichuan
Three Core Technological Breakthroughs of DeepSeek R2:
- Architectural Innovation
Adopts proprietary Hybrid MoE 3.0 architecture, achieving 1.2 trillion dynamically activated parameters (actual computational consumption: 78 billion parameters).
Validated by Alibaba Cloud tests:
- 97.3% reduction in per-token cost compared to GPT-4 Turbo for long-text inference tasks
(Data source: IDC Computing Power Economic Model)
-
Data Engineering
Constructed 5.2PB high-quality corpus covering finance, law, patents, and vertical domains.
Multi-stage semantic distillation boosts instruction compliance accuracy to 89.7%
(Benchmark: C-Eval 2.0 test set) -
Hardware Optimization
Proprietary distributed training framework achieves:
- 82% utilization rate on Ascend 910B chip clusters
- 512 PetaFLOPS actual computing power at FP16 precision
- 91% efficiency of equivalent-scale A100 clusters
(Validated by Huawei Labs)
Application Layer Advancements - Three Multimodal Breakthroughs:
- Vision Understanding
ViT-Transformer hybrid architecture achieves:
- 92.4 mAP on COCO dataset object segmentation
- 11.6% improvement over CLIP models
-
Industrial Inspection
Adaptive feature fusion algorithm reduces false detection rate to 7.2E-6 in photovoltaic EL defect detection
(Field data from LONGi Green Energy production lines) -
Medical Diagnostics
Knowledge graph-enhanced chest X-ray multi-disease recognition:
- 98.1% accuracy vs. 96.3% average of senior radiologist panels
(Blind test results from Peking Union Medical College Hospital)
Key Highlight:
8-bit quantization compression achieves:
- 83% model size reduction
- <2% accuracy loss
(Enables edge device deployment - Technical White Paper Chapter 4.2)
Others translate it as 'sub-8-bit' quantization, which is interesting too.