模型原理
理解、复现、创新。
一些资源:
重大意义的模型或原理:
- Attention: 17-06 Transformer
- Dynamic Embedding: 18-02 ELMo
- Encode only: 18-10 BERT, 19-07 RoBERTa
- Decode + Encode: 19-10 T5, 19-10 BART
- Scaling Law: 20-01 Scaling Laws for Neural Language Models
- Decode only: 20-05 GPT3
- MoE: 91-03 Adaptive Mixtures of Local Experts, 24-01 DeepSeekMoE
- Fine-tuning
- RLHF (DPO, GRPO)
