综述分享
A Survey of Large Language Models
- PLM(Pre-trained language models)
ELMO: 在LSTM上预训练,然后根据下游任务进行finetune (context-aware)
Bert: 在无标签的词源上进行训练,然后根据下游任务进行finetune(context-aware)
改进预训练方法的论文:
“Roberta: A robustly optimized BERT pretraining approach CoRR”
“Multitask prompted training enables zero-shot task generalization ICLR 2022”
“What language model architecture and pretraining objective works best for zero-shot generalization?”
- LLM(Large Language Models)
扩展(scaling)的PLM会有奇效,遵循scaling law: “Scaling laws for neural language models” 效果:“Emergent abilities of large language models”
Question: how LLM attain superior ability?
Article: “How does gpt obtain its ability? tracing emergent abilities of language models to their sources,”
emergent abilities
1. In-context learning: 上下文学习
2. Instruction follow: 更为详细的prompt LaMDA-PT
3. Step-by-step reasoning: 用于处理复杂问题(如数学推理) 严格的输入链条