大模型相关
- RNN的雏形可以追溯到90年代Jeffrey L.Elman的经典文章:Finding Structure in Time(1990)
- 2013年Google提出的Word2Vec可能是最为人熟知的Embedding技术之一
- Encoder-Decoder架构来自论文:Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation(2014)
- 注意力机制参考论文:Neural Machine Translation by Jointly Learning to Align and Translate(2014)
- LLM的技术发展和相互关系:Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond(2023)
- Transformer出自一篇经典论文:Attention Is All You Need(2017)
- 国外模型的对比可以参考大模型对比网站artificialanalysis.ai,国内模型可以参考superclueai提供的能力象限图。
- Prompt
- 基础的Prompt技巧可以参考吴恩达和OpenAI联合推出的视频课程ChatGPT Prompt Engineering for Developers。
- 编写清晰、具体的指令:
- 使用分隔符清晰地表示输入的不同部分;
- 要求一个结构化的输出;
- 要求模型检查是否满足条件;
- 提供少量示例(few-shot)。
- 给模型时间去思考:
- 指定完成任务所需的步骤;
- 指导模型在下结论之前找出一个自己的解法。
- 编写清晰、具体的指令:
- 近年来研究人员持续研究Prompt技术,针对不同任务提出了非常多的Prompt优化思路,下面介绍几种常见方法:
- CoT(Chain of Thought,思维链):通过引导模型逐步展示中间推理步骤,使其能够更好地处理复杂问题,提升推理透明性和准确性。这种方法类似于人类逐步分解问题并推导出最终答案的过程。
- ToT(Tree of Thoughts,思维树):ToT的每一个节点代表一个推理步骤。与CoT的线性推理路径不同,ToT通过构建树状结构来探索多种可能路径,使其适用于多步骤决策和复杂推理任务,从而更全面地覆盖问题空间。
- ReAct(Reasoning and Acting,推理与行动):通过让大模型交替进行推理(生成文本)和行动(如通过API查询外部信息),生成类似人类的任务解决轨迹,从而减少模型幻觉,提升问答和决策任务的可信度。
- 如果想了解更多关于Prompt技术的进展和成果,可以参考论文:A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications。
- 基础的Prompt技巧可以参考吴恩达和OpenAI联合推出的视频课程ChatGPT Prompt Engineering for Developers。
- Fine-tuning
- 微调(Fine-Tuning)是指在预训练的LLM基础上,通过进一步训练,使其在特定任务上表现得更好。此外,微调的另一个作用是让LLM的输出与人类的期望和价值观一致,这通常也被称为对齐(Alignment)。
- Full Fine-Tuning:全参数微调,即对大模型的所有参数进行更新。这种方法常用且有效,但缺点是需要消耗更多的训练资源。
- LoRA(Layer-wise Learning Rate Adaptation):在固定基座模型参数的基础上,对权重矩阵进行低秩分解(可以简单理解为在原模型的旁路加一个小模型)。训练过程中只更新低秩部分的参数,这样训练速度较快,但效果可能不如Full Fine-Tuning。
- RLHF(Reinforcement Learning from Human Feedback):强化学习方法之一,需要先通过偏好数据(用户或专家对模型生成的候选答案进行排序打分)来训练一个奖励模型(Reward Model, RM),然后使用近端策略优化(Proximal Policy Optimization, PPO)来进行强化学习。
- DPO(Direct Preference Optimization):强化学习方法之一,不需要训练一个奖励模型,直接使用偏好数据进行策略优化。
- Agent
- LLM中Agent的定义和作用可以参考OpenAI员工撰写的一篇文章:LLM Powered Autonomous Agents。
- RAG
- Function Calling
论文:
- Finding Structure in Time(1990)
- Long_Short-term_Memory(1997)
- Learning to Forget: Continual Prediction with LSTM(2000)
- Efficient Estimation of Word Representations in Vector Space(2013)
- A Survey of Text Representation and Embedding Techniques in NLP(2023)
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation(2014)
- Sequence to Sequence Learning with Neural Networks(2014)
- Neural Machine Translation by Jointly Learning to Align and Translate(2014)
- Attention Is All You Need(2017)
- Improving Language Understanding by Generative Pre-Training(2018)
- Language Models are Unsupervised Multitask Learners(2019)
- Language Models are Few-Shot Learners(2020)
- Training language models to follow instructions with human feedback(2022)
- GPT-4 Technical Report(2023)
- Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond(2023)
- The Prompt Report: A Systematic Survey of Prompting Techniques(2024)
- Proximal Policy Optimization Algorithms(2017)
- Scalable agent alignment via reward modeling: a research direction(2018)
- Layer-wise Learning Rate Adaptation(2021)
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model(2023)
- Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment(2024)
- A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More(2024)
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks(2020)
- MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning(2022)
- TALM: Tool Augmented Language Models(2022)
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models(2022)
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models(2023)
- ReAct: Synergizing Reasoning and Acting in Language Models(2022)
- Reflexion: Language Agents with Verbal Reinforcement Learning(2023)
- Chain of Hindsight Aligns Language Models with Feedback(2023)
文章:
All articles on this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated.