Agent三大经典范式学习

当前大模型智能体（LLM-based Agents）设计中最为核心的三种认知范式。

ReAct (Reasoning + Acting)
Plan-and-Solve (P&S)
Self-Reflection (Reflexion)

1. ReAct (Reasoning + Acting)

核心概念

ReAct 由 Google 和普林斯顿大学提出。其核心思想是让大模型交替进行推理（Thought）**和**行动（Action）。

Thought: 帮助模型明确当前目标、分析环境、追踪状态。
Action: 允许模型通过工具（如搜索引擎、计算器）与外部环境交互。
Observation: 观察行动的结果并将其反馈给模型。

流程细节

接收任务: 输入用户指令。
推理循环:
- 模型生成 Thought：分析当前进度，决定下一步做什么。
- 模型生成 Action：指定调用的工具和参数。
- 环境执行 Action 并返回 Observation。
- 将 Thought, Action, Observation 全部加入上下文，循环直到任务完成。
输出: 最终生成 Final Answer。

伪代码实现 (Python)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import logging
from typing import List, Dict, Any

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("ReActAgent")

class Tool:
    """Mock tool class for demonstration"""
    def call(self, action_input: str) -> str:
        # Simulate tool execution
        return f"Result of {action_input}"

class ReActAgent:
    def __init__(self, model, tools: Dict[str, Tool]):
        self.model = model
        self.tools = tools
        self.max_steps = 5

    def run(self, query: str) -> str:
        """
        Execute the ReAct loop: Thought -> Action -> Observation
        """
        context = f"Question: {query}\n"
        
        for step in range(self.max_steps):
            logger.info(f"Step {step + 1}: Generating Thought and Action")
            
            # 1. Generate Thought and Action based on current context
            response = self.model.generate(context + "Thought:")
            
            if "Final Answer:" in response:
                logger.info("Task completed.")
                return response.split("Final Answer:")[-1].strip()

            # 2. Parse action (Assume format: Action: [tool_name], Action Input: [input])
            try:
                tool_name, action_input = self._parse_action(response)
                logger.info(f"Executing Action: {tool_name} with input: {action_input}")
                
                # 3. Execute tool and get observation
                observation = self.tools[tool_name].call(action_input)
                
                # 4. Update context for the next iteration
                context += f"{response}\nObservation: {observation}\n"
            except Exception as e:
                logger.error(f"Failed to execute action: {e}")
                context += f"\nError: {str(e)}. Try a different approach.\n"

        return "Failed to reach a conclusion within max steps."

    def _parse_action(self, text: str) -> (str, str):
        # Implementation of parsing logic (regex or string split)
        # Placeholder logic
        return "Search", "Python Agent patterns"

2. Plan-and-Solve (P&S)

核心概念

Plan-and-Solve 旨在解决 Zero-shot Chain-of-Thought (CoT) 在处理复杂任务时容易“跑偏”或遗漏步骤的问题。它将决策分为两个阶段：

Planning: 将复杂任务分解为一系列小的子任务（子计划）。
Solving: 按照计划逐一执行子任务。

流程细节

计划阶段: 提示模型“让我们先制定一个计划”，生成步骤列表 $S_1, S_2, ..., S_n$。
执行阶段:
- 维持一个状态跟踪器。
- 遍历每个步骤，调用模型或工具解决该特定步骤的问题。
- 汇总每步的结果得到最终解。

伪代码实现 (Python)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
from typing import List
import logging

logger = logging.getLogger("PlanAndSolveAgent")

class PlanAndSolveAgent:
    def __init__(self, model):
        self.model = model

    def run(self, complex_task: str) -> str:
        """
        Two-stage process: Plan then Execute
        """
        # Phase 1: Planning
        logger.info("Phase 1: Generating global plan")
        plan_prompt = f"Task: {complex_task}\nBreak this down into logical steps."
        plan_raw = self.model.generate(plan_prompt)
        steps = self._extract_steps(plan_raw)

        # Phase 2: Execution
        logger.info(f"Phase 2: Executing {len(steps)} steps")
        results = []
        for i, step in enumerate(steps):
            logger.info(f"Executing step {i+1}: {step}")
            execution_prompt = f"Task: {complex_task}\nPlan: {plan_raw}\nNow, execute Step {i+1}: {step}"
            step_result = self.model.generate(execution_prompt)
            results.append(step_result)

        # Phase 3: Synthesis
        logger.info("Synthesizing final answer")
        final_prompt = f"Based on these steps: {results}, provide the final answer."
        return self.model.generate(final_prompt)

    def _extract_steps(self, plan_text: str) -> List[str]:
        # Simple split logic for demo
        return [s.strip() for s in plan_text.split('\n') if s.strip()]

3. Self-Reflection (Reflexion)

核心概念

Self-Reflection 引入了“闭环反馈”机制。Agent 不再是一次性输出，而是会审视自己的解答，寻找错误，并进行自我修正。典型的框架如 Reflexion：

Actor: 生成尝试。
Evaluator: 打分或判断成功/失败。
Reflector: 分析失败原因，生成“经验教训”存入长期记忆。

流程细节

初次尝试: Actor 执行任务。
评估: Evaluator 判断输出是否符合预期。
反思: 如果失败，模型会根据（输入、失败输出、奖励信号）生成一段文字形式的反思。
迭代: 模型在下一次尝试时，会将之前的反思作为额外上下文，避免重蹈覆辙。

伪代码实现 (Python)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import logging
from typing import Optional

logger = logging.getLogger("ReflectionAgent")

class ReflectionAgent:
    def __init__(self, actor_model, reflector_model):
        self.actor = actor_model
        self.reflector = reflector_model
        self.memory = [] # Store past reflections

    def run(self, task: str, max_iterations: int = 3) -> str:
        """
        Iterative loop: Attempt -> Evaluate -> Reflect -> Improve
        """
        current_attempt = ""
        
        for i in range(max_iterations):
            logger.info(f"Iteration {i+1}: Attempting task")
            
            # Include past reflections in context
            reflection_context = "\n".join(self.memory)
            prompt = f"Task: {task}\nPast lessons: {reflection_context}\nAnswer:"
            
            current_attempt = self.actor.generate(prompt)
            
            # Evaluate (In real scenarios, use unit tests or another LLM)
            is_correct, feedback = self._evaluate(current_attempt)
            
            if is_correct:
                logger.info("Evaluation passed.")
                return current_attempt
            
            # Generate reflection on failure
            logger.warning(f"Attempt failed. Generating reflection. Feedback: {feedback}")
            reflection_prompt = f"Task: {task}\nFailed Answer: {current_attempt}\nFeedback: {feedback}\nWhat went wrong?"
            reflection = self.reflector.generate(reflection_prompt)
            
            # Store learning in memory
            self.memory.append(f"Attempt {i+1} Lesson: {reflection}")

        return current_attempt

    def _evaluate(self, result: str) -> (bool, str):
        # Mock evaluation logic
        # Returns (Success, Feedback string)
        if "Correct Keyword" in result:
            return True, "Perfect"
        return False, "Missing specific technical details"

总结对比

范式	核心侧重点	适用场景	局限性
ReAct	实时交互、动态调整	搜索、数据库查询、需要频繁外部反馈	步数多时上下文开销大，容易陷入无限死循环
Plan-and-Solve	结构化拆解、全局观	复杂数学题、多步骤数据处理、长文档生成	初始计划如果错误，后续执行可能失效
Self-Reflection	自我优化、错误校正	编程任务、需要逻辑严密性的推理、多轮博弈	依赖 Evaluator 的准确性，多次调用成本较高

1. ReAct (Reasoning + Acting)#

核心概念#

流程细节#

伪代码实现 (Python)#

2. Plan-and-Solve (P&S)#

核心概念#

流程细节#

伪代码实现 (Python)#

3. Self-Reflection (Reflexion)#

核心概念#

流程细节#

伪代码实现 (Python)#

总结对比#

1. ReAct (Reasoning + Acting)

核心概念

流程细节

伪代码实现 (Python)

2. Plan-and-Solve (P&S)

核心概念

流程细节

伪代码实现 (Python)

3. Self-Reflection (Reflexion)

核心概念

流程细节

伪代码实现 (Python)

总结对比