第7章：多智能体协作

尽管单体智能体架构对于定义明确的问题可能非常有效，但当面对复杂的多领域任务时，其能力往往会受到限制。多智能体协作模式通过将系统构建为一系列独立、专业的智能体的合作整体来解决这些限制。这种方法基于任务分解的原则，即将一个高级目标分解为离散的子问题。然后，每个子问题被分配给一个拥有最适合该任务的具体工具、数据访问或推理能力的智能体。

例如，一个复杂的研究查询可能被分解并分配给一个研究智能体进行信息检索，一个数据分析智能体进行统计分析，以及一个综合智能体生成最终报告。此类系统的有效性并不仅仅是因为劳动分工，而是关键依赖于智能体之间的通信机制。这需要一种标准化的通信协议和共享的本体，使得智能体能够交换数据、委派子任务，并协调它们的行为，以确保最终输出的一致性。

这种分布式架构具有多个优点，包括增强的模块化、可扩展性和鲁棒性，因为单个智能体的故障并不必然导致整个系统失败。协作使得多智能体系统的整体性能超越单个智能体在集合中的潜在能力，产生协同效应。

多智能体协作模式概述

多智能体协作模式涉及设计系统，其中多个独立或半独立的智能体共同协作以实现一个共同目标。每个智能体通常都有一个明确的角色、与整体目标相一致的具体目标，以及可能访问不同的工具或知识库。这种模式的强大之处在于这些智能体之间的互动和协同效应。

协作可以采取多种形式：

顺序切换： 一个智能体完成一项任务并将输出传递给另一个智能体，以便进行管道中的下一步操作（类似于规划模式，但明确涉及不同的智能体）。
并行处理： 多个智能体同时处理问题的不同部分，随后将它们的结果合并。 辩论与共识： 多智能体协作，其中具有不同观点和信息来源的智能体参与讨论，以评估选项，最终达成共识或做出更加明智的决定。
分层结构：管理智能体可能会根据智能体的工具访问权限或插件功能动态分配任务，并综合其结果。每个智能体也可以处理相关的工具组，而不是由单个智能体处理所有工具。
专家团队： 具有不同领域（例如，研究人员、作家、编辑）专业知识的智能体协作，以生成复杂输出。

评论员-审查员：智能体创建初始输出，如计划、草案或答案。然后，第二组智能体对这一输出进行批判性评估，以检查其是否符合政策、安全性、合规性、正确性、质量以及与组织目标的契合度。原始创建者或最终智能体根据反馈修改输出。这种模式在代码生成、研究写作、逻辑检查和确保道德一致性方面特别有效。这种方法的优点包括提高了鲁棒性、改善了质量，以及降低了幻觉或错误发生的可能性。

一个多智能体系统（见图1）从根本上包括对智能体角色和职责的界定，建立智能体之间交换信息的通信渠道，以及制定任务流程或交互协议，以指导它们的协作努力。

图1：多智能体系统示例

智能体框架如Crew AI和Google ADK旨在通过提供智能体、任务及其交互过程的规范结构来促进这一范式。这种方法特别适用于需要多种专业知识、涵盖多个离散阶段或利用并发处理和信息在智能体之间协同优势的挑战。

实际应用与用例

多智能体协作是一种强大的模式，适用于众多领域：

复杂研究和分析： 一组智能体可以协作进行一个研究项目。一个智能体可能专注于搜索学术数据库，另一个智能体可能专注于总结发现，第三个智能体可能专注于识别趋势，第四个智能体可能专注于将信息综合成一份报告。这反映了人类研究团队可能的工作方式。
软件开发： 想象一下智能体协同构建软件的场景。一个智能体可能是需求分析师，另一个可能是代码生成器，第三个可能是测试员，第四个可能是文档编写者。他们可以相互传递输出，以构建和验证组件。
创意内容生成： 制定营销活动可能需要市场调研智能体、文案智能体、图形设计智能体（使用图像生成工具）以及社交媒体排期智能体，它们共同协作。
财务分析： 一个多智能体系统可以分析金融市场。智能体可能专门负责获取股票数据、分析新闻情绪、进行技术分析以及生成投资建议。
客户支持升级： 前线支持智能体可以处理初始查询，在需要时将复杂问题升级给专业智能体（例如，技术专家或账单专家），展示了基于问题复杂性的顺序移交。
供应链优化： 智能体可以代表供应链中的不同节点（供应商、制造商、分销商）并协作以优化库存水平、物流和调度，以应对变化的需求或中断。
网络分析与修复：自主操作极大地受益于智能体架构，尤其是在故障定位方面。多个智能体可以协作进行问题分类和修复，提出最佳行动方案。这些智能体还可以与传统机器学习模型和工具集成，利用现有系统的同时，同时提供生成式AI的优势。

能够精确划分专用智能体并细致编排它们之间的相互关系，赋予了开发者构建具有更高模块化、可扩展性以及应对复杂性的系统（这些复杂性对于一个单一集成智能体来说可能难以克服）的能力。

多智能体协作：探讨相互关系和通信结构

理解智能体之间复杂交互和沟通的方式对于设计有效的多智能体系统至关重要。如图2所示，存在一系列的相互关系和通信模型，从最简单的单个智能体场景到复杂、定制化的协作框架。每种模型都具备独特的优势和挑战，影响着多智能体系统的整体效率、鲁棒性和适应性。

1. 单智能体： 在最基本层面上，"单智能体"可以独立运行，无需与其他实体进行直接交互或通信。虽然这种模型易于实现和管理，但其功能本质上受到单个智能体范围和资源的限制。它适用于可以分解为独立子问题，每个子问题都可以由单个自给自足的智能体解决的问题。

2. 网络：“网络”模型代表了向协作迈出的重要一步，其中多个智能体以去中心化的方式直接相互交互。通信通常以点对点的方式进行，允许共享信息、资源甚至任务。这种模型促进了系统的弹性，因为单个智能体的失败并不一定会使整个系统瘫痪。然而，在大型、无结构的网络中管理通信开销和确保协调一致的决策可能具有挑战性。

3. 管理员： 在“管理员”模型中，一个专门的智能体，即“管理员”，负责监督和协调一组下属智能体的活动。管理员充当通信、任务分配和冲突解决的中央枢纽。这种分层结构提供了清晰的权力线路，可以简化管理和控制。然而，它引入了一个单点故障（管理员）的风险，如果管理员被大量下属或复杂任务压垮，可能会成为瓶颈。

4. 智能体作为工具： 此模型是对“智能体”概念的细微扩展，其中智能体的角色不再是直接的命令和控制，而是更多地提供资源、指导或分析支持给其他智能体。智能体可能会提供工具、数据或计算服务，使其他智能体能够更有效地完成任务，而不必规定它们的每一个动作。这种方法的目的是利用智能体的能力，而不施加僵化的自上而下的控制。

5. 分层式：“分层”模型在监督者概念的基础上扩展，创建了一个多层组织结构。这涉及多个监督者层级，高级监督者负责监督低级监督者，最终在最低层汇集了一组操作智能体。这种结构非常适合可以分解为子问题且每个子问题由层次结构中特定层管理的复杂问题。它提供了一种结构化的可扩展性和复杂性管理方法，允许在定义的边界内进行分布式决策。

图2：智能体通过多种方式相互通信和交互。

6. 自定义：“自定义”模型代表了多智能体系统设计的最大灵活性。它允许创建独特的相互关系和通信结构，这些结构精确地满足特定问题或应用的需求。这可能涉及结合之前提到的模型元素的混合方法，或者从环境的独特约束和机遇中产生的全新设计。自定义模型通常源于对特定性能指标进行优化的需求、处理高度动态的环境或将特定领域知识融入系统架构。设计和实施自定义模型通常需要深入理解多智能体系统原理，并仔细考虑通信协议、协调机制和涌现行为。

总的来说，为多智能体系统选择相互关系和通信模型是一个关键的设计决策。每种模型都各有优缺点，最佳选择取决于任务复杂性、智能体数量、期望的自主程度、对鲁棒性的需求以及可接受的通信开销等因素。未来多智能体系统的发展很可能会继续探索和改进这些模型，同时也会开发新的协同智能范式。

动手代码（船员AI）

这段Python代码使用CrewAI框架定义了一个AI智能体，用于生成一篇关于AI趋势的博客文章。它首先设置环境，从.env文件中加载API密钥。应用程序的核心包括定义两个智能体：一个研究员用于寻找和总结AI趋势，一个作家基于研究结果创作博客文章。

相应地定义了两个任务：一个用于研究趋势，另一个用于撰写博客文章，撰写任务依赖于研究任务的输出。然后将这些智能体和任务组装成一个团队，指定一个顺序执行过程，任务按顺序执行。团队初始化时包含智能体、任务和一个语言模型（具体为“gemini-2.0-flash”模型）。主函数通过调用kickoff()方法执行这个团队，协调智能体之间的协作以产生所需输出。最后，代码打印出团队执行的最后结果，即生成的博客文章。

import os
from dotenv import load_dotenv
from crewai import Agent, Task, Crew, Process
from langchain_google_genai import ChatGoogleGenerativeAI

def setup_environment():
    """Loads environment variables and checks for the required API key."""
    load_dotenv()
    if not os.getenv("GOOGLE_API_KEY"):
        raise ValueError("GOOGLE_API_KEY not found. Please set it in your .env file.")

def main():
    """Initializes and runs the AI crew for content creation using the latest Gemini model."""
    setup_environment()
    # Define the language model to use.
    # Updated to a model from the Gemini 2.0 series for better performance and features.
    # For cutting-edge (preview) capabilities, you could use "gemini-2.5-flash".
    llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
    # Define Agents with specific roles and goals
    researcher = Agent(
        role='Senior Research Analyst',
        goal='Find and summarize the latest trends in AI.',
        backstory="You are an experienced research analyst with a knack for identifying key trends and synthesizing information.",
        verbose=True,
        allow_delegation=False,
    )
    writer = Agent(
        role='Technical Content Writer',
        goal='Write a clear and engaging blog post based on research findings.',
        backstory="You are a skilled writer who can translate complex technical topics into accessible content.",
        verbose=True,
        allow_delegation=False,
    )
    # Define Tasks for the agents
    research_task = Task(
        description="Research the top 3 emerging trends in Artificial Intelligence in 2024-2025. Focus on practical applications and potential impact.",
        expected_output="A detailed summary of the top 3 AI trends, including key points and sources.",
        agent=researcher,
    )
    writing_task = Task(
        description="Write a 500-word blog post based on the research findings. The post should be engaging and easy for a general audience to understand.",
        expected_output="A complete 500-word blog post about the latest AI trends.",
        agent=writer,
        context=[research_task],
    )
    # Create the Crew
    blog_creation_crew = Crew(
        agents=[researcher, writer],
        tasks=[research_task, writing_task],
        process=Process.sequential,
        llm=llm,
        verbose=2  # Set verbosity for detailed crew execution logs
    )
    # Execute the Crew
    print("## Running the blog creation crew with Gemini 2.0 Flash... ##")
    try:
        result = blog_creation_crew.kickoff()
        print("\n------------------\n")
        print("## Crew Final Output ##")
        print(result)
    except Exception as e:
        print(f"\nAn unexpected error occurred: {e}")

if __name__ == "__main__":
    main()

我们将进一步探讨Google ADK框架内的更多示例，特别关注层次化、并行和顺序协调范式，以及将智能体作为操作工具的实施。

动手代码（Google ADK）

以下代码示例展示了在Google ADK中通过创建父子关系建立层次化智能体结构。该代码定义了两种类型的智能体：LlmAgent和从BaseAgent派生出的自定义TaskExecutor智能体。TaskExecutor旨在执行特定、非LLM任务，在此示例中，它简单地产生一个“任务成功完成”的事件。名为greeter的LlmAgent被初始化，并指定了模型和指令，以充当友好的问候者。自定义的TaskExecutor实例化为task_doer。创建了一个名为coordinator的父LlmAgent，也带有模型和指令。协调器的指令指导它将问候委托给greeter，将任务执行委托给task_doer。greeter和task_doer被添加为协调器的子智能体，建立了父子关系。然后代码断言这种关系已正确设置。最后，它打印一条消息，表明智能体层次结构已成功创建。

from google.adk.agents import LlmAgent, BaseAgent
from google.adk.agents.invocation_context import InvocationContext
from google.adk.events import Event
from typing import AsyncGenerator

class TaskExecutor(BaseAgent):
    """A specialized agent with custom, non-LLM behavior."""
    name: str = "TaskExecutor"
    description: str = "Executes a predefined task."

    async def _run_async_impl(self, context: InvocationContext) -> AsyncGenerator[Event, None]:
        """Custom implementation logic for the task."""
        # This is where your custom logic would go.
        # For this example, we'll just yield a simple event.
        yield Event(author=self.name, content="Task finished successfully.")

greeter = LlmAgent(
    name="Greeter",
    model="gemini-2.0-flash-exp",
    instruction="You are a friendly greeter."
)

task_doer = TaskExecutor()

coordinator = LlmAgent(
    name="Coordinator",
    model="gemini-2.0-flash-exp",
    description="A coordinator that can greet users and execute tasks.",
    instruction="When asked to greet, delegate to the Greeter. When asked to perform a task, delegate to the TaskExecutor.",
    sub_agents=[
        greeter,
        task_doer
    ]
)

# The ADK framework automatically establishes the parent-child relationships.
# These assertions will pass if checked after initialization.
assert greeter.parent_agent == coordinator
assert task_doer.parent_agent == coordinator

print("Agent hierarchy created successfully.")

这段代码示例展示了在Google ADK框架中如何使用LoopAgent来建立迭代工作流程。代码定义了两个智能体：ConditionChecker和ProcessingStep。ConditionChecker是一个自定义智能体，用于检查会话状态中的“状态”值。如果“状态”为“完成”，ConditionChecker将事件升级以停止循环。否则，它将事件传递以继续循环。ProcessingStep是一个使用“gemini-2.0-flash-exp”模型的LlmAgent。其指令是执行一个任务，如果它是最后一步，则将会话“状态”设置为“完成”。创建了一个名为StatusPoller的LoopAgent。StatusPoller配置了max_iterations=10。StatusPoller包含ProcessingStep和ConditionChecker的一个实例作为子智能体。LoopAgent将按顺序执行子智能体，最多10次迭代，如果ConditionChecker发现状态为“完成”，则停止。

import asyncio
from typing import AsyncGenerator
from google.adk.agents import LoopAgent, LlmAgent, BaseAgent
from google.adk.events import Event, EventActions
from google.adk.agents.invocation_context import InvocationContext

# Best Practice: Define custom agents as complete, self-describing classes.
class ConditionChecker(BaseAgent):
    """A custom agent that checks for a 'completed' status in the session state."""
    name: str = "ConditionChecker"
    description: str = "Checks if a process is complete and signals the loop to stop."

    async def _run_async_impl(
        self, context: InvocationContext
    ) -> AsyncGenerator[Event, None]:
        """Checks state and yields an event to either continue or stop the loop."""
        status = context.session.state.get("status", "pending")
        is_done = (status == "completed")

        if is_done:
            # Escalate to terminate the loop when the condition is met.
            yield Event(author=self.name, actions=EventActions(escalate=True))
        else:
            # Yield a simple event to continue the loop.
            yield Event(author=self.name, content="Condition not met, continuing loop.")

# Correction: The LlmAgent must have a model and clear instructions.
process_step = LlmAgent(
    name="ProcessingStep",
    model="gemini-2.0-flash-exp",
    instruction="You are a step in a longer process. Perform your task. If you are the final step, update session state by setting 'status' to 'completed'."
)

# The LoopAgent orchestrates the workflow.
poller = LoopAgent(
    name="StatusPoller",
    max_iterations=10,
    sub_agents=[
        process_step,
        ConditionChecker()  # Instantiating the well-defined custom agent.
    ]
)

# This poller will now execute 'process_step' and then 'ConditionChecker'
# repeatedly until the status is 'completed' or 10 iterations have passed.

这段代码示例阐述了Google ADK中的SequentialAgent模式，该模式旨在构建线性工作流程。此代码使用google.adk.agents库定义了一个顺序智能体管道，该管道包含两个智能体：step1和step2。step1被命名为"Step1_Fetch"，其输出将存储在会话状态中，键为"data"。step2被命名为"Step2_Process"，并被指示分析存储在session.state["data"]中的信息，并提供一个摘要。名为"MyPipeline"的顺序智能体负责协调这些子智能体的执行。当管道以初始输入运行时，step1将首先执行。step1的响应将被保存到会话状态中，键为"data"。随后，step2将执行，利用step1按照其指示放入状态中的信息。这种结构允许构建一个工作流程，其中一个智能体的输出成为下一个智能体的输入。这是创建多步AI或数据处理管道的常见模式。

from google.adk.agents import SequentialAgent, Agent

# This agent's output will be saved to session.state["data"]
step1 = Agent(name="Step1_Fetch", output_key="data")

# This agent will use the data from the previous step.
# We instruct it on how to find and use this data.
step2 = Agent(
    name="Step2_Process",
    instruction="Analyze the information found in state['data'] and provide a summary."
)

pipeline = SequentialAgent(
    name="MyPipeline",
    sub_agents=[step1, step2]
)

# When the pipeline is run with an initial input,
# Step1 will execute, its response will be stored in session.state["data"],
# and then Step2 will execute, using the information from the state as instructed.

以下代码示例展示了在Google ADK中实现的ParallelAgent模式，该模式有助于多个智能体任务的并发执行。数据收集器（data_gatherer）被设计为同时运行两个子智能体：weather_fetcher和news_fetcher。weather_fetcher智能体被指示获取指定位置的天气信息，并将结果存储在session.state["weather_data"]中。同样，news_fetcher智能体被指示获取指定主题的最顶级新闻故事，并将其存储在session.state["news_data"]中。每个子智能体都配置为使用"gemini-2.0-flash-exp"模型。ParallelAgent负责协调这些子智能体的执行，使它们能够并行工作。weather_fetcher和news_fetcher的结果将被收集并存储在会话状态中。最后，示例展示了如何在智能体执行完成后从final_state中访问收集到的天气和新闻数据。

from google.adk.agents import Agent, ParallelAgent

# It's better to define the fetching logic as tools for the agents
# For simplicity in this example, we'll embed the logic in the agent's instruction.
# In a real-world scenario, you would use tools.

# Define the individual agents that will run in parallel
weather_fetcher = Agent(
    name="weather_fetcher",
    model="gemini-2.0-flash-exp",
    instruction="Fetch the weather for the given location and return only the weather report.",
    output_key="weather_data"  # The result will be stored in session.state["weather_data"]
)

news_fetcher = Agent(
    name="news_fetcher",
    model="gemini-2.0-flash-exp",
    instruction="Fetch the top news story for the given topic and return only that story.",
    output_key="news_data"      # The result will be stored in session.state["news_data"]
)

# Create the ParallelAgent to orchestrate the sub-agents
data_gatherer = ParallelAgent(
    name="data_gatherer",
    sub_agents=[
        weather_fetcher,
        news_fetcher
    ]
)

提供的代码段展示了Google ADK中的“智能体作为工具”范式，使一个智能体能够以类似于函数调用的方式利用另一个智能体的功能。具体来说，该代码定义了一个使用Google的LlmAgent和AgentTool类的图像生成系统。它包含两个智能体：一个父智能体artist_agent和一个子智能体image_generator_agent。generate_image函数是一个简单的工具，它模拟图像创建，返回模拟的图像数据。image_generator_agent负责根据接收到的文本提示使用这个工具。artist_agent的作用是首先创造一个创意图像提示。然后，它通过AgentTool包装器调用image_generator_agent。AgentTool充当桥梁，允许一个智能体将另一个智能体作为工具使用。当artist_agent调用image_tool时，AgentTool使用artist的创意提示调用image_generator_agent。然后，image_generator_agent使用该提示调用generate_image函数。最后，生成的图像（或模拟数据）通过智能体返回。这种架构展示了一个分层智能体系统，其中高级智能体协调低级、专业的智能体执行任务。

from google.adk.agents import LlmAgent
from google.adk.tools import agent_tool
from google.genai import types

# 1. A simple function tool for the core capability.
# This follows the best practice of separating actions from reasoning.
def generate_image(prompt: str) -> dict:
    """Generates an image based on a textual prompt.

    Args:
        prompt: A detailed description of the image to generate.

    Returns:
        A dictionary with the status and the generated image bytes.
    """
    print(f"TOOL: Generating image for prompt: '{prompt}'")
    # In a real implementation, this would call an image generation API.
    # For this example, we return mock image data.
    mock_image_bytes = b"mock_image_data_for_a_cat_wearing_a_hat"
    return {
        "status": "success",
        # The tool returns the raw bytes, the agent will handle the Part creation.
        "image_bytes": mock_image_bytes,
        "mime_type": "image/png"
    }

# 2. Refactor the ImageGeneratorAgent into an LlmAgent.
# It now correctly uses the input passed to it.
image_generator_agent = LlmAgent(
    name="ImageGen",
    model="gemini-2.0-flash",
    description="Generates an image based on a detailed text prompt.",
    instruction=(
        "You are an image generation specialist. Your task is to take the user's request "
        "and use the `generate_image` tool to create the image. "
        "The user's entire request should be used as the 'prompt' argument for the tool. "
        "After the tool returns the image bytes, you MUST output the image."
    ),
    tools=[generate_image]
)

# 3. Wrap the corrected agent in an AgentTool.
# The description here is what the parent agent sees.
image_tool = agent_tool.AgentTool(
    agent=image_generator_agent,
    description="Use this tool to generate an image. The input should be a descriptive prompt of the desired image."
)

# 4. The parent agent remains unchanged. Its logic was correct.
artist_agent = LlmAgent(
    name="Artist",
    model="gemini-2.0-flash",
    instruction=(
        "You are a creative artist. First, invent a creative and descriptive prompt for an image. "
        "Then, use the `ImageGen` tool to generate the image using your prompt."
    ),
    tools=[image_tool]
)

概览

内容： 复杂问题往往超出了单一、整体化LLM智能体的能力。单个智能体可能缺乏多样化的专业技能或访问特定工具的能力，以应对多方面任务的各个部分。这种限制造成瓶颈，降低了系统的整体有效性和可扩展性。因此，处理复杂、多领域的目标变得低效，可能导致结果不完整或次优。

原因： 多智能体协作模式通过创建一个由多个协作智能体组成的系统，提供了一个标准化的解决方案。复杂问题被分解成更小、更易于管理的子问题。然后，每个子问题被分配给一个专门的智能体，该智能体拥有解决该问题所需的精确工具和能力。这些智能体通过定义的通信协议和交互模型（如顺序移交、并行工作流或分层委托）共同工作。这种基于智能体和分布式的处理方法产生协同效应，使得团队能够实现任何单个智能体都无法实现的成果。

经验法则： 当一个任务过于复杂，无法由单个智能体完成，且可以分解为需要特定技能或工具的独立子任务时，请使用此模式。它非常适合那些从多样化专业知识、并行处理或具有多个阶段的结构化工作流程中受益的问题，例如复杂的研究与分析、软件开发或创意内容生成。

视觉摘要

图3：多智能体设计模式

关键要点

多智能体协作是指多个智能体共同协作以实现一个共同目标。这种模式利用了专门的职责、分布式任务和智能体间的通信。协作可以采取以下形式：顺序移交、并行处理、辩论或分层结构。这种模式非常适合需要多种专业知识或多个不同阶段的复杂问题。

结论

本章探讨了多智能体协作模式，展示了在系统中编排多个专业智能体的优势。我们考察了各种协作模型，强调了该模式在解决跨不同领域复杂、多面问题中的基本作用。理解智能体协作自然引出对其与外部环境互动的探究。

参考文献

多智能体协作机制：LLM综述
多智能体系统 — 协作的力量

本章结构