# TraceStudio v2.0 高级功能文档 ## 🎯 新增功能概览 本文档描述 TraceStudio v2.0 新增的高级功能: 1. **特殊节点类型** - InputNode、OutputNode、FunctionNode 2. **连线分类** - 粗线(数组)vs 细线(标量) 3. **维度转换** - 升维、降维、广播操作 4. **函数节点嵌套** - 可复用的子工作流 5. **数组操作节点** - 专为数组设计的节点集合 --- ## 📦 特殊节点类型 ### 1. InputNode - 工作流入口 **用途**:作为子工作流的输入入口 ```python @register_node class InputNodeImpl(InputNode): """输入节点""" @output_port("output", "Any", description="输出接收到的所有输入") async def process(self, inputs: Dict[str, Any], context: Optional[Dict] = None): return { "outputs": {"output": inputs}, "context": context or {} } ``` **特性**: - ✅ 没有输入端口(仅有输出) - ✅ 将外部输入直接传递给工作流 - ✅ 必须在所有函数工作流中包含 - ✅ 通常作为工作流的第一个节点 **在工作流中的位置**: ``` 外部世界 ↓ [InputNode] ← 从全局上下文接收数据 ↓ [业务逻辑节点] ↓ [OutputNode] ↓ 返回结果 ``` ### 2. OutputNode - 工作流出口 **用途**:作为子工作流的输出出口 ```python @register_node class OutputNodeImpl(OutputNode): """输出节点""" @input_port("input", "Any", description="要输出的数据") async def process(self, inputs: Dict[str, Any], context: Optional[Dict] = None): return { "outputs": inputs, "context": context or {} } ``` **特性**: - ✅ 没有输出端口(仅有输入) - ✅ 收集工作流内部结果 - ✅ 必须在所有函数工作流中包含 - ✅ 通常作为工作流的最后一个节点 ### 3. FunctionNode - 可复用函数 **用途**:将整个子工作流包装为单个节点 ```python { "id": "multiply_and_sum", "type": "FunctionNode", "display_name": "乘积求和", "sub_workflow": { "nodes": [ {"id": "input", "type": "InputNodeImpl"}, {"id": "map", "type": "ArrayMapNode", "params": {"multiplier": 2}}, {"id": "sum", "type": "ArrayReduceNode"}, {"id": "output", "type": "OutputNodeImpl"} ], "edges": [...] } } ``` **特性**: - ✅ 将复杂子工作流封装为黑盒 - ✅ 支持嵌套(函数节点内可包含其他函数节点) - ✅ 对外表现为普通节点 - ✅ 可复用(多次使用同一函数) **嵌套示例**: ``` [Main FunctionNode] └─ [Sub FunctionNode 1] └─ [Node A] └─ [Node B] └─ [Sub FunctionNode 2] └─ [Node C] ``` --- ## 🔌 连线分类 ### EdgeType - 连线类型 ```python class EdgeType(Enum): SCALAR = "scalar" # 细线:单个元素 ARRAY = "array" # 粗线:数组 ``` **在前端表示**: - 粗线 🟦 = 数组类型 - 细线 ▬ = 标量类型 **示例连线定义**: ```python edges = [ { "source": "node1", "sourcePort": "output", "target": "node2", "targetPort": "input", "edgeType": "array" # 粗线:表示数组 }, { "source": "node2", "sourcePort": "result", "target": "node3", "targetPort": "input", "edgeType": "scalar" # 细线:表示标量 } ] ``` --- ## ⬆️ ⬇️ 维度转换 ### DimensionMode - 转换模式 ```python class DimensionMode(Enum): NONE = "none" # 无转换 EXPAND = "expand" # 升维:数组→单个元素(遍历) COLLAPSE = "collapse" # 降维:单个元素→数组(打包) BROADCAST = "broadcast" # 广播:单个值→数组 ``` ### 场景 1:升维(EXPAND) **场景描述**:数组连接到单元素输入 ``` 输入数组:[1, 2, 3] ↓ (EXPAND) 遍历执行: - AddNode(1) → 10 - AddNode(2) → 11 - AddNode(3) → 12 ↓ (打包为数组) 输出数组:[10, 11, 12] ``` **实现**: ```python # 连线定义 { "source": "array_source", "sourcePort": "values", "target": "add_node", "targetPort": "a", "dimensionMode": "expand" # 升维 } # AddNode会被执行3次,每次处理数组的一个元素 # 最后输出被打包为数组 ``` ### 场景 2:降维(COLLAPSE) **场景描述**:多条单元素线汇聚到数组输入 ``` 线1: value_a ──┐ ├→ ArrayConcatNode(arrays=[]) → [a, b] 线2: value_b ──┘ ``` **实现**: ```python # 多条线自动打包 edges = [ {"source": "node1", "sourcePort": "output", "target": "concat", "targetPort": "arrays"}, {"source": "node2", "sourcePort": "output", "target": "concat", "targetPort": "arrays"} ] # concat 节点的 arrays 输入将接收 [value_a, value_b] ``` ### 场景 3:广播(BROADCAST) **场景描述**:单个值扩展到数组 ``` 输入值:42 ↓ (BROADCAST) 输出数组:[42, 42, 42] # 广播3次 ``` **实现**: ```python # 通过 BroadcastNode { "id": "broadcast", "type": "BroadcastNode", "params": {"count": 3} # 广播3次 } # 输入 42 → 输出 [42, 42, 42] ``` --- ## 📊 数组操作节点集合 ### 包含的节点 | 节点 | 输入 | 输出 | 描述 | |------|------|------|------| | `ArrayMapNode` | 数组 | 数组 | 映射操作(元素级变换) | | `ArrayFilterNode` | 数组 | 数组 | 过滤操作(条件筛选) | | `ArrayReduceNode` | 数组 | 标量 | 规约操作(sum/product/max/min) | | `ArrayConcatNode` | 多数组 | 数组 | 连接操作(展平多个数组) | | `ArrayZipNode` | 数组×2 | 数组 | 拉链操作(按位置合并) | | `BroadcastNode` | 标量 | 数组 | 广播操作(扩展到数组) | ### 使用示例 #### 示例 1:数组映射 ```python # 功能:将数组中的每个数乘以2 nodes = [ {"id": "input", "type": "InputNodeImpl"}, {"id": "map", "type": "ArrayMapNode", "params": {"multiplier": 2}}, {"id": "output", "type": "OutputNodeImpl"} ] edges = [ {"source": "input", "sourcePort": "output", "target": "map", "targetPort": "values", "edgeType": "array"}, {"source": "map", "sourcePort": "mapped", "target": "output", "targetPort": "input", "edgeType": "array"} ] # 输入:{values: [1, 2, 3]} # 输出:[2, 4, 6] ``` #### 示例 2:数组规约 ```python # 功能:计算数组的和 nodes = [ {"id": "input", "type": "InputNodeImpl"}, {"id": "reduce", "type": "ArrayReduceNode", "params": {"operation": "sum"}}, {"id": "output", "type": "OutputNodeImpl"} ] # 输入:{values: [1, 2, 3, 4, 5]} # 输出:15 ``` #### 示例 3:嵌套数组操作 ```python # 功能:×2 后求和 nodes = [ {"id": "input", "type": "InputNodeImpl"}, {"id": "map", "type": "ArrayMapNode", "params": {"multiplier": 2}}, {"id": "reduce", "type": "ArrayReduceNode", "params": {"operation": "sum"}}, {"id": "output", "type": "OutputNodeImpl"} ] edges = [ {"source": "input", "sourcePort": "output", "target": "map", "targetPort": "values"}, {"source": "map", "sourcePort": "mapped", "target": "reduce", "targetPort": "values"}, {"source": "reduce", "sourcePort": "result", "target": "output", "targetPort": "input"} ] # 输入:[1, 2, 3] # 中间:[2, 4, 6] # 输出:12 ``` --- ## 🔄 完整工作流示例 ### 示例:处理学生成绩 ```python # 场景: # 1. 输入:学生ID列表 [1, 2, 3, 4, 5] # 2. 根据ID获取成绩 → [85, 92, 78, 88, 95] # 3. 过滤及格(≥60)→ [85, 92, 78, 88, 95] # 4. 计算平均分 → 87.6 # 5. 输出最终结果 main_workflow = { "nodes": [ { "id": "input", "type": "InputNodeImpl" }, { "id": "fetch_grades", "type": "ArrayMapNode", "params": {"multiplier": 1} # 实际上会调用数据库 }, { "id": "filter_pass", "type": "ArrayFilterNode", "params": {"threshold": 59} }, { "id": "avg", "type": "ArrayReduceNode", "params": {"operation": "sum"} # 再除以数组长度 }, { "id": "output", "type": "OutputNodeImpl" } ], "edges": [ { "source": "input", "sourcePort": "output", "target": "fetch_grades", "targetPort": "values", "edgeType": "array" }, { "source": "fetch_grades", "sourcePort": "mapped", "target": "filter_pass", "targetPort": "values", "edgeType": "array" }, { "source": "filter_pass", "sourcePort": "filtered", "target": "avg", "targetPort": "values", "edgeType": "array" }, { "source": "avg", "sourcePort": "result", "target": "output", "targetPort": "input", "edgeType": "scalar" } ] } # 执行 executor = AdvancedWorkflowExecutor(user_id="teacher1") success, report = await executor.execute( nodes=main_workflow["nodes"], edges=main_workflow["edges"], global_context={"student_ids": [1, 2, 3, 4, 5]} ) # 结果: # - 成绩:[85, 92, 78, 88, 95] # - 平均分:87.6 ``` --- ## 🎓 设计原理 ### 为什么需要特殊节点? 1. **明确定义接口** - InputNode/OutputNode 定义工作流的入/出 2. **支持嵌套** - FunctionNode 允许创建可复用的工作流 3. **模块化** - 大型工作流可分解为小函数 4. **黑盒封装** - 用户无需了解子工作流细节 ### 为什么需要连线分类? 1. **类型安全** - 在设计时发现不兼容的连接 2. **维度转换提示** - 自动提示需要升维/降维 3. **前端可视化** - 粗线/细线直观显示数据流类型 4. **性能优化** - 根据维度模式选择最优执行策略 ### 为什么需要维度转换? 1. **灵活性** - 支持多种使用场景 2. **代码复用** - 同一节点可处理标量和数组 3. **性能** - 避免不必要的循环展开 4. **表达力** - 原生支持并行处理 --- ## 📁 文件结构 ``` server/app/core/ ├── advanced_nodes.py # 特殊节点定义 ├── advanced_workflow_graph.py # 扩展工作流图 └── advanced_workflow_executor.py # 扩展执行引擎 server/app/nodes/ └── advanced_example_nodes.py # 10个示例节点 server/tests/ └── test_advanced_features.py # 完整测试用例 ``` --- ## 🧪 测试覆盖 | 测试 | 描述 | 状态 | |------|------|------| | test_special_nodes | 特殊节点注册 | ✅ | | test_dimension_inference | 维度转换推断 | ✅ | | test_simple_workflow | 简单工作流执行 | ✅ | | test_array_operations | 数组操作 | ✅ | | test_workflow_graph | 工作流图操作 | ✅ | | test_nested_function_workflow | 嵌套函数节点 | ✅ | --- ## 🚀 使用指南 ### 创建函数节点工作流 ```python from server.app.core.advanced_nodes import WorkflowPackager # 第1步:定义子工作流 sub_nodes = [...] sub_edges = [...] # 第2步:验证工作流 valid, error = WorkflowPackager.validate_function_workflow(sub_nodes, sub_edges) if not valid: print(f"验证失败: {error}") # 第3步:打包为函数节点 function_def = WorkflowPackager.package_as_function( node_id="my_function", nodes=sub_nodes, edges=sub_edges, display_name="我的函数", description="这是一个可复用的工作流函数" ) # 第4步:在其他工作流中使用 main_nodes = [..., function_def, ...] main_edges = [...] # 第5步:执行 executor = AdvancedWorkflowExecutor() success, report = await executor.execute(main_nodes, main_edges) ``` ### 处理数组数据 ```python # 创建包含数组操作的工作流 nodes = [ {"id": "input", "type": "InputNodeImpl"}, {"id": "map", "type": "ArrayMapNode", "params": {"multiplier": 2}}, {"id": "filter", "type": "ArrayFilterNode", "params": {"threshold": 5}}, {"id": "reduce", "type": "ArrayReduceNode", "params": {"operation": "sum"}}, {"id": "output", "type": "OutputNodeImpl"} ] # 构建连接关系 edges = [ {"source": "input", "sourcePort": "output", "target": "map", "targetPort": "values", "edgeType": "array"}, {"source": "map", "sourcePort": "mapped", "target": "filter", "targetPort": "values", "edgeType": "array"}, {"source": "filter", "sourcePort": "filtered", "target": "reduce", "targetPort": "values", "edgeType": "array"}, {"source": "reduce", "sourcePort": "result", "target": "output", "targetPort": "input", "edgeType": "scalar"} ] # 执行 result = await executor.execute(nodes, edges, {"values": [1, 2, 3, 4, 5, 6, 7]}) # 结果: # - ×2:[2, 4, 6, 8, 10, 12, 14] # - >5:[6, 8, 10, 12, 14] # - 求和:50 ``` --- ## 📈 性能考虑 | 操作 | 复杂度 | 耗时 | |------|--------|------| | 节点注册 | O(1) | <1ms | | 图验证 | O(N+E) | <10ms(100节点) | | 拓扑排序 | O(N+E) | <10ms(100节点) | | 升维执行(单节点) | O(N) | N×节点耗时 | | 缓存查询 | O(1) | <1ms | --- ## ⚠️ 注意事项 1. **函数节点必须包含 InputNode 和 OutputNode** - 否则 WorkflowPackager.validate_function_workflow() 会报错 2. **维度转换不是自动的** - 需要显式在 edgeType 和 dimensionMode 中指定 - 前端应提供可视化提示 3. **升维操作会重复执行节点** - 例如数组有100个元素,节点会执行100次 - 注意性能影响 4. **嵌套深度有限制** - 理论上无限制,但建议不超过5层 - 过深会影响调试和性能 --- ## 🔮 未来扩展 1. **并行执行** - 支持无依赖节点的并行处理 2. **条件分支** - 支持 if-then-else 逻辑 3. **循环结构** - 支持 for-loop、while-loop 4. **错误处理** - 支持 try-catch 机制 5. **动态工作流** - 根据运行时条件动态构建工作流 --- **下一步**:[API 集成指南](./API_INTEGRATION.md)