TraceStudio-dev/docs/server1.2/ADVANCED_FEATURES.md
2026-01-09 21:37:02 +08:00

563 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# TraceStudio v2.0 高级功能文档
## 🎯 新增功能概览
本文档描述 TraceStudio v2.0 新增的高级功能:
1. **特殊节点类型** - InputNode、OutputNode、FunctionNode
2. **连线分类** - 粗线数组vs 细线(标量)
3. **维度转换** - 升维、降维、广播操作
4. **函数节点嵌套** - 可复用的子工作流
5. **数组操作节点** - 专为数组设计的节点集合
---
## 📦 特殊节点类型
### 1. InputNode - 工作流入口
**用途**:作为子工作流的输入入口
```python
@register_node
class InputNodeImpl(InputNode):
"""输入节点"""
@output_port("output", "Any", description="输出接收到的所有输入")
async def process(self, inputs: Dict[str, Any], context: Optional[Dict] = None):
return {
"outputs": {"output": inputs},
"context": context or {}
}
```
**特性**
- ✅ 没有输入端口(仅有输出)
- ✅ 将外部输入直接传递给工作流
- ✅ 必须在所有函数工作流中包含
- ✅ 通常作为工作流的第一个节点
**在工作流中的位置**
```
外部世界
[InputNode] ← 从全局上下文接收数据
[业务逻辑节点]
[OutputNode]
返回结果
```
### 2. OutputNode - 工作流出口
**用途**:作为子工作流的输出出口
```python
@register_node
class OutputNodeImpl(OutputNode):
"""输出节点"""
@input_port("input", "Any", description="要输出的数据")
async def process(self, inputs: Dict[str, Any], context: Optional[Dict] = None):
return {
"outputs": inputs,
"context": context or {}
}
```
**特性**
- ✅ 没有输出端口(仅有输入)
- ✅ 收集工作流内部结果
- ✅ 必须在所有函数工作流中包含
- ✅ 通常作为工作流的最后一个节点
### 3. FunctionNode - 可复用函数
**用途**:将整个子工作流包装为单个节点
```python
{
"id": "multiply_and_sum",
"type": "FunctionNode",
"display_name": "乘积求和",
"sub_workflow": {
"nodes": [
{"id": "input", "type": "InputNodeImpl"},
{"id": "map", "type": "ArrayMapNode", "params": {"multiplier": 2}},
{"id": "sum", "type": "ArrayReduceNode"},
{"id": "output", "type": "OutputNodeImpl"}
],
"edges": [...]
}
}
```
**特性**
- ✅ 将复杂子工作流封装为黑盒
- ✅ 支持嵌套(函数节点内可包含其他函数节点)
- ✅ 对外表现为普通节点
- ✅ 可复用(多次使用同一函数)
**嵌套示例**
```
[Main FunctionNode]
└─ [Sub FunctionNode 1]
└─ [Node A]
└─ [Node B]
└─ [Sub FunctionNode 2]
└─ [Node C]
```
---
## 🔌 连线分类
### EdgeType - 连线类型
```python
class EdgeType(Enum):
SCALAR = "scalar" # 细线:单个元素
ARRAY = "array" # 粗线:数组
```
**在前端表示**
- 粗线 🟦 = 数组类型
- 细线 ▬ = 标量类型
**示例连线定义**
```python
edges = [
{
"source": "node1",
"sourcePort": "output",
"target": "node2",
"targetPort": "input",
"edgeType": "array" # 粗线:表示数组
},
{
"source": "node2",
"sourcePort": "result",
"target": "node3",
"targetPort": "input",
"edgeType": "scalar" # 细线:表示标量
}
]
```
---
## ⬆️ ⬇️ 维度转换
### DimensionMode - 转换模式
```python
class DimensionMode(Enum):
NONE = "none" # 无转换
EXPAND = "expand" # 升维:数组→单个元素(遍历)
COLLAPSE = "collapse" # 降维:单个元素→数组(打包)
BROADCAST = "broadcast" # 广播:单个值→数组
```
### 场景 1升维EXPAND
**场景描述**:数组连接到单元素输入
```
输入数组:[1, 2, 3]
↓ (EXPAND)
遍历执行:
- AddNode(1) → 10
- AddNode(2) → 11
- AddNode(3) → 12
↓ (打包为数组)
输出数组:[10, 11, 12]
```
**实现**
```python
# 连线定义
{
"source": "array_source",
"sourcePort": "values",
"target": "add_node",
"targetPort": "a",
"dimensionMode": "expand" # 升维
}
# AddNode会被执行3次每次处理数组的一个元素
# 最后输出被打包为数组
```
### 场景 2降维COLLAPSE
**场景描述**:多条单元素线汇聚到数组输入
```
线1: value_a ──┐
├→ ArrayConcatNode(arrays=[]) → [a, b]
线2: value_b ──┘
```
**实现**
```python
# 多条线自动打包
edges = [
{"source": "node1", "sourcePort": "output", "target": "concat", "targetPort": "arrays"},
{"source": "node2", "sourcePort": "output", "target": "concat", "targetPort": "arrays"}
]
# concat 节点的 arrays 输入将接收 [value_a, value_b]
```
### 场景 3广播BROADCAST
**场景描述**:单个值扩展到数组
```
输入值42
↓ (BROADCAST)
输出数组:[42, 42, 42] # 广播3次
```
**实现**
```python
# 通过 BroadcastNode
{
"id": "broadcast",
"type": "BroadcastNode",
"params": {"count": 3} # 广播3次
}
# 输入 42 → 输出 [42, 42, 42]
```
---
## 📊 数组操作节点集合
### 包含的节点
| 节点 | 输入 | 输出 | 描述 |
|------|------|------|------|
| `ArrayMapNode` | 数组 | 数组 | 映射操作(元素级变换) |
| `ArrayFilterNode` | 数组 | 数组 | 过滤操作(条件筛选) |
| `ArrayReduceNode` | 数组 | 标量 | 规约操作sum/product/max/min |
| `ArrayConcatNode` | 多数组 | 数组 | 连接操作(展平多个数组) |
| `ArrayZipNode` | 数组×2 | 数组 | 拉链操作(按位置合并) |
| `BroadcastNode` | 标量 | 数组 | 广播操作(扩展到数组) |
### 使用示例
#### 示例 1数组映射
```python
# 功能将数组中的每个数乘以2
nodes = [
{"id": "input", "type": "InputNodeImpl"},
{"id": "map", "type": "ArrayMapNode", "params": {"multiplier": 2}},
{"id": "output", "type": "OutputNodeImpl"}
]
edges = [
{"source": "input", "sourcePort": "output", "target": "map", "targetPort": "values", "edgeType": "array"},
{"source": "map", "sourcePort": "mapped", "target": "output", "targetPort": "input", "edgeType": "array"}
]
# 输入:{values: [1, 2, 3]}
# 输出:[2, 4, 6]
```
#### 示例 2数组规约
```python
# 功能:计算数组的和
nodes = [
{"id": "input", "type": "InputNodeImpl"},
{"id": "reduce", "type": "ArrayReduceNode", "params": {"operation": "sum"}},
{"id": "output", "type": "OutputNodeImpl"}
]
# 输入:{values: [1, 2, 3, 4, 5]}
# 输出15
```
#### 示例 3嵌套数组操作
```python
# 功能×2 后求和
nodes = [
{"id": "input", "type": "InputNodeImpl"},
{"id": "map", "type": "ArrayMapNode", "params": {"multiplier": 2}},
{"id": "reduce", "type": "ArrayReduceNode", "params": {"operation": "sum"}},
{"id": "output", "type": "OutputNodeImpl"}
]
edges = [
{"source": "input", "sourcePort": "output", "target": "map", "targetPort": "values"},
{"source": "map", "sourcePort": "mapped", "target": "reduce", "targetPort": "values"},
{"source": "reduce", "sourcePort": "result", "target": "output", "targetPort": "input"}
]
# 输入:[1, 2, 3]
# 中间:[2, 4, 6]
# 输出12
```
---
## 🔄 完整工作流示例
### 示例:处理学生成绩
```python
# 场景:
# 1. 输入学生ID列表 [1, 2, 3, 4, 5]
# 2. 根据ID获取成绩 → [85, 92, 78, 88, 95]
# 3. 过滤及格≥60→ [85, 92, 78, 88, 95]
# 4. 计算平均分 → 87.6
# 5. 输出最终结果
main_workflow = {
"nodes": [
{
"id": "input",
"type": "InputNodeImpl"
},
{
"id": "fetch_grades",
"type": "ArrayMapNode",
"params": {"multiplier": 1} # 实际上会调用数据库
},
{
"id": "filter_pass",
"type": "ArrayFilterNode",
"params": {"threshold": 59}
},
{
"id": "avg",
"type": "ArrayReduceNode",
"params": {"operation": "sum"} # 再除以数组长度
},
{
"id": "output",
"type": "OutputNodeImpl"
}
],
"edges": [
{
"source": "input",
"sourcePort": "output",
"target": "fetch_grades",
"targetPort": "values",
"edgeType": "array"
},
{
"source": "fetch_grades",
"sourcePort": "mapped",
"target": "filter_pass",
"targetPort": "values",
"edgeType": "array"
},
{
"source": "filter_pass",
"sourcePort": "filtered",
"target": "avg",
"targetPort": "values",
"edgeType": "array"
},
{
"source": "avg",
"sourcePort": "result",
"target": "output",
"targetPort": "input",
"edgeType": "scalar"
}
]
}
# 执行
executor = AdvancedWorkflowExecutor(user_id="teacher1")
success, report = await executor.execute(
nodes=main_workflow["nodes"],
edges=main_workflow["edges"],
global_context={"student_ids": [1, 2, 3, 4, 5]}
)
# 结果:
# - 成绩:[85, 92, 78, 88, 95]
# - 平均分87.6
```
---
## 🎓 设计原理
### 为什么需要特殊节点?
1. **明确定义接口** - InputNode/OutputNode 定义工作流的入/出
2. **支持嵌套** - FunctionNode 允许创建可复用的工作流
3. **模块化** - 大型工作流可分解为小函数
4. **黑盒封装** - 用户无需了解子工作流细节
### 为什么需要连线分类?
1. **类型安全** - 在设计时发现不兼容的连接
2. **维度转换提示** - 自动提示需要升维/降维
3. **前端可视化** - 粗线/细线直观显示数据流类型
4. **性能优化** - 根据维度模式选择最优执行策略
### 为什么需要维度转换?
1. **灵活性** - 支持多种使用场景
2. **代码复用** - 同一节点可处理标量和数组
3. **性能** - 避免不必要的循环展开
4. **表达力** - 原生支持并行处理
---
## 📁 文件结构
```
server/app/core/
├── advanced_nodes.py # 特殊节点定义
├── advanced_workflow_graph.py # 扩展工作流图
└── advanced_workflow_executor.py # 扩展执行引擎
server/app/nodes/
└── advanced_example_nodes.py # 10个示例节点
server/tests/
└── test_advanced_features.py # 完整测试用例
```
---
## 🧪 测试覆盖
| 测试 | 描述 | 状态 |
|------|------|------|
| test_special_nodes | 特殊节点注册 | ✅ |
| test_dimension_inference | 维度转换推断 | ✅ |
| test_simple_workflow | 简单工作流执行 | ✅ |
| test_array_operations | 数组操作 | ✅ |
| test_workflow_graph | 工作流图操作 | ✅ |
| test_nested_function_workflow | 嵌套函数节点 | ✅ |
---
## 🚀 使用指南
### 创建函数节点工作流
```python
from server.app.core.advanced_nodes import WorkflowPackager
# 第1步定义子工作流
sub_nodes = [...]
sub_edges = [...]
# 第2步验证工作流
valid, error = WorkflowPackager.validate_function_workflow(sub_nodes, sub_edges)
if not valid:
print(f"验证失败: {error}")
# 第3步打包为函数节点
function_def = WorkflowPackager.package_as_function(
node_id="my_function",
nodes=sub_nodes,
edges=sub_edges,
display_name="我的函数",
description="这是一个可复用的工作流函数"
)
# 第4步在其他工作流中使用
main_nodes = [..., function_def, ...]
main_edges = [...]
# 第5步执行
executor = AdvancedWorkflowExecutor()
success, report = await executor.execute(main_nodes, main_edges)
```
### 处理数组数据
```python
# 创建包含数组操作的工作流
nodes = [
{"id": "input", "type": "InputNodeImpl"},
{"id": "map", "type": "ArrayMapNode", "params": {"multiplier": 2}},
{"id": "filter", "type": "ArrayFilterNode", "params": {"threshold": 5}},
{"id": "reduce", "type": "ArrayReduceNode", "params": {"operation": "sum"}},
{"id": "output", "type": "OutputNodeImpl"}
]
# 构建连接关系
edges = [
{"source": "input", "sourcePort": "output", "target": "map", "targetPort": "values", "edgeType": "array"},
{"source": "map", "sourcePort": "mapped", "target": "filter", "targetPort": "values", "edgeType": "array"},
{"source": "filter", "sourcePort": "filtered", "target": "reduce", "targetPort": "values", "edgeType": "array"},
{"source": "reduce", "sourcePort": "result", "target": "output", "targetPort": "input", "edgeType": "scalar"}
]
# 执行
result = await executor.execute(nodes, edges, {"values": [1, 2, 3, 4, 5, 6, 7]})
# 结果:
# - ×2[2, 4, 6, 8, 10, 12, 14]
# - >5[6, 8, 10, 12, 14]
# - 求和50
```
---
## 📈 性能考虑
| 操作 | 复杂度 | 耗时 |
|------|--------|------|
| 节点注册 | O(1) | <1ms |
| 图验证 | O(N+E) | <10ms100节点 |
| 拓扑排序 | O(N+E) | <10ms100节点 |
| 升维执行单节点 | O(N) | N×节点耗时 |
| 缓存查询 | O(1) | <1ms |
---
## ⚠️ 注意事项
1. **函数节点必须包含 InputNode 和 OutputNode**
- 否则 WorkflowPackager.validate_function_workflow() 会报错
2. **维度转换不是自动的**
- 需要显式在 edgeType dimensionMode 中指定
- 前端应提供可视化提示
3. **升维操作会重复执行节点**
- 例如数组有100个元素节点会执行100次
- 注意性能影响
4. **嵌套深度有限制**
- 理论上无限制但建议不超过5层
- 过深会影响调试和性能
---
## 🔮 未来扩展
1. **并行执行**
- 支持无依赖节点的并行处理
2. **条件分支**
- 支持 if-then-else 逻辑
3. **循环结构**
- 支持 for-loopwhile-loop
4. **错误处理**
- 支持 try-catch 机制
5. **动态工作流**
- 根据运行时条件动态构建工作流
---
**下一步**[API 集成指南](./API_INTEGRATION.md)