BrowseCrawler — 整树离线爬取
概述
IDE / HMI 联机时常需要把 AddressSpace 子树拉到本地, 后续节点搜索无需再发 Browse RPC。BrowseCrawler 用 BFS 队列遍历, 双重保护 (max_depth + max_nodes) 防爆。
API
| 成员 | 类别 | 读写 | 说明 |
|---|---|---|---|
| BrowseCrawler(ua, max_depth, max_nodes, node_class_filter) | 构造 | — | 构造爬取器 |
| crawl_async(root_node_id, progress, cancel_event) | 方法 | 读 | 异步 BFS, 返回 CrawlResult |
| CrawlResult.all_nodes | 属性 | 读 | 扁平节点列表 |
| CrawlResult.children_by_parent | 属性 | 读 | 父子关系字典 |
| CrawlResult.elapsed | 属性 | 读 | 总耗时 (timedelta) |
代码示例
import asyncio
from opcua import DarraOpcUa, BrowseCrawler, NodeClass, WellKnownNodes
async def main():
with DarraOpcUa("opc.tcp://localhost:4840") as ua:
ua.connect()
# 1) 爬整棵 ObjectsFolder
crawler = BrowseCrawler(ua, max_depth=8, max_nodes=50_000)
def on_progress(count, current_nid):
print(f"\r已抓 {count} 个, 当前 {current_nid}", end="")
result = await crawler.crawl_async(
WellKnownNodes.OBJECTS_FOLDER, progress=on_progress)
print(f"\n共 {len(result.all_nodes)} 节点, 耗时 {result.elapsed.total_seconds():.1f}s")
# 2) 仅 Variable 节点
c2 = BrowseCrawler(ua, node_class_filter=NodeClass.VARIABLE)
r2 = await c2.crawl_async("ns=2;s=Boilers")
for n in r2.all_nodes:
print(f" {n.browse_name} ({n.node_id})")
# 3) 取消支持
cancel = asyncio.Event()
async def cancel_after(sec):
await asyncio.sleep(sec)
cancel.set()
asyncio.create_task(cancel_after(5))
r3 = await crawler.crawl_async("i=85", cancel_event=cancel)
print(f"取消时已抓 {len(r3.all_nodes)} 个")
asyncio.run(main())
性能
| 节点规模 | 耗时 (千兆 LAN) |
|---|---|
| 1,000 | ~0.5s |
| 10,000 | ~5s |
| 50,000 | ~25s |
最佳实践
- 限定 root, 不要从
i=84(Root) 开始 - 用
node_class_filter节省内存 - 启动时一次性爬, 后续从本地查
- 给用户"取消"按钮
跨语言对照
| C# | Python | Java | C++ | Rust | C |
|---|---|---|---|---|---|
| BrowseCrawler | BrowseCrawler | BrowseCrawler | BrowseCrawler | BrowseCrawler | DarraUa_BrowseCrawler_* |
| CrawlAsync | crawl_async | crawlAsync | CrawlAsync | crawl_async | DarraUa_BrowseCrawler_Crawl |