人机协同拆解系统 · 案例详情

Demo

项目演示视频

Demo video

毕业设计小组（Group 21）项目演示视频（YouTube）。

Capstone team (Group 21) demo on YouTube.

Context

项目背景与问题

Background & problem

新能源汽车普及提速，全球最早一批动力电池已濒临报废，退役电池的安全高效处置成为亟待解决的行业问题。现有的两种主流方案均存在明显局限：

As EV adoption accelerates, early battery packs are reaching end-of-life. Safe, efficient teardown of retired packs is an urgent industry challenge. The two dominant approaches both have clear limits:

纯人工拆解的缺陷

Manual-only teardown

安全风险高 · 效率瓶颈大

High risk · hard to scale

电池结构复杂，拆解过程存在高压触电、热失控等危险；操作人员高度依赖个人经验，技能门槛高，难以规模化复制

Complex packs imply electrical and thermal hazards; throughput depends heavily on individual skill and is hard to replicate at scale.

纯机械臂方案的缺陷

Robot-only approach

柔性不足 · 适应性差

Low flexibility

不同厂商、不同型号的电池结构差异大，纯机械臂方案难以适应非标准化场景，面对异常情况缺乏灵活响应能力

Non-standard geometries across OEMs limit pure automation; robots struggle with exceptions without human judgment.

知识缺口

Knowledge gap

无拆解说明书可依

No teardown manuals

市面上厂商仅提供电池安装说明书，拆解流程与步骤缺乏标准化文档，现有知识库无法直接支撑安全拆解作业

OEMs typically ship installation guides only; standardized teardown procedures are missing, so knowledge bases cannot directly support safe work.

人机协同的价值：将大模型的推理规划能力与人工的灵活判断相结合，以 AR 技术弥补信息不对称——既克服纯人工的安全与经验瓶颈，也弥补纯机械臂的柔性不足，在非标准化拆解场景中实现安全性与效率的双重提升。

Human–machine synergy: Pair LLM planning with human judgment and use AR to close information gaps—reducing manual risk and experience bottlenecks while adding flexibility robots lack in non-standard teardown.

Architecture

系统架构概览

System architecture

感知层

Perception

YOLOv11 视觉识别

YOLOv11 vision

实时识别电池部件与潜在危险源，为 AR 叠加层提供位置与类别数据输入

Detects pack components and hazards in real time; feeds positions and classes to the AR overlay.

规划层

Planning

LLM 任务拆解

LLM task decomposition

大语言模型解析拆解目标，生成结构化的分步任务序列，经语义验证后下发执行

The LLM parses goals into structured step lists; semantic verification gates execution.

展示层

Presentation

AR 实时引导

AR guidance

将拆解步骤与危险区域以 AR 形式叠加在真实设备上，实现沉浸式操作引导

Overlays steps and hazard zones on the physical device for immersive guidance.

Ownership

我负责的模块

My contributions

模块一 · 决策判断Module 1 · Selection

多模态大模型选型与评估

Multimodal LLM selection & evaluation

市场上厂商仅提供电池安装说明书，拆解流程缺乏标准化文档——这正是引入大模型的核心原因：借助其推理能力，从安装逻辑逆向推导拆解步骤，并预判潜在危险情境。基于此，系统对比评估多款多模态大模型，最终选定 Gemini 2.0 Flash 作为任务规划核心模型。

OEMs ship installation manuals only—no standardized teardown docs. That is why we use an LLM: to infer disassembly steps from install logic and anticipate risky situations. We benchmarked multimodal models and chose Gemini 2.0 Flash as the planning backbone.

模型	任务准确性	任务覆盖程度	危险情境预判	工业术语理解	多模态能力	成本
GPT-4o	✓ 高	✓ 完整	✓ 强	✓ 强	✓ 强	高
Qwen 系列	△ 中等	△ 部分	△ 一般	✓ 强（中文）	△ 部分	低
Gemini 1.5 系列	✓ 高	△ 较完整	△ 一般	✓ 强	✓ 强	△ 中等
Gemini 2.0 Flash ✓	✓ 高	✓ 完整	✓ 强	✓ 强	✓ 强	低

Model	Accuracy	Coverage	Risk foresight	Industrial terms	Multimodal	Cost
GPT-4o	✓ High	✓ Full	✓ Strong	✓ Strong	✓ Strong	High
Qwen	△ Medium	△ Partial	△ Fair	✓ Strong (ZH)	△ Partial	Low
Gemini 1.5	✓ High	△ Mostly full	△ Fair	✓ Strong	✓ Strong	△ Medium
Gemini 2.0 Flash ✓	✓ High	✓ Full	✓ Strong	✓ Strong	✓ Strong	Low

评估重点说明：本次评估优先考察危险情境预判能力（模型是否能在无明确说明的情况下识别潜在风险操作）与英文工业术语理解能力（拆解文档多为英文原版），延迟并非关键决策维度。Gemini 2.0 Flash 在兼顾高准确性与低成本的同时，对危险预判的拓展思考能力表现最为突出。

Evaluation focus: We prioritized hazard anticipation (flagging risky steps without explicit warnings) and English industrial terminology (many source docs are English). Latency was secondary. Gemini 2.0 Flash balanced accuracy, cost, and the strongest extended reasoning on risk.

通过 Prompt 工程使模型输出符合预期的结构化拆解步骤，并明确标注各步骤的风险等级
设计涵盖多个电池型号的评估任务集，验证模型在非标准化场景下的泛化能力
使用 Cursor 搭建 Web 端交互界面，集成模型 API，支持人机任务拆解流程的展示与交互

Prompt engineering for structured steps with per-step risk levels
Evaluation tasks across multiple pack types to test generalization in non-standard settings
Built a Cursor-based web UI integrating the model API for interactive human–machine decomposition demos

模块二 · 核心创新Module 2 · Innovation

双阶段语义验证机制设计

Two-stage semantic verification

针对 BERT 和 n-gram 在检测任务重叠与冗余时精度不足的问题，设计并实现了基于 DiffCSE + Cross-Encoder 的双阶段语义验证流程。

BERT and n-gram similarity were too weak for overlap and redundancy in industrial steps; we implemented a DiffCSE + Cross-Encoder two-stage pipeline.

问题根源：LLM 生成的拆解步骤存在语义重叠（同一操作被拆分为多步）与逻辑冗余（无关步骤混入）的问题。传统 BERT 相似度和 n-gram 匹配在工业术语密集的场景下精度不足，导致任务分配不准确。

Root cause: LLM outputs can duplicate one action across steps or add irrelevant lines. Dense industrial vocabulary breaks vanilla BERT / n-gram matching and hurts task assignment.

第一阶段 · 粗排

Stage 1 · Coarse

DiffCSE

基于对比学习的句子嵌入模型，对拆解步骤进行语义向量化，快速召回候选相似步骤，过滤明显不相关内容

Contrastive sentence embeddings vectorize steps, recall likely duplicates, and filter obvious non-matches.

第二阶段 · 精排

Stage 2 · Rerank

Cross-Encoder

同时输入两条步骤进行交叉注意力计算，精确判断任务间的语义重叠程度，显著提升任务分配的准确性与一致性

Cross-attention over step pairs refines overlap scores and improves assignment consistency.

验证结果：双阶段流程相较单一 BERT 方案，在工业拆解步骤去重任务上显著提升了精度，有效减少了冗余步骤的产生，保障了人机协同任务流的执行准确性。

Outcome: Versus BERT-only baselines, the two-stage flow materially improved deduplication accuracy, cut redundant steps, and stabilized the human–machine task graph.

模块三 · 协同开发Module 3 · Integration

跨模块对接与测试

Cross-module integration & testing

协助团队完成各模块的集成对接与联调测试，确保系统整体流程的稳定运行。

Helped integrate modules and run joint tests so the end-to-end pipeline stayed stable.

参与 YOLOv11 电池部件与潜在危险源识别模块的对接测试
协助 AR 展示层与 LLM 规划层的接口对齐，确保任务步骤与视觉叠加的同步
参与系统整体端到端测试，覆盖典型拆解场景的完整流程验证

Integration testing for the YOLOv11 component and hazard detection
Aligned AR presentation with LLM planning APIs so steps and overlays stay in sync
End-to-end tests across representative teardown scenarios

Outcome

项目成果

Results

上海交通大学本科生毕业设计银奖

SJTU undergraduate capstone — silver award

展现出创新性与工程应用价值

Recognized for innovation and engineering impact

技术创新

Technical

双阶段 DiffCSE + Cross-Encoder 语义验证方案，在工业拆解场景下显著优于传统 BERT / n-gram 方法

Two-stage DiffCSE + Cross-Encoder verification outperformed BERT / n-gram baselines in industrial dedup tasks.

系统集成

Integration

完整实现 LLM 任务规划 + YOLOv11 危险源识别 + AR 可视化引导的端到端人机协同系统

Shipped an end-to-end stack: LLM planning, YOLOv11 hazard sensing, and AR guidance.

工程价值

Impact

针对电动汽车动力电池拆解这一高风险、高成本场景，提供了可落地的智能化辅助解决方案

A practical assistive path for high-risk, high-cost EV battery teardown.

基于大模型与 AR 技术的
电子设备人机协同拆解系统

Human–Machine Collaborative Disassembly
with LLMs and AR