All checks were successful
Deploy API Server / build-and-deploy (push) Successful in 45s
- runtime-internal.service: resolveSnapshot 自动重建、persistResult 5种jobType持久化、validateOutput 校验、convertQuizCandidates/convertFlashcardCandidates 候选转换、notifyJobComplete 通知、JOB_CANCELLED处理、heartbeat 双阶段更新+取消检测 - user-ai.service: createAnalysisJob 11步流程、cancelJob、publishQuiz/publishFlashcard、getAnalysis/listAnalyses等 - user-ai.controller: 20+ 用户API端点 - 新增服务: SnapshotBuilderService、PriorityRulesService、SnapshotCleanupService、JobReaperService - 新增模块: admin-learning (CRUD管理) - Prisma schema: cancelRequestedAt/cancelledAt/sourceBlockIds 字段、expiresAt 索引 - 文档: ai-runtime-user-api.md、Issue 记录 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
89 lines
2.7 KiB
Markdown
89 lines
2.7 KiB
Markdown
# API-AI-R01: resolveSnapshot 并发竞争
|
||
|
||
## 基本信息
|
||
|
||
| 字段 | 值 |
|
||
|------|-----|
|
||
| Issue ID | API-AI-R01 |
|
||
| 类型 | Non-blocking / 优化 |
|
||
| 仓库 | api-server |
|
||
| 关联 Issue | API-AI-016 (Snapshot Builder) |
|
||
| 发现日期 | 2026-06-17 |
|
||
| 优先级 | P2 |
|
||
|
||
## 问题描述
|
||
|
||
`runtime-internal.service.ts` 中的 `resolveSnapshot()` 存在并发竞态窗口。
|
||
|
||
### 场景
|
||
|
||
两个 Runtime 实例同时对同一个 job 调用 `getSnapshot()`,且 job 当前没有有效 snapshot(未生成或已过期):
|
||
|
||
```
|
||
时间线 →
|
||
|
||
实例 A 实例 B
|
||
│ │
|
||
├─ resolveSnapshot(job) │
|
||
│ snapshotId=null → 进 else │
|
||
│ ├─ resolveSnapshot(job)
|
||
│ │ snapshotId=null → 进 else
|
||
│ │
|
||
├─ buildSnapshot() → snap-A │
|
||
│ ├─ buildSnapshot() → snap-B
|
||
│ │
|
||
├─ job.update(snapshotId=A) │
|
||
│ ├─ job.update(snapshotId=B) ← 覆盖 A
|
||
```
|
||
|
||
### 后果
|
||
|
||
1. 数据库产生 snap-A 孤儿行(无 job 引用)
|
||
2. 浪费一次全量聚合查询(buildSnapshot)
|
||
3. snap-A 在 24h TTL 后自动过期清理
|
||
|
||
### 为什么当前影响可接受
|
||
|
||
- 不会丢数据或返回错误
|
||
- snapshot 构建是幂等的,两份结果一致性高
|
||
- 触发条件苛刻:两个 Runtime 实例需同时 poll 到同一个 job(poll 时有 lock 机制大幅降低概率)
|
||
- 即使发生,额外开销仅为一次聚合查询
|
||
|
||
## 建议修复方案
|
||
|
||
方案:对 job 行加悲观锁后再判断 snapshot 状态。
|
||
|
||
```typescript
|
||
// resolveSnapshot 改为:
|
||
private async resolveSnapshot(job) {
|
||
// SELECT ... FOR UPDATE 锁住 job 行
|
||
const locked = await this.prisma.aiRuntimeJob.findUnique({
|
||
where: { id: job.id },
|
||
// Prisma 不直接支持 FOR UPDATE,需用 $queryRaw
|
||
});
|
||
|
||
if (locked.snapshotId) {
|
||
const existing = await this.prisma.learningAnalysisSnapshot.findUnique({
|
||
where: { id: locked.snapshotId },
|
||
});
|
||
if (existing && (!existing.expiresAt || new Date(existing.expiresAt) >= new Date())) {
|
||
return existing;
|
||
}
|
||
}
|
||
|
||
const snapshot = await this.snapshotBuilder.buildSnapshot(...);
|
||
await this.prisma.aiRuntimeJob.update({
|
||
where: { id: job.id },
|
||
data: { snapshotId: snapshot.id },
|
||
});
|
||
return snapshot;
|
||
}
|
||
```
|
||
|
||
或者使用 `$transaction` 包裹读-判断-写逻辑,依赖数据库隔离级别保护。
|
||
|
||
## 相关文件
|
||
|
||
- `src/modules/ai-runtime/internal/runtime-internal.service.ts:resolveSnapshot()`
|
||
- `src/modules/ai-runtime/snapshot-builder.service.ts:buildSnapshot()`
|