api-server/docs/learning-info-design.md
wangdl 38a8629e42
Some checks failed
Deploy API Server / build-and-deploy (push) Failing after 11s
feat: M8 学习信息收集系统完整实现
Phase 1-2: 设计文档 + 数据库 (ReadingEvent/MaterialReadingProgress/TemporaryReadingMaterial/LearningSession扩展/DailyLearningActivity扩展/LearningRecord)
Phase 3: 批量上报 + 校验去重 + ReadingEventProcessorService
Phase 4: 4表聚合管线 (LearningSession/MaterialReadingProgress/DailyLearningActivity/LearningRecord)
Phase 5: 查询接口 (progress/continue/summary/trend/heatmap/history/reprocess)
Phase 6: 权限校验 + session中断清理 + API文档

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-08 21:09:13 +08:00

312 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 学习信息收集 总设计
## 1. 概述
M8 里程碑实现从 iOS 客户端via Rust document runtime→ API 服务端的学习行为信息收集闭环。
### 数据流
```
iOS App → Rust zx_document_core (ReadingEventV2)
→ iOS 适配层(补充 readingTargetType/platform/appVersion/timezone
→ POST /reading/events (批量上报)
→ ReadingEventProcessorService校验/去重/聚合)
→ LearningSession / MaterialReadingProgress / DailyLearningActivity / LearningRecord
→ 查询接口(进度/继续学习/summary/trend/heatmap/历史)
```
## 2. readingTargetType
Rust 侧不存储 `readingTargetType`,由 iOS 适配层在上传时补充。
| readingTargetType | materialId 映射 | knowledgeBaseId |
|---|---|---|
| `knowledge_source` | `KnowledgeSource.id` | `KnowledgeSource.knowledgeBaseId` |
| `temporary_file` | `TemporaryReadingMaterial.id` | `null`(后续可补) |
### iOS 上传时补充逻辑
```typescript
// iOS 适配层在构造上传请求时:
const item = {
eventId: rustEvent.eventId,
clientSessionId: rustEvent.clientSessionId,
materialId: rustEvent.materialId,
eventType: rustEvent.eventType,
position: rustEvent.position,
activeSecondsDelta: rustEvent.activeSecondsDelta,
clientTimestampMs: rustEvent.timestampMs,
sequence: rustEvent.sequence,
// iOS 补充字段:
readingTargetType: resolveTargetType(rustEvent.materialId), // 'knowledge_source' | 'temporary_file'
platform: 'ios',
appVersion: getAppVersion(),
clientTimezoneOffsetMinutes: getTimezoneOffset(),
};
```
## 3. 实体映射
### 3.1 新增表
#### ReadingEvent原始事件日志
```prisma
model ReadingEvent {
id String @id @default(cuid())
userId String
eventId String
clientSessionId String
readingTargetType String @db.VarChar(32)
materialId String
knowledgeBaseId String?
eventType String @db.VarChar(32)
position Json?
activeSecondsDelta Int @default(0)
clientTimestampMs BigInt
clientTimezoneOffsetMinutes Int?
sequence Int
platform String? @db.VarChar(16)
appVersion String? @db.VarChar(32)
status String @default("pending") @db.VarChar(32)
errorCode String? @db.VarChar(32)
warningCodes Json?
serverReceivedAt DateTime @default(now())
processedAt DateTime?
createdAt DateTime @default(now())
user User @relation(fields: [userId], references: [id])
@@unique([userId, eventId])
@@index([userId, clientSessionId])
@@index([userId, readingTargetType, materialId, clientTimestampMs])
@@index([status, createdAt])
@@index([userId, createdAt])
}
```
#### MaterialReadingProgress资料阅读进度
```prisma
model MaterialReadingProgress {
id String @id @default(cuid())
userId String
materialId String // 关联的 materialId
readingTargetType String @db.VarChar(32)
knowledgeBaseId String? // 从 KnowledgeSource 反查
lastClientSessionId String?
lastPosition Json? // camelCase ReadingPosition
lastProgress Float? // 0~1 归一化进度值
totalActiveSeconds Int @default(0) // 累计活跃阅读秒数
sessionCount Int @default(0) // 阅读会话次数
status String @default("not_started") @db.VarChar(32)
firstOpenedAt DateTime?
lastOpenedAt DateTime?
lastReadAt DateTime?
isMarkedRead Boolean @default(false)
markedReadAt DateTime?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
user User @relation(fields: [userId], references: [id])
@@unique([userId, materialId])
@@index([userId])
@@index([knowledgeBaseId])
@@index([status])
}
```
#### TemporaryReadingMaterial临时阅读资料
```prisma
model TemporaryReadingMaterial {
id String @id @default(cuid())
userId String
title String? @db.VarChar(255)
originalFilename String? @db.VarChar(255)
mimeType String? @db.VarChar(100)
sizeBytes BigInt @default(0)
storageKey String? @db.VarChar(500)
sourceStatus String @default("active") @db.VarChar(32)
expiresAt DateTime?
deletedAt DateTime?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
user User @relation(fields: [userId], references: [id])
@@index([userId])
@@index([expiresAt])
}
```
### 3.2 扩展现有表
#### LearningSession扩展字段
在现有 `LearningSession` 基础上新增:
```prisma
model LearningSession {
// ... 现有字段 ...
// M8 新增字段:
clientSessionId String? // Rust client_session_id关联上报事件
materialId String? // 正在阅读的资料 materialId
readingTargetType String? @db.VarChar(32)
totalActiveSeconds Int @default(0) // 来自 Rust 的累计活跃秒数
lastPosition Json? // 最后阅读位置
lastEventAt DateTime? // 最后事件时间
}
```
> 现有字段 `mode` 保留,新增 `readingTargetType` 不冲突。`durationSeconds` 兼容:优先使用 `totalActiveSeconds`Rust tracker无 Rust 数据则保留旧逻辑。
#### DailyLearningActivity扩展字段
```prisma
model DailyLearningActivity {
// ... 现有字段 (durationSeconds, sessionsCount, activeRecallCount, reviewCount, aiAnalysisCount, completedLoopCount, activityLevel) ...
// M8 新增字段:
readingSeconds Int @default(0) // 当日阅读时长(秒)
materialsReadCount Int @default(0) // 当日阅读资料数
markedReadCount Int @default(0) // 当日标记已读数
}
```
### 3.3 复用现有表
#### LearningRecord无需改 schema
`recordType` 取值扩展:
- `reading` — 阅读记录(新增)
- `read_completed` — 完成阅读(新增)
`metadata` JSON 扩展字段:
```json
{
"materialId": "...",
"readingTargetType": "knowledge_source",
"knowledgeBaseId": "...",
"totalActiveSeconds": 120,
"lastPosition": {...}
}
```
## 4. 核心聚合链路
```
POST /reading/events (批量上报)
ReadingEventProcessorService.processBatch(events)
├─ 1. 幂等去重eventId unique
├─ 2. 校验activeSecondsDelta >= 0 且 <= 300
├─ 3. 写入 ReadingEvent 表status=pending→processed
├─ 4. 聚合 → LearningSession
│ - 按 clientSessionId 找已存在 session
│ - 存在:更新 lastPosition / totalActiveSeconds / lastEventAt
│ - 不存在MaterialOpened新建 LearningSession
│ - MaterialClosed结束 sessionstatus=ended
├─ 5. 聚合 → MaterialReadingProgress
│ - UPSERT (userId, materialId)
│ - 累加 totalActiveSeconds / sessionCount
│ - 更新 latestPosition / progressValue
│ - 时间更新firstOpenedAt / lastReadAt / completedAt
├─ 6. 聚合 → DailyLearningActivity
│ - UPSERT (userId, activityDate)
│ - 累加 readingDurationSeconds / materialCount
└─ 7. 写入 LearningRecord当 MarkedAsRead / MaterialClosed / 首次打开)
```
### 聚合时机
**同步聚合**(在请求处理中完成):
- 校验通过后立即写入 ReadingEvent
- 立即聚合到 LearningSession / MaterialReadingProgress / DailyLearningActivity
- 暂不使用 worker/队列
### 特殊情况处理
| 场景 | 处理 |
|------|------|
| 重复 eventId | status=duplicate, 跳过聚合 |
| activeSecondsDelta < 0 | status=failed, errorCode=INVALID_DELTA |
| activeSecondsDelta > 300 | 截断为 300单次 tick 不超过 5 分钟) |
| activeSecondsDelta = 0 | 合法MaterialOpened/PositionChanged/MarkedAsRead |
| MaterialClosed 无 position | 不覆盖已有 position |
| 乱序事件(时间倒退) | 不拒绝,正常处理(客户端时钟漂移容忍) |
## 5. 错误码与警告码
### 错误码事件被拒绝status=failed
| 码 | 含义 |
|----|------|
| `MATERIAL_NOT_FOUND` | knowledge_source 不存在 |
| `TEMPORARY_MATERIAL_NOT_FOUND` | temporary_file 不存在 |
| `MATERIAL_ACCESS_DENIED` | 不属于当前用户 |
| `TEMPORARY_MATERIAL_EXPIRED` | 临时文件已过期 |
| `INVALID_TARGET_TYPE` | 未知 readingTargetType |
| `INVALID_EVENT_TYPE` | 未知 eventType |
| `INVALID_TIMESTAMP` | 时间戳格式错误 |
| `INVALID_POSITION` | position JSON 格式错误 |
| `INVALID_ACTIVE_SECONDS` | activeSecondsDelta < 0 |
| `BATCH_LIMIT_EXCEEDED` | 超过批量上限100 |
| `MISSING_CLIENT_SESSION` | 缺少 clientSessionId |
| `MISSING_MATERIAL_ID` | 缺少 materialId |
### 警告码(事件被接受但标记)
| | 含义 |
|----|------|
| `ACTIVE_SECONDS_CAPPED` | delta > 300已截断 |
| `CLIENT_TIMESTAMP_SKEWED` | 时钟偏差 > 5 min |
| `POSITION_IGNORED` | position 存在但对 eventType 无效 |
| `DUPLICATE_EVENT` | 幂等重放 |
| `OUT_OF_ORDER_EVENT` | 乱序事件 |
| `SOURCE_DELETED` | 来源资料已删除 |
## 6. 权限校验
### 上报接口
- `readingTargetType=knowledge_source`:验证 `KnowledgeSource` 存在且属于当前用户
- `readingTargetType=temporary_file`:验证 `TemporaryReadingMaterial` 存在且属于当前用户
- 未知 materialId记录 warning仍接受事件避免丢失数据
### 查询接口
- `GET /reading/progress/:materialId`:验证用户权限
- `GET /reading/continue-learning`:返回当前用户的资料
- 所有查询接口通过 JWT guard 获取 userId
## 7. 接口列表
| 方法 | 路径 | 说明 |
|------|------|------|
| POST | `/reading/events` | 批量上报阅读事件 |
| GET | `/reading/progress/:materialId` | 查询单资料阅读进度 |
| GET | `/reading/continue-learning` | 首页继续学习 |
| GET | `/reading/summary` | 学习 summary |
| GET | `/reading/trend` | 纯数据 trend |
| GET | `/reading/heatmap` | 热力图数据 |
| GET | `/reading/history` | 学习历史记录 |
| POST | `/reading/events/replay` | 事件重放/修复 |
## 8. 验收清单
- [x] `docs/learning-info-design.md` 存在
- [x] readingTargetType 定义knowledge_source / temporary_file
- [x] materialId 映射:→ KnowledgeSource.id / TemporaryReadingMaterial.id
- [x] 权限校验方式JWT guard + userId + 资源归属检查
- [x] Rust ReadingEventV2 → API ReadingEvent 字段映射
- [x] 核心聚合链路ReadingEvent → LearningSession → MaterialReadingProgress → DailyLearningActivity → LearningRecord
- [x] 错误码定义8 种
- [x] 同步聚合策略