api-server/docs/learning-info-design.md

312 lines
11 KiB
Markdown
Raw Normal View History

# 学习信息收集 总设计
## 1. 概述
M8 里程碑实现从 iOS 客户端via Rust document runtime→ API 服务端的学习行为信息收集闭环。
### 数据流
```
iOS App → Rust zx_document_core (ReadingEventV2)
→ iOS 适配层(补充 readingTargetType/platform/appVersion/timezone
→ POST /reading/events (批量上报)
→ ReadingEventProcessorService校验/去重/聚合)
→ LearningSession / MaterialReadingProgress / DailyLearningActivity / LearningRecord
→ 查询接口(进度/继续学习/summary/trend/heatmap/历史)
```
## 2. readingTargetType
Rust 侧不存储 `readingTargetType`,由 iOS 适配层在上传时补充。
| readingTargetType | materialId 映射 | knowledgeBaseId |
|---|---|---|
| `knowledge_source` | `KnowledgeSource.id` | `KnowledgeSource.knowledgeBaseId` |
| `temporary_file` | `TemporaryReadingMaterial.id` | `null`(后续可补) |
### iOS 上传时补充逻辑
```typescript
// iOS 适配层在构造上传请求时:
const item = {
eventId: rustEvent.eventId,
clientSessionId: rustEvent.clientSessionId,
materialId: rustEvent.materialId,
eventType: rustEvent.eventType,
position: rustEvent.position,
activeSecondsDelta: rustEvent.activeSecondsDelta,
clientTimestampMs: rustEvent.timestampMs,
sequence: rustEvent.sequence,
// iOS 补充字段:
readingTargetType: resolveTargetType(rustEvent.materialId), // 'knowledge_source' | 'temporary_file'
platform: 'ios',
appVersion: getAppVersion(),
clientTimezoneOffsetMinutes: getTimezoneOffset(),
};
```
## 3. 实体映射
### 3.1 新增表
#### ReadingEvent原始事件日志
```prisma
model ReadingEvent {
id String @id @default(cuid())
userId String
eventId String
clientSessionId String
readingTargetType String @db.VarChar(32)
materialId String
knowledgeBaseId String?
eventType String @db.VarChar(32)
position Json?
activeSecondsDelta Int @default(0)
clientTimestampMs BigInt
clientTimezoneOffsetMinutes Int?
sequence Int
platform String? @db.VarChar(16)
appVersion String? @db.VarChar(32)
status String @default("pending") @db.VarChar(32)
errorCode String? @db.VarChar(32)
warningCodes Json?
serverReceivedAt DateTime @default(now())
processedAt DateTime?
createdAt DateTime @default(now())
user User @relation(fields: [userId], references: [id])
@@unique([userId, eventId])
@@index([userId, clientSessionId])
@@index([userId, readingTargetType, materialId, clientTimestampMs])
@@index([status, createdAt])
@@index([userId, createdAt])
}
```
#### MaterialReadingProgress资料阅读进度
```prisma
model MaterialReadingProgress {
id String @id @default(cuid())
userId String
materialId String // 关联的 materialId
readingTargetType String @db.VarChar(32)
knowledgeBaseId String? // 从 KnowledgeSource 反查
lastClientSessionId String?
lastPosition Json? // camelCase ReadingPosition
lastProgress Float? // 0~1 归一化进度值
totalActiveSeconds Int @default(0) // 累计活跃阅读秒数
sessionCount Int @default(0) // 阅读会话次数
status String @default("not_started") @db.VarChar(32)
firstOpenedAt DateTime?
lastOpenedAt DateTime?
lastReadAt DateTime?
isMarkedRead Boolean @default(false)
markedReadAt DateTime?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
user User @relation(fields: [userId], references: [id])
@@unique([userId, materialId])
@@index([userId])
@@index([knowledgeBaseId])
@@index([status])
}
```
#### TemporaryReadingMaterial临时阅读资料
```prisma
model TemporaryReadingMaterial {
id String @id @default(cuid())
userId String
title String? @db.VarChar(255)
originalFilename String? @db.VarChar(255)
mimeType String? @db.VarChar(100)
sizeBytes BigInt @default(0)
storageKey String? @db.VarChar(500)
sourceStatus String @default("active") @db.VarChar(32)
expiresAt DateTime?
deletedAt DateTime?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
user User @relation(fields: [userId], references: [id])
@@index([userId])
@@index([expiresAt])
}
```
### 3.2 扩展现有表
#### LearningSession扩展字段
在现有 `LearningSession` 基础上新增:
```prisma
model LearningSession {
// ... 现有字段 ...
// M8 新增字段:
clientSessionId String? // Rust client_session_id关联上报事件
materialId String? // 正在阅读的资料 materialId
readingTargetType String? @db.VarChar(32)
totalActiveSeconds Int @default(0) // 来自 Rust 的累计活跃秒数
lastPosition Json? // 最后阅读位置
lastEventAt DateTime? // 最后事件时间
}
```
> 现有字段 `mode` 保留,新增 `readingTargetType` 不冲突。`durationSeconds` 兼容:优先使用 `totalActiveSeconds`Rust tracker无 Rust 数据则保留旧逻辑。
#### DailyLearningActivity扩展字段
```prisma
model DailyLearningActivity {
// ... 现有字段 (durationSeconds, sessionsCount, activeRecallCount, reviewCount, aiAnalysisCount, completedLoopCount, activityLevel) ...
// M8 新增字段:
readingSeconds Int @default(0) // 当日阅读时长(秒)
materialsReadCount Int @default(0) // 当日阅读资料数
markedReadCount Int @default(0) // 当日标记已读数
}
```
### 3.3 复用现有表
#### LearningRecord无需改 schema
`recordType` 取值扩展:
- `reading` — 阅读记录(新增)
- `read_completed` — 完成阅读(新增)
`metadata` JSON 扩展字段:
```json
{
"materialId": "...",
"readingTargetType": "knowledge_source",
"knowledgeBaseId": "...",
"totalActiveSeconds": 120,
"lastPosition": {...}
}
```
## 4. 核心聚合链路
```
POST /reading/events (批量上报)
ReadingEventProcessorService.processBatch(events)
├─ 1. 幂等去重eventId unique
├─ 2. 校验activeSecondsDelta >= 0 且 <= 300
├─ 3. 写入 ReadingEvent 表status=pending→processed
├─ 4. 聚合 → LearningSession
│ - 按 clientSessionId 找已存在 session
│ - 存在:更新 lastPosition / totalActiveSeconds / lastEventAt
│ - 不存在MaterialOpened新建 LearningSession
│ - MaterialClosed结束 sessionstatus=ended
├─ 5. 聚合 → MaterialReadingProgress
│ - UPSERT (userId, materialId)
│ - 累加 totalActiveSeconds / sessionCount
│ - 更新 latestPosition / progressValue
│ - 时间更新firstOpenedAt / lastReadAt / completedAt
├─ 6. 聚合 → DailyLearningActivity
│ - UPSERT (userId, activityDate)
│ - 累加 readingDurationSeconds / materialCount
└─ 7. 写入 LearningRecord当 MarkedAsRead / MaterialClosed / 首次打开)
```
### 聚合时机
**同步聚合**(在请求处理中完成):
- 校验通过后立即写入 ReadingEvent
- 立即聚合到 LearningSession / MaterialReadingProgress / DailyLearningActivity
- 暂不使用 worker/队列
### 特殊情况处理
| 场景 | 处理 |
|------|------|
| 重复 eventId | status=duplicate, 跳过聚合 |
| activeSecondsDelta < 0 | status=failed, errorCode=INVALID_DELTA |
| activeSecondsDelta > 300 | 截断为 300单次 tick 不超过 5 分钟) |
| activeSecondsDelta = 0 | 合法MaterialOpened/PositionChanged/MarkedAsRead |
| MaterialClosed 无 position | 不覆盖已有 position |
| 乱序事件(时间倒退) | 不拒绝,正常处理(客户端时钟漂移容忍) |
## 5. 错误码与警告码
### 错误码事件被拒绝status=failed
| 码 | 含义 |
|----|------|
| `MATERIAL_NOT_FOUND` | knowledge_source 不存在 |
| `TEMPORARY_MATERIAL_NOT_FOUND` | temporary_file 不存在 |
| `MATERIAL_ACCESS_DENIED` | 不属于当前用户 |
| `TEMPORARY_MATERIAL_EXPIRED` | 临时文件已过期 |
| `INVALID_TARGET_TYPE` | 未知 readingTargetType |
| `INVALID_EVENT_TYPE` | 未知 eventType |
| `INVALID_TIMESTAMP` | 时间戳格式错误 |
| `INVALID_POSITION` | position JSON 格式错误 |
| `INVALID_ACTIVE_SECONDS` | activeSecondsDelta < 0 |
| `BATCH_LIMIT_EXCEEDED` | 超过批量上限100 |
| `MISSING_CLIENT_SESSION` | 缺少 clientSessionId |
| `MISSING_MATERIAL_ID` | 缺少 materialId |
### 警告码(事件被接受但标记)
| 码 | 含义 |
|----|------|
| `ACTIVE_SECONDS_CAPPED` | delta > 300已截断 |
| `CLIENT_TIMESTAMP_SKEWED` | 时钟偏差 > 5 min |
| `POSITION_IGNORED` | position 存在但对 eventType 无效 |
| `DUPLICATE_EVENT` | 幂等重放 |
| `OUT_OF_ORDER_EVENT` | 乱序事件 |
| `SOURCE_DELETED` | 来源资料已删除 |
## 6. 权限校验
### 上报接口
- `readingTargetType=knowledge_source`:验证 `KnowledgeSource` 存在且属于当前用户
- `readingTargetType=temporary_file`:验证 `TemporaryReadingMaterial` 存在且属于当前用户
- 未知 materialId记录 warning仍接受事件避免丢失数据
### 查询接口
- `GET /reading/progress/:materialId`:验证用户权限
- `GET /reading/continue-learning`:返回当前用户的资料
- 所有查询接口通过 JWT guard 获取 userId
## 7. 接口列表
| 方法 | 路径 | 说明 |
|------|------|------|
| POST | `/reading/events` | 批量上报阅读事件 |
| GET | `/reading/progress/:materialId` | 查询单资料阅读进度 |
| GET | `/reading/continue-learning` | 首页继续学习 |
| GET | `/reading/summary` | 学习 summary |
| GET | `/reading/trend` | 纯数据 trend |
| GET | `/reading/heatmap` | 热力图数据 |
| GET | `/reading/history` | 学习历史记录 |
| POST | `/reading/events/replay` | 事件重放/修复 |
## 8. 验收清单
- [x] `docs/learning-info-design.md` 存在
- [x] readingTargetType 定义knowledge_source / temporary_file
- [x] materialId 映射:→ KnowledgeSource.id / TemporaryReadingMaterial.id
- [x] 权限校验方式JWT guard + userId + 资源归属检查
- [x] Rust ReadingEventV2 → API ReadingEvent 字段映射
- [x] 核心聚合链路ReadingEvent → LearningSession → MaterialReadingProgress → DailyLearningActivity → LearningRecord
- [x] 错误码定义8 种
- [x] 同步聚合策略