api-server/docs/ai-runtime-internal-api-protocol.md
wangdl e16b970a2c
All checks were successful
Deploy API Server / build-and-deploy (push) Successful in 49s
docs: sync internal API protocol with heartbeat & snapshot changes
- 4.3 Heartbeat: 204→200, add { jobId, lockUntil, cancelRequested } response
- 4.4 Get Snapshot: auto-rebuild replaces SNAPSHOT_EXPIRED/SNAPSHOT_NOT_FOUND
- 3. Error codes: drop SNAPSHOT_EXPIRED, mark SNAPSHOT_NOT_FOUND as deprecated
- 4.7 Submit Failure: add JOB_CANCELLED handling rule

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-18 11:54:48 +08:00

376 lines
9.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# API 与 Rust Runtime 内部通信协议
## 1. 概述
本文档定义主 API 与 Rust Heavy Runtime 之间的内部 HTTP 通信协议。
通信方向:
- Runtime → API拉取 Job、提交结果、提交日志
- API → Runtime健康检查可选
## 2. 鉴权
所有 `/internal/runtime/*` 接口使用 `InternalAuthGuard`
**请求头**
```
x-internal-api-key: <RUNTIME_SERVICE_TOKEN>
x-runtime-instance-id: runtime-001
```
- `x-internal-api-key`:与 API 环境变量 `INTERNAL_API_KEY` 一致
- `x-runtime-instance-id`Runtime 实例标识,记录到日志
**安全约束**
- 普通用户 JWT 不可访问 internal 接口
- service token 不可访问普通用户 API
- Runtime 不可通过 internal 接口访问非当前 job 所需数据
## 3. 错误响应格式
所有 internal 接口失败时返回:
```json
{
"statusCode": 400,
"errorCode": "INVALID_SNAPSHOT",
"message": "Snapshot has expired for this job",
"timestamp": "2026-06-11T10:00:00.000Z"
}
```
### 错误码
| 错误码 | HTTP | 说明 | retryable |
|--------|------|------|-----------|
| `JOB_NOT_FOUND` | 404 | Job 不存在 | false |
| `JOB_ALREADY_LOCKED` | 409 | 已被其他 Runtime 锁定 | true |
| `SNAPSHOT_NOT_FOUND` | 404 | 快照不存在已废弃getSnapshot 自动重建) | false |
| `CREDENTIAL_NOT_FOUND` | 404 | 凭证不存在 | false |
| `CREDENTIAL_INVALID` | 422 | 凭证无效 | false |
| `RESULT_ALREADY_EXISTS` | 409 | 重复提交 | false |
| `RESULT_SCHEMA_UNSUPPORTED` | 422 | schema 版本不支持 | false |
| `RUNTIME_VERSION_INCOMPATIBLE` | 422 | Runtime 版本不兼容 | false |
| `INTERNAL_ERROR` | 500 | 内部错误 | true |
## 4. 接口详情
### 4.1 Poll Jobs
```
POST /internal/runtime/jobs/poll
```
Runtime 拉取待执行 job。API 根据 Runtime 的 `supportedJobTypes``capabilities` 过滤兼容的 job。
**请求**
```json
{
"runtimeInstanceId": "runtime-001",
"supportedJobTypes": ["learning_state_analysis", "quiz_generation"],
"limit": 5,
"capabilities": {
"supportedSnapshotVersions": ["ai_snapshot_v1"],
"supportedOutputSchemaVersions": ["analysis_output_v1", "quiz_output_v1"]
}
}
```
**响应 200**
```json
{
"jobs": [
{
"id": "job-abc123",
"jobType": "learning_state_analysis",
"targetType": "material",
"targetId": "mat-xyz",
"priority": 0,
"snapshotId": "snap-001",
"promptVersion": "learning_state_v1",
"outputSchemaVersion": "analysis_output_v1"
}
]
}
```
### 4.2 Lock Job
```
POST /internal/runtime/jobs/{jobId}/lock
```
Runtime 锁定一个 job获取执行权。
**请求**
```json
{
"runtimeInstanceId": "runtime-001"
}
```
**响应 200**
```json
{
"jobId": "job-abc123",
"status": "locked",
"lockUntil": 1700000000123
}
```
### 4.3 Heartbeat
```
POST /internal/runtime/jobs/{jobId}/heartbeat
```
Runtime 延长 lock 有效期。首次调用完成 locked→running 转换并设置 startedAt后续调用仅延长 lockUntil。
同时返回 `cancelRequested` 标志 — 若用户已请求取消Runtime 应在下一个检查点中止执行。
**请求**
```json
{
"runtimeInstanceId": "runtime-001"
}
```
**响应 200**
```json
{
"jobId": "job-abc123",
"lockUntil": 1700000000123,
"cancelRequested": false
}
```
| 字段 | 类型 | 说明 |
|------|------|------|
| `jobId` | string | Job ID |
| `lockUntil` | number | 锁过期时间ms epoch |
| `cancelRequested` | boolean | 用户是否已请求取消。true 时 Runtime 应提交 JOB_CANCELLED 并中止 |
### 4.4 Get Snapshot
```
GET /internal/runtime/jobs/{jobId}/snapshot
```
Runtime 获取 job 关联的 LearningAnalysisSnapshot。
若 job 无关联快照、快照不存在、或快照 sourceDataVersion 过旧 / 已过期API 自动调用 `SnapshotBuilder.buildSnapshot` 重建新快照并绑定到 job。Runtime 无需处理快照过期逻辑。
**响应 200**
```json
{
"jobId": "job-abc123",
"snapshotId": "snap-001",
"snapshotVersion": "ai_snapshot_v1",
"privacyScope": { "allowDocumentContent": true },
"userProfile": { "learningGoal": "exam", "currentLevel": "intermediate" },
"aiSettings": { "allowAiAnalysis": true },
"learningBehaviorSummary": { "totalActiveSeconds": 3600 },
"materialProgressSummary": { "progress": 0.6 },
"behaviorSignals": { "engagementSignal": "high" },
"scoreSignals": { "masteryRiskScore": 0.3 },
"constraints": { "dailyAvailableMinutes": 60 },
"allowedModelFields": ["learningGoal", "currentLevel"]
}
```
**错误**
- `404 JOB_NOT_FOUND` — Job 不存在
### 4.5 Resolve Credential
```
POST /internal/runtime/model-credentials/resolve
```
Runtime 获取模型调用凭证。platform_key 模式返回平台 keyuser_deepseek_key 模式解密用户 key 后返回。
**请求**
```json
{
"jobId": "job-abc123",
"apiKeyMode": "user_deepseek_key",
"credentialId": "cred-001",
"provider": "deepseek"
}
```
**响应 200**
```json
{
"provider": "deepseek",
"model": "deepseek-chat",
"baseUrl": "https://api.deepseek.com/v1",
"apiKey": "sk-xxxx",
"apiKeyMode": "user_deepseek_key"
}
```
**安全要求**
- 明文 `apiKey` 只在响应中短暂出现,不写日志
- `apiKey` 不返回给 iOS / Admin
- 用户 key 必须属于 `job.userId`
- platform key 由 Runtime 环境变量优先使用API 可选返回
**错误**
- `404 CREDENTIAL_NOT_FOUND`
- `422 CREDENTIAL_INVALID`
### 4.6 Submit Result
```
POST /internal/runtime/jobs/{jobId}/result
```
Runtime 提交执行成功的结果。
**请求**
```json
{
"runtimeInstanceId": "runtime-001",
"schemaVersion": "analysis_output_v1",
"status": "succeeded",
"rawOutput": { "learningState": "in_progress", "confidence": 0.85 },
"validatedOutput": { "learningState": "in_progress", "riskLevel": "low" },
"validationErrors": [],
"usage": {
"inputTokens": 1200,
"outputTokens": 450,
"totalTokens": 1650,
"latencyMs": 3200,
"costEstimate": 3
},
"attemptNo": 0,
"outputHash": "sha256-abc123"
}
```
**幂等规则**
- `resultIdempotencyKey = jobId + attemptNo + outputHash`
- 相同 key 重复提交返回 200幂等
- 已有 succeeded result 且 outputHash 不同返回 409 `RESULT_ALREADY_EXISTS`
**响应 201**created
**错误**
- `409 RESULT_ALREADY_EXISTS`
- `422 RESULT_SCHEMA_UNSUPPORTED`
### 4.7 Submit Failure
```
POST /internal/runtime/jobs/{jobId}/fail
```
Runtime 提交执行失败的原因。
**请求**
```json
{
"runtimeInstanceId": "runtime-001",
"errorCode": "MODEL_TIMEOUT",
"errorMessage": "DeepSeek request timed out after 30s",
"retryable": true,
"rawError": "connection timeout"
}
```
**处理规则**
- `errorCode=JOB_CANCELLED`job 立即标记为 `cancelled`(无论 retryable 值)
- `retryable=true``retryCount < maxRetryCount`job 回到 `pending`
- `retryable=false` 或达到 maxRetryCountjob 变为 `failed`,触发通知
- `rawError` 中不得包含 apiKey
**响应 200**acknowledged
### 4.8 Submit Invocation Logs
```
POST /internal/runtime/invocation-logs
```
Runtime 提交模型调用日志(批量)。
**请求**
```json
{
"logs": [
{
"jobId": "job-abc123",
"provider": "deepseek",
"model": "deepseek-chat",
"apiKeyMode": "user_deepseek_key",
"credentialId": "cred-001",
"promptName": "learning_state_analysis",
"promptVersion": "learning_state_v1",
"outputSchemaVersion": "analysis_output_v1",
"inputTokens": 1200,
"outputTokens": 450,
"totalTokens": 1650,
"latencyMs": 3200,
"costEstimate": 3,
"success": true,
"retryCount": 0,
"runtimeInstanceId": "runtime-001",
"traceId": "trace-xyz",
"correlationId": "corr-abc"
}
]
}
```
**约束**
- 不允许 `apiKey` 字段
- 失败调用也要提交日志
- 日志提交失败不导致主任务崩溃
**响应 201**created
### 4.9 Health可选
```
GET /internal/runtime/health
```
API 查询 Runtime 健康状态。此接口由 Runtime 暴露(非 API 暴露)。
**响应 200**
```json
{
"runtimeInstanceId": "runtime-001",
"status": "ok",
"version": "0.1.0",
"startedAt": 1700000000000,
"lastJobAt": 1700000000123,
"activeJobs": 2
}
```
## 5. 接口总览
| 方法 | 路径 | 调用方 | 鉴权 |
|------|------|--------|------|
| POST | `/internal/runtime/jobs/poll` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/jobs/{jobId}/lock` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/jobs/{jobId}/heartbeat` | Runtime | InternalAuthGuard |
| GET | `/internal/runtime/jobs/{jobId}/snapshot` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/model-credentials/resolve` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/jobs/{jobId}/result` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/jobs/{jobId}/fail` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/invocation-logs` | Runtime | InternalAuthGuard |
| GET | `/internal/runtime/health` | API | —(检查外部 Runtime |
## 6. 验收清单
- [x] 所有 internal 接口有 DTO 定义(`runtime-internal.dto.ts`
- [x] 所有 internal 接口有鉴权设计(复用 InternalAuthGuard
- [x] 所有失败返回包含 errorCode / message
- [x] Runtime result 支持结构化 payloadvalidatedOutput
- [x] Runtime failure 支持 retryable 标记
- [x] Credential resolve 接口明确不记录明文 key
- [x] 接口命名、字段命名与 Runtime 项目可直接对齐