api-server/docs/ai-runtime-internal-api-protocol.md

360 lines
8.6 KiB
Markdown
Raw Normal View History

# API 与 Rust Runtime 内部通信协议
## 1. 概述
本文档定义主 API 与 Rust Heavy Runtime 之间的内部 HTTP 通信协议。
通信方向:
- Runtime → API拉取 Job、提交结果、提交日志
- API → Runtime健康检查可选
## 2. 鉴权
所有 `/internal/runtime/*` 接口使用 `InternalAuthGuard`
**请求头**
```
x-internal-api-key: <RUNTIME_SERVICE_TOKEN>
x-runtime-instance-id: runtime-001
```
- `x-internal-api-key`:与 API 环境变量 `INTERNAL_API_KEY` 一致
- `x-runtime-instance-id`Runtime 实例标识,记录到日志
**安全约束**
- 普通用户 JWT 不可访问 internal 接口
- service token 不可访问普通用户 API
- Runtime 不可通过 internal 接口访问非当前 job 所需数据
## 3. 错误响应格式
所有 internal 接口失败时返回:
```json
{
"statusCode": 400,
"errorCode": "INVALID_SNAPSHOT",
"message": "Snapshot has expired for this job",
"timestamp": "2026-06-11T10:00:00.000Z"
}
```
### 错误码
| 错误码 | HTTP | 说明 | retryable |
|--------|------|------|-----------|
| `JOB_NOT_FOUND` | 404 | Job 不存在 | false |
| `JOB_ALREADY_LOCKED` | 409 | 已被其他 Runtime 锁定 | true |
| `SNAPSHOT_EXPIRED` | 410 | 快照已过期 | true |
| `SNAPSHOT_NOT_FOUND` | 404 | 快照不存在 | false |
| `CREDENTIAL_NOT_FOUND` | 404 | 凭证不存在 | false |
| `CREDENTIAL_INVALID` | 422 | 凭证无效 | false |
| `RESULT_ALREADY_EXISTS` | 409 | 重复提交 | false |
| `RESULT_SCHEMA_UNSUPPORTED` | 422 | schema 版本不支持 | false |
| `RUNTIME_VERSION_INCOMPATIBLE` | 422 | Runtime 版本不兼容 | false |
| `INTERNAL_ERROR` | 500 | 内部错误 | true |
## 4. 接口详情
### 4.1 Poll Jobs
```
POST /internal/runtime/jobs/poll
```
Runtime 拉取待执行 job。API 根据 Runtime 的 `supportedJobTypes``capabilities` 过滤兼容的 job。
**请求**
```json
{
"runtimeInstanceId": "runtime-001",
"supportedJobTypes": ["learning_state_analysis", "quiz_generation"],
"limit": 5,
"capabilities": {
"supportedSnapshotVersions": ["ai_snapshot_v1"],
"supportedOutputSchemaVersions": ["analysis_output_v1", "quiz_output_v1"]
}
}
```
**响应 200**
```json
{
"jobs": [
{
"id": "job-abc123",
"jobType": "learning_state_analysis",
"targetType": "material",
"targetId": "mat-xyz",
"priority": 0,
"snapshotId": "snap-001",
"promptVersion": "learning_state_v1",
"outputSchemaVersion": "analysis_output_v1"
}
]
}
```
### 4.2 Lock Job
```
POST /internal/runtime/jobs/{jobId}/lock
```
Runtime 锁定一个 job获取执行权。
**请求**
```json
{
"runtimeInstanceId": "runtime-001"
}
```
**响应 200**
```json
{
"jobId": "job-abc123",
"status": "locked",
"lockUntil": 1700000000123
}
```
### 4.3 Heartbeat
```
POST /internal/runtime/jobs/{jobId}/heartbeat
```
Runtime 延长 lock 有效期。
**请求**
```json
{
"runtimeInstanceId": "runtime-001"
}
```
**响应 204**:空 body仅延长 `lockUntil`
### 4.4 Get Snapshot
```
GET /internal/runtime/jobs/{jobId}/snapshot
```
Runtime 获取 job 关联的 LearningAnalysisSnapshot。
**响应 200**
```json
{
"jobId": "job-abc123",
"snapshotId": "snap-001",
"snapshotVersion": "ai_snapshot_v1",
"privacyScope": { "allowDocumentContent": true },
"userProfile": { "learningGoal": "exam", "currentLevel": "intermediate" },
"aiSettings": { "allowAiAnalysis": true },
"learningBehaviorSummary": { "totalActiveSeconds": 3600 },
"materialProgressSummary": { "progress": 0.6 },
"behaviorSignals": { "engagementSignal": "high" },
"scoreSignals": { "masteryRiskScore": 0.3 },
"constraints": { "dailyAvailableMinutes": 60 },
"allowedModelFields": ["learningGoal", "currentLevel"]
}
```
**错误**
- `404 SNAPSHOT_NOT_FOUND` — 快照不存在
- `410 SNAPSHOT_EXPIRED` — 快照已过期Runtime 应提交 retryable fail
### 4.5 Resolve Credential
```
POST /internal/runtime/model-credentials/resolve
```
Runtime 获取模型调用凭证。platform_key 模式返回平台 keyuser_deepseek_key 模式解密用户 key 后返回。
**请求**
```json
{
"jobId": "job-abc123",
"apiKeyMode": "user_deepseek_key",
"credentialId": "cred-001",
"provider": "deepseek"
}
```
**响应 200**
```json
{
"provider": "deepseek",
"model": "deepseek-chat",
"baseUrl": "https://api.deepseek.com/v1",
"apiKey": "sk-xxxx",
"apiKeyMode": "user_deepseek_key"
}
```
**安全要求**
- 明文 `apiKey` 只在响应中短暂出现,不写日志
- `apiKey` 不返回给 iOS / Admin
- 用户 key 必须属于 `job.userId`
- platform key 由 Runtime 环境变量优先使用API 可选返回
**错误**
- `404 CREDENTIAL_NOT_FOUND`
- `422 CREDENTIAL_INVALID`
### 4.6 Submit Result
```
POST /internal/runtime/jobs/{jobId}/result
```
Runtime 提交执行成功的结果。
**请求**
```json
{
"runtimeInstanceId": "runtime-001",
"schemaVersion": "analysis_output_v1",
"status": "succeeded",
"rawOutput": { "learningState": "in_progress", "confidence": 0.85 },
"validatedOutput": { "learningState": "in_progress", "riskLevel": "low" },
"validationErrors": [],
"usage": {
"inputTokens": 1200,
"outputTokens": 450,
"totalTokens": 1650,
"latencyMs": 3200,
"costEstimate": 3
},
"attemptNo": 0,
"outputHash": "sha256-abc123"
}
```
**幂等规则**
- `resultIdempotencyKey = jobId + attemptNo + outputHash`
- 相同 key 重复提交返回 200幂等
- 已有 succeeded result 且 outputHash 不同返回 409 `RESULT_ALREADY_EXISTS`
**响应 201**created
**错误**
- `409 RESULT_ALREADY_EXISTS`
- `422 RESULT_SCHEMA_UNSUPPORTED`
### 4.7 Submit Failure
```
POST /internal/runtime/jobs/{jobId}/fail
```
Runtime 提交执行失败的原因。
**请求**
```json
{
"runtimeInstanceId": "runtime-001",
"errorCode": "MODEL_TIMEOUT",
"errorMessage": "DeepSeek request timed out after 30s",
"retryable": true,
"rawError": "connection timeout"
}
```
**处理规则**
- `retryable=true``retryCount < maxRetryCount`job 回到 `pending`
- `retryable=false` 或达到 maxRetryCountjob 变为 `failed`
- `rawError` 中不得包含 apiKey
**响应 200**acknowledged
### 4.8 Submit Invocation Logs
```
POST /internal/runtime/invocation-logs
```
Runtime 提交模型调用日志(批量)。
**请求**
```json
{
"logs": [
{
"jobId": "job-abc123",
"provider": "deepseek",
"model": "deepseek-chat",
"apiKeyMode": "user_deepseek_key",
"credentialId": "cred-001",
"promptName": "learning_state_analysis",
"promptVersion": "learning_state_v1",
"outputSchemaVersion": "analysis_output_v1",
"inputTokens": 1200,
"outputTokens": 450,
"totalTokens": 1650,
"latencyMs": 3200,
"costEstimate": 3,
"success": true,
"retryCount": 0,
"runtimeInstanceId": "runtime-001",
"traceId": "trace-xyz",
"correlationId": "corr-abc"
}
]
}
```
**约束**
- 不允许 `apiKey` 字段
- 失败调用也要提交日志
- 日志提交失败不导致主任务崩溃
**响应 201**created
### 4.9 Health可选
```
GET /internal/runtime/health
```
API 查询 Runtime 健康状态。此接口由 Runtime 暴露(非 API 暴露)。
**响应 200**
```json
{
"runtimeInstanceId": "runtime-001",
"status": "ok",
"version": "0.1.0",
"startedAt": 1700000000000,
"lastJobAt": 1700000000123,
"activeJobs": 2
}
```
## 5. 接口总览
| 方法 | 路径 | 调用方 | 鉴权 |
|------|------|--------|------|
| POST | `/internal/runtime/jobs/poll` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/jobs/{jobId}/lock` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/jobs/{jobId}/heartbeat` | Runtime | InternalAuthGuard |
| GET | `/internal/runtime/jobs/{jobId}/snapshot` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/model-credentials/resolve` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/jobs/{jobId}/result` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/jobs/{jobId}/fail` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/invocation-logs` | Runtime | InternalAuthGuard |
| GET | `/internal/runtime/health` | API | —(检查外部 Runtime |
## 6. 验收清单
- [x] 所有 internal 接口有 DTO 定义(`runtime-internal.dto.ts`
- [x] 所有 internal 接口有鉴权设计(复用 InternalAuthGuard
- [x] 所有失败返回包含 errorCode / message
- [x] Runtime result 支持结构化 payloadvalidatedOutput
- [x] Runtime failure 支持 retryable 标记
- [x] Credential resolve 接口明确不记录明文 key
- [x] 接口命名、字段命名与 Runtime 项目可直接对齐