api-server/docs/ai-runtime-internal-api-protocol.md
wangdl eea9e3e7c6
Some checks failed
Deploy API Server / build-and-deploy (push) Has been cancelled
feat: API-Runtime 内部通信协议与 DTO (API-AI-001)
定义 9 个 internal/runtime 接口的完整协议:Poll/Lock/Heartbeat/Snapshot/
Credential Resolve/Result/Fail/InvocationLog/Health。新增 RuntimeInternalDto
类型文件,复用 InternalAuthGuard 鉴权,与 Rust 侧可直接对齐。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-11 20:35:20 +08:00

360 lines
8.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# API 与 Rust Runtime 内部通信协议
## 1. 概述
本文档定义主 API 与 Rust Heavy Runtime 之间的内部 HTTP 通信协议。
通信方向:
- Runtime → API拉取 Job、提交结果、提交日志
- API → Runtime健康检查可选
## 2. 鉴权
所有 `/internal/runtime/*` 接口使用 `InternalAuthGuard`
**请求头**
```
x-internal-api-key: <RUNTIME_SERVICE_TOKEN>
x-runtime-instance-id: runtime-001
```
- `x-internal-api-key`:与 API 环境变量 `INTERNAL_API_KEY` 一致
- `x-runtime-instance-id`Runtime 实例标识,记录到日志
**安全约束**
- 普通用户 JWT 不可访问 internal 接口
- service token 不可访问普通用户 API
- Runtime 不可通过 internal 接口访问非当前 job 所需数据
## 3. 错误响应格式
所有 internal 接口失败时返回:
```json
{
"statusCode": 400,
"errorCode": "INVALID_SNAPSHOT",
"message": "Snapshot has expired for this job",
"timestamp": "2026-06-11T10:00:00.000Z"
}
```
### 错误码
| 错误码 | HTTP | 说明 | retryable |
|--------|------|------|-----------|
| `JOB_NOT_FOUND` | 404 | Job 不存在 | false |
| `JOB_ALREADY_LOCKED` | 409 | 已被其他 Runtime 锁定 | true |
| `SNAPSHOT_EXPIRED` | 410 | 快照已过期 | true |
| `SNAPSHOT_NOT_FOUND` | 404 | 快照不存在 | false |
| `CREDENTIAL_NOT_FOUND` | 404 | 凭证不存在 | false |
| `CREDENTIAL_INVALID` | 422 | 凭证无效 | false |
| `RESULT_ALREADY_EXISTS` | 409 | 重复提交 | false |
| `RESULT_SCHEMA_UNSUPPORTED` | 422 | schema 版本不支持 | false |
| `RUNTIME_VERSION_INCOMPATIBLE` | 422 | Runtime 版本不兼容 | false |
| `INTERNAL_ERROR` | 500 | 内部错误 | true |
## 4. 接口详情
### 4.1 Poll Jobs
```
POST /internal/runtime/jobs/poll
```
Runtime 拉取待执行 job。API 根据 Runtime 的 `supportedJobTypes``capabilities` 过滤兼容的 job。
**请求**
```json
{
"runtimeInstanceId": "runtime-001",
"supportedJobTypes": ["learning_state_analysis", "quiz_generation"],
"limit": 5,
"capabilities": {
"supportedSnapshotVersions": ["ai_snapshot_v1"],
"supportedOutputSchemaVersions": ["analysis_output_v1", "quiz_output_v1"]
}
}
```
**响应 200**
```json
{
"jobs": [
{
"id": "job-abc123",
"jobType": "learning_state_analysis",
"targetType": "material",
"targetId": "mat-xyz",
"priority": 0,
"snapshotId": "snap-001",
"promptVersion": "learning_state_v1",
"outputSchemaVersion": "analysis_output_v1"
}
]
}
```
### 4.2 Lock Job
```
POST /internal/runtime/jobs/{jobId}/lock
```
Runtime 锁定一个 job获取执行权。
**请求**
```json
{
"runtimeInstanceId": "runtime-001"
}
```
**响应 200**
```json
{
"jobId": "job-abc123",
"status": "locked",
"lockUntil": 1700000000123
}
```
### 4.3 Heartbeat
```
POST /internal/runtime/jobs/{jobId}/heartbeat
```
Runtime 延长 lock 有效期。
**请求**
```json
{
"runtimeInstanceId": "runtime-001"
}
```
**响应 204**:空 body仅延长 `lockUntil`
### 4.4 Get Snapshot
```
GET /internal/runtime/jobs/{jobId}/snapshot
```
Runtime 获取 job 关联的 LearningAnalysisSnapshot。
**响应 200**
```json
{
"jobId": "job-abc123",
"snapshotId": "snap-001",
"snapshotVersion": "ai_snapshot_v1",
"privacyScope": { "allowDocumentContent": true },
"userProfile": { "learningGoal": "exam", "currentLevel": "intermediate" },
"aiSettings": { "allowAiAnalysis": true },
"learningBehaviorSummary": { "totalActiveSeconds": 3600 },
"materialProgressSummary": { "progress": 0.6 },
"behaviorSignals": { "engagementSignal": "high" },
"scoreSignals": { "masteryRiskScore": 0.3 },
"constraints": { "dailyAvailableMinutes": 60 },
"allowedModelFields": ["learningGoal", "currentLevel"]
}
```
**错误**
- `404 SNAPSHOT_NOT_FOUND` — 快照不存在
- `410 SNAPSHOT_EXPIRED` — 快照已过期Runtime 应提交 retryable fail
### 4.5 Resolve Credential
```
POST /internal/runtime/model-credentials/resolve
```
Runtime 获取模型调用凭证。platform_key 模式返回平台 keyuser_deepseek_key 模式解密用户 key 后返回。
**请求**
```json
{
"jobId": "job-abc123",
"apiKeyMode": "user_deepseek_key",
"credentialId": "cred-001",
"provider": "deepseek"
}
```
**响应 200**
```json
{
"provider": "deepseek",
"model": "deepseek-chat",
"baseUrl": "https://api.deepseek.com/v1",
"apiKey": "sk-xxxx",
"apiKeyMode": "user_deepseek_key"
}
```
**安全要求**
- 明文 `apiKey` 只在响应中短暂出现,不写日志
- `apiKey` 不返回给 iOS / Admin
- 用户 key 必须属于 `job.userId`
- platform key 由 Runtime 环境变量优先使用API 可选返回
**错误**
- `404 CREDENTIAL_NOT_FOUND`
- `422 CREDENTIAL_INVALID`
### 4.6 Submit Result
```
POST /internal/runtime/jobs/{jobId}/result
```
Runtime 提交执行成功的结果。
**请求**
```json
{
"runtimeInstanceId": "runtime-001",
"schemaVersion": "analysis_output_v1",
"status": "succeeded",
"rawOutput": { "learningState": "in_progress", "confidence": 0.85 },
"validatedOutput": { "learningState": "in_progress", "riskLevel": "low" },
"validationErrors": [],
"usage": {
"inputTokens": 1200,
"outputTokens": 450,
"totalTokens": 1650,
"latencyMs": 3200,
"costEstimate": 3
},
"attemptNo": 0,
"outputHash": "sha256-abc123"
}
```
**幂等规则**
- `resultIdempotencyKey = jobId + attemptNo + outputHash`
- 相同 key 重复提交返回 200幂等
- 已有 succeeded result 且 outputHash 不同返回 409 `RESULT_ALREADY_EXISTS`
**响应 201**created
**错误**
- `409 RESULT_ALREADY_EXISTS`
- `422 RESULT_SCHEMA_UNSUPPORTED`
### 4.7 Submit Failure
```
POST /internal/runtime/jobs/{jobId}/fail
```
Runtime 提交执行失败的原因。
**请求**
```json
{
"runtimeInstanceId": "runtime-001",
"errorCode": "MODEL_TIMEOUT",
"errorMessage": "DeepSeek request timed out after 30s",
"retryable": true,
"rawError": "connection timeout"
}
```
**处理规则**
- `retryable=true``retryCount < maxRetryCount`job 回到 `pending`
- `retryable=false` 或达到 maxRetryCountjob 变为 `failed`
- `rawError` 中不得包含 apiKey
**响应 200**acknowledged
### 4.8 Submit Invocation Logs
```
POST /internal/runtime/invocation-logs
```
Runtime 提交模型调用日志(批量)。
**请求**
```json
{
"logs": [
{
"jobId": "job-abc123",
"provider": "deepseek",
"model": "deepseek-chat",
"apiKeyMode": "user_deepseek_key",
"credentialId": "cred-001",
"promptName": "learning_state_analysis",
"promptVersion": "learning_state_v1",
"outputSchemaVersion": "analysis_output_v1",
"inputTokens": 1200,
"outputTokens": 450,
"totalTokens": 1650,
"latencyMs": 3200,
"costEstimate": 3,
"success": true,
"retryCount": 0,
"runtimeInstanceId": "runtime-001",
"traceId": "trace-xyz",
"correlationId": "corr-abc"
}
]
}
```
**约束**
- 不允许 `apiKey` 字段
- 失败调用也要提交日志
- 日志提交失败不导致主任务崩溃
**响应 201**created
### 4.9 Health可选
```
GET /internal/runtime/health
```
API 查询 Runtime 健康状态。此接口由 Runtime 暴露(非 API 暴露)。
**响应 200**
```json
{
"runtimeInstanceId": "runtime-001",
"status": "ok",
"version": "0.1.0",
"startedAt": 1700000000000,
"lastJobAt": 1700000000123,
"activeJobs": 2
}
```
## 5. 接口总览
| 方法 | 路径 | 调用方 | 鉴权 |
|------|------|--------|------|
| POST | `/internal/runtime/jobs/poll` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/jobs/{jobId}/lock` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/jobs/{jobId}/heartbeat` | Runtime | InternalAuthGuard |
| GET | `/internal/runtime/jobs/{jobId}/snapshot` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/model-credentials/resolve` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/jobs/{jobId}/result` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/jobs/{jobId}/fail` | Runtime | InternalAuthGuard |
| POST | `/internal/runtime/invocation-logs` | Runtime | InternalAuthGuard |
| GET | `/internal/runtime/health` | API | —(检查外部 Runtime |
## 6. 验收清单
- [x] 所有 internal 接口有 DTO 定义(`runtime-internal.dto.ts`
- [x] 所有 internal 接口有鉴权设计(复用 InternalAuthGuard
- [x] 所有失败返回包含 errorCode / message
- [x] Runtime result 支持结构化 payloadvalidatedOutput
- [x] Runtime failure 支持 retryable 标记
- [x] Credential resolve 接口明确不记录明文 key
- [x] 接口命名、字段命名与 Runtime 项目可直接对齐