故障排除¶
本文档提供常见问题的诊断和解决方案。
快速诊断¶
服务状态检查¶
# Docker Compose
docker-compose ps
docker-compose logs --tail=50
# Kubernetes
kubectl get pods -n unifiles
kubectl describe pod <pod-name> -n unifiles
健康检查¶
日志查看¶
# Docker
docker-compose logs -f api
docker-compose logs -f worker-upload
# Kubernetes
kubectl logs -n unifiles -l app=unifiles-api -f
常见问题¶
服务无法启动¶
症状¶
诊断¶
# 查看详细日志
docker-compose logs api --tail=100
# 检查配置
docker-compose config
# 检查环境变量
docker-compose exec api env | grep -E "PG_|REDIS_|MINIO_"
解决方案¶
配置错误:
# 验证 .env 文件
cat .env | grep -v "^#" | grep -v "^$"
# 检查必需的变量
required_vars="PG_HOST PG_PASSWORD REDIS_HOST MINIO_ENDPOINT SECURITY_SECRET_KEY"
for var in $required_vars; do
if [ -z "${!var}" ]; then
echo "Missing: $var"
fi
done
依赖服务未就绪:
# 确保依赖服务已启动
docker-compose up -d postgres redis minio
# 等待服务就绪
docker-compose exec postgres pg_isready
docker-compose exec redis redis-cli ping
数据库连接失败¶
症状¶
诊断¶
# 检查 PostgreSQL 状态
docker-compose exec postgres pg_isready
# 测试连接
docker-compose exec api python -c "
import asyncpg
import asyncio
asyncio.run(asyncpg.connect(
host='postgres',
database='unifiles',
user='unifiles',
password='your_password'
))
print('Connection OK')
"
# 检查网络
docker network inspect unifiles_default
解决方案¶
认证失败:
# 检查用户和密码
docker-compose exec postgres psql -U postgres -c "\du"
# 重置密码
docker-compose exec postgres psql -U postgres -c "
ALTER USER unifiles WITH PASSWORD 'new_password';
"
连接数耗尽:
-- 检查连接数
SELECT count(*) FROM pg_stat_activity WHERE datname = 'unifiles';
-- 终止空闲连接
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'unifiles'
AND state = 'idle'
AND state_change < NOW() - INTERVAL '10 minutes';
pg_hba.conf 配置:
# 检查认证配置
docker-compose exec postgres cat /var/lib/postgresql/data/pg_hba.conf
# 添加允许的连接
# host unifiles unifiles 0.0.0.0/0 md5
Redis 连接问题¶
症状¶
诊断¶
# 检查 Redis 状态
docker-compose exec redis redis-cli ping
# 带密码测试
docker-compose exec redis redis-cli -a $REDIS_PASSWORD ping
# 检查内存使用
docker-compose exec redis redis-cli -a $REDIS_PASSWORD INFO memory
解决方案¶
认证问题:
# 检查密码配置
docker-compose exec redis redis-cli CONFIG GET requirepass
# 设置密码
docker-compose exec redis redis-cli CONFIG SET requirepass "new_password"
内存不足:
# 检查内存策略
docker-compose exec redis redis-cli -a $REDIS_PASSWORD CONFIG GET maxmemory-policy
# 设置内存限制和策略
docker-compose exec redis redis-cli -a $REDIS_PASSWORD CONFIG SET maxmemory 2gb
docker-compose exec redis redis-cli -a $REDIS_PASSWORD CONFIG SET maxmemory-policy allkeys-lru
MinIO/存储问题¶
症状¶
诊断¶
# 检查 MinIO 状态
docker-compose exec minio mc admin info local
# 测试连接
docker-compose exec api python -c "
from minio import Minio
client = Minio('minio:9000', 'unifiles', 'your_secret', secure=False)
print(client.list_buckets())
"
# 检查 Bucket
docker-compose exec minio mc ls local
解决方案¶
Bucket 不存在:
# 创建 Bucket
docker-compose exec minio mc mb local/unifiles-raw
docker-compose exec minio mc mb local/unifiles-processed
权限问题:
# 检查策略
docker-compose exec minio mc admin policy list local
# 重置用户权限
docker-compose exec minio mc admin user add local unifiles your_secret
docker-compose exec minio mc admin policy attach local readwrite --user unifiles
磁盘空间:
# 检查磁盘使用
docker-compose exec minio mc admin info local --json | jq '.info.backend'
# 清理临时文件
docker-compose exec minio mc rm --recursive --force local/unifiles-cache/
文件上传失败¶
症状¶
诊断¶
# 检查上传日志
docker-compose logs api | grep -i upload
# 测试上传
curl -v -X POST http://localhost:8088/api/v1/files \
-H "Authorization: Bearer $API_KEY" \
-F "file=@test.pdf"
解决方案¶
文件过大:
文件类型限制:
超时问题:
内容提取失败¶
症状¶
诊断¶
# 检查提取 Worker
docker-compose logs worker-extraction | tail -100
# 检查队列状态
docker-compose exec redis redis-cli -a $REDIS_PASSWORD LLEN extraction_queue
# 检查文件状态
curl http://localhost:8088/api/v1/files/$FILE_ID \
-H "Authorization: Bearer $API_KEY"
解决方案¶
OCR 超时:
# 增加超时时间
OCR_TIMEOUT_SECONDS=600
# 重试提取
curl -X POST http://localhost:8088/api/v1/extractions \
-H "Authorization: Bearer $API_KEY" \
-d '{"file_id": "xxx", "mode": "simple"}'
内存不足:
搜索结果为空¶
症状¶
诊断¶
-- 检查向量索引状态
SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'chunks';
-- 检查数据
SELECT count(*) FROM chunks WHERE knowledge_base_id = 'xxx';
-- 检查向量维度
SELECT embedding::text FROM chunks LIMIT 1;
解决方案¶
索引未创建:
-- 创建 HNSW 索引
CREATE INDEX CONCURRENTLY idx_chunks_embedding ON chunks
USING hnsw (embedding vector_cosine_ops);
维度不匹配:
数据未索引:
# 触发重新索引
curl -X POST http://localhost:8088/api/v1/knowledge-bases/$KB_ID/reindex \
-H "Authorization: Bearer $API_KEY"
性能问题¶
症状¶
诊断¶
# 检查资源使用
docker stats
# 检查慢查询
docker-compose exec postgres psql -U unifiles -d unifiles -c "
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
"
# 检查连接池
curl http://localhost:8088/metrics | grep pool
解决方案¶
数据库慢查询:
-- 分析查询计划
EXPLAIN ANALYZE SELECT * FROM chunks
WHERE knowledge_base_id = 'xxx'
ORDER BY embedding <=> '[...]'::vector
LIMIT 10;
-- 更新统计信息
ANALYZE chunks;
-- 调整索引参数
SET hnsw.ef_search = 100;
连接池耗尽:
# 增加连接池大小
PG_POOL_MAX=50
# 检查连接泄漏
SELECT * FROM pg_stat_activity WHERE state = 'idle in transaction';
扩容:
Worker 队列积压¶
症状¶
诊断¶
# 检查队列大小
docker-compose exec redis redis-cli -a $REDIS_PASSWORD LLEN upload_queue
docker-compose exec redis redis-cli -a $REDIS_PASSWORD LLEN extraction_queue
# 检查 Worker 状态
docker-compose ps | grep worker
docker-compose logs worker-upload --tail=50
解决方案¶
Worker 异常:
增加 Worker 数量:
清理死信队列:
日志分析¶
常见错误模式¶
# 搜索错误日志
docker-compose logs api 2>&1 | grep -i error | tail -50
# 统计错误类型
docker-compose logs api 2>&1 | grep -i error | \
sed 's/.*ERROR.*| //' | sort | uniq -c | sort -rn
关键日志位置¶
| 组件 | 日志位置 |
|---|---|
| API | docker-compose logs api |
| Worker | docker-compose logs worker-* |
| PostgreSQL | docker-compose logs postgres |
| Redis | docker-compose logs redis |
| MinIO | docker-compose logs minio |
获取帮助¶
收集诊断信息¶
#!/bin/bash
# collect_diagnostics.sh
OUTPUT="diagnostics_$(date +%Y%m%d_%H%M%S)"
mkdir -p $OUTPUT
# 系统信息
uname -a > $OUTPUT/system.txt
df -h >> $OUTPUT/system.txt
free -m >> $OUTPUT/system.txt
# Docker 信息
docker version > $OUTPUT/docker.txt
docker-compose ps >> $OUTPUT/docker.txt
docker stats --no-stream >> $OUTPUT/docker.txt
# 服务日志
docker-compose logs --tail=500 > $OUTPUT/logs.txt
# 配置 (隐藏敏感信息)
docker-compose config | sed 's/password:.*/password: [REDACTED]/' > $OUTPUT/config.txt
# 健康检查
curl -s http://localhost:8088/health/detailed > $OUTPUT/health.json
# 打包
tar -czf ${OUTPUT}.tar.gz $OUTPUT
rm -rf $OUTPUT
echo "Diagnostics collected: ${OUTPUT}.tar.gz"
报告问题¶
在报告问题时,请提供:
- Unifiles 版本
- 部署方式 (Docker/Kubernetes)
- 错误消息和日志
- 复现步骤
- 诊断信息包
GitHub Issues: https://github.com/your-org/unifiles/issues