# TAPD LABEL AGENT **Repository Path**: maniacratt/tapd-label-agent ## Basic Information - **Project Name**: TAPD LABEL AGENT - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-18 - **Last Updated**: 2026-02-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # TAPD需求标签智能分类系统 基于Tornado异步架构的高性能AI Agent服务,专为TAPD需求管理设计,提供智能标签分类、自动监听新需求、多标签语义预测等功能。 ## 🎯 核心功能 1. **TAPD集成** - 自动监听TAPD新需求并打标签 2. **多标签语义分类** - 预测时间价值、结果代价、预估工作量、来源价值 3. **高性能异步架构** - Tornado + 连接池 + 批量处理 4. **模型 & 关键词预加载** - 系统启动时自动加载语义模型及 KeyBERT 关键词抽取器 5. **完整监控** - 性能指标、错误追踪、日志系统 ## Architecture ### Core Components 1. **Template Pattern Base Classes** - `BaseHandler`: Template for all HTTP request handlers - `AIAgent`: Template for AI agents - `MLModel`: Template for ML models - `ScheduledTask`: Template for scheduled tasks 2. **Managers** - `AgentManager`: Manages AI agents lifecycle - `ModelManager`: Manages ML models lifecycle 3. **Scheduler** - `TaskScheduler`: APScheduler-based task scheduling with cron and interval support 4. **Handlers** - Health check endpoint - Agent execution and status endpoints - Model training and prediction endpoints ## Features - **Async Architecture**: Built on Tornado for high-performance async request handling - **Template Pattern**: Clean, extensible design for handlers, agents, and models - **Scheduled Tasks**: Support for cron expressions and fixed intervals - **ML Model Support**: Classification models with scikit-learn - **RESTful API**: Full API for agent execution and model management - **Error Handling**: Comprehensive error handling and logging ## Installation ```bash pip install -r requirements.txt ``` ### Windows 环境注意事项(CPU 版推荐) - 已固定 `torch==2.2.2+cpu / torchvision==0.17.2+cpu / torchaudio==2.2.2+cpu`,避免 DLL 依赖问题。 - 如需重新安装(CPU,用 pip 指令): ```powershell python -m pip install --force-reinstall --no-cache-dir torch==2.2.2+cpu torchvision==0.17.2+cpu torchaudio==2.2.2+cpu -f https://download.pytorch.org/whl/torch_stable.html ``` - 确保已安装 Microsoft VC++ 2015-2022 运行库(若缺失可从微软官网安装)。 - 如果有 NVIDIA GPU 且希望用 CUDA,请改用对应 CUDA 版本的 torch/torchvision/torchaudio,并使用 PyTorch 官方提供的下载命令。 ## Configuration Edit `.env` (或 `.env.example`) 以配置以下关键参数: | 变量 | 说明 | 默认值 | | --- | --- | --- | | `DEBUG` | 是否启用调试模式 | `True` | | `PORT` / `HOST` | 服务监听端口与地址 | `8888` / `0.0.0.0` | | `SCHEDULER_ENABLED` | 是否启动 APScheduler | `True` | | `MODEL_DIR` / `DATA_DIR` / `LOG_DIR` | 模型、数据、日志目录 | `./models` / `./data` / `./logs` | | `ENABLE_KEYWORD_EXTRACTION` | 是否启用 KeyBERT 关键词提取。TAPD 预测服务会在应用启动时预加载 KeyBERT 模型(若此值为 `True`)。 | `True` | | `KEYBERT_MODEL_PATH` | 指定 KeyBERT 使用的本地 SentenceTransformer 模型路径。为空时使用内置 `models/shibing624/text2vec-base-chinese`。 | `""` | | `ST_MODEL_PATH` | SentenceTransformer 语义编码模型路径,供语义分类器/KeyBERT 复用 | 见 `models/shibing624/text2vec-base-chinese` | | `TAPD_API_*` | TAPD API 凭证(见 TAPD 集成文档) | - | > 提示:应用启动时 `app.Application` 会执行模型与 KeyBERT 预加载,并实例化 `TAPDPredictionService`。若 `KEYBERT_MODEL_PATH` 指向网络盘或大型模型,建议提前准备本地缓存以缩短启动时间。 ## Running the Application ```bash python app.py ``` The server will start on `http://0.0.0.0:8888` ## Training / Prediction (模型) ### 训练 ```bash D:\PythonProject\tapd-label-agent\.venv\Scripts\python.exe models\train_semantic_model.py ``` 训练会: - 使用 `text2vec-base-chinese` 生成语义向量,提升语言理解和推理能力 - 训练 4 个标签模型(时间价值/结果代价/预估工作量/来源价值) - 保存模型到 `./models` ### 预测 - 单条预测示例:`models\predict_requirements.py --mode single` - 批量预测示例:`models\predict_requirements.py --mode batch` ### 数据&标签配置 - 训练数据默认读取 `models/test.csv` - 标签定义在 `config.py` 的 `LABEL_DEFINITIONS`,已合并配置标签与数据中的高频标签 - 若新增/调整标签,请同步更新 `config.py` 并补充训练样本 ### 说明 - 预测结果会返回 `reasoning` 字段,基于相似案例的“推荐理由”(case-based reasoning)。 - 若训练时报缺 DLL,请按上方 Windows 环境步骤重新安装 CPU 版 torch。 - 支持离线/内网训练:通过环境变量 `ST_MODEL_PATH` 指定本地模型目录。 ### 离线下载与训练(无公网 / VPN) 目标:所有文件都落到仓库内的 `models`,离线机不访问公网。 步骤(先在可联网机执行,再整体拷到离线机): 1. 准备模型文件到仓库路径(联网机执行 Python 脚本下载) ```bash pip install huggingface_hub==0.16.4 python - <<'PY' from huggingface_hub import snapshot_download snapshot_download( repo_id="shibing624/text2vec-base-chinese", local_dir=r"D:\PythonProject\tapd-label-agent\models", local_dir_use_symlinks=False, resume_download=True, ) PY ``` 确认 `models/shibing624` 下有 `config.json`、tokenizer、`pytorch_model.bin` 等完整文件。 2. 打包依赖离线安装包 ```bash pip download -r requirements.txt -d D:\PythonProject\tapd-label-agent\offline_wheels ``` 3. 拷贝到离线机 - 拷贝整个仓库目录,确保包含 `models\tensor\` 与 `offline_wheels\`。 4. 离线机安装依赖 ```powershell python -m pip install --no-index --find-links D:\PythonProject\tapd-label-agent\offline_wheels -r requirements.txt ``` 5. 离线机训练(使用本地模型) ```powershell $env:ST_MODEL_PATH="D:\PythonProject\tapd-label-agent\models\tensor" python D:\PythonProject\tapd-label-agent\models\train_semantic_model.py ``` 补充: - 如需 CPU 版 PyTorch 离线包,可在联网机执行 `pip download torch==2.2.2+cpu torchvision==0.17.2+cpu torchaudio==2.2.2+cpu -d D:\PythonProject\tapd-label-agent\offline_wheels` ## API Endpoints ### Health Check ``` GET /health ``` ### Agents ``` GET /agents # List all agents GET /agents/{agent_id} # Get agent status POST /agents/{agent_id} # Execute agent ``` ### Models ``` GET /models # List all models GET /models/{model_id}/train # Get model info POST /models/{model_id}/train # Train model POST /models/{model_id}/predict # Make predictions ``` ## Example Usage ### 1. Create and Train a Model ```python from core.ml_model import ClassificationModel from managers.model_manager import ModelManager import numpy as np model_manager = ModelManager() model = ClassificationModel('classifier') X = np.random.rand(100, 10) y = np.random.randint(0, 2, 100) metrics = await model.train(X, y) model_manager.register_model(model) ``` ### 2. Create an Agent ```python from core.ai_agent import ClassificationAgent from managers.agent_manager import AgentManager agent_manager = AgentManager() agent = ClassificationAgent('agent_1', 'My Agent', model) agent_manager.register_agent(agent) ``` ### 3. Execute Agent ```python result = await agent_manager.execute_agent('agent_1', { 'features': [0.1, 0.2, 0.3, ...] }) ``` ### 4. Schedule Tasks ```python from core.scheduler import ScheduledTask, TaskScheduler class MyTask(ScheduledTask): async def execute(self): print("Task executed") scheduler = TaskScheduler() task = MyTask('task_1', 'My Task') scheduler.register_interval_task(task, seconds=3600) scheduler.start() ``` ## Extending the System ### Create Custom Agent ```python from core.ai_agent import AIAgent class CustomAgent(AIAgent): async def process(self, input_data): # Your custom logic here return {'result': 'processed'} ``` ### Create Custom Model ```python from core.ml_model import MLModel class CustomModel(MLModel): def build_model(self): # Build your model pass async def train(self, X, y, **kwargs): # Train your model pass async def predict(self, X): # Make predictions pass ``` ### Create Custom Handler ```python from core.base_handler import BaseHandler class CustomHandler(BaseHandler): async def handle_request(self, *args, **kwargs): # Your handler logic self.write_json({'result': 'success'}) ``` ## Project Structure ``` tapd-label-agent/ ├── app.py # Main application ├── config.py # Configuration ├── requirements.txt # Dependencies ├── .env # Environment variables ├── core/ │ ├── base_handler.py # Base handler template │ ├── ai_agent.py # AI agent template │ ├── ml_model.py # ML model template │ └── scheduler.py # Task scheduler ├── handlers/ │ ├── health_handler.py │ ├── agent_handler.py │ └── model_handler.py ├── managers/ │ ├── agent_manager.py │ └── model_manager.py └── examples/ └── example_usage.py ``` ## Performance Considerations - Async/await for non-blocking I/O - Connection pooling for database operations - Model caching to avoid reloading - Efficient numpy operations for predictions - APScheduler for background task management ## License MIT