# TAPD LABEL AGENT

**Repository Path**: maniacratt/tapd-label-agent

## Basic Information

- **Project Name**: TAPD LABEL AGENT
- **Description**: No description available
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-12-18
- **Last Updated**: 2026-02-08

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# TAPD需求标签智能分类系统

基于Tornado异步架构的高性能AI Agent服务，专为TAPD需求管理设计，提供智能标签分类、自动监听新需求、多标签语义预测等功能。

## 🎯 核心功能

1. **TAPD集成** - 自动监听TAPD新需求并打标签
2. **多标签语义分类** - 预测时间价值、结果代价、预估工作量、来源价值
3. **高性能异步架构** - Tornado + 连接池 + 批量处理
4. **模型 & 关键词预加载** - 系统启动时自动加载语义模型及 KeyBERT 关键词抽取器
5. **完整监控** - 性能指标、错误追踪、日志系统

## Architecture

### Core Components

1. **Template Pattern Base Classes**
   - `BaseHandler`: Template for all HTTP request handlers
   - `AIAgent`: Template for AI agents
   - `MLModel`: Template for ML models
   - `ScheduledTask`: Template for scheduled tasks

2. **Managers**
   - `AgentManager`: Manages AI agents lifecycle
   - `ModelManager`: Manages ML models lifecycle

3. **Scheduler**
   - `TaskScheduler`: APScheduler-based task scheduling with cron and interval support

4. **Handlers**
   - Health check endpoint
   - Agent execution and status endpoints
   - Model training and prediction endpoints

## Features

- **Async Architecture**: Built on Tornado for high-performance async request handling
- **Template Pattern**: Clean, extensible design for handlers, agents, and models
- **Scheduled Tasks**: Support for cron expressions and fixed intervals
- **ML Model Support**: Classification models with scikit-learn
- **RESTful API**: Full API for agent execution and model management
- **Error Handling**: Comprehensive error handling and logging

## Installation

```bash
pip install -r requirements.txt
```

### Windows 环境注意事项（CPU 版推荐）

- 已固定 `torch==2.2.2+cpu / torchvision==0.17.2+cpu / torchaudio==2.2.2+cpu`，避免 DLL 依赖问题。
- 如需重新安装（CPU，用 pip 指令）：  
  ```powershell
  python -m pip install --force-reinstall --no-cache-dir  torch==2.2.2+cpu torchvision==0.17.2+cpu torchaudio==2.2.2+cpu -f https://download.pytorch.org/whl/torch_stable.html
  ```
- 确保已安装 Microsoft VC++ 2015-2022 运行库（若缺失可从微软官网安装）。
- 如果有 NVIDIA GPU 且希望用 CUDA，请改用对应 CUDA 版本的 torch/torchvision/torchaudio，并使用 PyTorch 官方提供的下载命令。

## Configuration

Edit `.env` (或 `.env.example`) 以配置以下关键参数：

| 变量 | 说明 | 默认值 |
| --- | --- | --- |
| `DEBUG` | 是否启用调试模式 | `True` |
| `PORT` / `HOST` | 服务监听端口与地址 | `8888` / `0.0.0.0` |
| `SCHEDULER_ENABLED` | 是否启动 APScheduler | `True` |
| `MODEL_DIR` / `DATA_DIR` / `LOG_DIR` | 模型、数据、日志目录 | `./models` / `./data` / `./logs` |
| `ENABLE_KEYWORD_EXTRACTION` | 是否启用 KeyBERT 关键词提取。TAPD 预测服务会在应用启动时预加载 KeyBERT 模型（若此值为 `True`）。 | `True` |
| `KEYBERT_MODEL_PATH` | 指定 KeyBERT 使用的本地 SentenceTransformer 模型路径。为空时使用内置 `models/shibing624/text2vec-base-chinese`。 | `""` |
| `ST_MODEL_PATH` | SentenceTransformer 语义编码模型路径，供语义分类器/KeyBERT 复用 | 见 `models/shibing624/text2vec-base-chinese` |
| `TAPD_API_*` | TAPD API 凭证（见 TAPD 集成文档） | - |

> 提示：应用启动时 `app.Application` 会执行模型与 KeyBERT 预加载，并实例化 `TAPDPredictionService`。若 `KEYBERT_MODEL_PATH` 指向网络盘或大型模型，建议提前准备本地缓存以缩短启动时间。

## Running the Application

```bash
python app.py
```

The server will start on `http://0.0.0.0:8888`

## Training / Prediction (模型)

### 训练
```bash
D:\PythonProject\tapd-label-agent\.venv\Scripts\python.exe models\train_semantic_model.py
```
训练会：
- 使用 `text2vec-base-chinese` 生成语义向量，提升语言理解和推理能力
- 训练 4 个标签模型（时间价值/结果代价/预估工作量/来源价值）
- 保存模型到 `./models`

### 预测
- 单条预测示例：`models\predict_requirements.py --mode single`
- 批量预测示例：`models\predict_requirements.py --mode batch`

### 数据&标签配置
- 训练数据默认读取 `models/test.csv`
- 标签定义在 `config.py` 的 `LABEL_DEFINITIONS`，已合并配置标签与数据中的高频标签
  - 若新增/调整标签，请同步更新 `config.py` 并补充训练样本

### 说明
- 预测结果会返回 `reasoning` 字段，基于相似案例的“推荐理由”（case-based reasoning）。
- 若训练时报缺 DLL，请按上方 Windows 环境步骤重新安装 CPU 版 torch。
- 支持离线/内网训练：通过环境变量 `ST_MODEL_PATH` 指定本地模型目录。

### 离线下载与训练（无公网 / VPN）
目标：所有文件都落到仓库内的 `models`，离线机不访问公网。

步骤（先在可联网机执行，再整体拷到离线机）：
1. 准备模型文件到仓库路径（联网机执行 Python 脚本下载）  
   ```bash
   pip install huggingface_hub==0.16.4
   python - <<'PY'
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="shibing624/text2vec-base-chinese",
    local_dir=r"D:\PythonProject\tapd-label-agent\models",
    local_dir_use_symlinks=False,
    resume_download=True,
)
PY
   ```
   确认 `models/shibing624` 下有 `config.json`、tokenizer、`pytorch_model.bin` 等完整文件。

2. 打包依赖离线安装包  
   ```bash
   pip download -r requirements.txt -d D:\PythonProject\tapd-label-agent\offline_wheels
   ```

3. 拷贝到离线机  
   - 拷贝整个仓库目录，确保包含 `models\tensor\` 与 `offline_wheels\`。

4. 离线机安装依赖  
   ```powershell
   python -m pip install --no-index --find-links D:\PythonProject\tapd-label-agent\offline_wheels -r requirements.txt
   ```

5. 离线机训练（使用本地模型）  
   ```powershell
   $env:ST_MODEL_PATH="D:\PythonProject\tapd-label-agent\models\tensor"
   python D:\PythonProject\tapd-label-agent\models\train_semantic_model.py
   ```

补充：
- 如需 CPU 版 PyTorch 离线包，可在联网机执行  
  `pip download torch==2.2.2+cpu torchvision==0.17.2+cpu torchaudio==2.2.2+cpu -d D:\PythonProject\tapd-label-agent\offline_wheels`

## API Endpoints

### Health Check
```
GET /health
```

### Agents
```
GET /agents                    # List all agents
GET /agents/{agent_id}         # Get agent status
POST /agents/{agent_id}        # Execute agent
```

### Models
```
GET /models                    # List all models
GET /models/{model_id}/train   # Get model info
POST /models/{model_id}/train  # Train model
POST /models/{model_id}/predict # Make predictions
```

## Example Usage

### 1. Create and Train a Model

```python
from core.ml_model import ClassificationModel
from managers.model_manager import ModelManager
import numpy as np

model_manager = ModelManager()
model = ClassificationModel('classifier')
X = np.random.rand(100, 10)
y = np.random.randint(0, 2, 100)

metrics = await model.train(X, y)
model_manager.register_model(model)
```

### 2. Create an Agent

```python
from core.ai_agent import ClassificationAgent
from managers.agent_manager import AgentManager

agent_manager = AgentManager()
agent = ClassificationAgent('agent_1', 'My Agent', model)
agent_manager.register_agent(agent)
```

### 3. Execute Agent

```python
result = await agent_manager.execute_agent('agent_1', {
    'features': [0.1, 0.2, 0.3, ...]
})
```

### 4. Schedule Tasks

```python
from core.scheduler import ScheduledTask, TaskScheduler

class MyTask(ScheduledTask):
    async def execute(self):
        print("Task executed")

scheduler = TaskScheduler()
task = MyTask('task_1', 'My Task')
scheduler.register_interval_task(task, seconds=3600)
scheduler.start()
```

## Extending the System

### Create Custom Agent

```python
from core.ai_agent import AIAgent

class CustomAgent(AIAgent):
    async def process(self, input_data):
        # Your custom logic here
        return {'result': 'processed'}
```

### Create Custom Model

```python
from core.ml_model import MLModel

class CustomModel(MLModel):
    def build_model(self):
        # Build your model
        pass
    
    async def train(self, X, y, **kwargs):
        # Train your model
        pass
    
    async def predict(self, X):
        # Make predictions
        pass
```

### Create Custom Handler

```python
from core.base_handler import BaseHandler

class CustomHandler(BaseHandler):
    async def handle_request(self, *args, **kwargs):
        # Your handler logic
        self.write_json({'result': 'success'})
```

## Project Structure

```
tapd-label-agent/
├── app.py                 # Main application
├── config.py             # Configuration
├── requirements.txt      # Dependencies
├── .env                  # Environment variables
├── core/
│   ├── base_handler.py   # Base handler template
│   ├── ai_agent.py       # AI agent template
│   ├── ml_model.py       # ML model template
│   └── scheduler.py      # Task scheduler
├── handlers/
│   ├── health_handler.py
│   ├── agent_handler.py
│   └── model_handler.py
├── managers/
│   ├── agent_manager.py
│   └── model_manager.py
└── examples/
    └── example_usage.py
```

## Performance Considerations

- Async/await for non-blocking I/O
- Connection pooling for database operations
- Model caching to avoid reloading
- Efficient numpy operations for predictions
- APScheduler for background task management

## License

MIT