# Logic-RL
**Repository Path**: wang_wei_973667927/Logic-RL
## Basic Information
- **Project Name**: Logic-RL
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-02-06
- **Last Updated**: 2025-02-06
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Logic Rl
## Successfully reproduced DeepSeek R1 Zero on 2K Tiny Logic Puzzle Dataset.
See project explanation [here](https://evxpwrsfkdb.feishu.cn/docx/NokEdaMBmo6aqZxVdxkcSm2cnab?from=from_copylink).
Wandb project[here](https://wandb.ai/ustc_ai/GRPO_logic_KK/reports/GRPO-Zero--VmlldzoxMTIwOTYyNw?accessToken=gnbnl5mu5pwfww7gtwxymohg85w7d7vthvjvbl4w8yxg0a99vf1k22m11e61cvv8).
---
## ✨ Enhanced Features (After Rule-Based RL)
| 🚩 Uncertainty Marking | 📝 Progressive Summarization |
|------------------------|---------------------------|
| Flag ambiguous steps for verification | Maintain intermediate conclusions |
| ✅ Self Verification | 🌐 Multilingual Switching |
|-----------------------------|-------------------------------|
| First verify then answer | Chinese reasoning traces with English answers |
---
## 📸 Results Preview
 |
 |
| Model Output Example |
Average Output Length |
---
## 🛠️ Installation
```bash
conda create -n logic python=3.9
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip3 install vllm==0.6.3 ray
pip3 install flash-attn --no-build-isolation
pip install -e . # For verl integration
pip install wandb IPython matplotlib
```
---
## 📂 Data Preparation
You can directly use /data.
For your own data generation, here's a demo:
### Base Model
```bash
python ./examples/data_preprocess/kk.py \
--local_dir {processed_data_path} \
--data_path {raw_data_path}
```
### Instruct Model
```bash
python ./examples/data_preprocess/kk.py \
--template_type=qwen-instruct \
--local_dir {processed_data_path} \
--data_path {raw_data_path}
```
---
## 🚀 Training Execution
```bash
conda activate logic
bash main_grpo.sh # 4×A100 80G
```
---
## ⚙️ Implementation Details
| Component | Location |
|------------------------|-----------------------------------|
| 🏆 Reward Modeling | `verl/utils/reward_score/kk.py` |
| 📚 Data Preprocessing | `examples/data_preprocess/kk.py` |
---
---
## Citation
```
@misc{logic-rl,
author = {Tian Xie and Qingnan Ren and Yuqian Hong},
title = {Logic-RL},
howpublished = {https://github.com/Unakar/Logic-RL},
note = {Accessed: 2025-02-03},
year = {2025}
}
```
---
---
## 🙏 Acknowledgements
- [Verl](https://github.com/volcengine/verl) 🔗
- [TinyZero](https://github.com/Jiayi-Pan/TinyZero) 🔗