# Mirror
**Repository Path**: baochenlong/Mirror
## Basic Information
- **Project Name**: Mirror
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: anonymous
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-11-04
- **Last Updated**: 2024-11-04
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
🪞 Mirror: A Universal Framework for Various Information Extraction Tasks
## 🔥 Supported Tasks
1. Named Entity Recognition
2. Entity Relationship Extraction (Triplet Extraction)
3. Event Extraction
4. Aspect-based Sentiment Analysis
5. Multi-span Extraction (e.g. Discontinuous NER)
6. N-ary Extraction (e.g. Hyper Relation Extraction)
7. Extractive Machine Reading Comprehension (MRC) and Question Answering
8. Classification & Multi-choice MRC

## 🌴 Dependencies
Python>=3.10
```bash
pip install -r requirements.txt
```
## 🚀 QuickStart
### Pretrained Model Weights & Datasets
Download the pretrained model weights & datasets from [[Anonymized OSF]](https://osf.io/kwsm4/?view_only=91a610f7a81a430eb953378f26a8054c) .
No worries, it's an anonymous link just for double blind peer reviewing.
### Pretraining
1. Download and unzip the pretraining corpus into `resources/Mirror/v1.4_sampled_v3/merged/all_excluded`
2. Start to run
```bash
CUDA_VISIBLE_DEVICES=0 rex train -m src.task -dc conf/Pretrain_excluded.yaml
```
### Fine-tuning
⚠️ Due to data license constraints, some datasets are unavailable to provide directly (e.g. ACE04, ACE05).
1. Download and unzip the pretraining corpus into `resources/Mirror/v1.4_sampled_v3/merged/all_excluded`
2. Download and unzip the fine-tuning datasets into `resources/Mirror/uie/`
3. Start to fine-tuning
```bash
# UIE tasks
CUDA_VISIBLE_DEVICES=0 bash scripts/single_task_wPTAllExcluded_wInstruction/run1.sh
CUDA_VISIBLE_DEVICES=1 bash scripts/single_task_wPTAllExcluded_wInstruction/run2.sh
CUDA_VISIBLE_DEVICES=2 bash scripts/single_task_wPTAllExcluded_wInstruction/run3.sh
CUDA_VISIBLE_DEVICES=3 bash scripts/single_task_wPTAllExcluded_wInstruction/run4.sh
# Multi-span and N-ary extraction
CUDA_VISIBLE_DEVICES=4 bash scripts/single_task_wPTAllExcluded_wInstruction/run_new_tasks.sh
# GLUE datasets
CUDA_VISIBLE_DEVICES=5 bash scripts/single_task_wPTAllExcluded_wInstruction/glue.sh
```
### Analysis Experiments
- Few-shot experiments : `scripts/run_fewshot.sh`. Collecting results: `python mirror_fewshot_outputs/get_avg_results.py`
- Mirror w/ PT w/o Inst. : `scripts/single_task_wPTAllExcluded_woInstruction`
- Mirror w/o PT w/ Inst. : `scripts/single_task_wo_pretrain`
- Mirror w/o PT w/o Inst. : `scripts/single_task_wo_pretrain_wo_instruction`
### Evaluation
1. Change `task_dir` and `data_pairs` you want to evaluate. The default setting is to get results of Mirrordirect on all downstream tasks.
2. `CUDA_VISIBLE_DEVICES=0 python -m src.eval`
### Demo
1. Download and unzip the pretrained task dump into `mirror_outputs/Mirror_Pretrain_AllExcluded_2`
2. Try our demo:
```bash
CUDA_VISIBLE_DEVICES=0 python -m src.app.api_backend
```

## 💌 Others
This project is licensed under Apache-2.0.
We hope you enjoy it ~