# PaperRobot
**Repository Path**: hico111/PaperRobot
## Basic Information
- **Project Name**: PaperRobot
- **Description**: Code for PaperRobot: Incremental Draft Generation of Scientific Ideas
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-07-30
- **Last Updated**: 2020-12-19
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# PaperRobot: Incremental Draft Generation of Scientific Ideas
[PaperRobot: Incremental Draft Generation of Scientific Ideas](https://www.aclweb.org/anthology/P19-1191)
[[Sample Output]](https://eaglew.github.io/PaperRobot/)
Accepted by 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)
Table of Contents
=================
* [Overview](#overview)
* [Requirements](#requirements)
* [Quickstart](#quickstart)
* [Citation](#citation)
## Overview
## Requirements
#### Environment:
- Python 3.6 **CAUTION!! The model might not be saved and loaded properly under Python 3.5**
- Ubuntu 16.04/18.04 **CAUTION!! The model might not run properly on windows because [windows uses backslashes on the path while Linux/OS X uses forward slashes](https://www.howtogeek.com/181774/why-windows-uses-backslashes-and-everything-else-uses-forward-slashes/)**
#### Pacakges
You can click the following links for detailed installation instructions.
- [Pytorch 1.1](https://pytorch.org/get-started/previous-versions/)
- [NumPy 1.16.3](https://www.scipy.org/install.html)
- [SciPy 1.2.1](https://www.scipy.org/install.html)
- [NetworkX 2.3](https://networkx.github.io/documentation/stable/install.html)
#### Data:
- [PubMed Paper Reading Dataset](https://drive.google.com/open?id=1DLmxK5x7m8bDPK5ZfAROtGpkWZ_v980Z)
This dataset gathers 14,857 entities, 133 relations, and entities corresponding tokenized text from PubMed. It contains 875,698 training pairs, 109,462 development pairs, and 109,462 test pairs.
- [PubMed Term, Abstract, Conclusion, Title Dataset](https://drive.google.com/open?id=1O91gX2maPHdIRUb9DdZmUOI5issRMXMY)
This dataset gathers three types of pairs: Title-to-Abstract (Training: 22,811/Development: 2095/Test: 2095), Abstract-to-Conclusion and Future work (Training: 22,811/Development: 2095/Test: 2095), Conclusion and Future work-to-Title (Training: 15,902/Development: 2095/Test: 2095) from PubMed. Each pair contains a pair of input and output as well as the corresponding terms(from original KB and link prediction results).
## Quickstart
### Existing paper reading
**CAUTION!! Because the dataset is quite large, the training and evaluation of link prediction model will be pretty slow.**
#### Preprocessing:
Download and unzip the `paper_reading.zip` from [PubMed Paper Reading Dataset](https://drive.google.com/open?id=1DLmxK5x7m8bDPK5ZfAROtGpkWZ_v980Z)
. Put `paper_reading` folder under the `Existing paper reading` folder.
#### Training
Hyperparameter can be adjusted as follows: For example, if you want to change the number of hidden unit to 6, you can append `--hidden 6` after `train.py`
```
python train.py
```
To resume training, you can apply the following command and put the previous model path after the `--model`
```
python train.py --cont --model models/GATA/best_dev_model.pth.tar
```
#### Test
Put the finished model path after the `--model`
The `test.py` will provide the ranking score for the test set.
```
python test.py --model models/GATA/best_dev_model.pth.tar
```
### New paper writing
#### Preprocessing:
Download and unzip the `data_pubmed_writing.zip` from [PubMed Term, Abstract, Conclusion, Title Dataset](https://drive.google.com/open?id=1O91gX2maPHdIRUb9DdZmUOI5issRMXMY)
. Put `data` folder under the `New paper writing folder`.
#### Training
Put the type of data after the `--data_path`. For example, if you want to train an abstract model, put `data/pubmed_abstract` after `--data_path`.
Put the model directory after the `--model_dp`
```
python train.py --data_path data/pubmed_abstract --model_dp abstract_model/
```
To resume training, you can apply the following command and put the previous model path after the `--model`
```
python train.py --data_path data/pubmed_abstract --cont --model abstract_model/memory/best_dev_model.pth.tar
```
For more other options, please check the code.
#### Test
Put the finished model path after the `--model`
The `test.py` will provide the score for the test set.
```
python test.py --data_path data/pubmed_abstract --model abstract_model/memory/best_dev_model.pth.tar
```
#### Predict an instance
Put the finished model path after the `--model`
The `input.py` will provide the prediction for customized input.
```
python input.py --data_path data/pubmed_abstract --model abstract_model/memory/best_dev_model.pth.tar
```
## Citation
```
@inproceedings{wang-etal-2019-paperrobot,
title = "{P}aper{R}obot: Incremental Draft Generation of Scientific Ideas",
author = "Wang, Qingyun and
Huang, Lifu and
Jiang, Zhiying and
Knight, Kevin and
Ji, Heng and
Bansal, Mohit and
Luan, Yi",
booktitle = "Proceedings of the 57th Conference of the Association for Computational Linguistics",
month = jul,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P19-1191",
pages = "1980--1991"
}
```