# DualVD

**Repository Path**: pdsxsf/DualVD

## Basic Information

- **Project Name**: DualVD
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-01-02
- **Last Updated**: 2021-06-20

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue
====================================


![alt text](image/visual_result.png)
<p align="center">Example results from the VisDial v1.0 validation dataset.</p>


This is a PyTorch implementation for [DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue, AAAI2020](https://arxiv.org/abs/1911.07251).


  * [Requirements](#Requirements)
  * [Data](#Data)
  * [Training](#training)
  * [Evaluation](#evaluation)
  * [Acknowledgements](#acknowledgements)

If you use this code in your research, please consider citing:

```text
@inproceedings{jiang2019daulvd,
  title =  {DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue},
  author = {Xiaoze, Jiang and Jing, Yu and Zengchang, Qin and Yingying, Zhuang and Xingxing, Zhang and Yue, Hu and Qi, Wu},
  year =  {2020},
  booktitle = {AAAI}
}
```


Requirements
----------------------
This code is implemented using PyTorch v1.0, and provides out of the box support with CUDA 9 and CuDNN 7. 

```sh
conda create -n visdialch python=3.6
conda activate visdialch  # activate the environment and install all dependencies
cd DualVD/
pip install -r requirements.txt
```


Data
----------------------

1. Download the VisDial v1.0 dialog json files and images from [here][1].
2. Download the word counts file for VisDial v1.0 train split from [here][2]. 
They are used to build the vocabulary.
3. Use Faster-RCNN to extract image features from [here][3].
4. Use Large-Scale-VRD to extract visual relation embedding from [here][4].
5. Use Densecap to extract local captions from [here][5].
6. Generate ELMo word vectors from [here][6].
7. Download pre-trained GloVe word vectors from [here][7].


Training
--------


Train the DualVD model as:

```sh
python train.py --config-yml configs/lf_disc_faster_rcnn_x101_bs32.yml --gpu-ids 0 1 # provide more ids for multi-GPU execution other args...
```

The code have an `--overfit` flag, which can be useful for rapid debugging. It takes a batch of 5 examples and overfits the model on them.

### Saving model checkpoints

This script will save model checkpoints at every epoch as per path specified by `--save-dirpath`. Refer [visdialch/utils/checkpointing.py][8] for more details on how checkpointing is managed.

### Logging

Use [Tensorboard][9] for logging training progress. Recommended: execute `tensorboard --logdir /path/to/save_dir --port 8008` and visit `localhost:8008` in the browser.


Evaluation
----------

Evaluation of a trained model checkpoint can be done as follows:

```sh
python evaluate.py --config-yml /path/to/config.yml --load-pthpath /path/to/checkpoint.pth --split val --gpu-ids 0
```


Acknowledgements
----------------

* This code began with [batra-mlp-lab/visdial-challenge-starter-pytorch][10]. We thank the developers for doing most of the heavy-lifting.


[1]: https://visualdialog.org/data
[2]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/visdial_1.0_word_counts_train.json
[3]: https://github.com/peteanderson80/bottom-up-attention
[4]: https://github.com/jz462/Large-Scale-VRD.pytorch
[5]: https://github.com/jcjohnson/densecap
[6]: https://allennlp.org/elmo
[7]: https://github.com/stanfordnlp/GloVe
[8]: https://github.com/JXZe/DualVD/blob/master/visdialch/utils/checkpointing.py
[9]: https://www.github.com/lanpa/tensorboardX
[10]: https://github.com/batra-mlp-lab/visdial-challenge-starter-pytorch