# qa_task

**Repository Path**: eshijia/qa_task

## Basic Information

- **Project Name**: qa_task
- **Description**: No description available
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2016-06-23
- **Last Updated**: 2020-12-18

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# QA-Task

This is some code for doing language modeling with Keras (forked from [here](https://github.com/codekansas/keras-language-modeling)). I will update this repo when I have more experiment results about some QA tasks. Now, there are some code about question-answering tasks based on [codekansas](http://benjaminbolte.com/blog/2016/keras-language-modeling.html). Thanks for his efforts.

## Some Basic File Descriptions

 - `attention_lstm.py`: Attentional LSTM, based on one of the papers referenced in the codekansas's blog post and others. One application used it for [image captioning](http://arxiv.org/pdf/1502.03044.pdf). It is initialized with an attention vector which provides the attention component for the neural network.
 - `keras_model.py`: The `LanguageModel` class uses the `config` settings to generate a training model and a testing model. The model can be trained by passing a question vector, a ground truth answer vector, and a bad answer vector to `fit`. Then `predict` calculates the similarity between a question and answer. Override the `build` method with whatever language model you want to get a trainable model. Examples are provided at the bottom, including the `EmbeddingModel`, `ConvolutionModel`, and `AttentionModel`.
 - `xxx_embeddings.py`: A Word2Vec layer that uses the embeddings generated by Gensim's word2vec model to provide vectors in place of the Keras `Embedding` layer, which could help improve convergence, since fewer parameters need to be learned. *xxx* denotes which dataset the code's target. Note that this requires generating a separate file with the word2vec weights, so it doesn't fit in very nicely with the Keras architecture.
 - `results.txt`: The results which I have tested with some model (`nlpcc_dbqa_models/ever_test/`).

## How to Make the Code Work

Before you work with the code, you should install [Keras](http://keras.io). The code uses Keras with this [pull request](https://github.com/fchollet/keras/pull/2413), and it isn't merged with the master branch of Keras. Therefore, you can step the running environment with the following steps.

```
mkdir virtual_env  
cd virtual_env
virtualenv keras_new  #在virtual_env目录下创建一个新的虚拟环境
source keras_new/bin/activate  #激活该虚拟环境
#注意！！如果是Windows用户，就不是使用source，而是直接运行 keras_new\Scripts\activate
pip install keras  #安装官方版本的keras，以将所需的其它依赖安装
pip uninstall keras
cd ..
git clone https://github.com/eshijia/keras
cd keras
python setup.py install  #安装自定义的keras
pip install h5py  #安装该模板，以用来载入或保存训练的模型
```
在激活虚拟环境后，所有的pip安装命令（不要加sudo）都会将模块安装到虚拟环境中，不会影响系统自身的Python环境。若要退出虚拟环境可以在激活状态下执行 `deactivate`。

Currently, there is one program you can run to do some trainning works.

- `nlpcc_qa_eval.py`: Evaluation framework for the NLPCC-QA dataset. To get this working, clone the data repository and set the NLPCC_QA environment variable to the cloned repository.To run this code, you should first active the virtualenv of keras-codakansas, and check the `conf` (verify some parameters), then `python nlpcc_qa_eval.py`. The word2vec embedding file is in After trainning, you can comment the line of `evaluator.train(model)`, and uncomment the last two lines to evaluate the training result. Because the word embedding file of the dataset is too large for github, you should run `nlpcc_dbqa_embeddings.py` first to generate the `nlpcc_dbqa_word2vec_100_dim.embeddings` file.

### Data

 - L6 from [Yahoo Webscope](http://webscope.sandbox.yahoo.com/)
 - [InsuranceQA data](https://github.com/shuzi/insuranceQA)
   - [Pythonic version](https://github.com/codekansas/insurance_qa_python)
 - [NLPCC-Task Data](http://pan.baidu.com/s/1pLTCWOj)