# qa_task **Repository Path**: eshijia/qa_task ## Basic Information - **Project Name**: qa_task - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2016-06-23 - **Last Updated**: 2020-12-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # QA-Task This is some code for doing language modeling with Keras (forked from [here](https://github.com/codekansas/keras-language-modeling)). I will update this repo when I have more experiment results about some QA tasks. Now, there are some code about question-answering tasks based on [codekansas](http://benjaminbolte.com/blog/2016/keras-language-modeling.html). Thanks for his efforts. ## Some Basic File Descriptions - `attention_lstm.py`: Attentional LSTM, based on one of the papers referenced in the codekansas's blog post and others. One application used it for [image captioning](http://arxiv.org/pdf/1502.03044.pdf). It is initialized with an attention vector which provides the attention component for the neural network. - `keras_model.py`: The `LanguageModel` class uses the `config` settings to generate a training model and a testing model. The model can be trained by passing a question vector, a ground truth answer vector, and a bad answer vector to `fit`. Then `predict` calculates the similarity between a question and answer. Override the `build` method with whatever language model you want to get a trainable model. Examples are provided at the bottom, including the `EmbeddingModel`, `ConvolutionModel`, and `AttentionModel`. - `xxx_embeddings.py`: A Word2Vec layer that uses the embeddings generated by Gensim's word2vec model to provide vectors in place of the Keras `Embedding` layer, which could help improve convergence, since fewer parameters need to be learned. *xxx* denotes which dataset the code's target. Note that this requires generating a separate file with the word2vec weights, so it doesn't fit in very nicely with the Keras architecture. - `results.txt`: The results which I have tested with some model (`nlpcc_dbqa_models/ever_test/`). ## How to Make the Code Work Before you work with the code, you should install [Keras](http://keras.io). The code uses Keras with this [pull request](https://github.com/fchollet/keras/pull/2413), and it isn't merged with the master branch of Keras. Therefore, you can step the running environment with the following steps. ``` mkdir virtual_env cd virtual_env virtualenv keras_new #在virtual_env目录下创建一个新的虚拟环境 source keras_new/bin/activate #激活该虚拟环境 #注意!!如果是Windows用户,就不是使用source,而是直接运行 keras_new\Scripts\activate pip install keras #安装官方版本的keras,以将所需的其它依赖安装 pip uninstall keras cd .. git clone https://github.com/eshijia/keras cd keras python setup.py install #安装自定义的keras pip install h5py #安装该模板,以用来载入或保存训练的模型 ``` 在激活虚拟环境后,所有的pip安装命令(不要加sudo)都会将模块安装到虚拟环境中,不会影响系统自身的Python环境。若要退出虚拟环境可以在激活状态下执行 `deactivate`。 Currently, there is one program you can run to do some trainning works. - `nlpcc_qa_eval.py`: Evaluation framework for the NLPCC-QA dataset. To get this working, clone the data repository and set the NLPCC_QA environment variable to the cloned repository.To run this code, you should first active the virtualenv of keras-codakansas, and check the `conf` (verify some parameters), then `python nlpcc_qa_eval.py`. The word2vec embedding file is in After trainning, you can comment the line of `evaluator.train(model)`, and uncomment the last two lines to evaluate the training result. Because the word embedding file of the dataset is too large for github, you should run `nlpcc_dbqa_embeddings.py` first to generate the `nlpcc_dbqa_word2vec_100_dim.embeddings` file. ### Data - L6 from [Yahoo Webscope](http://webscope.sandbox.yahoo.com/) - [InsuranceQA data](https://github.com/shuzi/insuranceQA) - [Pythonic version](https://github.com/codekansas/insurance_qa_python) - [NLPCC-Task Data](http://pan.baidu.com/s/1pLTCWOj)