# ATEC_Task_1 **Repository Path**: eshijia/ATEC_Task_1 ## Basic Information - **Project Name**: ATEC_Task_1 - **Description**: No description available - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2018-05-12 - **Last Updated**: 2020-12-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ATEC Task 1 ## 使用方法 - 准备工作 `mkdir datasets` 将训练集文件copy至datasets文件夹 - 评估模型性能 `python run_train_and_evaluate.py` - 生成提交文件 `bash submit.sh` ## 实验记录 ### Partial Data | Model | P | R | F1 | Accuracy | AUC | Comment | Online F1 | |:------------------:|:------:|:------:|:------:|:--------:|:------:|----------------|:---------:| | Siamese CNN | 0.5287 | 0.4578 | 0.4908 | 0.7935 | 0.6723 | word embedding | | | Siamese CNN | 0.5347 | 0.5322 | 0.5334 | 0.7977 | 0.7018 | char embedding | | | Siamese BiLSTM | 0.5253 | 0.5707 | 0.5471 | 0.7949 | 0.7138 | word embedding | | | Siamese BiLSTM | 0.5833 | 0.6041 | 0.5935 | 0.8202 | 0.7421 | char embedding | 0.7679 | | Siamese CNN-BiLSTM | 0.5356 | 0.5058 | 0.5203 | 0.7973 | 0.6920 | word embedding | | | Siamese CNN-BiLSTM | 0.5060 | 0.6433 | 0.5664 | 0.7860 | 0.7345 | char embedding | | | Siamese Category | 0.3405 | 0.7287 | 0.4641 | 0.6344 | 0.6685 | word embedding | 0.6023 | | Siamese Category | 0.3929 | 0.7275 | 0.5103 | 0.6967 | 0.7077 | char embedding | 0.6721 | ### All Data (0.2 dev) | Model | P | R | F1 | Accuracy | AUC | Comment | Online F1 | |:------------------:|:------:|:------:|:------:|:--------:|:------:|---------------------|:---------:| | Siamese CNN | 0.5102 | 0.4300 | 0.4667 | 0.8208 | 0.6690 | word embedding | | | Siamese CNN | 0.5202 | 0.4367 | 0.4783 | 0.8238 | 0.6734 | char embedding | | | Siamese CNN | 0.5627 | 0.4335 | 0.4897 | 0.8353 | 0.6792 | mixed embedding | | | Siamese BiLSTM | 0.5250 | 0.5245 | 0.5248 | 0.8268 | 0.7093 | word embedding | 0.6003 | | Siamese BiLSTM | 0.5698 | 0.4841 | 0.5234 | 0.8393 | 0.7013 | char embedding | 0.5829 | | Siamese BiLSTM | 0.5353 | 0.5149 | 0.5249 | 0.8301 | 0.7076 | mixed embedding | 0.5936 | | Siamese CNN-BiLSTM | 0.4487 | 0.5135 | 0.4789 | 0.7963 | 0.6864 | word embedding | | | Siamese CNN-BiLSTM | 0.4487 | 0.5518 | 0.4950 | 0.7947 | 0.7003 | char embedding | | | Siamese CNN-BiLSTM | 0.5368 | 0.5050 | 0.5204 | 0.8303 | 0.7039 | mixed embedding | 0.5901 | | Siamese Category | 0.4521 | 0.4761 | 0.4638 | 0.7993 | 0.6737 | word embedding | | | Siamese Category | 0.4527 | 0.5111 | 0.4801 | 0.7982 | 0.6867 | char embedding | | | Siamese Category | 0.4145 | 0.5223 | 0.4622 | 0.7784 | 0.6789 | mixed embedding | | | Siamese BiLSTM | 0.4785 | 0.6280 | 0.5432 | 0.8074 | 0.7377 | char+dropout(0.5) | | | Siamese BiLSTM | 0.5362 | 0.5274 | 0.5318 | 0.8306 | 0.7128 | char+1:4.48 | 0.5994 | | Siamese BiLSTM | 0.3999 | 0.7137 | 0.5126 | 0.7525 | 0.7374 | char+1:4.48+dropout | 0.5893 | | ensemble | 0.5154 | 0.5451 | 0.5298 | 0.8236 | 0.7154 | cnn word+char | | | ensemble | 0.4771 | 0.5534 | 0.5237 | 0.8165 | 0.7143 | bilstm word+char | | |Siamese BiLSTM (0.65)| 0.5073| 0.5601 | 0.5324 | 0.8206 | 0.7194 | char embedding attention| 0.6015 | | Siamese BiLSTMAtt | 0.5158 | 0.6599 | 0.5791 | 0.8250 | 0.7609 | enhanced LSTM attention| 0.6608 | | Siamese BiLSTMAtt | 0.5240 | 0.6695 | 0.5879 | 0.8288 | 0.7669 | enhanced LSTM attention| 0.6712 | | Siamese BiLSTMAtt | 0.5185 | 0.6928 | 0.5931 | 0.8267 | 0.7747 | dropout(0.2) after embedding + l2(0.01) with lstm, bsize=128| 0.6746 | | Siamese BiLSTMAtt | 0.5356 | 0.6569 | 0.5901| 0.8336 | 0.7650 | dropout(0.2) after embedding + l2(0.01) with lstm, bsize=64| 0.6733 | ### All Data (0.1 dev) | Model | P | R | F1 | Accuracy | AUC | Comment | Online F1 | |:------------------:|:------:|:------:|:------:|:--------:|:------:|---------------------|:---------:| | Siamese BiLSTMAtt | 0.5318 | 0.6972 | 0.6034 | 0.8328 | 0.7801 |dropout(0.2) l2(0.01)| | - **Evaluation results of CNN siamese word model** Precision: 0.504029758215 Recall: 0.475438596491 F1: 0.489316882335 Accuracy: 0.784371029225 - **Evaluation results of CNN siamese char model** Precision: 0.570206699929 Recall: 0.46783625731 F1: 0.51397365885 Accuracy: 0.807750952986 - **Evaluation results of BiLSTM siamese word model** Precision: 0.549507817024 Recall: 0.554970760234 F1: 0.552225778295 Accuracy: 0.804447268107 - **Evaluation results of BiLSTM siamese char model (0.5)** Precision: 0.624883936862 Recall: 0.393567251462 F1: 0.482956584141 Accuracy: 0.816899618806 - **Evaluation results of CNN-BiLSTM word siamese word model** Precision: 0.544910179641 Recall: 0.266081871345 F1: 0.357563850688 Accuracy: 0.792249047014 - **Evaluation results of CNN-BiLSTM word siamese char model** Precision: 0.621247113164 Recall: 0.314619883041 F1: 0.417701863354 Accuracy: 0.809402795426 - **Evaluation results of siamese categorical word model** Precision: 0.583850931677 Recall: 0.0549707602339 F1: 0.100481026189 Accuracy: 0.786149936468 - **Evaluation results of siamese categorical char model (< 10 epoch)** Precision: 0.384869507763 Recall: 0.681286549708 F1: 0.491872493139 Accuracy: 0.69415501906 - **Evaluation results of siamese categorical char model (30 epoch)** Precision: 0.419786096257 Recall: 0.550877192982 F1: 0.476479514416 Accuracy: 0.736975857687 - **Evaluation results of BiLSTM siamese char model (without pre-train)** Precision: 0.519367157665 Recall: 0.556725146199 F1: 0.537397685577 Accuracy: 0.791740787802 - **Evaluation results of BiLSTM siamese char model** Precision: 0.602150537634 Recall: 0.589473684211 F1: 0.595744680851 Accuracy: 0.826175349428 - **Evaluation results of BiLSTM siamese char model (class weights)** Precision: 0.542815674891 Recall: 0.656140350877 F1: 0.594122319301 Accuracy: 0.805209656925