HanLP作者的新书《自然语言处理入门》详细笔记!业界良心之作,书中不是枯燥无味的公式罗列,而是用白话阐述的通俗易懂的算法模型。从基本概念出发,逐步介绍中文分词、词性标注、命名实体识别、信息抽取、文本聚类、文本分类、句法分析这几个热门问题的算法原理与工程实现。
QUAC ("quantitative analysis of chatter" or any related acronym you like) is a package for acquiring and analyzing social Internet content. Docs are online at http://reidpr.github.io/quac.
Utilities, Baselines, Statistics and Descriptions Related to the MSMARCO DATASET
MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension and question answering
Analysis on the MS-MARCO leaderboard regarding the machine reading comprehension task.
This is updated version of the dataset for Chinese community medical question answering.