# gans-jax **Repository Path**: fmscole/gans-jax ## Basic Information - **Project Name**: gans-jax - **Description**: No description available - **Primary Language**: Unknown - **License**: Unlicense - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-05-29 - **Last Updated**: 2025-09-05 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README 以下是几个常用的 英文-中文(英翻中)平行语料数据集,可用于机器翻译、NLP研究或模型训练: --- 1. 大规模公开数据集 名称 简介 下载链接 WMT (Workshop on Machine Translation) 国际机器翻译研讨会提供的权威数据集,包含中英双语新闻、维基百科等数据(如WMT17-WMT23)。 [WMT官网](https://www.statmt.org/wmt23/) OpenSubtitles 电影/电视剧字幕的平行语料(中英对齐),数据量大但可能含噪音。 [OpenSubtitles2018](http://opus.nlpl.eu/OpenSubtitles-v2018.php) CCAligned 从CommonCrawl抓取的网页中英平行语料,覆盖新闻、博客等。 [CCAligned](https://opus.nlpl.eu/CCAligned.php) UN Corpus 联合国官方文件的中英双语数据(正式文本,质量高)。 [UN Corpus](https://conferences.unite.un.org/UNCorpus) --- 2. 中文社区整理的数据集 名称 简介 下载链接 AI Challenger 2017年AI挑战赛中英翻译数据集(1000万句对,领域广泛)。 [AI Challenger](https://github.com/AIChallenger/AI_Challenger_2017) CWMT (China Workshop on Machine Translation) 中文信息学会发布的学术评测数据(新闻、专利等)。 [CWMT2017](http://nlp.nju.edu.cn/cwmt-wmt/) TED Talks TED演讲的中英字幕对齐数据(口语化,适合对话翻译)。 [TED2020](https://opus.nlpl.eu/TED2020.php) --- 3. 预处理好的开源数据 - Hugging Face Datasets 直接加载中英平行语料(如`wmt14-zh-en`、`opus100`): ```python from datasets import load_dataset dataset = load_dataset("wmt14", "zh-en") ``` - Opus100 精选的100万句对中英数据,质量较高:[Opus100](https://opus.nlpl.eu/opus-100.php) --- 4. 垂直领域数据 领域 数据集示例 医疗 [MedTrans](https://github.com/alibaba-research/Chinese-Medical-Translation)(阿里公开的医学中英对照) 法律 [中国法律法规汉英平行语料库](http://www.pkucorpus.com/)(需申请) 电商 [Amazon Review Multilingual](https://registry.opendata.aws/amazon-reviews-ml/)(商品评论中英对齐) --- 注意事项 1. 清洗数据:多数公开数据需去噪(如长度过滤、语言检测)。 2. 版权问题:部分数据(如OpenSubtitles)需遵守CC协议。 3. 分词工具:中文建议用`jieba`或`pkuseg`,英文用`spaCy`。 如果需要具体某个领域的细分数据或预处理脚本,可以进一步说明! [["Military leaders know this, and the threat that they will eventually push him aside will plague his presidency well into next year.", "\u5728\u62c5\u4efb\u603b\u7406\u53d6\u5f97\u5b9e\u6743\u540e\uff0c\u5979\u6700\u7ec8\u53ef\u80fd\u4f1a\u91cd\u65b0\u5ba1\u89c6\u5979\u4e0e\u7a46\u6c99\u62c9\u592b\u7684\u534f\u8bae\u3002"], ["The researchers concluded that self-reported overall health and depression improved among those who enrolled in Medicaid, and that there was an increase in the diagnosis and treatment of diabetes for this group.", "\u7814\u7a76\u8005\u7684\u7ed3\u8bba\u662f\uff0c\u5728\u53c2\u52a0\u4e86\u533b\u7597\u8865\u52a9\u7684\u4eba\u4e2d\u95f4\uff0c\u81ea\u6211\u62a5\u544a\u7684\u5065\u5eb7\u548c\u6291\u90c1\u60c5\u51b5\u6709\u6240\u597d\u8f6c\uff0c\u5e76\u4e14\u8fd9\u4e00\u7fa4\u4f53\u7684\u7cd6\u5c3f\u75c5\u8bca\u65ad\u548c\u6cbb\u7597\u6570\u91cf\u4e5f\u6709\u6240\u589e\u52a0\u3002"], ["There is a vast number of important Buddhist sites in Swat and other areas of northwest Pakistan.", "\u5728\u65af\u74e6\u7279\u6cb3\u8c37\u548c\u5df4\u57fa\u65af\u5766\u897f\u5317\u90e8\u6709\u7740\u5927\u91cf\u91cd\u8981\u7684\u4f5b\u6559\u6587\u7269\u3002"], ["But Howard Hughes\u2019s success as a film producer and airline owner made him one of the richest Americans to emerge during the first half of the twentieth century.", "\u4f46\u970d\u534e\u5fb7\u00b7\u4f11\u65af\u4f5c\u4e3a\u7535\u5f71\u5236\u7247\u4eba\u548c\u822a\u7a7a\u516c\u53f8\u8001\u677f\u7684\u6210\u529f\u4f7f\u5f97\u4ed6\u8dfb\u8eab20\u4e16\u7eaa\u524d\u534a\u53f6\u6700\u5bcc\u6709\u7684\u7f8e\u56fd\u4eba\u884c\u5217\u3002"], ["The concluding sentence of his review is widely quoted by his admirers: \u201cThat we abstract from all these stories in building our models is not because the stories are uninteresting but because they may be too interesting and thereby distract us from the pervasive market forces that should be our principal concern.\u201d", "\u4ed6\u5728\u6587\u7ae0\u7ed3\u5c3e\u7684\u88ab\u4ed6\u7684\u5d07\u62dc\u8005\u6240\u5e7f\u6cdb\u5f15\u7528\uff1a\u201c\u6211\u4eec\u4e4b\u6240\u4ee5\u5728\u6784\u5efa\u6211\u4eec\u7684\u6a21\u578b\u7684\u8fc7\u7a0b\u4e2d\u6392\u9664\u6389\u6240\u6709\u8fd9\u4e9b\u6545\u4e8b\uff0c\u5e76\u975e\u56e0\u4e3a\u8fd9\u4e9b\u6545\u4e8b\u662f\u65e0\u8da3\u7684\uff0c\u800c\u662f\u5b83\u4eec\u53ef\u80fd\u592a\u8fc7\u6709\u8da3\uff0c\u4ee5\u81f3\u4e8e\u8ba9\u4eba\u65e0\u6cd5\u628a\u6ce8\u610f\u529b\u96c6\u4e2d\u5728\u672c\u5e94\u6210\u4e3a\u4e3b\u8981\u5173\u6ce8\u70b9\u7684\u66f4\u5177\u666e\u904d\u6027\u7684\u5e02\u573a\u529b\u91cf\u4e0a\u3002 \u201d"], ["But the concern goes beyond Washington: many ordinary citizens in the US and elsewhere genuinely fear the consequences of a Trump administration.", "\u4f46\u62c5\u5fe7\u7edd\u4e0d\u4ec5\u9650\u4e8e\u534e\u76db\u987f\uff0c\u8bb8\u591a\u7f8e\u56fd\u548c\u5176\u4ed6\u56fd\u5bb6\u7684\u666e\u7f57\u5927\u4f17\u4e5f\u5bf9\u7279\u6717\u666e\u653f\u5e9c\u7684\u540e\u679c\u771f\u6b63\u5730\u5fe7\u5fc3\u5fe1\u5fe1\u3002"], ["The Nobel laureate economist Edmund S. Phelps has described Trump\u2019s direct interference in the corporate sector as reminiscent of corporatist Nazi Germany and Fascist Italy.", "\u8bfa\u8d1d\u5c14\u7ecf\u6d4e\u5b66\u5956\u5f97\u4e3b\u57c3\u5fb7\u8499\u5fb7\u00b7\u83f2\u5c14\u666e\u65af\uff08Edmund S. Phelps\uff09\u8bf4\u7279\u6717\u666e\u76f4\u63a5\u5e72\u9884\u516c\u53f8\u90e8\u95e8\u8ba9\u4eba\u60f3\u8d77\u4e86\u793e\u56e2\u4e3b\u4e49\u7684\u7eb3\u7cb9\u5fb7\u56fd\u548c\u6cd5\u897f\u65af\u4e3b\u4e49\u7684\u610f\u5927\u5229\u3002"], ["These leaders are unlikely to accept any power-sharing arrangement that includes the Taliban.", "\u8fd9\u4e9b\u9886\u5bfc\u4eba\u4e0d\u592a\u53ef\u80fd\u63a5\u53d7\u4efb\u4f55\u5305\u62ec\u5854\u5229\u73ed\u7684\u6743\u529b\u5206\u6cbb\u3002"], ["The S&P 500 price/earnings ratio is gradually climbing back to its long-term average of 16.", "\u6807\u51c6\u666e\u5c14500\u5e02\u76c8\u7387\u6b63\u9010\u6e10\u6500\u5347\u523016\u500d\u7684\u957f\u671f\u5747\u503c\u3002"], ["The euro shares important features with versions of the old gold standard, under which countries fixed their exchange rates relative to each other by setting the price at which domestic currency could be redeemed in gold.", "       \u6b27\u5143\u5236\u4e0e\u4ee5\u524d\u7684\u91d1\u672c\u4f4d\u5236\u6709\u7740\u5f88\u5927\u7684\u76f8\u4f3c\u4e4b\u5904\u3002 \u5728\u91d1\u672c\u4f4d\u5236\u4e0b\uff0c\u5404\u56fd\u5bb6\u5236\u5b9a\u51fa\u4ee5\u9ec4\u91d1\u5151\u6362\u672c\u56fd\u8d27\u5e01\u7684\u4ef7\u683c\uff0c\u4ee5\u6b64\u6765\u56fa\u5b9a\u672c\u56fd\u76f8\u5bf9\u5176\u4ed6\u56fd\u7684\u6c47\u7387\u3002"], ["Peer-to-peer lending and crowdfunding already represent new ways of matching borrowers with investors.", "P2P\u8d37\u6b3e\u548c\u96c6\u8d44\u5df2\u7ecf\u5c55\u73b0\u51fa\u8fde\u63a5\u6295\u8d44\u4eba\u548c\u501f\u6b3e\u4eba\u7684\u65b0\u9014\u5f84\u3002"], ["After the revelations, however, it was clear that the tobacco industry was a malevolent force that did not belong in the policymaking process.", "\u4f46\u6b64\u6b21\u62ab\u9732\u8bc1\u660e\u70df\u8349\u4e1a\u662f\u4e0d\u5e94\u5c5e\u4e8e\u51b3\u7b56\u8fc7\u7a0b\u7684\u90aa\u6076\u529b\u91cf\u3002"], ["India\u2019s leaders should make the same pledges, and should also join other nuclear powers in signing the Comprehensive Nuclear Test Ban Treaty.", "\u5370\u5ea6\u7684\u9886\u5bfc\u4eba\u672c\u5e94\u8be5\u505a\u51fa\u540c\u6837\u7684\u627f\u8bfa\uff0c\u5e76\u4e14\u8fd8\u5e94\u8be5\u548c\u5176\u4ed6\u7684\u6838\u56fd\u5bb6\u4e00\u8d77\u7b7e\u7f72\u300a\u5168\u9762\u7981\u6b62\u6838\u8bd5\u9a8c\u6761\u7ea6\u300b\u3002"], ["Europe\u2019s Digital Reactionaries", "\u6b27\u6d32\u6570\u5b57\u53cd\u52a8\u6d3e"], ["The devastation caused by Israel\u2019s periodic asymmetrical confrontations, combined with the continuing occupation of Palestinian lands and the ever-growing expansion of settlements, has fueled a growing campaign to undermine Israel\u2019s legitimacy.", "\u4ee5\u8272\u5217\u5b9a\u671f\u4e0d\u5bf9\u79f0\u51b2\u7a81\u6240\u9020\u6210\u7684\u60e8\u8c61\uff0c\u4ee5\u53ca\u5bf9\u5df4\u52d2\u65af\u5766\u571f\u5730\u7684\u957f\u671f\u5360\u9886\u548c\u4e0d\u65ad\u6269\u5f20\u7684\u5b9a\u5c45\u70b9\u5efa\u8bbe\uff0c\u6b63\u5728\u52a9\u957f\u4e00\u80a1\u4e0d\u5229\u4e8e\u4ee5\u8272\u5217\u5408\u6cd5\u6027\u7684\u8fd0\u52a8\u3002"], ["In fact, the authors may have got their analysis right, just for the wrong country.", "\u5b9e\u9645\u4e0a\uff0c\u8fd9\u4efd\u62a5\u544a\u7684\u4f5c\u8005\u53ef\u80fd\u5206\u6790\u6b63\u786e\uff0c\u4f46\u53ea\u662f\u641e\u9519\u4e86\u56fd\u5bb6\u3002"], ["Restructuring the economy is perhaps the most urgent \u2013 and most difficult \u2013 challenge facing China\u2019s leaders today.", "\u56e0\u6b64\uff0c\u91cd\u7ec4\u7ecf\u6d4e\u4e5f\u8bb8\u662f\u5f53\u4eca\u4e2d\u56fd\u9886\u5bfc\u4eba\u6240\u9762\u4e34\u7684\u6700\u7d27\u8feb\u4e5f\u662f\u6700\u8270\u96be\u7684\u6311\u6218\u3002"], ["Asia\u2019s challenges are graver than those facing Europe, which embodies comprehensive development more than any other part of the world.", "\u4e9a\u6d32\u9762\u4e34\u7684\u6311\u6218\u6bd4\u6b27\u6d32\u8fd8\u8981\u4e25\u5cfb\uff0c\u6b27\u6d32\u7684\u6574\u4f53\u53d1\u5c55\u662f\u5176\u4ed6\u4efb\u4f55\u5730\u533a\u6240\u4e0d\u80fd\u6bd4\u4f60\u7684\u3002"], ["The best cure would be controlled higher inflation \u2013 that is, the aforementioned temporary increase in the inflation target \u2013 to erode the real value of public debt and forestall the risk of a much more damaging inflationary shock later, one in which expectations become unhinged.", "\u6700\u4f73\u89e3\u51b3\u529e\u6cd5\u662f\u6709\u63a7\u5236\u7684\u9ad8\u901a\u80c0\u2014\u2014\u5373\u4e0a\u9762\u6240\u63d0\u5230\u7684\u4e34\u65f6\u6027\u63d0\u9ad8\u901a\u80c0\u76ee\u6807\u2014\u2014\u4ee5\u51cf\u5c11\u516c\u503a\u7684\u5b9e\u9645\u4ef7\u503c\u3001\u5207\u65ad\u65e5\u540e\u53d1\u751f\u4f24\u5bb3\u66f4\u5927\u7684\u901a\u80c0\u6027\u51b2\u51fb\uff08\u6b64\u65f6\u9884\u671f\u5c06\u5b8c\u5168\u9519\u4e71\uff09\u7684\u98ce\u9669\u3002"], ["They may be comparatively few in number, but they can mobilize material resources if they do not get their way.", "\u4ed6\u4eec\u76f8\u6bd4\u4e4b\u4e0b\u53ef\u80fd\u4eba\u6570\u8f83\u5c11\uff0c\u4f46\u5374\u62e5\u6709\u5728\u4e0d\u8fbe\u76ee\u7684\u65f6\u8c03\u52a8\u7269\u8d28\u8d44\u6e90\u7684\u80fd\u529b\u3002"], ["Raqqa was in rapid decline by then, and the city\u2019s despair would intensify over the next decade.", "\u90a3\u65f6\u62c9\u5361\u5df2\u7ecf\u5728\u5feb\u901f\u8870\u843d\uff0c\u800c\u4e14\u8fd9\u5ea7\u57ce\u5e02\u4eca\u540e\u5341\u5e74\u4f1a\u8d8a\u6765\u8d8a\u9677\u5165\u7edd\u671b\u3002"], ["A strategy of isolation has resulted in a dangerous alliance of convenience among extremist forces excluded from political processes and power structures.", "\u9694\u79bb\u653f\u7b56\u4f7f\u90a3\u4e9b\u88ab\u6392\u9664\u5728\u653f\u6cbb\u8fdb\u7a0b\u548c\u6743\u529b\u7ed3\u6784\u4e4b\u5916\u7684\u6781\u7aef\u52bf\u529b\u7ec4\u6210\u4e86\u5371\u9669\u7684\u6743\u5b9c\u8054\u76df\u3002"], ["Aylwin faced one of the toughest moral choices any leader of a newly re-established democracy can confront: how far to push prosecution of those who had abducted, tortured, and killed thousands of Chileans during General Augusto Pinochet\u2019s dictatorship.", "\u57c3\u5c14\u6587\u9762\u4e34\u7740\u4efb\u4f55\u65b0\u5efa\u7acb\u7684\u6c11\u4e3b\u56fd\u5bb6\u7684\u9886\u5bfc\u4eba\u6240\u9762\u4e34\u7684\u6700\u8270\u96be\u7684\u9053\u5fb7\u9009\u62e9\u4e4b\u4e00\uff1a\u5bf9\u5728\u76ae\u8bfa\u5207\u7279\u5c06\u519b\u72ec\u88c1\u671f\u95f4\u7ed1\u67b6\u3001\u8650\u5f85\u548c\u6740\u5bb3\u6210\u5343\u4e0a\u4e07\u667a\u5229\u4eba\u7684\u4eba\u7684\u8d77\u8bc9\u5e94\u8be5\u8fdb\u884c\u5230\u4f55\u79cd\u7a0b\u5ea6\u3002"], ["First, there are market failures, which occur when, for example, investors display herd behavior, information asymmetries exist,