# CPT
**Repository Path**: whn09/CPT
## Basic Information
- **Project Name**: CPT
- **Description**: CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-05-27
- **Last Updated**: 2022-12-05
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# CPT
This repository contains code and checkpoints for **CPT**.
[**CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation**](https://arxiv.org/pdf/2109.05729.pdf)
Yunfan Shao, Zhichao Geng, Yitao Liu, Junqi Dai, Fei Yang, Li Zhe, Hujun Bao, Xipeng Qiu
## Introduction
Aiming to unify both NLU and NLG tasks, We propose a novel **C**hinese **P**re-trained Un-balanced **T**ransformer (**CPT**), which is an unbalanced Transformer encoder-decoder pre-trained with MLM and DAE jointly.
The architecture of CPT is a variant of the full Transformer and consists of three parts:
1. **Shared Encoder** (S-Enc): a Transformer encoder with fully-connected self-attention, which is designed to capture the common semantic representation for both language understanding and generation.
2. **Understanding Decoder** (U-Dec): a shallow Transformer encoder with fully-connected self-attention, which is designed for NLU tasks. The input of U-Dec is the output of S-Enc.
3. **Generation Decoder** (G-Dec): a Transformer decoder with masked self-attention, which is designed for generation tasks with auto-regressive fashion. G-Dec utilizes the output of S-Enc with cross-attention.
## Pre-Trained Models
We provide the pre-trained weights of CPT and Chinese BART with source code, which can be directly used in Huggingface-Transformers.
- **`Chinese BART-base`**: 6 layers Encoder, 6 layers Decoder, 12 Heads and 768 Model dim.
- **`Chinese BART-large`**: 12 layers Encoder, 12 layers Decoder, 16 Heads and 1024 Model dim.
- **`CPT-base`**: 10 layers S-Enc, 2 layers U-Dec/G-Dec, 12 Heads and 768 Model dim.
- **`CPT-large`**: 20 layers S-Enc, 4 layers U-Dec/G-Dec, 16 Heads and 1024 Model dim.
The pre-trained weights can be downloaded here.
| Model | `MODEL_NAME`|
| --- | --- |
| **`Chinese BART-base`** | [fnlp/bart-base-chinese](https://huggingface.co/fnlp/bart-base-chinese) |
| **`Chinese BART-large`** | [fnlp/bart-large-chinese](https://huggingface.co/fnlp/bart-large-chinese) |
| **`CPT-base`** | [fnlp/cpt-base](https://huggingface.co/fnlp/cpt-base) |
| **`CPT-large`** | [fnlp/cpt-large](https://huggingface.co/fnlp/cpt-large) |
### Requirements:
- pytorch==1.8.1
- transformers==4.4.1
To use CPT, please import the file `finetune/modeling_cpt.py` that define the architecture of CPT into your project.
Then, use the PTMs as the following example, where `MODEL_NAME` is the corresponding string that refers to the model.
For CPT:
```python
from modeling_cpt import BertTokenizer, CPTForConditionalGeneration
tokenizer = BertTokenizer.from_pretrained("MODEL_NAME")
model = CPTForConditionalGeneration.from_pretrained("MODEL_NAME")
print(model)
```
For Chinese BART:
```python
from transformers import BertTokenizer, BartForConditionalGeneration
tokenizer = BertTokenizer.from_pretrained("MODEL_NAME")
model = BartForConditionalGeneration.from_pretrained("MODEL_NAME")
print(model)
```
## Pre-Training
Pre-training code and examples can be find [Here](pretrain/README.md).
## Fine-Tuning
Fine-tuning code and examples can be find [Here](finetune/README.md).
## Deploy
## deploy on AWS SageMaker Endpoint
run locally
~~~
#make sure you got trained models in 6 folder
cd endpoint
sh build_and_push.sh
docker run -v -d -p 8080:8080 cpt
~~~
```python
# test
#curl http://localhost:8080/ping
# curl
import requests
import json
url='http://localhost:8080/invocations'
data={"data": "《半導體》Q1展望保守,世界垂淚2019/02/11 10:28時報資訊【時報記者沈培華台北報導】世界先進 (5347) 去年營運創歷史新高,每股純益達3.72元。但對今年首季展望保守,預計營收將比上季高點減近一成。世界先進於封關前股價拉高,今早則是開平走低。世界先進於年前台股封關後舉行法說會公布財報。公司去年營運表現亮麗,營收與獲利同創歷史新高紀錄。2018年全年營收289.28億元,年增16.1%,毛利率35.2%,拉升3.2個百分點,稅後淨利61.66億元,年增36.9%,營收與獲利同創歷史新高,每股純益3.72元。董事會通過去年度擬配發現金股利3.2元。展望第一季,受到客戶進入庫存調整,公司預期,本季營收估在67億至71億元,將季減8%至13%,毛利率將約34.5%至36.5%。此外,因應客戶需求,世界先進決定斥資2.36億美元,收購格芯新加坡8吋晶圓廠。世界先進於年前宣布,將購買格芯位於新加坡Tampines的8吋晶圓3E廠房、廠務設施、機器設備及微機電(MEMS)智財權與業務,交易總金額2.36億美元,交割日訂108年12月31日。格芯晶圓3E廠現有月產能3.5萬片8吋晶圓,世界先進每年將可增加超過40萬片8吋晶圓產能,增進公司明年起業績成長動能。TOP關閉"}
data = json.dumps(data)
r = requests.post(url,data=data)
#show result
print (r.text)
```
结果如下
```json
{"摘要": "2011 年 f1 周 年 展 望 保 守 保 守 世 界 第 一 (图 )"}
```
run on endpoint
```shell script
endpoint_ecr_image="847380964353.dkr.ecr.us-east-2.amazonaws.com/cpt"
python create_endpoint.py \
--endpoint_ecr_image_path ${endpoint_ecr_image} \
--endpoint_name 'cpt' \
--instance_type "ml.g4dn.xlarge"
```
在部署结束后,看到SageMaker控制台生成了对应的endpoint,可以使用如下客户端代码测试调用
```python
from boto3.session import Session
import json
data={"data": "《半導體》Q1展望保守,世界垂淚2019/02/11 10:28時報資訊【時報記者沈培華台北報導】世界先進 (5347) 去年營運創歷史新高,每股純益達3.72元。但對今年首季展望保守,預計營收將比上季高點減近一成。世界先進於封關前股價拉高,今早則是開平走低。世界先進於年前台股封關後舉行法說會公布財報。公司去年營運表現亮麗,營收與獲利同創歷史新高紀錄。2018年全年營收289.28億元,年增16.1%,毛利率35.2%,拉升3.2個百分點,稅後淨利61.66億元,年增36.9%,營收與獲利同創歷史新高,每股純益3.72元。董事會通過去年度擬配發現金股利3.2元。展望第一季,受到客戶進入庫存調整,公司預期,本季營收估在67億至71億元,將季減8%至13%,毛利率將約34.5%至36.5%。此外,因應客戶需求,世界先進決定斥資2.36億美元,收購格芯新加坡8吋晶圓廠。世界先進於年前宣布,將購買格芯位於新加坡Tampines的8吋晶圓3E廠房、廠務設施、機器設備及微機電(MEMS)智財權與業務,交易總金額2.36億美元,交割日訂108年12月31日。格芯晶圓3E廠現有月產能3.5萬片8吋晶圓,世界先進每年將可增加超過40萬片8吋晶圓產能,增進公司明年起業績成長動能。TOP關閉"}
session = Session()
runtime = session.client("runtime.sagemaker")
response = runtime.invoke_endpoint(
EndpointName='cpt',
ContentType="application/json",
Body=json.dumps(data),
)
result = json.loads(response["Body"].read())
print (result)
```
## Citation
```bibtex
@article{shao2021cpt,
title={CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation},
author={Yunfan Shao and Zhichao Geng and Yitao Liu and Junqi Dai and Fei Yang and Li Zhe and Hujun Bao and Xipeng Qiu},
journal={arXiv preprint arXiv:2109.05729},
year={2021}
}
```