# CiYi

**Repository Path**: visualjoyce/CiYi

## Basic Information

- **Project Name**: CiYi
- **Description**: Lexical Semantics and its Computation
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-02-02
- **Last Updated**: 2023-08-21

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

![image](https://user-images.githubusercontent.com/2136700/161353640-5bb7009d-5d50-4413-a752-f81fdad6a6d0.png)

# CiYi (词义)

A repo for lexical semantics

## MWE Type

## PIE Classification

```bibtex
@inproceedings{tan-jiang-2021-bert,
    title = "Does {BERT} Understand Idioms? A Probing-Based Empirical Study of {BERT} Encodings of Idioms",
    author = "Tan, Minghuan  and
      Jiang, Jing",
    booktitle = "Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)",
    month = sep,
    year = "2021",
    address = "Held Online",
    publisher = "INCOMA Ltd.",
    url = "https://aclanthology.org/2021.ranlp-main.156",
    pages = "1397--1407",
    abstract = "Understanding idioms is important in NLP. In this paper, we study to what extent pre-trained BERT model can encode the meaning of a potentially idiomatic expression (PIE) in a certain context. We make use of a few existing datasets and perform two probing tasks: PIE usage classification and idiom paraphrase identification. Our experiment results suggest that BERT indeed can separate the literal and idiomatic usages of a PIE with high accuracy. It is also able to encode the idiomatic meaning of a PIE to some extent.",
}
```

## SemEval 2022 Task 2

Multilingual Idiomaticity Detection and Sentence Embedding

### Subtask A

_Data Preprocess_

```shell
python experiments/semeval-2022_task02_idiomacity/subtask_a/create_data.py \
  --input_location ../SemEval_2022_Task2-idiomaticity/SubTaskA \
  --output_location data/annotations/semeval-2022_task02_idiomacity/subtask_a \
  --phase evaluation
```

_Train_

```shell
bash run_semeval2022_task2a.sh data
```

### Subtask B

_Data Preprocess_

```shell
python experiments/semeval-2022_task02_idiomacity/subtask_b/create_data.py \
  --input_location ../SemEval_2022_Task2-idiomaticity/SubTaskB \
  --output_location data/annotations/semeval-2022_task02_idiomacity/subtask_b \
  --sts_dataset_path stsbenchmark.tsv.gz
```

_Train_

```shell
bash run_semeval2022_task2b.sh data
```

## Acknowledgement

We recommend the following repos:

* [lexcomp](https://github.com/vered1986/lexcomp)
* [allennlp-models](https://github.com/allenai/allennlp-models)