OCNLI: Original Chinese Natural Language Inference.

In this paper, we present the first large-scale NLI dataset (consisting of ~56,000 annotated sentence pairs) for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI). ... Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has been ... Conclusion In this paper, we presented the Original Chinese Natural Language Inference (OCNLI) corpus, the first large-scale, non-translated NLI dataset for Chinese. ...

arXiv:2010.05444v1 fatcat:oxsomovwjbhoxbofugsl4hbgea

In this paper, we present the first large-scale NLI dataset (consisting of ∼56,000 annotated sentence pairs) 1 for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI). ... performance gap), making it a challenging new resource that we hope will help to accelerate progress in Chinese natural language understanding. ... Conclusion In this paper, we presented the Original Chinese Natural Language Inference (OCNLI) corpus, the first large-scale, non-translated NLI dataset for Chinese. ...

doi:10.18653/v1/2020.findings-emnlp.314 fatcat:4bu5oxroy5g33m4g6asaciyoam

These comprehensive benchmarks have facilitated a broad range of research and applications in natural language processing (NLP). ... The advent of natural language understanding (NLU) benchmarks for English, such as GLUE and SuperGLUE allows new NLU models to be evaluated across a diverse set of tasks. ... OCNLI Original Chinese Natural Language Inference (OCNLI, ) is collected closely following procedures of MNLI . ...

arXiv:2004.05986v3 fatcat:xwmawjovnzcsjjvucr6ngiiauy

Open Access Multiple Versions

Citation

Liang Xu, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Kyle Richardson, Zhenzhong Lan. "CLUE: A Chinese Language Understanding Evaluation Benchmark." arXiv (2020)

Pretrained language models (PLMs) have shown marvelous improvements across various NLP tasks. ... In this paper, we revisit the segmentation granularity of Chinese PLMs. We propose a mixed-granularity Chinese BERT (MigBERT) by considering both characters and words. ... Natural Language Inference We select two natural language inference (NLI) datasets to evaluate the inference ability of our MigBERT. ...

arXiv:2303.10893v2 fatcat:2jrvmjy7dvc2le27zw7e3zq2xm

Open Access Multiple Versions

Pretrained Language Models (PLMs) have achieved tremendous success in natural language understanding tasks. ... While different learning schemes -- fine-tuning, zero-shot, and few-shot learning -- have been widely explored and compared for languages such as English, there is comparatively little work in Chinese ... EFF achieves quite good results for natural language inference tasks, OCNLI and BUSTM, but performs poorly on CHID, a machine learning comprehension task. ...

arXiv:2107.07498v2 fatcat:ljx2nma3b5aa3ix2pzyadnkgnu

Open Access Multiple Versions

Furthermore, since the pipeline is language-independent, we train WordBERT for Chinese language and obtain significant gains on five natural language understanding datasets. ... Lastly, the analyse on inference speed illustrates WordBERT has comparable time cost to BERT in natural language understanding tasks. ... Furthermore, we extend our approach to Chinese languages. Results on several natural language understanding tasks indicate that our model significantly outperforms BERT. ...

arXiv:2202.12142v1 fatcat:ui7agnwb4rbqfbjwu7qkxirpze

In this paper, we introduce MiniRBT, a small Chinese pre-trained model that aims to advance research in Chinese natural language processing. ... In natural language processing, pre-trained language models have become essential infrastructures. ... Therefore, to further advance the development of Chinese natural language processing, we propose a small Chinese pre-trained model with strong practicability. ...

arXiv:2304.00717v1 fatcat:l56orltf3vgfzb7apnjwgtrfca

Thus, we propose a prompt-based Chinese text classification framework that uses generated natural language sequences as hints, which can alleviate the classification bottleneck well in low-resource scenarios ... However, existing prompt-based methods mainly focus on English tasks, which generally apply English pretrained language models that can not directly adapt to Chinese tasks due to structural and grammatical ... It is a binary classification task that aims to predict whether two sentences are semantically similar. • OCNLI stands for Original Chinese Natural Language Inference, which is collected by closely following ...

doi:10.3390/app122111117 fatcat:zphqj2jpg5g5hjhl26jefryno4

DOAJ

In this paper, we present RoChBERT, a framework to build more Robust BERT-based models by utilizing a more comprehensive adversarial graph to fuse Chinese phonetic and glyph features into pre-trained representations ... Despite of the superb performance on a wide range of tasks, pre-trained language models (e.g., BERT) have been proved vulnerable to adversarial texts. ... In our dataset, there are 12,000/1,500/1,500 texts in train/dev/test datasets. • OCNLI: a Chinese natural language inference dataset from CLUE . ...

arXiv:2210.15944v1 fatcat:u74jazmoijcojjhjtwzb2iz6ve

Pre-trained Language Models (PLMs) have been widely used in various natural language processing (NLP) tasks, owing to their powerful text representations trained on large-scale corpora. ... In this paper, we propose a new PLM called PERT for natural language understanding (NLU). PERT is an auto-encoding model (like BERT) trained with Permuted Language Model (PerLM). ... INTRODUCTION Pre-trained Language Models (PLMs) have shown excellent performance on various natural language processing (NLP) tasks. ...

arXiv:2203.06906v1 fatcat:kpjmaqa7wffbjhgjkbwcmonuji

Open Access

Nonetheless, virtually all prompt-based methods are token-level, meaning they all utilize GPT's left-to-right language model or BERT's masked language model to perform cloze-style tasks. ... In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a BERT original pre-training task abandoned by RoBERTa and other models--Next Sentence Prediction (NSP). ... Web MNLI 392,702 9,815 3 Natural Language Inference Acc. Speech, Fiction and Reports MNLI-mm 392,702 9,832 3 Natural Language Inference Acc. ...

arXiv:2109.03564v2 fatcat:4lduqcs4x5h4llkqga32qrpe2a

Open Access Multiple Versions

range of Natural Language Understanding (NLU) tasks. ... Extensive experiments conducted on CLUE and SuperGLUE demonstrate that our method achieves comprehensive improvements on a wide variety of NLU tasks in both Chinese and English with little extra inference ... Similar to most Chinese PLMs, characters are used as fine-grained tokens due to the language nature of Chinese. ...

arXiv:2108.00801v1 fatcat:lk2lzmu4vrfadiunhizvyvdll4

Open Access Multiple Versions

We provide novel evidence that complicates this claim: we find that state-of-the-art Natural Language Inference (NLI) models assign the same labels to permuted examples as they do to the original, i.e. ... (English and Mandarin Chinese). ... OCNLI: Orig- inal Chinese Natural Language Inference. In Find- ings of the Association for Computational Linguis- tics: EMNLP 2020, pages 3512-3526, Online. ...

arXiv:2101.00010v2 fatcat:ilqekmsvqfhcfd52y7lkvdlntm

Multiple Versions

Pre-trained language models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. ... ERNIE 3.0 Titan is the largest Chinese dense pre-trained model so far. Empirical results show that the ERNIE 3.0 Titan outperforms the state-of-the-art models on 68 NLP datasets. ... Natural Language Inference. ERNIE 3.0 Titan is evaluated on three NLI datasets, namely OCNLI, OCNLI-FC, and CMNLI, and achieves the best performance. ...

arXiv:2112.12731v1 fatcat:hact2hlojrdydhxcnzozmb7kee

Citation

Shuohuan Wang, Yu Sun, Yang Xiang, Zhihua Wu, Siyu Ding, Weibao Gong, Shikun Feng, Junyuan Shang, Yanbin Zhao, Chao Pang, Jiaxiang Liu, Xuyi Chen, Yuxiang Lu, Weixin Liu, Xi Wang, Yangfan Bai, Qiuliang Chen, Li Zhao, Shiyong Li, Peng Sun, Dianhai Yu, Yanjun Ma, Hao Tian, Hua Wu, Tian Wu, Wei Zeng, Ge Li, Wen Gao, Haifeng Wang. "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation." arXiv (2021)

Yuan 1.0 presents strong capacity of natural language generation, and the generated articles are difficult to distinguish from the human-written ones. ... Recent work like GPT-3 has demonstrated excellent performance of Zero-Shot and Few-Shot learning on many natural language processing (NLP) tasks by scaling up model size, dataset size and the amount of ... Yuan 1.0 was trained on a new Chinese dataset of 5TB high-quality text that was built on 850TB raw data from Internet. ...

arXiv:2110.04725v2 fatcat:xzgaonkg6fhm7orixf2tidlvdy

Multiple Versions

OCNLI: Original Chinese Natural Language Inference [article]

Preserved Fulltext

OCNLI: Original Chinese Natural Language Inference

Preserved Fulltext

CLUE: A Chinese Language Understanding Evaluation Benchmark [article]

Preserved Fulltext

Other Versions

Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models [article]

Preserved Fulltext

Other Versions

FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark [article]

Preserved Fulltext

Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words [article]

Preserved Fulltext

MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model [article]

Preserved Fulltext

Investigating Prompt Learning for Chinese Few-Shot Text Classification with Pre-Trained Language Models

Preserved Fulltext

RoChBert: Towards Robust BERT Fine-tuning for Chinese [article]

Preserved Fulltext

PERT: Pre-training BERT with Permuted Language Model [article]

Preserved Fulltext

NSP-BERT: A Prompt-based Few-Shot Learner Through an Original Pre-training Task–Next Sentence Prediction [article]

Preserved Fulltext

Other Versions

LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization [article]

Preserved Fulltext

Other Versions

UnNatural Language Inference [article]

Preserved Fulltext

Other Versions

ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation [article]

Preserved Fulltext

Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning [article]

Preserved Fulltext

Other Versions