Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








59 Hits in 1.9 sec

OCNLI: Original Chinese Natural Language Inference [article]

Hai Hu, Kyle Richardson, Liang Xu, Lu Li, Sandra Kuebler, Lawrence S. Moss
2020 arXiv   pre-print
In this paper, we present the first large-scale NLI dataset (consisting of ~56,000 annotated sentence pairs) for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI).  ...  Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has been  ...  Conclusion In this paper, we presented the Original Chinese Natural Language Inference (OCNLI) corpus, the first large-scale, non-translated NLI dataset for Chinese.  ... 
arXiv:2010.05444v1 fatcat:oxsomovwjbhoxbofugsl4hbgea

OCNLI: Original Chinese Natural Language Inference

Hai Hu, Kyle Richardson, Liang Xu, Lu Li, Sandra Kübler, Lawrence Moss
2020 Findings of the Association for Computational Linguistics: EMNLP 2020   unpublished
In this paper, we present the first large-scale NLI dataset (consisting of ∼56,000 annotated sentence pairs) 1 for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI).  ...  performance gap), making it a challenging new resource that we hope will help to accelerate progress in Chinese natural language understanding.  ...  Conclusion In this paper, we presented the Original Chinese Natural Language Inference (OCNLI) corpus, the first large-scale, non-translated NLI dataset for Chinese.  ... 
doi:10.18653/v1/2020.findings-emnlp.314 fatcat:4bu5oxroy5g33m4g6asaciyoam

CLUE: A Chinese Language Understanding Evaluation Benchmark [article]

Liang Xu, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong (+20 others)
2020 arXiv   pre-print
These comprehensive benchmarks have facilitated a broad range of research and applications in natural language processing (NLP).  ...  The advent of natural language understanding (NLU) benchmarks for English, such as GLUE and SuperGLUE allows new NLU models to be evaluated across a diverse set of tasks.  ...  OCNLI Original Chinese Natural Language Inference (OCNLI, ) is collected closely following procedures of MNLI .  ... 
arXiv:2004.05986v3 fatcat:xwmawjovnzcsjjvucr6ngiiauy

Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models [article]

Xinnian Liang, Zefan Zhou, Hui Huang, Shuangzhi Wu, Tong Xiao, Muyun Yang, Zhoujun Li, Chao Bian
2023 arXiv   pre-print
Pretrained language models (PLMs) have shown marvelous improvements across various NLP tasks.  ...  In this paper, we revisit the segmentation granularity of Chinese PLMs. We propose a mixed-granularity Chinese BERT (MigBERT) by considering both characters and words.  ...  Natural Language Inference We select two natural language inference (NLI) datasets to evaluate the inference ability of our MigBERT.  ... 
arXiv:2303.10893v2 fatcat:2jrvmjy7dvc2le27zw7e3zq2xm

FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark [article]

Liang Xu, Xiaojing Lu, Chenyang Yuan, Xuanwei Zhang, Huilin Xu, Hu Yuan, Guoao Wei, Xiang Pan, Xin Tian, Libo Qin, Hu Hai
2021 arXiv   pre-print
Pretrained Language Models (PLMs) have achieved tremendous success in natural language understanding tasks.  ...  While different learning schemes -- fine-tuning, zero-shot, and few-shot learning -- have been widely explored and compared for languages such as English, there is comparatively little work in Chinese  ...  EFF achieves quite good results for natural language inference tasks, OCNLI and BUSTM, but performs poorly on CHID, a machine learning comprehension task.  ... 
arXiv:2107.07498v2 fatcat:ljx2nma3b5aa3ix2pzyadnkgnu

Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words [article]

Zhangyin Feng, Duyu Tang, Cong Zhou, Junwei Liao, Shuangzhi Wu, Xiaocheng Feng, Bing Qin, Yunbo Cao, Shuming Shi
2022 arXiv   pre-print
Furthermore, since the pipeline is language-independent, we train WordBERT for Chinese language and obtain significant gains on five natural language understanding datasets.  ...  Lastly, the analyse on inference speed illustrates WordBERT has comparable time cost to BERT in natural language understanding tasks.  ...  Furthermore, we extend our approach to Chinese languages. Results on several natural language understanding tasks indicate that our model significantly outperforms BERT.  ... 
arXiv:2202.12142v1 fatcat:ui7agnwb4rbqfbjwu7qkxirpze

MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model [article]

Xin Yao, Ziqing Yang, Yiming Cui, Shijin Wang
2023 arXiv   pre-print
In this paper, we introduce MiniRBT, a small Chinese pre-trained model that aims to advance research in Chinese natural language processing.  ...  In natural language processing, pre-trained language models have become essential infrastructures.  ...  Therefore, to further advance the development of Chinese natural language processing, we propose a small Chinese pre-trained model with strong practicability.  ... 
arXiv:2304.00717v1 fatcat:l56orltf3vgfzb7apnjwgtrfca

Investigating Prompt Learning for Chinese Few-Shot Text Classification with Pre-Trained Language Models

Chengyu Song, Taihua Shao, Kejing Lin, Dengfeng Liu, Siyuan Wang, Honghui Chen
2022 Applied Sciences  
Thus, we propose a prompt-based Chinese text classification framework that uses generated natural language sequences as hints, which can alleviate the classification bottleneck well in low-resource scenarios  ...  However, existing prompt-based methods mainly focus on English tasks, which generally apply English pretrained language models that can not directly adapt to Chinese tasks due to structural and grammatical  ...  It is a binary classification task that aims to predict whether two sentences are semantically similar. • OCNLI stands for Original Chinese Natural Language Inference, which is collected by closely following  ... 
doi:10.3390/app122111117 fatcat:zphqj2jpg5g5hjhl26jefryno4

RoChBert: Towards Robust BERT Fine-tuning for Chinese [article]

Zihan Zhang, Jinfeng Li, Ning Shi, Bo Yuan, Xiangyu Liu, Rong Zhang, Hui Xue, Donghong Sun, Chao Zhang
2022 arXiv   pre-print
In this paper, we present RoChBERT, a framework to build more Robust BERT-based models by utilizing a more comprehensive adversarial graph to fuse Chinese phonetic and glyph features into pre-trained representations  ...  Despite of the superb performance on a wide range of tasks, pre-trained language models (e.g., BERT) have been proved vulnerable to adversarial texts.  ...  In our dataset, there are 12,000/1,500/1,500 texts in train/dev/test datasets. • OCNLI: a Chinese natural language inference dataset from CLUE .  ... 
arXiv:2210.15944v1 fatcat:u74jazmoijcojjhjtwzb2iz6ve

PERT: Pre-training BERT with Permuted Language Model [article]

Yiming Cui, Ziqing Yang, Ting Liu
2022 arXiv   pre-print
Pre-trained Language Models (PLMs) have been widely used in various natural language processing (NLP) tasks, owing to their powerful text representations trained on large-scale corpora.  ...  In this paper, we propose a new PLM called PERT for natural language understanding (NLU). PERT is an auto-encoding model (like BERT) trained with Permuted Language Model (PerLM).  ...  INTRODUCTION Pre-trained Language Models (PLMs) have shown excellent performance on various natural language processing (NLP) tasks.  ... 
arXiv:2203.06906v1 fatcat:kpjmaqa7wffbjhgjkbwcmonuji

NSP-BERT: A Prompt-based Few-Shot Learner Through an Original Pre-training Task–Next Sentence Prediction [article]

Yi Sun, Yu Zheng, Chao Hao, Hangping Qiu
2022 arXiv   pre-print
Nonetheless, virtually all prompt-based methods are token-level, meaning they all utilize GPT's left-to-right language model or BERT's masked language model to perform cloze-style tasks.  ...  In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a BERT original pre-training task abandoned by RoBERTa and other models--Next Sentence Prediction (NSP).  ...  Web MNLI 392,702 9,815 3 Natural Language Inference Acc. Speech, Fiction and Reports MNLI-mm 392,702 9,832 3 Natural Language Inference Acc.  ... 
arXiv:2109.03564v2 fatcat:4lduqcs4x5h4llkqga32qrpe2a

LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization [article]

Weidong Guo, Mingjun Zhao, Lusheng Zhang, Di Niu, Jinwen Luo, Zhenhua Liu, Zhenyang Li, Jianbo Tang
2021 arXiv   pre-print
range of Natural Language Understanding (NLU) tasks.  ...  Extensive experiments conducted on CLUE and SuperGLUE demonstrate that our method achieves comprehensive improvements on a wide variety of NLU tasks in both Chinese and English with little extra inference  ...  Similar to most Chinese PLMs, characters are used as fine-grained tokens due to the language nature of Chinese.  ... 
arXiv:2108.00801v1 fatcat:lk2lzmu4vrfadiunhizvyvdll4

UnNatural Language Inference [article]

Koustuv Sinha, Prasanna Parthasarathi, Joelle Pineau, Adina Williams
2021 arXiv   pre-print
We provide novel evidence that complicates this claim: we find that state-of-the-art Natural Language Inference (NLI) models assign the same labels to permuted examples as they do to the original, i.e.  ...  (English and Mandarin Chinese).  ...  OCNLI: Orig- inal Chinese Natural Language Inference. In Find- ings of the Association for Computational Linguis- tics: EMNLP 2020, pages 3512-3526, Online.  ... 
arXiv:2101.00010v2 fatcat:ilqekmsvqfhcfd52y7lkvdlntm

ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation [article]

Shuohuan Wang, Yu Sun, Yang Xiang, Zhihua Wu, Siyu Ding, Weibao Gong, Shikun Feng, Junyuan Shang, Yanbin Zhao, Chao Pang, Jiaxiang Liu, Xuyi Chen (+17 others)
2021 arXiv   pre-print
Pre-trained language models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks.  ...  ERNIE 3.0 Titan is the largest Chinese dense pre-trained model so far. Empirical results show that the ERNIE 3.0 Titan outperforms the state-of-the-art models on 68 NLP datasets.  ...  Natural Language Inference. ERNIE 3.0 Titan is evaluated on three NLI datasets, namely OCNLI, OCNLI-FC, and CMNLI, and achieves the best performance.  ... 
arXiv:2112.12731v1 fatcat:hact2hlojrdydhxcnzozmb7kee

Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning [article]

Shaohua Wu, Xudong Zhao, Tong Yu, Rongguo Zhang, Chong Shen, Hongli Liu, Feng Li, Hong Zhu, Jiangang Luo, Liang Xu, Xuanwei Zhang
2021 arXiv   pre-print
Yuan 1.0 presents strong capacity of natural language generation, and the generated articles are difficult to distinguish from the human-written ones.  ...  Recent work like GPT-3 has demonstrated excellent performance of Zero-Shot and Few-Shot learning on many natural language processing (NLP) tasks by scaling up model size, dataset size and the amount of  ...  Yuan 1.0 was trained on a new Chinese dataset of 5TB high-quality text that was built on 850TB raw data from Internet.  ... 
arXiv:2110.04725v2 fatcat:xzgaonkg6fhm7orixf2tidlvdy
« Previous Showing results 1 — 15 out of 59 results