A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
OCNLI: Original Chinese Natural Language Inference
[article]
2020
arXiv
pre-print
In this paper, we present the first large-scale NLI dataset (consisting of ~56,000 annotated sentence pairs) for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI). ...
Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has been ...
Conclusion In this paper, we presented the Original Chinese Natural Language Inference (OCNLI) corpus, the first large-scale, non-translated NLI dataset for Chinese. ...
arXiv:2010.05444v1
fatcat:oxsomovwjbhoxbofugsl4hbgea
OCNLI: Original Chinese Natural Language Inference
2020
Findings of the Association for Computational Linguistics: EMNLP 2020
unpublished
In this paper, we present the first large-scale NLI dataset (consisting of ∼56,000 annotated sentence pairs) 1 for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI). ...
performance gap), making it a challenging new resource that we hope will help to accelerate progress in Chinese natural language understanding. ...
Conclusion In this paper, we presented the Original Chinese Natural Language Inference (OCNLI) corpus, the first large-scale, non-translated NLI dataset for Chinese. ...
doi:10.18653/v1/2020.findings-emnlp.314
fatcat:4bu5oxroy5g33m4g6asaciyoam
CLUE: A Chinese Language Understanding Evaluation Benchmark
[article]
2020
arXiv
pre-print
These comprehensive benchmarks have facilitated a broad range of research and applications in natural language processing (NLP). ...
The advent of natural language understanding (NLU) benchmarks for English, such as GLUE and SuperGLUE allows new NLU models to be evaluated across a diverse set of tasks. ...
OCNLI Original Chinese Natural Language Inference (OCNLI, ) is collected closely following procedures of MNLI . ...
arXiv:2004.05986v3
fatcat:xwmawjovnzcsjjvucr6ngiiauy
Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models
[article]
2023
arXiv
pre-print
Pretrained language models (PLMs) have shown marvelous improvements across various NLP tasks. ...
In this paper, we revisit the segmentation granularity of Chinese PLMs. We propose a mixed-granularity Chinese BERT (MigBERT) by considering both characters and words. ...
Natural Language Inference We select two natural language inference (NLI) datasets to evaluate the inference ability of our MigBERT. ...
arXiv:2303.10893v2
fatcat:2jrvmjy7dvc2le27zw7e3zq2xm
FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark
[article]
2021
arXiv
pre-print
Pretrained Language Models (PLMs) have achieved tremendous success in natural language understanding tasks. ...
While different learning schemes -- fine-tuning, zero-shot, and few-shot learning -- have been widely explored and compared for languages such as English, there is comparatively little work in Chinese ...
EFF achieves quite good results for natural language inference tasks, OCNLI and BUSTM, but performs poorly on CHID, a machine learning comprehension task. ...
arXiv:2107.07498v2
fatcat:ljx2nma3b5aa3ix2pzyadnkgnu
Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words
[article]
2022
arXiv
pre-print
Furthermore, since the pipeline is language-independent, we train WordBERT for Chinese language and obtain significant gains on five natural language understanding datasets. ...
Lastly, the analyse on inference speed illustrates WordBERT has comparable time cost to BERT in natural language understanding tasks. ...
Furthermore, we extend our approach to Chinese languages. Results on several natural language understanding tasks indicate that our model significantly outperforms BERT. ...
arXiv:2202.12142v1
fatcat:ui7agnwb4rbqfbjwu7qkxirpze
MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model
[article]
2023
arXiv
pre-print
In this paper, we introduce MiniRBT, a small Chinese pre-trained model that aims to advance research in Chinese natural language processing. ...
In natural language processing, pre-trained language models have become essential infrastructures. ...
Therefore, to further advance the development of Chinese natural language processing, we propose a small Chinese pre-trained model with strong practicability. ...
arXiv:2304.00717v1
fatcat:l56orltf3vgfzb7apnjwgtrfca
Investigating Prompt Learning for Chinese Few-Shot Text Classification with Pre-Trained Language Models
2022
Applied Sciences
Thus, we propose a prompt-based Chinese text classification framework that uses generated natural language sequences as hints, which can alleviate the classification bottleneck well in low-resource scenarios ...
However, existing prompt-based methods mainly focus on English tasks, which generally apply English pretrained language models that can not directly adapt to Chinese tasks due to structural and grammatical ...
It is a binary classification task that aims to predict whether two sentences are semantically similar. • OCNLI stands for Original Chinese Natural Language Inference, which is collected by closely following ...
doi:10.3390/app122111117
fatcat:zphqj2jpg5g5hjhl26jefryno4
RoChBert: Towards Robust BERT Fine-tuning for Chinese
[article]
2022
arXiv
pre-print
In this paper, we present RoChBERT, a framework to build more Robust BERT-based models by utilizing a more comprehensive adversarial graph to fuse Chinese phonetic and glyph features into pre-trained representations ...
Despite of the superb performance on a wide range of tasks, pre-trained language models (e.g., BERT) have been proved vulnerable to adversarial texts. ...
In our dataset, there are 12,000/1,500/1,500 texts in train/dev/test datasets. • OCNLI: a Chinese natural language inference dataset from CLUE . ...
arXiv:2210.15944v1
fatcat:u74jazmoijcojjhjtwzb2iz6ve
PERT: Pre-training BERT with Permuted Language Model
[article]
2022
arXiv
pre-print
Pre-trained Language Models (PLMs) have been widely used in various natural language processing (NLP) tasks, owing to their powerful text representations trained on large-scale corpora. ...
In this paper, we propose a new PLM called PERT for natural language understanding (NLU). PERT is an auto-encoding model (like BERT) trained with Permuted Language Model (PerLM). ...
INTRODUCTION Pre-trained Language Models (PLMs) have shown excellent performance on various natural language processing (NLP) tasks. ...
arXiv:2203.06906v1
fatcat:kpjmaqa7wffbjhgjkbwcmonuji
NSP-BERT: A Prompt-based Few-Shot Learner Through an Original Pre-training Task–Next Sentence Prediction
[article]
2022
arXiv
pre-print
Nonetheless, virtually all prompt-based methods are token-level, meaning they all utilize GPT's left-to-right language model or BERT's masked language model to perform cloze-style tasks. ...
In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a BERT original pre-training task abandoned by RoBERTa and other models--Next Sentence Prediction (NSP). ...
Web MNLI 392,702 9,815 3 Natural Language Inference Acc. Speech, Fiction and Reports MNLI-mm 392,702 9,832 3 Natural Language Inference Acc. ...
arXiv:2109.03564v2
fatcat:4lduqcs4x5h4llkqga32qrpe2a
LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization
[article]
2021
arXiv
pre-print
range of Natural Language Understanding (NLU) tasks. ...
Extensive experiments conducted on CLUE and SuperGLUE demonstrate that our method achieves comprehensive improvements on a wide variety of NLU tasks in both Chinese and English with little extra inference ...
Similar to most Chinese PLMs, characters are used as fine-grained tokens due to the language nature of Chinese. ...
arXiv:2108.00801v1
fatcat:lk2lzmu4vrfadiunhizvyvdll4
UnNatural Language Inference
[article]
2021
arXiv
pre-print
We provide novel evidence that complicates this claim: we find that state-of-the-art Natural Language Inference (NLI) models assign the same labels to permuted examples as they do to the original, i.e. ...
(English and Mandarin Chinese). ...
OCNLI: Orig-
inal Chinese Natural Language Inference. In Find-
ings of the Association for Computational Linguis-
tics: EMNLP 2020, pages 3512-3526, Online. ...
arXiv:2101.00010v2
fatcat:ilqekmsvqfhcfd52y7lkvdlntm
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
[article]
2021
arXiv
pre-print
Pre-trained language models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. ...
ERNIE 3.0 Titan is the largest Chinese dense pre-trained model so far. Empirical results show that the ERNIE 3.0 Titan outperforms the state-of-the-art models on 68 NLP datasets. ...
Natural Language Inference. ERNIE 3.0 Titan is evaluated on three NLI datasets, namely OCNLI, OCNLI-FC, and CMNLI, and achieves the best performance. ...
arXiv:2112.12731v1
fatcat:hact2hlojrdydhxcnzozmb7kee
Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
[article]
2021
arXiv
pre-print
Yuan 1.0 presents strong capacity of natural language generation, and the generated articles are difficult to distinguish from the human-written ones. ...
Recent work like GPT-3 has demonstrated excellent performance of Zero-Shot and Few-Shot learning on many natural language processing (NLP) tasks by scaling up model size, dataset size and the amount of ...
Yuan 1.0 was trained on a new Chinese dataset of 5TB high-quality text that was built on 850TB raw data from Internet. ...
arXiv:2110.04725v2
fatcat:xzgaonkg6fhm7orixf2tidlvdy
« Previous
Showing results 1 — 15 out of 59 results