Representation Deficiency in Masked Language Modeling.

Masked Language Modeling (MLM) has been one of the most prominent approaches for pretraining bidirectional text encoders due to its simplicity and effectiveness. ... tokens, resulting in a representation deficiency for real tokens and limiting the pretrained model's expressiveness when it is adapted to downstream data without tokens. ... Figure 1 : 1 Figure 1: In an MLM-pretrained model, (a) some model dimensions are exclusively used for representing [MASK] tokens, resulting in a representation deficiency for modeling inputs without [MASK ...

arXiv:2302.02060v2 fatcat:nsmuwbz5qvcabkoif6rxh6cfqq

Multiple Versions

approach for extracting monolingual text using Deep Bi-directional Language Models(LM) such as BERT and other Machine Translation models, and also explore different ways of extracting code-switched text ... from the ASR model. ... Based on Masked LM using BERT, given a code-switched sentence, we particularly mask the unwanted words from the second language < M ASK >, and use BERT to recover these words in terms of monolingual text ...

arXiv:2006.08870v1 fatcat:lz6q5ke3rjehzexj4aj7izmr5y

In recent years, transformer-based language models have achieved state of the art performance in various NLP benchmarks. ... We examine a variety of approaches to integrate structured knowledge into current language models and determine challenges, and possible opportunities to leverage both structured and unstructured information ... compensate deficiencies in the areas. ...

arXiv:2101.12294v2 fatcat:3npdpudyzzhm3bmevv5qcvxuwe

Open Access Multiple Versions

proposal, inhibitor process-based in- terference, long stimulus onset asynchrony, target masking; experi- ments; undergraduates; 9504239 speech recognition; background noise masking; empirical data; hear ... Johnson-Laird’s mental models representations; 9504186 resonance frequency/vocal tract cross-section relation, equal/unequal tubelet models; 950661 1 set-valued feature structures’ domain properties; 9506012 ...

In this paper, we investigate the use of self-supervised pretraining for the smaller KWS models in a label-deficient scenario. ... It is found that the pretrained models greatly outperform the models without pretraining, showing that Data2Vec pretraining can increase the performance of KWS models in label-deficient scenarios. ... As a result, self-supervised learning can be used to improve model performance in the case of label-deficiency. ...

arXiv:2210.01703v2 fatcat:bqonkph7wzckxhc3ijxxes2xqm

Multiple Versions

Multilingual phonetic recognition systems mitigate data sparsity issues by training models on data from multiple languages and learning a speech-to-phone or speech-to-text model universal to all languages ... This paper argues that in the real world, even an unseen language has metadata: linguists can tell us the language name, its language family and, usually, its phoneme inventory. ... The performance is shown in Table 2 , where both proposed models ("w2v+linear+mask" and "w2v+gcn+mask") outperform the "base" model; "w2v+gcn+mask" model achieves the lowest multilingual error rate, while ...

doi:10.21437/interspeech.2021-1843 dblp:conf/interspeech/GaoNZQCH21 fatcat:h36lrbx54bbjtkplt67cgrkyeq

Pre-trained Language Models (PLMs) have achieved remarkable performance gains across numerous downstream tasks in natural language understanding. ... Various Chinese PLMs have been successively proposed for learning better Chinese language representation. ... The coarse-grained information is only implicitly explored in the masked language modeling by designing the masking strategies and the coarse-grained representations are absent. ...

arXiv:2208.10844v2 fatcat:vl2yls4g7factbjwovynzakl5q

Multiple Versions

However, their static mask matrices limit the capability for localness modeling in text representation learning. ... In this paper, we present a novel understanding of SAN and FFN as Mask Attention Networks (MANs) and show that they are two special cases of MANs with static mask matrices. ... We argue that deficiency of Transformer in local structure modeling is caused by the attention computation with static mask matrix. ...

arXiv:2103.13597v1 fatcat:qistwvn3sfa6xbuectdsg2x5my

Open Access

We summarize the development in this field into three time periods, namely task-specific methods, vision-language pre-training (VLP) methods, and larger models empowered by large-scale weakly-labeled data ... After that, we show how recent work utilizes large-scale raw image-text data to learn language-aligned visual representations that generalize better on zero or few shot learning tasks. ... (a) Original BERT with single-modality, where some language tokens are masked for prediction to train language representation. ...

arXiv:2203.01922v1 fatcat:vnjfetgkpzedpfhklufooqet7y

In this work, we extend adaptive approaches to learn more about model interpretability and computational efficiency. ... The usage of transformers has grown from learning about language semantics to forming meaningful visiolinguistic representations. ... In this example, the right answer is assigned a deficient score. The network does not seem to learn distinguishing features from similar classes properly. ...

arXiv:2005.07486v3 fatcat:mtiihd5y6vgjjkztnxpvxbsp7u

Multiple Versions

Based on this, we propose the AF-DSC method to explicitly model such sentiment composition in reviews. ... Our key intuition is that the sentiment representation of a document is composed of the sentiment representations of all the aspects of that document. ... mechanism in the pre-trained language model. ...

arXiv:2209.02276v1 fatcat:xfjt5xz2xnb35mpqidwgfrknvy

These vases have in common that they show a cult- image of Dionysos, consisting of a mask or masks on a column, in combination with the conventional Attic imagery of the revelling ecstatic female worshippers ... Sophocles thus finds in this exercise in self-representation a way to frame critical questions on dramatic theory and to define his own dramatic practice. ...

Cross-modal alignment plays a crucial role in vision-language pre-training (VLP) models, enabling them to capture meaningful associations across different modalities. ... However, most of them pay little attention to the global semantic features generated for the masked data, resulting in a limited cross-modal alignment ability of global representations to local features ... The absence of MLM would lead to deficient textual representations, which will impair the reconstruction goals of MLTC, thereby limiting the effectiveness of MLTC in understanding. ...

arXiv:2306.07096v2 fatcat:wmz3kf4b7vblbpv2z2le2b542e

Multiple Versions

Moreover, a sentiment-aware masked language model is further proposed to fill in the blanks in the masked positions by incorporating both context and sentiment polarity to capture the multi-grained semantics ... In this paper, we view the positions to be masked as the learnable parameters, and further propose a novel AM-ST model to learn adaptive task-relevant masks based on the attention mechanism. ... Infilling Blanks In this stage, our model will infill tokens in masked positions using a sentimentaware masked language model(Senti-MLM) as shown in Figure 2(b) . ...

arXiv:2302.12045v1 fatcat:s7doxemsgjagfexxhgyurw3cke

Open Access

Traditional rule-based NLP, for instance, is known for its deficiency of creating context-aware representations of words and sentences. ... The emergence of transformer-based pre-trained language models (PTLMs) has bought new and improved techniques to natural language processing (NLP). ... ACKNOWLEDGMENTS The work reported in this paper was supported by the Research Matching Grant Scheme administrated by the University Grants Committee in Hong Kong. ...

doi:10.1145/3543106.3543120 fatcat:vb7aofzscfglznjacjr4nxxury

Representation Deficiency in Masked Language Modeling [article]

Preserved Fulltext

Other Versions

End-to-End Code Switching Language Models for Automatic Speech Recognition [article]

Preserved Fulltext

Combining pre-trained language models and structured knowledge [article]

Preserved Fulltext

Other Versions

Page 1456 of Linguistics and Language Behavior Abstracts: LLBA Vol. 29, Issue 3 [page]

Preserved Fulltext

Improving Label-Deficient Keyword Spotting Using Self-Supervised Pretraining [article]

Preserved Fulltext

Other Versions

Zero-Shot Cross-Lingual Phonetic Recognition with External Language Embedding

Preserved Fulltext

CLOWER: A Pre-trained Language Model with Contrastive Learning over Word and Character Representations [article]

Preserved Fulltext

Other Versions

Mask Attention Networks: Rethinking and Strengthen Transformer [article]

Preserved Fulltext

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models [article]

Preserved Fulltext

Adaptive Transformers for Learning Multimodal Representations [article]

Preserved Fulltext

Other Versions

Zero-shot Aspect-level Sentiment Classification via Explicit Utilization of Aspect-to-Document Sentiment Composition [article]

Preserved Fulltext

Page 2 of Classical Antiquity Vol. 17, Issue 1 [page]

Preserved Fulltext

Global and Local Semantic Completion Learning for Vision-Language Pre-training [article]

Preserved Fulltext

Other Versions

Generative Sentiment Transfer via Adaptive Masking [article]

Preserved Fulltext

Extract Aspect-based Financial Opinion Using Natural Language Inference

Preserved Fulltext