LINSPECTOR: Multilingual Probing Tasks for Word Representations
release_6zrivthtkzeyxmrvk7a4psukdq
by
Gözde Gül Şahin, Clara Vania, Ilia Kuznetsov, Iryna Gurevych
2019
Abstract
Despite an ever growing number of word representation models introduced for a
large number of languages, there is a lack of a standardized technique to
provide insights into what is captured by these models. Such insights would
help the community to get an estimate of the downstream task performance, as
well as to design more informed neural architectures, while avoiding extensive
experimentation which requires substantial computational resources not all
researchers have access to. A recent development in NLP is to use simple
classification tasks, also called probing tasks, that test for a single
linguistic feature such as part-of-speech. Existing studies mostly focus on
exploring the information encoded by the sentence-level representations for
English. However, from a typological perspective the morphologically poor
English is rather an outlier: the information encoded by the word order and
function words in English is often stored on a subword, morphological level in
other languages. To address this, we introduce 15 word-level probing tasks such
as case marking, possession, word length, morphological tag count and
pseudoword identification for 24 languages. We present experiments on several
state of the art word embedding models, in which we relate the probing task
performance for a diverse set of languages to a range of classic NLP tasks such
as semantic role labeling and natural language inference. We find that a number
of probing tests have significantly high positive correlation to the downstream
tasks, especially for morphologically rich languages. We show that our tests
can be used to explore word embeddings or black-box neural models for
linguistic cues in a multilingual setting. We release the probing datasets and
the evaluation suite with https://github.com/UKPLab/linspector.
In text/plain
format
Archived Files and Locations
application/pdf 1.5 MB
file_ei4iklufvjbctl6hdtqh6rxvyq
|
arxiv.org (repository) web.archive.org (webarchive) |
1903.09442v1
access all versions, variants, and formats of this works (eg, pre-prints)