Geographic Named Entity Recognition and Disambiguation in Mexican News using word embeddings.

In this paper, we propose an extensible geoparsing approach including geographic entity recognition based on a neural network model and disambiguation based on what we have called dynamic context disambiguation ... The first task could be approached through a machine learning approach, in which case a model is trained to recognize a sequence of characters (words) corresponding to geographic entities. ... Geographic-Named Entity Recognition We have obtained the semantic features based on word embeddings obtained with word2vec [29] . ...

doi:10.3390/rs12183041 doaj:2a94e8c05d16492f856aa3ed81fb4916 fatcat:odfrdlic2fa37ahqtpx624c7wa

DOAJ Szczepanski

Proper name recognition is a subtask of Name Entity Recognition in Message Understanding Conference. ... For our corpus annotation proper name recognition is a crucial task since proper names appear approximately in more than 50% of total sentences of the electronic texts that we collected for such purpose ... The preliminary results shows the possibilities of the method and the required information for better results. ...

doi:10.1007/978-3-540-24694-7_43 fatcat:broitu57rzajtflicbfo3rsh24

To train our model, we use toponym co-occurrences collected from different contexts, namely textual (i.e., co-occurrences of toponyms in Wikipedia articles) and geographical (i.e., inclusion and proximity ... in geographical areas with fewer places in the data sources. ... The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. ...

doi:10.3390/ijgi10120818 fatcat:sttddeumfbg4lnkjm4tptfrzr4

DOAJ Szczepanski

Our goal is to form in the vector thematic layers geographically meaningful words correctly attached to the cartographic objects. ... In this work, we propose a method that combines OCR-based text recognition in raster-scanned maps with heuristics specially adapted for cartographic data to resolve the recognition ambiguities using, among ... Acknowledgments The work was partially supported by Mexican Government (CONACYT, SNI, CGPI-IPN) and the ITRI of the Chung-Ang University. ...

doi:10.1007/978-3-540-25977-0_7 fatcat:5kowthkdzjcdxl7zheiuyitdke

of metrics and a detailed toponym taxonomy with implications for Named Entity Recognition (NER) and beyond. ... Part 3) Evaluation Data: shared via a new dataset called GeoWebNews to provide test/train examples and enable immediate use of our contributions. ... Authors agree that standard Named Entity Recognition is inade-quate for geographic NLP tasks. ...

arXiv:1810.12368v5 fatcat:omtwa7xnvrgxvgipn6pddc6l44

Multiple Versions

We apply techniques from natural language processing (lexicons, word embeddings, topic models) to 15 U.S. history textbooks widely used in Texas between 2015 and 2017, studying their depiction of historically ... Word embeddings reveal that women tend to be discussed in the contexts of work and the home. Topic modeling highlights the higher prominence of political topics compared with social ones. ... Acknowledgments We would like to thank the following individuals for helpful conversations, feedback, and ideas: Noah Smith, Sebastian Munoz-Najar Galvez, Lily ...

doi:10.1177/2332858420940312 fatcat:l5antrdnc5d5dbi4goob6k5lou

DOAJ Szczepanski

This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics ... It covers models and methods for discovering and canonicalizing entities and their semantic types and organizing them into clean taxonomies. ... It is a great pleasure and honor to have such wonderful colleagues in our research community. ...

arXiv:2009.11564v2 fatcat:vh2lqfmhhbcwpf6dcsej3hhvgy

Multiple Versions

More recently, this system was used in the early phases of response to COVID-19 in the United States, although its utility was limited to a relatively brief window due to the rapid domestic spread of the ... This study aims to assess the feasibility of annotating and automatically extracting travel history mentions from unstructured clinical documents in the Department of Veterans Affairs across disparate ... We thank the editor and anonymous reviewers for their feedback in ameliorating the reporting of this study. ...

doi:10.2196/26719 pmid:33759790 fatcat:bmji77r2nvgk7lkpvpx245eajm

DOAJ

Citation

Kelly S Peterson, Julia Lewis, Olga V Patterson, Alec B Chapman, Daniel Denhalter, Patricia A Lye, Vanessa W Stevens, Shantini D Gamage, Gary A Roselle, Katherine S Wallace, Makoto Jones. "Automated Travel History Extraction from Clinical Notes: Algorithm Development and Validation for Emergent Infectious Disease Events (Preprint)." JMIR Public Health and Surveillance 7.3 (2020) e26719

Previous studies on this topic have typically assumed that geographical references (e.g., gazetteer terms, dialectal words) in a text are indicative of its author's location. ... Geographical location is vital to geospatial applications like local search and event detection. ... Acknowledgments The authors wish to thank Stephen Roller and Jason Baldridge making their data and tools available to replicate their NA experiments. ...

doi:10.1613/jair.4200 fatcat:jvdb3fdb4ngoxmsjo4fm2jbqve

DOAJ Szczepanski

Such topics are used in tasks like classification and recommendations. ... Numerous approaches have been proposed to detect topics from collections of microposts, where the topics are represented by lists of terms such as words, phrases, or word embeddings. ... Dinesh and Dr. Jayant Venkatanatha for valuable contributions during the preparation of this work. ...

doi:10.1371/journal.pone.0236863 pmid:32780736 fatcat:rblunq2kqffcvpfg362olx5rhq

DOAJ

Book of Abstracts DHN, Rīga 2020 Book of Abstracts of the Digital Humanities in the Nordic Countries 5th conference. ... conferences/dhn2020 Editors: Sanita Reinsone, Anda Baklāne, Jānis Daugavietis Editorial assistants: Justīne Jaudzema, Ilze Ļaksa-Timinska Cover: Anete Krūmiņa Publisher: Institute of Literature, Folklore and ... We also thank the library of the Technische Acknowledgements This work has been supported by the European Union's Horizon 2020 research and innovation programme under grant 770299 (NewsEye). ...

doi:10.5281/zenodo.4107117 fatcat:6ongky6p5rab7gvtawnjmp2ofm

Open Access

In this paper, we present a work in progress for location disambiguation in news documents that uses a vector-semantic representation learned from information sources that include events and geographic ... Linking these locations to coordinates in a map usually requires two steps involving the named entity: extraction and disambiguation. ... The locations were provided by CentroGeo based on OpenNLP's Named Entity Recognition (NER) module [1] . ...

doi:10.29007/pl5h fatcat:3trhftivlbcdbaubwiw5vbhaiy

We expect that this survey will be useful to NLP researchers interested in building equitable language technologies by rethinking LLM benchmarks and model architectures. ... We observe that past work in NLP concerning dialects goes deeper than mere dialect classification, and . ... This approach uses two neural parsers, which are modified with the word embeddings used for initialisation. The word embeddings are trained on the standard and the dialect-specific datasets. ...

arXiv:2401.05632v2 fatcat:qkpfmywh2bar7capolzkfx6guu

Open Access Multiple Versions

embeddings, probabilities, and generated text. ... Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: ... Sentence Embedding Metrics Instead of using static word embeddings, LLMs use embeddings learned in the context of a sentence, and are more appropriately paired with embedding metrics for sentence-level ...

arXiv:2309.00770v2 fatcat:idqaltzjdndnhnwcgd7dsqcejm

Multiple Versions

In domains like information retrieval, words have classically been modeled as discrete entities using 1-of-n encoding, a representation that elides most of a word's syntactic and semantic structure. ... Recent research, however, has begun exploring more robust representations called word embeddings. ... named entity recognition and linking tasks. ...

fatcat:2doedhxumbf2vcq4mpm47wmm6a

Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text

Preserved Fulltext

Recognition of Named Entities in Spanish Texts [chapter]

Preserved Fulltext

Deep Learning for Toponym Resolution: Geocoding Based on Pairs of Toponyms

Preserved Fulltext

Resolving Ambiguities in Toponym Recognition in Cartographic Maps [chapter]

Preserved Fulltext

A Pragmatic Guide to Geoparsing Evaluation [article]

Preserved Fulltext

Other Versions

Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks

Preserved Fulltext

Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases [article]

Preserved Fulltext

Other Versions

Automated Travel History Extraction from Clinical Notes: Algorithm Development and Validation for Emergent Infectious Disease Events (Preprint)

Preserved Fulltext

Text-Based Twitter User Geolocation Prediction

Preserved Fulltext

Microblog topic identification using Linked Open Data

Preserved Fulltext

Book of Abstracts of the Digital Humanities in the Nordic Countries 5th conference. Riga, 20–23 October 2020 [article]

Preserved Fulltext

A Vector Semantics Approach to the Geoparsing Disambiguation Task for Texts in Spanish

Preserved Fulltext

Natural Language Processing for Dialects of a Language: A Survey [article]

Preserved Fulltext

Other Versions

Bias and Fairness in Large Language Models: A Survey [article]

Preserved Fulltext

Other Versions

Modeling words for online sexual behavior surveillance and clinical text information extraction MODELING WORDS FOR ONLINE SEXUAL BEHAVIOR SURVEILLANCE AND CLINICAL TEXT INFORMATION EXTRACTION

Preserved Fulltext