A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Piloting a deceased subject integrated data repository and protecting privacy of relatives
2014
AMIA Annual Symposium Proceedings
We report on creation of the decease subject Integrated Data Repository (dsIDR) at National Institutes of Health, Clinical Center and a pilot methodology to remove secondary protected health information ...
We characterize available structured coded data in dsIDR and report the estimated frequencies of secondary PxI, ranging from 12.9% (sensitive token presence) to 1.1% (using stricter criteria). ...
Acknowledgments: This work has been supported by intramural research funds from the NIH Clinical Center and the National Library of Medicine. ...
pmid:25954378
pmcid:PMC4420001
fatcat:xesdrfhaxnb27krmwblhvyg6k4
Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes
2020
npj Digital Medicine
To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter ("Protected ...
of data available in structured medical record data. ...
We developed a privacy-centric approach to removing PHI from free-text clinical notes using both rule-based and statistical natural language processing (NLP) approaches. ...
doi:10.1038/s41746-020-0258-y
pmid:32337372
pmcid:PMC7156708
fatcat:loudfjimwra4zkiddjima3s2xy
Review: Privacy-preservation in the context of Natural Language Processing
2021
IEEE Access
This raises a lot of concerns about the ways the data is acquired and the potential information leaks. ...
A considerable portion of user-contributed data is in natural language, and in the past few years, many researchers have proposed NLP-based methods to address these data privacy challenges. ...
[29] investigated the use of a frequency-filtering approach where they filter out rare sentences (frequency < 3) and sentences containing bigrams under a certain frequency threshold (frequency < 256 ...
doi:10.1109/access.2021.3124163
fatcat:pb5kf2yv7jbyfhtmpxwvrmtfdm
Biomedical Natural Language ProcessingKevin Bretonnel Cohen and Dina Demner-Fushman (University of Colorado School of Medicine, and National Library of Medicine)John Benjamins Publishing (Book series on Natural Language Processing, edited by Ruslan Mitkov, volume 11), 2014, 160 pp; hardbound, ISBN 978-90-272-4997-5
2017
Computational Linguistics
Acknowledgments The work presented in this paper comes from a 3 year project (ALADIN) started in 2009 and funded by the French Agence Nationale de la Recherche (National Research Agency -ANR) in the context ...
Acknowledgments We would like to thank all members of our research group, IT for Health, for their support and input. ...
Materials and Method The ever-increasing amount of biomedical (molecular biology, genetics, proteomics) and clinical data repositories increase in a dramatic manner. ...
doi:10.1162/coli_r_00281
fatcat:6abwqppd3bgyvkzfc2mq6kzo6u
Roadmap to a Comprehensive Clinical Data Warehouse for Precision Medicine Applications in Oncology
2017
Cancer Informatics
A cornerstone for these programs is the establishment of enterprise-wide Clinical Data Warehouses. ...
information originating from data sources, including Electronic Medical Records, Clinical Trial Management Systems, Tumor Registries, Biospecimen Repositories, Radiology and Pathology archives, and Next ...
RSD and LR managed administrative and clinical adoption. SG, KH, LAG, ES, and GR contributed in data elements selection and clinical evaluation. ...
doi:10.1177/1176935117694349
pmid:28469389
pmcid:PMC5392017
fatcat:azzuk5zt2zh7tjynwituv4khdy
De-identifying Spanish medical texts - named entity recognition applied to radiology reports
2021
Journal of Biomedical Semantics
Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data ...
Background Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. ...
Acknowledgements We would like to thank the Medical Image Bank of the Valencian Community, from which the data used in this publication come from. ...
doi:10.1186/s13326-021-00236-2
pmid:33781334
pmcid:PMC8006627
fatcat:qi2ra7z7frhhbdvcprevcklbta
De-identification of clinical free text using natural language processing: A systematic review of current approaches
[article]
2023
arXiv
pre-print
De-identification, i.e. the process of removing PHI is a critical step in making EHR data accessible. ...
Objectives: Our study aims to provide systematic evidence on how the de-identification of clinical free text has evolved in the last thirteen years, and to report on the performances and limitations of ...
., “A Frequency-based Strategy of Obtaining Sentences from Clinical Data
Repository for Crowdsourcing,” in Medinfo 2015: Ehealth-Enabled Health, I. N. Sarkar, A.
Georgiou, and P. M. D. ...
arXiv:2312.03736v1
fatcat:gd5oci3z7nbd3bpmvmxt6unbry
Building a Best-in-Class De-identification Tool for Electronic Medical Records Through Ensemble Learning
[article]
2020
medRxiv
pre-print
We evaluated the system with a publicly available dataset of 515 notes from the I2B2 2014 de-identification challenge and a dataset of 10,000 notes from the Mayo Clinic. ...
The results indicated a recall of 0.992 and 0.994 and a precision of 0.979 and 0.967 on the I2B2 and the Mayo Clinic data, respectively. ...
that was used for testing the performance of the system, the Mayo Data Team of Ahmed Hadad, Connie Nehls and Salena Tong for preparing and helping us understand the Mayo EHR data and Andy Danielsen for ...
doi:10.1101/2020.12.22.20248270
fatcat:z2akx4pz5fcopa5towq7h4w6py
Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis
2015
IMIA Yearbook of Medical Informatics
We conducted a literature review of clinical NLP research from 2008 to 2014, emphasizing recent publications (2012-2014), based on PubMed and ACL proceedings as well as relevant referenced publications ...
We present a review of recent advances in clinical Natural Language Processing (NLP), with a focus on semantic analysis and key subtasks that support such analysis. ...
runtime and complexity to support knowledge discovery efforts from a large-scale clinical repository. ...
doi:10.15265/iy-2015-009
pmid:26293867
pmcid:PMC4587060
fatcat:4cqwat2q2jhkphvrfytfowmyge
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
[article]
2020
arXiv
pre-print
The Pile is constructed from 22 diverse high-quality subsets – both existing and newly constructed – many of which derive from academic or professional sources. ...
Through an in-depth exploratory analysis, we document potentially concerning aspects of the data for prospective users. We make publicly available the code used in its construction. ...
Terms of Service (ToS) compliant data is data which is obtained and used in a fashion that is known to be consistent with the terms of service of the data host. ...
arXiv:2101.00027v1
fatcat:74dgmcl55rdupks3kzygosjlca
Transfer language space with similar domain adaptation: a case study with hepatocellular carcinoma
2022
Journal of Biomedical Semantics
Background Transfer learning is a common practice in image classification with deep learning where the available data is often limited for training a complex model with millions of parameters. ...
Conclusion We conclude that transfer learning along with fine-tuning the discriminative model is often more effective for performing shared targeted tasks than the training for a language space from scratch ...
To obtain a representative sample of benign cases from the MR studies (which represented 1% of the LI-RADS coded reports), two radiologists manually annotated 537 benign cases from the EUH MRI dataset. ...
doi:10.1186/s13326-022-00262-8
pmid:35197110
pmcid:PMC8867666
fatcat:zsh5chrib5am5i6xnrrxoludye
Speech motor control in fluent and dysfluent speech production of an individual with apraxia of speech and Broca's aphasia
2007
Clinical Linguistics & Phonetics
In the first analysis, kinematic and coordination data from error-free fluent speech samples were compared to the same type of data from a group of six age-matched control speakers (males & females). ...
In this study, movement data from lips, jaw and tongue were acquired using the AG-100 EMMA system from a relatively young individual with apraxia of speech (AOS) and Broca's aphasia. ...
Acknowledgements This study was supported by a grant from the Natural Sciences and Engineering Research Council of Canada (NSERC), awarded to the first author. The authors wish to thank Dr. ...
doi:10.1080/02699200600812331
pmid:17364624
fatcat:5a3wl55fjraupnxhaomgqlcsvi
Datasets for Large Language Models: A Comprehensive Survey
[article]
2024
arXiv
pre-print
Information from 20 dimensions is incorporated into the dataset statistics. The total data size surveyed surpasses 774.5 TB for pre-training corpora and 700M instances for other datasets. ...
Additionally, a comprehensive review of the existing available dataset resources is also provided, including statistics from 444 datasets, covering 8 language categories and spanning 32 domains. ...
Smashwords is a large repository of free ebooks, containing over 500K electronic books. ...
arXiv:2402.18041v1
fatcat:lizkw5xllrfthn6efbggy7jhh4
Grafting acoustic instruments and signal processing: Creative control and augmented expressivity
2013
Journal of the Acoustical Society of America
It is traditionally a participatory form of music with no distinction between performers and audience, a characteristic that makes for acoustical requirements that differ considerably from those of a concert ...
Sacred Harp singing, a common type of shape-note singing, is a centuries-old tradition of American community choral music. ...
from a clinical scanner. ...
doi:10.1121/1.4830794
fatcat:2sexsmbypjcozjctnorzoiy7te
The neurology of syntax: Language use without Broca's area
2000
Behavioral and Brain Sciences
After several decades of the study of language and the brain from a linguistic angle, there is now a relatively dense body of facts that can be seriously evaluated. ...
An outlook on language derived from current linguistic theory can lead to a new and more precise picture of language and the brain. ...
I would like to thank Michal Ben-Shachar for her invaluable comments and help and Danny Fox for saving me from several pitfalls. ...
doi:10.1017/s0140525x00002399
fatcat:gqirknf7f5cvpc5vjyz3yv6gmq
« Previous
Showing results 1 — 15 out of 103 results