A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Detecting Sensitive Information from Textual Documents: An Information-Theoretic Approach
[chapter]
2012
Lecture Notes in Computer Science
In this paper, we present a general-purpose method to automatically detect sensitive information from textual documents in a domain-independent way. ...
In this paper, we tackle the problem of automatic detection of sensitive text for sanitization purposes. ...
Disclaimer and acknowledgments Authors are solely responsible for the views expressed in this paper, which do not necessarily reflect the position of UNESCO nor commit that organization. This ...
doi:10.1007/978-3-642-34620-0_17
fatcat:lcxymnrqmjcubhgcnxh6ozayaa
Detecting Term Relationships to Improve Textual Document Sanitization
2013
Pacific Asia Conference on Information Systems
In this paper, we present a general-purpose method to automatically detect semantically related terms that may enable disclosure of sensitive data. ...
Several automatic sanitization mechanisms can be found in the literature; however, most of them evaluate the sensitivity of the textual terms considering them as independent variables. ...
Our proposal relies on the foundations of the information theory and a corpus as global as the Web to offer a general-purpose solution that can be automatically applied to heterogeneous textual documents ...
dblp:conf/pacis/0001BV13
fatcat:afuqrvar3zelljp7fmzjnzof24
Automatic Declassification of Textual Documents by Generalizing Sensitive Terms
2014
International Journal of Computer Applications
This paper presents a generalized sanitization method that discovers the sensitive information based on the concept of information content. ...
So before document publishing, sanitization operations are performed on the document for preserving the privacy and inorder to retain the utility of the document. ...
CONCLUSION Publishing of textual documents is essential for various purposes such as research, decision making and due to regulations. ...
doi:10.5120/17626-8390
fatcat:g3rkbm7g3rclfaambq75h4t6ti
C-sanitized: A privacy model for document redaction and sanitization
2015
Journal of the Association for Information Science and Technology
To do so, human experts are usually requested to redact or sanitize document contents. ...
The sensitive nature of much of this information causes a serious privacy threat when documents are uncontrollably made available to untrusted third parties. ...
Its goal is to mimic and, hence, automatize the reasoning of human sanitizers with regard to semantic inferences, disclosure analysis and protection of textual documents. ...
doi:10.1002/asi.23363
fatcat:5rq7pnxknjarnavnfxcajfyy6y
Toward sensitive document release with privacy guarantees
2017
Engineering applications of artificial intelligence
In fact, textual documents are usually protected manually, in a process known as document redaction or sanitization. ...
automatic document redaction/sanitization algorithms and offers clear and a priori privacy guarantees on data protection; even though its potential benefits C-sanitization still presents some limitations ...
Its goal is to mimic and, hence, automatize the analysis of semantic inferences that human experts perform for document sanitization. ...
doi:10.1016/j.engappai.2016.12.013
fatcat:bivycjh7fvfmhkt4djlc6gy2ki
A Review on Text Sanitization
2014
International Journal of Computer Applications
Several semi-automatic and automatic methods are used for identifying sensitive information and thereby sanitizing the document by removing such terms. ...
So before publishing or sharing documents, the sensitive information should be removed or masked. This is the major goal of Text sanitization. ...
general-purpose knowledge bases/corpora such as web. ...
doi:10.5120/16749-6916
fatcat:jr22xjtgqbc5didgv4mtsh547q
A Privacy Preserving Data Publishing Middleware for Unstructured, Textual Social Media Data
2020
International Conference on Language Resources and Evaluation
Privacy is going to be an integral part of data science and analytics in the coming years. ...
Privacy preservation becomes more challenging specially in the context of unstructured data. ...
The purpose of this research is to come up with a framework to sanitize data and preserve privacy, which can be utilized before publishing textual social media data to any analytical 3rd party. ...
dblp:conf/lrec/AbeywardanaT20
fatcat:j6vxa2j34ncqbdf26cmeavgtxy
Text Sanitization Beyond Specific Domains: Zero-Shot Redaction Substitution with Large Language Models
[article]
2023
arXiv
pre-print
In the context of information systems, text sanitization techniques are used to identify and remove sensitive data to comply with security and regulatory requirements. ...
generality and requiring customization for each desirable domain. ...
A general-purpose sanitization method exploiting knowledge bases to compute term frequency for sensitive term substitution is proposed in [27] . ...
arXiv:2311.10785v1
fatcat:nzcgiepdcbdfzj7z4miktmi26e
An Information Retrieval Approach to Document Sanitization
[chapter]
2014
Studies in Computational Intelligence
In this paper we use information retrieval metrics to evaluate the effect of a document sanitization process, measuring information loss and risk of disclosure. ...
In order to sanitize the documents we have developed a semiautomatic anonymization process following the guidelines of Executive Order 13526 (2009) of the US Administration. ...
The work contributed by the second author was carried out as part of the Computer Science Ph.D. program of the Universitat Autònoma de Barcelona (UAB). ...
doi:10.1007/978-3-319-09885-2_9
fatcat:26c2tedf65g75ad26bwrdqwv4q
Privacy-preserving data outsourcing in the cloud via semantic data splitting
2017
Computer Communications
cloud storage locations we need; to show its potential and generality, we have applied it to the least structured and most challenging data type: plain textual documents. ...
We propose a semantically-grounded data splitting mechanism that is able to automatically detect pieces of data that may cause privacy risks and split them on local premises, so that each chunk does not ...
As far as we know, the only privacy model that fits with this scenario and these requirements is C-sanitization [25, 26] , a general privacy model for (textual) document sanitization. ...
doi:10.1016/j.comcom.2017.06.012
fatcat:3fmwhsr6ijdb3m2zfmi6t5yrwi
Semantify CEUR-WS Proceedings: Towards the Automatic Generation of Highly Descriptive Scholarly Publishing Linked Datasets
[chapter]
2014
Communications in Computer and Information Science
To foster this trend, in the context of the ESWC2014 Semantic Publishing Challenge, we present a system that automatically generates rich RDF datasets from CEUR-WS workshop proceedings. ...
Our system is provided as an on-line Web service to support on-the-fly RDF generation. ...
PHASE 2: Semantic annotator: this component automatically adds semantic annotations to the textual contents of HTML documents without semantic markups. ...
doi:10.1007/978-3-319-12024-9_10
fatcat:cymj5lrgszable4csh7gylnzem
Detecting Inconsistencies Between Process Models and Textual Descriptions
[chapter]
2015
Lecture Notes in Computer Science
To reduce the time and effort needed to repair such situations, this paper presents the first approach to automatically identify inconsistencies between a process model and a corresponding textual description ...
When considering that hundreds of such descriptions may be in use in a particular organization by dozens of people, using a variety of editors, there is a clear risk that such models become misaligned. ...
Textual process descriptions generally describe process steps in a chronological order [33] . ...
doi:10.1007/978-3-319-23063-4_6
fatcat:k7mbvp4rzngtjgrdk7ftrncc3q
Document Sanitization: Measuring Search Engine Information Loss and Risk of Disclosure for the Wikileaks cables
[chapter]
2012
Lecture Notes in Computer Science
In order to sanitize the documents we have developed a semi-automatic anonymization process following the guidelines of Executive Order 13526 (2009) of the US Administration, by (i) identifying and anonymizing ...
In this paper we evaluate the effect of a document sanitization process on a set of information retrieval metrics, in order to measure information loss and risk of disclosure. ...
By manual inspection of the docu-ments, we can conclude in general that a worse value is due to the loss of key textual information relevant to the query. ...
doi:10.1007/978-3-642-33627-0_24
fatcat:ecreun3syjgavcfbnrijo6taba
Development of custom notation for XML-based language: A model-driven approach
2017
Computer Science and Information Systems
We provide recommendations for application of the approach and demonstrate them on a case study of a language for definition of graphs. ...
In spite of its popularity, XML provides poor user experience and a lot of domain-specific languages can be improved by introducing custom, more humanfriendly notation. ...
This work was supported by projects KEGA 047TUKE-4/2016 "Integrating software processes into the teaching of programming" and FEI-2015-23 "Pattern based domainspecific language development". ...
doi:10.2298/csis170116036c
fatcat:axuuywgggncw3me4wquik2ui74
De-Identification of French Unstructured Clinical Notes for Machine Learning Tasks
[article]
2023
arXiv
pre-print
Unstructured textual data are at the heart of health systems: liaison letters between doctors, operating reports, coding of procedures according to the ICD-10 standard, etc. ...
The result is an approach that effectively protects the privacy of the patients at the heart of these medical documents. ...
For example in [36] , the authors present a new state-of-the-art for French Named Entity Recognition in a general purpose context (not medical). ...
arXiv:2209.09631v2
fatcat:hummxvhotncnlliprkh776bli4
« Previous
Showing results 1 — 15 out of 2,759 results