Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








2,759 Hits in 2.9 sec

Detecting Sensitive Information from Textual Documents: An Information-Theoretic Approach [chapter]

David Sánchez, Montserrat Batet, Alexandre Viejo
2012 Lecture Notes in Computer Science  
In this paper, we present a general-purpose method to automatically detect sensitive information from textual documents in a domain-independent way.  ...  In this paper, we tackle the problem of automatic detection of sensitive text for sanitization purposes.  ...  Disclaimer and acknowledgments Authors are solely responsible for the views expressed in this paper, which do not necessarily reflect the position of UNESCO nor commit that organization. This  ... 
doi:10.1007/978-3-642-34620-0_17 fatcat:lcxymnrqmjcubhgcnxh6ozayaa

Detecting Term Relationships to Improve Textual Document Sanitization

David Sánchez, Montserrat Batet, Alexandre Viejo
2013 Pacific Asia Conference on Information Systems  
In this paper, we present a general-purpose method to automatically detect semantically related terms that may enable disclosure of sensitive data.  ...  Several automatic sanitization mechanisms can be found in the literature; however, most of them evaluate the sensitivity of the textual terms considering them as independent variables.  ...  Our proposal relies on the foundations of the information theory and a corpus as global as the Web to offer a general-purpose solution that can be automatically applied to heterogeneous textual documents  ... 
dblp:conf/pacis/0001BV13 fatcat:afuqrvar3zelljp7fmzjnzof24

Automatic Declassification of Textual Documents by Generalizing Sensitive Terms

Veena Vasudevan, Ansamma John
2014 International Journal of Computer Applications  
This paper presents a generalized sanitization method that discovers the sensitive information based on the concept of information content.  ...  So before document publishing, sanitization operations are performed on the document for preserving the privacy and inorder to retain the utility of the document.  ...  CONCLUSION Publishing of textual documents is essential for various purposes such as research, decision making and due to regulations.  ... 
doi:10.5120/17626-8390 fatcat:g3rkbm7g3rclfaambq75h4t6ti

C-sanitized: A privacy model for document redaction and sanitization

David Sánchez, Montserrat Batet
2015 Journal of the Association for Information Science and Technology  
To do so, human experts are usually requested to redact or sanitize document contents.  ...  The sensitive nature of much of this information causes a serious privacy threat when documents are uncontrollably made available to untrusted third parties.  ...  Its goal is to mimic and, hence, automatize the reasoning of human sanitizers with regard to semantic inferences, disclosure analysis and protection of textual documents.  ... 
doi:10.1002/asi.23363 fatcat:5rq7pnxknjarnavnfxcajfyy6y

Toward sensitive document release with privacy guarantees

David Sánchez, Montserrat Batet
2017 Engineering applications of artificial intelligence  
In fact, textual documents are usually protected manually, in a process known as document redaction or sanitization.  ...  automatic document redaction/sanitization algorithms and offers clear and a priori privacy guarantees on data protection; even though its potential benefits C-sanitization still presents some limitations  ...  Its goal is to mimic and, hence, automatize the analysis of semantic inferences that human experts perform for document sanitization.  ... 
doi:10.1016/j.engappai.2016.12.013 fatcat:bivycjh7fvfmhkt4djlc6gy2ki

A Review on Text Sanitization

Veena Vasudevan, Ansamma John
2014 International Journal of Computer Applications  
Several semi-automatic and automatic methods are used for identifying sensitive information and thereby sanitizing the document by removing such terms.  ...  So before publishing or sharing documents, the sensitive information should be removed or masked. This is the major goal of Text sanitization.  ...  general-purpose knowledge bases/corpora such as web.  ... 
doi:10.5120/16749-6916 fatcat:jr22xjtgqbc5didgv4mtsh547q

A Privacy Preserving Data Publishing Middleware for Unstructured, Textual Social Media Data

Prasadi Abeywardana, Uthayasanker Thayasivam
2020 International Conference on Language Resources and Evaluation  
Privacy is going to be an integral part of data science and analytics in the coming years.  ...  Privacy preservation becomes more challenging specially in the context of unstructured data.  ...  The purpose of this research is to come up with a framework to sanitize data and preserve privacy, which can be utilized before publishing textual social media data to any analytical 3rd party.  ... 
dblp:conf/lrec/AbeywardanaT20 fatcat:j6vxa2j34ncqbdf26cmeavgtxy

Text Sanitization Beyond Specific Domains: Zero-Shot Redaction Substitution with Large Language Models [article]

Federico Albanese and Daniel Ciolek and Nicolas D'Ippolito
2023 arXiv   pre-print
In the context of information systems, text sanitization techniques are used to identify and remove sensitive data to comply with security and regulatory requirements.  ...  generality and requiring customization for each desirable domain.  ...  A general-purpose sanitization method exploiting knowledge bases to compute term frequency for sensitive term substitution is proposed in [27] .  ... 
arXiv:2311.10785v1 fatcat:nzcgiepdcbdfzj7z4miktmi26e

An Information Retrieval Approach to Document Sanitization [chapter]

David F. Nettleton, Daniel Abril
2014 Studies in Computational Intelligence  
In this paper we use information retrieval metrics to evaluate the effect of a document sanitization process, measuring information loss and risk of disclosure.  ...  In order to sanitize the documents we have developed a semiautomatic anonymization process following the guidelines of Executive Order 13526 (2009) of the US Administration.  ...  The work contributed by the second author was carried out as part of the Computer Science Ph.D. program of the Universitat Autònoma de Barcelona (UAB).  ... 
doi:10.1007/978-3-319-09885-2_9 fatcat:26c2tedf65g75ad26bwrdqwv4q

Privacy-preserving data outsourcing in the cloud via semantic data splitting

David Sánchez, Montserrat Batet
2017 Computer Communications  
cloud storage locations we need; to show its potential and generality, we have applied it to the least structured and most challenging data type: plain textual documents.  ...  We propose a semantically-grounded data splitting mechanism that is able to automatically detect pieces of data that may cause privacy risks and split them on local premises, so that each chunk does not  ...  As far as we know, the only privacy model that fits with this scenario and these requirements is C-sanitization [25, 26] , a general privacy model for (textual) document sanitization.  ... 
doi:10.1016/j.comcom.2017.06.012 fatcat:3fmwhsr6ijdb3m2zfmi6t5yrwi

Semantify CEUR-WS Proceedings: Towards the Automatic Generation of Highly Descriptive Scholarly Publishing Linked Datasets [chapter]

Francesco Ronzano, Gerard Casamayor del Bosque, Horacio Saggion
2014 Communications in Computer and Information Science  
To foster this trend, in the context of the ESWC2014 Semantic Publishing Challenge, we present a system that automatically generates rich RDF datasets from CEUR-WS workshop proceedings.  ...  Our system is provided as an on-line Web service to support on-the-fly RDF generation.  ...  PHASE 2: Semantic annotator: this component automatically adds semantic annotations to the textual contents of HTML documents without semantic markups.  ... 
doi:10.1007/978-3-319-12024-9_10 fatcat:cymj5lrgszable4csh7gylnzem

Detecting Inconsistencies Between Process Models and Textual Descriptions [chapter]

Han van der Aa, Henrik Leopold, Hajo A. Reijers
2015 Lecture Notes in Computer Science  
To reduce the time and effort needed to repair such situations, this paper presents the first approach to automatically identify inconsistencies between a process model and a corresponding textual description  ...  When considering that hundreds of such descriptions may be in use in a particular organization by dozens of people, using a variety of editors, there is a clear risk that such models become misaligned.  ...  Textual process descriptions generally describe process steps in a chronological order [33] .  ... 
doi:10.1007/978-3-319-23063-4_6 fatcat:k7mbvp4rzngtjgrdk7ftrncc3q

Document Sanitization: Measuring Search Engine Information Loss and Risk of Disclosure for the Wikileaks cables [chapter]

David F. Nettleton, Daniel Abril
2012 Lecture Notes in Computer Science  
In order to sanitize the documents we have developed a semi-automatic anonymization process following the guidelines of Executive Order 13526 (2009) of the US Administration, by (i) identifying and anonymizing  ...  In this paper we evaluate the effect of a document sanitization process on a set of information retrieval metrics, in order to measure information loss and risk of disclosure.  ...  By manual inspection of the docu-ments, we can conclude in general that a worse value is due to the loss of key textual information relevant to the query.  ... 
doi:10.1007/978-3-642-33627-0_24 fatcat:ecreun3syjgavcfbnrijo6taba

Development of custom notation for XML-based language: A model-driven approach

Sergej Chodarev, Jaroslav Porubän
2017 Computer Science and Information Systems  
We provide recommendations for application of the approach and demonstrate them on a case study of a language for definition of graphs.  ...  In spite of its popularity, XML provides poor user experience and a lot of domain-specific languages can be improved by introducing custom, more humanfriendly notation.  ...  This work was supported by projects KEGA 047TUKE-4/2016 "Integrating software processes into the teaching of programming" and FEI-2015-23 "Pattern based domainspecific language development".  ... 
doi:10.2298/csis170116036c fatcat:axuuywgggncw3me4wquik2ui74

De-Identification of French Unstructured Clinical Notes for Machine Learning Tasks [article]

Yakini Tchouka, Jean-François Couchot, Maxime Coulmeau, David Laiymani, Philippe Selles, Azzedine Rahmani
2023 arXiv   pre-print
Unstructured textual data are at the heart of health systems: liaison letters between doctors, operating reports, coding of procedures according to the ICD-10 standard, etc.  ...  The result is an approach that effectively protects the privacy of the patients at the heart of these medical documents.  ...  For example in [36] , the authors present a new state-of-the-art for French Named Entity Recognition in a general purpose context (not medical).  ... 
arXiv:2209.09631v2 fatcat:hummxvhotncnlliprkh776bli4
« Previous Showing results 1 — 15 out of 2,759 results