short-paper

Measuring information gain using provenance

Authors:
Shemon Rawat

University of Cincinnati

University of Cincinnati
View Profile

,
Seokki Lee

University of Cincinnati

University of Cincinnati
View Profile

,
Taeho Jung

University of Notre Dame

University of Notre Dame
View Profile

TaPP '22: Proceedings of the 14th International Workshop on the Theory and Practice of ProvenanceJune 2022Article No.: 7Pages 1–4https://doi.org/10.1145/3530800.3534534

Published:12 June 2022Publication History

TaPP '22: Proceedings of the 14th International Workshop on the Theory and Practice of Provenance

Pages 1–4

ABSTRACT

In recent years, a large amount of data is collected from multiple sources and the demands for analyzing these data have increased enormously. Data sharing is a valuable part of this data-intensive and collaborative environment due to the synergies and added values created by multi-modal datasets generated from different sources. In this work, we introduce a technique that can be used for quantifying the degree of information gain (IG) that may be obtained over data sharing. Our method captures both where- (to compute the IG over values) and how-provenance (to find matching records) and accurately computes the IG based on them. We conduct a preliminary evaluation to show the runtime of our approach over a real-world dataset.

References

E. Ainy, P. Bourhis, SB. Davidson, D. Deutch, and Tova Milo. 2015. Approximated Summarization of Data Provenance. In CIKM. 483--492.Google Scholar
Peter Buneman, Sanjeev Khanna, and Wang-Chiew Tan. 2001. Why and Where: A Characterization of Data Provenance. In ICDT. Springer, 316--330.Google ScholarDigital Library
Surajit Chaudhuri, Bolin Ding, and Srikanth Kandula. 2017. Approximate Query Processing: No Silver Bullet. In SIGMOD. ACM, 511--519.Google Scholar
Shihyen Chen, Bin Ma, and Kaizhong Zhang. 2009. On the similarity metric and the distance metric. Theor. Comput. Sci. 410, 24--25 (2009), 2365--2376.Google ScholarDigital Library
James Cheney, Laura Chiticariu, and Wang-Chiew Tan. 2009. Provenance in databases: Why, how, and where. Now Publishers Inc.Google Scholar
Per-Erik Danielsson. 1980. Euclidean distance mapping. Computer Graphics and image processing 14, 3 (1980), 227--248.Google Scholar
Daniel Deutch, Yuval Moskovitch, and Noam Rinetzky. 2019. Hypothetical Reasoning via Provenance Abstraction. In SIGMOD. 537--554.Google Scholar
Erich Grädel and Val Tannen. 2017. Semiring Provenance for First-Order Model Checking. arXiv preprint arXiv.1712.01980 (2017).Google Scholar
T.J. Green, G. Karvounarakis, and V. Tannen. 2007. Provenance semirings. In PODS. 31--40.Google Scholar
Taeho Jung, Seokki Lee, and Wenyi Tang. 2021. Using Provenance to Evaluate Risk and Benefit of Data Sharing. In TaPP.Google Scholar
Taeho Jung, Xiang-Yang Li, Wenchao Huang, Jianwei Qian, Linlin Chen, Junze Han, Jiahui Hou, and Cheng Su. 2017. AccountTrade: Accountable protocols for big data trading against dishonest consumers. In INFOCOM. IEEE, 1--9.Google Scholar
Sven Köhler, Bertram Ludäscher, and Daniel Zinn. 2013. First-Order Provenance Games. In In Search of Elegance in the Theory and Practice of Computation. Springer, 382--399.Google Scholar
Seokki Lee, Bertram Ludäscher, and Boris Glavic. 2018. Provenance summaries for answers and non-answers. Proceedings of the VLDB Endowment 11, 12 (2018), 1954--1957.Google ScholarDigital Library
Seokki Lee, Bertram Ludäscher, and Boris Glavic. 2018. PUG: a framework and practical implementation for why and why-not provenance. The VLDB Journal (2018), 1--25.Google Scholar
Seokki Lee, Bertram Ludäscher, and Boris Glavic. 2020. Approximate Summaries for Why and Why-not Provenance. Proc. VLDB Endow. 13, 6 (2020), 912--924.Google ScholarDigital Library
Kaiyu Li and Guoliang Li. 2018. Approximate Query Processing: What is New and Where to Go? - A Survey on Approximate Query Processing. Data Sci. Eng. 3, 4 (2018), 379--397.Google Scholar
Tobias Müller, Benjamin Dietrich, and Torsten Grust. 2018. You Say'What', I Hear'Where'and'Why':(Mis-) Interpreting SQL to Derive Fine-Grained Provenance. arXiv preprint arXiv:1805.11517 (2018).Google Scholar
Mohammad Norouzi, David J Fleet, and Russ R Salakhutdinov. 2012. Hamming distance metric learning. In Advances in neural information processing systems. 1061--1069.Google Scholar
Jane Xu, Waley Zhang, Abdussalam Alawini, and Val Tannen. 2018. Provenance Analysis for Missing Answers and Integrity Repairs. Data Engineering (2018), 39.Google Scholar
Liu Yang and Rong Jin. 2006. Distance metric learning: A comprehensive survey. Michigan State Universiy 2, 2 (2006), 4.Google Scholar

Index Terms

Measuring information gain using provenance
1. Information systems
  1. Data management systems
    1. Database design and models
      1. Data model extensions
        Data provenance

Recommendations

Feature selection using Information Gain and decision information in neighborhood decision system
Abstract
Feature selection is a significant preprocessing technique for data mining, which can promote the accuracy of data classification and shrink feature space by eliminating redundant features. Since traditional feature selection ...
Highlights
- The joint information granule considers more possibilities.
- The neighborhood ...
Read More
Information gain-based semi-supervised feature selection for hybrid data
Abstract
Information gain, as an important feature measure, plays a vital role in the process of feature selection. Most of existing information gain-based feature selection algorithms are developed on data with single type features. However, in practical ...
Read More
Maximum entropy model for mobile text classification in cloud computing using improved information gain algorithm

With the rapid popularization of the Internet and the multimedia that be deemed to a new information transmission mode, people can not only get the information you want easily, but also post the information that you have in the world. At the same time, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
TaPP '22: Proceedings of the 14th International Workshop on the Theory and Practice of Provenance
June 2022
67 pages
ISBN:9781450393492
DOI:10.1145/3530800
Conference Chairs:
Adriane Chapman
University of Southampton
,
Daniel Deutch
Tel Aviv University
,
Tanu Malik
DePaul University
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
how and where provenance
information gain
Qualifiers
- short-paper
Conference

Acceptance Rates
TaPP '22 Paper Acceptance Rate10of17submissions,59%Overall Acceptance Rate10of17submissions,59%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 83
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Measuring information gain using provenance

TaPP '22: Proceedings of the 14th International Workshop on the Theory and Practice of Provenance

ABSTRACT

References

Cited By

Index Terms

Recommendations

Feature selection using Information Gain and decision information in neighborhood decision system

Information gain-based semi-supervised feature selection for hybrid data

Maximum entropy model for mobile text classification in cloud computing using improved information gain algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Measuring information gain using provenance

TaPP '22: Proceedings of the 14th International Workshop on the Theory and Practice of Provenance

ABSTRACT

References

Cited By

Index Terms

Recommendations

Feature selection using Information Gain and decision information in neighborhood decision system

Information gain-based semi-supervised feature selection for hybrid data

Maximum entropy model for mobile text classification in cloud computing using improved information gain algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media