Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3530800.3534534acmconferencesArticle/Chapter ViewAbstractPublication PagestappConference Proceedingsconference-collections
short-paper

Measuring information gain using provenance

Published:12 June 2022Publication History

ABSTRACT

In recent years, a large amount of data is collected from multiple sources and the demands for analyzing these data have increased enormously. Data sharing is a valuable part of this data-intensive and collaborative environment due to the synergies and added values created by multi-modal datasets generated from different sources. In this work, we introduce a technique that can be used for quantifying the degree of information gain (IG) that may be obtained over data sharing. Our method captures both where- (to compute the IG over values) and how-provenance (to find matching records) and accurately computes the IG based on them. We conduct a preliminary evaluation to show the runtime of our approach over a real-world dataset.

References

  1. E. Ainy, P. Bourhis, SB. Davidson, D. Deutch, and Tova Milo. 2015. Approximated Summarization of Data Provenance. In CIKM. 483--492.Google ScholarGoogle Scholar
  2. Peter Buneman, Sanjeev Khanna, and Wang-Chiew Tan. 2001. Why and Where: A Characterization of Data Provenance. In ICDT. Springer, 316--330.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Surajit Chaudhuri, Bolin Ding, and Srikanth Kandula. 2017. Approximate Query Processing: No Silver Bullet. In SIGMOD. ACM, 511--519.Google ScholarGoogle Scholar
  4. Shihyen Chen, Bin Ma, and Kaizhong Zhang. 2009. On the similarity metric and the distance metric. Theor. Comput. Sci. 410, 24--25 (2009), 2365--2376.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. James Cheney, Laura Chiticariu, and Wang-Chiew Tan. 2009. Provenance in databases: Why, how, and where. Now Publishers Inc.Google ScholarGoogle Scholar
  6. Per-Erik Danielsson. 1980. Euclidean distance mapping. Computer Graphics and image processing 14, 3 (1980), 227--248.Google ScholarGoogle Scholar
  7. Daniel Deutch, Yuval Moskovitch, and Noam Rinetzky. 2019. Hypothetical Reasoning via Provenance Abstraction. In SIGMOD. 537--554.Google ScholarGoogle Scholar
  8. Erich Grädel and Val Tannen. 2017. Semiring Provenance for First-Order Model Checking. arXiv preprint arXiv.1712.01980 (2017).Google ScholarGoogle Scholar
  9. T.J. Green, G. Karvounarakis, and V. Tannen. 2007. Provenance semirings. In PODS. 31--40.Google ScholarGoogle Scholar
  10. Taeho Jung, Seokki Lee, and Wenyi Tang. 2021. Using Provenance to Evaluate Risk and Benefit of Data Sharing. In TaPP.Google ScholarGoogle Scholar
  11. Taeho Jung, Xiang-Yang Li, Wenchao Huang, Jianwei Qian, Linlin Chen, Junze Han, Jiahui Hou, and Cheng Su. 2017. AccountTrade: Accountable protocols for big data trading against dishonest consumers. In INFOCOM. IEEE, 1--9.Google ScholarGoogle Scholar
  12. Sven Köhler, Bertram Ludäscher, and Daniel Zinn. 2013. First-Order Provenance Games. In In Search of Elegance in the Theory and Practice of Computation. Springer, 382--399.Google ScholarGoogle Scholar
  13. Seokki Lee, Bertram Ludäscher, and Boris Glavic. 2018. Provenance summaries for answers and non-answers. Proceedings of the VLDB Endowment 11, 12 (2018), 1954--1957.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Seokki Lee, Bertram Ludäscher, and Boris Glavic. 2018. PUG: a framework and practical implementation for why and why-not provenance. The VLDB Journal (2018), 1--25.Google ScholarGoogle Scholar
  15. Seokki Lee, Bertram Ludäscher, and Boris Glavic. 2020. Approximate Summaries for Why and Why-not Provenance. Proc. VLDB Endow. 13, 6 (2020), 912--924.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kaiyu Li and Guoliang Li. 2018. Approximate Query Processing: What is New and Where to Go? - A Survey on Approximate Query Processing. Data Sci. Eng. 3, 4 (2018), 379--397.Google ScholarGoogle Scholar
  17. Tobias Müller, Benjamin Dietrich, and Torsten Grust. 2018. You Say'What', I Hear'Where'and'Why':(Mis-) Interpreting SQL to Derive Fine-Grained Provenance. arXiv preprint arXiv:1805.11517 (2018).Google ScholarGoogle Scholar
  18. Mohammad Norouzi, David J Fleet, and Russ R Salakhutdinov. 2012. Hamming distance metric learning. In Advances in neural information processing systems. 1061--1069.Google ScholarGoogle Scholar
  19. Jane Xu, Waley Zhang, Abdussalam Alawini, and Val Tannen. 2018. Provenance Analysis for Missing Answers and Integrity Repairs. Data Engineering (2018), 39.Google ScholarGoogle Scholar
  20. Liu Yang and Rong Jin. 2006. Distance metric learning: A comprehensive survey. Michigan State Universiy 2, 2 (2006), 4.Google ScholarGoogle Scholar

Index Terms

  1. Measuring information gain using provenance

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      TaPP '22: Proceedings of the 14th International Workshop on the Theory and Practice of Provenance
      June 2022
      67 pages
      ISBN:9781450393492
      DOI:10.1145/3530800

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 June 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      TaPP '22 Paper Acceptance Rate10of17submissions,59%Overall Acceptance Rate10of17submissions,59%
    • Article Metrics

      • Downloads (Last 12 months)24
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader