Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

An intelligent, uncertainty driven aggregation scheme for streams of ordered sets

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Data streams management has attracted the attention of many researchers during the recent years. The reason is that numerous devices generate huge amounts of data demanding an efficient processing scheme for delivering high quality applications. Data are reported through streams and stored into a number of partitions. Separation techniques facilitate the parallel management of data while intelligent methods are necessary to manage these multiple instances of data. Progressive analytics over huge amounts of data could be adopted to deliver partial responses and, possibly, to save time in the execution of applications. An interesting research domain is the efficient management of queries over multiple partitions. Usually, such queries demand responses in the form of ordered sets of objects (e.g., top-k queries). These ordered sets include objects in a ranked order and require novel mechanisms for deriving responses based on partial results. In this paper, we study a setting of multiple data partitions and propose an intelligent, uncertainty driven decision making mechanism that aims to respond to streams of queries. Our mechanism delivers an ordered set of objects over a number of partial ordered subsets retrieved by each partition of data. We envision that a number of query processors are placed in front of each partition and report progressive analytics to a Query Controller (QC). The QC receives queries, assigns the task to the underlying processors and decides the right time to deliver the final ordered set to the application. We propose an aggregation model for deriving the final ordered set of objects and a Fuzzy Logic (FL) inference process. We present a Type-2 FL system that decides when the QC should stop aggregating partial subsets and return the final response to the application. We report on the performance of the proposed mechanism through the execution of a large set of experiments. Our results deal with the throughput of the QC, the quality of the final ordered set of objects and the time required for delivering the final response.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://grouplens.org/datasets/movielens/.

  2. http://research.microsoft.com/en-us/projects/urbancomputing/default.aspx#datasets.

References

  1. Abadi DJ, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik SB (2003) Aurora: a new model and architecture for data stream management. VLDB J 12(2)

  2. Agarwal S, Milner H, Kleiner A, Talwalkar A, Jordan M, Madden S, Mozafari B, Stoica I (2014) Knowing when you’re wrong: building fast and reliable approximate query processing systems. ACM SIGMOD, USA

  3. Ailon N Aggregation of partial rankings, p-Ratings and top-m lists. Algorithmica 57(2):284–300

  4. Arasu A, Babcock B, Babu S, Cieslewicz J, Datar M, Ito K, Motwani R, Srivastava U, Widom J (2004) ’STREAM: The Stanford Data Stream Management System’. Springer

  5. Aslam J, Montague M (2001) Models for metasearch. In: Proceedings of SIGIR

  6. Babcock B, Olston C (2003) Distributed top-k monitoring. In: 22nd ACM SIGMOD

  7. Bohm C, Ooi BC, Plant C, Yan Y (2007) Efficiently processing continuous k-NN queries on data streams. In: ICDE

  8. Brook D, Evans DA (1972) An approach to the probability distribution of the Cusum Run Length. Biometrika 59(3):539– 549

    Article  MathSciNet  MATH  Google Scholar 

  9. Chandramouli B, Goldstein J, Quamar A (2013) Scalable progressive analytics on big data in the cloud. Proc VLDB endowment 6(14)

  10. Chandrasekaran S, Franklin MJ (2003) PSoup: a system for streaming queries over streaming data. VLDB J 12(2):140–156

    Article  Google Scholar 

  11. Charikar M, Chen K, Farach-Colton M (2002) Finding frequent items in data streams. In: 29th ICALP

  12. Chaudhuri S, Das G, Srivastava U (2004) Effective use of block-level sampling in statistics estimation. In: SIGMOD

  13. Chen T, Chen L, Ozsu MT, Xiao N (2013) Optimizing multi-top-k queries over uncertain data streams. IEEE Trans Knowl Data Eng 25(8)

  14. Condie T, Conway N, Alvaro P, Hellerstein JM, Elmeleegy K, Sears R (2010) MapReduce online. In: Proceedings of the 7th conference on networked systems design and implementation

  15. Cranor C, Johnson T, Spataschek O, Shkapenyuk V (2003) Gigascope: a stream database for network applications. In: Proceedings of the ACM international conference on management of data. SIGMOD

  16. Das G, Gunopulos D, Koudas N, Sarkas N (2007) Ad-hoc top-k Query answering for data streams. In: VLDB

  17. Demaine E, Lopez-Ortiz A, Munro J (2002) Frequency estimation of internet packet streams with limited space. In: 10th ESA annual european symposium on algorithms

  18. Dittman DJ, Khoshgoftaar TM, Wald R, Napolitano A (2013) Classification performance of rank aggregation techniques for ensemble gene selection

  19. Doucet A, Briers M, Senecal S (2006) Efficient block sampling strategies for sequential Monte Carlo methods

  20. Durbin J (1960) The fitting of time series models. Rev Inst Int Stat 28:233–243

    Article  MATH  Google Scholar 

  21. Fagin R (2002) Combining fuzzy information: an overview. In: ACM SIGMOD record, pp 109–118

  22. Fagin R, Kumar R, Sivakumar D (2003a) Comparing top k lists. In: Proceedings of the 14th annual ACM-SIAM symposium on discrete algorithms, pp 28–36

  23. Fagin R, Lotem A, Naor M (2003b) Optimal aggregation algorithms for middleware. JCSS 66(4):614656

    MathSciNet  MATH  Google Scholar 

  24. Fernndez-Berni J, Carmona-Galn R, Martnez-Carmona JF, Rodrguez-Vzquez (2012) Early forest fire detection by vision-enabled wireless sensor networks, vol 21

  25. Fisne A, Kuzu C, Hudaverdi T (2011) Prediction of environmental impacts of quarry blasting operation using fuzzy logic. Environ Monit Assess 174:461–470

    Article  Google Scholar 

  26. Ge T, Zdonik S, Madden S (2009) Top-k queries on uncertain data: on score distribution and typical answers. In: SIGMOD ’09. Providence, USA

  27. Gouveia C, Fonseca A (2008) New approaches to environmental monitoring: the use of ICT to explore volunteered geographic information. GoeJ 72:185–197

    Google Scholar 

  28. Haghani P, Michel S, Aberer K (2009) Evaluating top-k queries over incomplete data streams

  29. Hammad MA, Ghanem TM, Aref WG, Elmagarmid AK, Mokbel MF (2003) Efficient pipelined execution of sliding-window queries over data streams. In: Technical report TR CSD-03-035, Purdue University Department of Computer Sciences

  30. Han X, Wang M, Zhang X, Meng X (2012) Differentially private top-k query over map-reduce. In: CloudDB ’12, Maui

  31. Haury A-C, Gestraud P, Vert J-P (2011) The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PLoS ONE 6(12)

  32. Hellerstein JM, Avnur R (2000) Informix under control: online query Processing. Data Mining and Knowledge Discovery Journal

  33. Hua M, Pei J (2009) Continuously monitoring top-k uncertain data streams: a probabilistic threshold method. Distributed Parallel Databases 26:29–65

    Article  Google Scholar 

  34. Ilyas IF, Beskales G, Soliman M (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4)

  35. Jermaine C, Arumugam S, Pol A, Dobra A (2007) Scalable approximate query processing with the DBO engine. In: SIGMOD

  36. Jin C, Yi K, Chen L, Xu J, Lin X (2010) Sliding window top-k queries on uncertain streams. VLDB J

  37. Kendall MG (1955) Rank correlation methods. Hafner Publishing Co, New York

  38. Klementiev A, Roth D, Small K, Titov I (2009) Unsupervised rank aggregation with domain-specific expertise. In: Proceedings of the 21st international joint conference on artificial intelligence, pp 1101–1106

  39. Kolomvatsos K, Anagnostopoulos C, Hadjiefthymiades S (2015) An efficient time-optimized scheme for progressive analytics in big data. Elsevier Big Data Research 2(4)

  40. Kolomvatsos K, Anagnostopoulos C, Hadjiefthymiades S (2015) A time optimized scheme for top-k list maintenance over incomplete data streams. Elsevier Information Sciences (INS) 311:59–73

    Article  Google Scholar 

  41. Kumar R, Punera K, Suel T, Vassilvitskii S (2009) Top-k aggregation using intersections of ranked inputs. In: Proceedings of the WSDM

  42. Levinson N (1947) The wiener RMS error criterion in filter design and prediction. J Math Phys 25:261–278

    Article  MathSciNet  Google Scholar 

  43. Logothetis D, Yocum K (2008) Ad-hoc data processing in the cloud, vol 1, pp 1472–1475

  44. Mamoulis N, Yiu ML, Cheng KH, Cheung DW (2007) Efficient top-k aggregation of ranked inputs. ACM Trans Database Syst 32(3)

  45. Metwally A, Agraval D, Abbadi AE (2005) Efficient computation of frequent and top-k elements in data streams. In: ICDT

  46. Mokbel M, Xiong X, Hammad M, Aref W (2005) Continuous query processing of spatio-temporal data streams in PLACE. Geoinformatics 9(4)

  47. Motwani R, Widom J, Arasu A, Babcock B, Babu S, Datar M, Manku GS, Olston C, Rosenstein J, Varma R (2003) Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the international conference on innovative data systems research. CIDR

  48. Mouratidis K, Bakiras S, Papadias D (2006) Continuous monitoring of top-k queries over sliding windows. In: SIGMOD

  49. Meumayer R, Mayer R, Norvag K (2011) Combination of feature selection methods for text categorization. In: Clough P, Foley C, Gurrin, Jones G, Kraaij W, Lee H, Mudoch V (eds) Advances in information retieval, vol. 661 of lecture notes in computer science. Springer, Berlin, pp 763–766

  50. Nepal S, Ramakrishna MV (1999) Query processing issues in image (multimedia) databases. In: ICDE

  51. Nguyen HTH, Cao J (2014) Trustworthy answers for top-k queries on uncertain big data in decision making. Elsevier Information Sciences, In Press

  52. Raman V, Raman B, Hellerstein JM (1999) Online dynamic reordering for interactive data processing. In: VLDB

  53. Segaran TT (2007) Programming collective intelligence: building smart web 2.0 applications. O Reilly Media, Sebastopol

  54. Yang D, Shastri A, Rundensteiner EA, Ward MO (2011) An optimal strategy for monitoring top-k queries in streaming windows. In: EDBT/ICDT

  55. Yao Y, Gehrke J (2002) The cougar approach to in-network query processing in sensor networks. SIGMOD Record 31(3)

  56. Zheng Y, Liu F, Hsieh H-P (2013) U-air: when urban air quality inference meets big data. In: Proceedings of the KDD , Chicago

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kostas Kolomvatsos.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kolomvatsos, K. An intelligent, uncertainty driven aggregation scheme for streams of ordered sets. Appl Intell 45, 713–735 (2016). https://doi.org/10.1007/s10489-016-0789-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-016-0789-8

Keywords