ABSTRACT
The ability to deal with uncertain information is becoming increasingly important for modern database applications. Whereas a conventional (certain) object is usually represented by a vector from a multidimensional feature space, an uncertain object is represented by a multivariate probability density function (PDF). This PDF can be defined either discretely (e.g. by a histogram) or continuously in parametric form (e.g. by a Gaussian Mixture Model). For a database of uncertain objects, the users expect similar data analysis techniques as for a conventional database of certain objects. An important analysis technique for certain objects is the skyline operator which finds maximal or minimal vectors with respect to any possible attribute weighting. In this paper, we propose the concept of probabilistic skylines, an extension of the skyline operator for uncertain objects. In addition, we propose efficient and effective methods for determining the probabilistic skyline of uncertain objects which are defined by a PDF in parametric form (e.g. a Gaussian function or a Gaussian Mixture Model). To further accelerate the search, we elaborate how the computation of the probabilistic skyline can be supported by an index structure for uncertain objects. An extensive experimental evaluation demonstrates both the effectiveness and the efficiency of our technique.
- R. S. Blum, Y. Zhang, B. M. Sadler, and R. J. Kozick. Approximation of correlated nongaussian noise pdfs using gaussian mixture models, published. In American University, Washington DC, 1999.Google Scholar
- C. Böhm, A. Pryakhin, and M. Schubert. The gauss-tree: Efficient object identification in databases of probabilistic feature vectors. In ICDE, page 9, 2006. Google ScholarDigital Library
- S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001. Google ScholarDigital Library
- R. Cheng, D. V. Kalashnikov, and S. Prabhakar. Evaluating probabilistic queries over imprecise data. In SIGMOD, pages 551--562, 2003. Google ScholarDigital Library
- R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J. S. Vitter. Efficient indexing methods for probabilistic threshold queries over uncertain data. In VLDB, pages 876--887, 2004. Google ScholarDigital Library
- J. Chomicki, P. Godfrey, J. Gryz, and D. Liang. Skyline with presorting: Theory and optimizations. In Intelligent Information Systems, pages 595--604, 2005.Google Scholar
- N. N. Dalvi and D. Suciu. Answering queries from statistics and probabilistic views. In VLDB, pages 805--816, 2005. Google ScholarDigital Library
- A. Faradjian, J. Gehrke, and P. Bonnet. Gadt: A probability space adt for representing and querying the physical world. In ICDE, pages 201--211, 2002. Google ScholarDigital Library
- D. Kossmann, F. Ramsak, and S. Rost. Shooting stars in the sky: An online algorithm for skyline queries. In VLDB, pages 275--286, 2002. Google ScholarDigital Library
- Q. Li, B. Moon, and I. Lopez. Skyline index for time series data. IEEE Transactions on Knowledge and Data Engineering, 16(6):669--684, 2004. Google ScholarDigital Library
- X. Lin, Y. Yuan, W. Wang, and H. Lu. Stabbing the sky: Efficient skyline computation over sliding windows. In ICDE, pages 502--513, 2005. Google ScholarDigital Library
- K. Lu, Y. Qian, D. Rodríguez, W. Rivera, and M. Rodriguez. Wireless sensor networks for environmental monitoring applications: A design framework. In GLOBECOM, pages 1108--1112, 2007.Google ScholarCross Ref
- A. M. Mainwaring, D. E. Culler, J. Polastre, R. Szewczyk, and J. Anderson. Wireless sensor networks for habitat monitoring. In WSNA, pages 88--97, 2002. Google ScholarDigital Library
- D. Papadias, Y. Tao, G. Fu, and B. Seeger. An optimal and progressive algorithm for skyline queries. In SIGMOD, pages 467--478, 2003. Google ScholarDigital Library
- J. Pei, B. Jiang, X. Lin, and Y. Yuan. Probabilistic skylines on uncertain data. In VLDB, pages 15--26, 2007. Google ScholarDigital Library
- B. Sarikaya, M. A. Alim, and S. Rezaei. Integrating wireless eegs into medical sensor networks. In IWCMC, pages 1369--1374, 2006. Google ScholarDigital Library
- A. D. Sarma, O. Benjelloun, A. Y. Halevy, and J. Widom. Working models for uncertain data. In ICDE, page 7, 2006. Google ScholarDigital Library
- K.-L. Tan, P.-K. Eng, and B. C. Ooi. Efficient progressive skyline computation. In VLDB, pages 301--310, 2001. Google ScholarDigital Library
- Y. Tao, R. Cheng, X. Xiao, W. K. Ngai, B. Kao, and S. Prabhakar. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In VLDB, pages 922--933, 2005. Google ScholarDigital Library
- A. K. H. Tung, Z. Huang, H. Lu, and B. C. Ooi. Continuous skyline queries for moving objects. IEEE Transactions on Knowledge and Data Engineering, 18(12):1645--1658, 2006. Google ScholarDigital Library
- D.-L. Yu and D.-W. Yu. Detecting sensor faults for a chemical reactor rig via adaptive neural network model. In ISNN (3), pages 544--549, 2005. Google ScholarDigital Library
Index Terms
- Probabilistic skyline queries
Recommendations
Probabilistic skyline queries on uncertain time series
The uncertainty of data is popular and inherent in most applications. Although skyline queries on time series in the interval has attracted great interest in recent years, skyline queries on uncertain time series remains an open problem so far.To handle ...
Probabilistic aggregate skyline join queries: skylines with aggregate operations over existentially uncertain relations
SSDBM '15: Proceedings of the 27th International Conference on Scientific and Statistical Database ManagementThe multi-criteria decision making, made possible by the advent of skyline queries, has been successfully applied in many areas. Though most of the earlier work is concerned with only a single relation, several real world applications require finding ...
Continuous probabilistic skyline queries for uncertain moving objects
CAR'10: Proceedings of the 2nd international Asia conference on Informatics in control, automation and robotics - Volume 1In this paper, an efficient algorithm U-CPSQ is used to handle continuous probabilistic skyline queries. The main idea is as following: Firstly, according to the new probabilistic dominance relation defined in this paper, it is possible for us to ...
Comments