Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A new fragment re-allocation strategy for NoSQL database systems

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

NoSQL databases are famed for the characteristics of high scalability, high availability, and high fault-tolerance. So NoSQL databases are used in a lot of applications. The data partitioning strategy and fragment allocation strategy directly affect NoSQL database systems’ performance. The data partition strategy of large, global databases is performed by horizontally, vertically partitioning or combination of both. In the general way the system scatters the related fragments as possible to improve operations’ parallel degree. But the operations are usually not very complicated in some applications, and an operation may access to more than one fragment. At the same time, those fragments which have to be accessed by an operation may interact with each other. The general allocation strategies will increase system’s communication cost during operations execution over sites. In order to improve those applications’ performance and enable NoSQL database systems to work efficiently, these applications’ fragments have to be allocated in a reasonable way that can reduce the communication cost i.e., to minimize the total volume of data transmitted during operations execution over sites. A strategy of clustering fragments based on hypergraph is proposed, which can cluster fragments which were accessed together in most operations to the same cluster. Themethod uses a weighted hypergraph to represent the fragments’ access pattern of operations. A hypergraph partitioning algorithmis used to cluster fragments in our strategy. This method can reduce the amount of sites that an operation has to span. So it can reduce the communication cost over sites. Experimental results confirm that the proposed technique will effectively contribute in solving fragments re-allocation problem in a specific application environment of NoSQL database system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bryant R, Katz R H, Lazowska E D. Big-data computing: creating revolutionary breakthroughs in commerce, science and society. Computing Community Consortium, 2008: 1–15

    Google Scholar 

  2. Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers A H. Big data: The Next Frontier For Innovation, Competition, and Productivity. MacKinsey Global Institute. 2011

    Google Scholar 

  3. Lohr S. The age of big data (2012). http://www.nytimes.com/2012/02/12/sunday-review/bigdatas-impact-in-the-world.html

    Google Scholar 

  4. Noguchi Y. Following digital breadcrumbs to big data gold (2011). http://www.npr.org/2011/11/29/142521910/the-digitalbreadcrumbsthat-lead-to-big-data

    Google Scholar 

  5. Noguchi Y. The search for analysts to make sense of big data (2011). http://www.npr.org/2011/11/30/142893065/the-searchforanalysts-to-make-sense-of-big-data

    Google Scholar 

  6. Kumar K A, Deshpande A, Khuller S. Data placement and replica selection for improving co-location in distributed environments. http://arxiv.org/abs/1302.4168

  7. Li X. Research of data allocation strategy in distributed database. Dissertation for the Master Degree. Dalian: Dalian University of Technology, 2009

    Google Scholar 

  8. Navathe S B, Ra M. Vertical partitioning for database design: a graphical algorithm. ACM SIGMOD Record, 1989, 18(2): 440–450

    Article  Google Scholar 

  9. Andrew B. Mlpart. http://vlsicad.ucsd.edu/GSRC/bookshelf/Slots/Partitioning/MLPart/.

  10. Caldwell A E, Kahng A B, Markov I L. Design and implementation of move-based heuristics for VLSI hypergraph partitioning. Journal of Experimental Algorithmics, 2000, 5: 5

    Article  Google Scholar 

  11. Karypis G, Aggarwal R, Kumar V, Shekhar S. Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 1999, 7(1): 69–79

    Article  Google Scholar 

  12. Alpert C J, Kahng A B. Recent directions in netlist partitioning: a survey, Integration, the VLSI Journal, 1995, 19(1): 1–81

    Article  MATH  Google Scholar 

  13. Karypis G, Kumar V, Multilevel k-way hypergraph partitioning. VLSI Design, 2000, 11(3): 285–300

    Article  Google Scholar 

  14. Selvakkumaran N, Karypis G. Multiobjective hypergraph partitioning algorithms for cut and maximum subdomain-degree minimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2006, 25(3): 504–517

    Article  Google Scholar 

  15. Liu D R, Shekhar S. Partitioning similarity graphs: a framework for declustering problems. Information Systems, 1996, 21(6): 475–496

    Article  Google Scholar 

  16. Yu P L, Lee Y R, Stam A. Multiple-criteria Decision Making: Concepts, Techniques, and Extensions. Plenum Press New York, 1985

    Book  MATH  Google Scholar 

  17. Cooley R, Mobasher B, Srivastava J. Web mining: information and pattern discovery on the world wide web. In: Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence. 1997, 558–567

    Chapter  Google Scholar 

  18. Karypis G, Han E H, Kumar V. Chameleon: hierarchical clustering using dynamic modeling. Computer, 1999, 32(8): 68–75

    Article  Google Scholar 

  19. Lakshman A, Malik P. Cassandra: structured storage system on a P2P network. In: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing. 2009

    Google Scholar 

  20. Lakshman A, Malik P. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35–40

    Article  Google Scholar 

  21. Sarathy R, Shetty B, Sen A. A constrained nonlinear 0-1 program for data allocation. European Journal of Operational Research, 1997, 102(3): 626–647

    Article  MATH  Google Scholar 

  22. Menon S. Allocating fragments in distributed databases. IEEE Transactions on Parallel and Distributed Systems, 2005, 16(7): 577–585

    Article  Google Scholar 

  23. Ram S, Narasimhan S. Database allocation in a distributed environment: incorporating a concurrency control mechanism and queuing costs. Management Science, 1994, 40(8): 969–983

    Article  MATH  Google Scholar 

  24. Karlapalem K, Pun N M. Query-driven data allocation algorithms for distributed database systems. In: Proceeding of the Database and Expert Systems Applications. 1997, 347–356

    Chapter  Google Scholar 

  25. Chaturvedi A R, Choubey A K, Roan J. Scheduling the allocation of data fragments in a distributed database environment: a machine learning approach. IEEE Transactions on Engineering Management, 1994, 41(2): 194–207

    Article  Google Scholar 

  26. Corcoran A L, Hale J. A genetic algorithm for fragment allocation in a distributed database system. In: Proceedings of the 1994 ACM Symposium on Applied Computing. 1994, 247–250

    Chapter  Google Scholar 

  27. March S T, Rho S. Allocating data and operations to nodes in distributed database design. IEEE Transactions on Knowledge and Data Engineering, 1995, 7(2): 305–317

    Article  Google Scholar 

  28. Apers P M. Data allocation in distributed database systems. ACM Transactions on Database Systems (TODS), 1988, 13(3): 263–304

    Article  Google Scholar 

  29. Ulus T, Uysal M. Heuristic approach to dynamic data allocation in distributed database systems. Pakistan Journal of Information and Technology, 2003, 2(3): 231–239

    Article  Google Scholar 

  30. Chin A G. Incremental data allocation and reallocation in distributed database systems. In: Proceedings of the Data Warehousing and Web Engineering. 2002, 137–160

    Chapter  Google Scholar 

  31. Abdalla H I. An efficient approach for data placement in distributed systems. In: Proceedings of the 5th IEEE FTRA International Conference on Multimedia and Ubiquitous Engineering. 2011, 297–301

    Google Scholar 

  32. Wang T, Lin Z, Yang B, Gao J, Huang A, Yang D, Zhang Q, Tang S, Niu J. Mba: A market-based approach to data allocation and dynamic migration for cloud database. Science China Information Sciences, 2012, 55(9): 1935–1948

    Article  Google Scholar 

  33. Du J, Barker K, Alhajj R. Attraction-a global affinity measure for database vertical partitioning. In: Proceedings of International Conference www/Internet. 2003, 538–548

    Google Scholar 

  34. Curino C, Jones E, Zhang Y, Madden S. Schism: a workload-driven approach to database replication and partitioning. Proceedings of the VLDB Endowment, 2010, 3(1–2): 48–57

    Article  Google Scholar 

  35. Quamar A, Kumar K A, Deshpande A. Sword: scalable workload-aware data placement for transactional workloads. In: Proceedings of the 16th International Conference on Extending Database Technology. 2013, 430–441

    Google Scholar 

  36. Quamar A. Scaling Transactional Workloads on the Cloud. Technical report MD. 2013

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhikun Chen.

Additional information

Zhikun Chen was born in 1985, received the MS degree in computer science from National University of Defense Technology, China in 2009. He is currently a PhD student of National University of Defense Technology. His researching interests are in the areas of distribute computing, cloud computing, massive data management, and big data processing.

Shuqiang Yang was born in 1968. He is a PhD advisor of Computer department of National University of Defense Technology, China. His researching interests are in the areas of distribute computing, cloud computing, massive data management, data mining, and big data processing.

Shuang Tan was born in 1984, received the MS degree in computer science from National University of Defense Technology, China in 2009. He is currently a PhD student of National University of Defense Technology. His researching interests are in the areas of cloud security, and network security.

Li He was born in 1984, received the MS degree in computer science from National University of Defense Technology, China in 2009. He is currently a PhD student of National University of Defense Technology. His researching interests are in the areas of network classification, data mining, and microblog analyzing.

Hong Yin was born in 1982, received the MS degree in Computer science from National University of Defense Technology, China in 2010. He is currently a PhD student of National University of Defense Technology. His researching interests are in the areas of satellite fault diagnosis, cloud computing, and massive data mining.

Ge Zhang was born in 1974. He is a senior engineer of Beijing Aeronautics Engineering Technology Research Center, China. His researching interests are in the areas of electronic engineer of aeronautics, and data processing of aeronautics.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Yang, S., Tan, S. et al. A new fragment re-allocation strategy for NoSQL database systems. Front. Comput. Sci. 9, 111–127 (2015). https://doi.org/10.1007/s11704-014-3480-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-014-3480-4

Keywords