Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Flash-Based Storage Deduplication Techniques: A Survey

Authors Info & Claims
Published:01 July 2019Publication History
Skip Abstract Section

Abstract

Exponential growth of the amount of data stored worldwide together with high level of data redundancy motivates the active development of data deduplication techniques. The overall increasing popularity of solid-state drives (SSDs) as primary storage devices forces the adaptation of deduplication techniques to technical peculiarities of this type of storage (such as write amplification and wearout), implying active research in SSD-equipped storage data deduplication subdomain. In this survey paper the authors summarize the recent results on deduplication in SSD-enhanced storage, providing a novel taxonomy of the techniques. They classify the techniques on the basis of storage device complexity, starting from a sub-device level up to the storage network. Linux deduplication implementations are discussed, and the results of experimental comparison of several widely used tools are presented. Finally, the authors briefly outline open problems in the field and possible points of future research.

References

  1. Ajdari, M., Park, P., Kwon, D., Kim, J., & Kim, J. (2018). A Scalable HW-Based Inline Deduplication for SSD Arrays. IEEE Computer Architecture Letters, 17(1), 47–50. doi:10.1109/LCA.2017.2753258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Albireo virtual data optimizer (vdo) on drbd. (n.d.). Linbit. Retrieved from https://www.linbit.com/en/albireo-virtual-data-optimizer-vdo-on-drbd/Google ScholarGoogle Scholar
  3. Bowling, J. (2013). Opendedup: open-source deduplication put to the test. Linux Journal, (228), 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chen, X., Chen, W., Lu, Z., Long, P., Yang, S., & Wang, Z. (2017). A duplication-aware ssd-based cache architecture for primary storage in virtualization environment. IEEE Systems Journal, 11(4), 2578–2589. doi:10.1109/JSYST.2015.2494377.Google ScholarGoogle ScholarCross RefCross Ref
  5. Chen, Z., Chen, Z., Xiao, N., & Liu, F. (2015). Nf-dedupe: A novel no-fingerprint deduplication scheme for flash-based ssds. In 2015 IEEE symposium on computers and communication (ISCC) (pp. 588–594). New York: IEEE. doi:10.1109/ISCC.2015.7405578. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. ChuanW. B.RenS. Q.KeohS. L.AungK. M. M. (2015). Flexible yet secure de-duplication service for enterprise data on cloud storage. In International Conference on Cloud Computing Research and Innovation (ICCCRI) (pp. 37-44). IEEE. 10.1109/ICCCRI.2015.11 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Data deduplication and compression with vdo. (n.d.). Redhat. Retrieved from https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/storage_administration_guide/vdoGoogle ScholarGoogle Scholar
  8. Dirik, C., & Jacob, B. (2009). The performance of pc solid-state disks (ssds) as a function of bandwidth, concurrency, device architecture, and system organization. SIGARCH Comput. Archit. News, 37(3), 279–289. doi:10.1145/1555815.1555790. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. DuY.ZhangY.XiaoN. (2014). R-Dedup: content aware redundancy management for SSD-based RAID systems. In 43rd International Conference on Parallel Processing (ICPP) (pp. 111-120). IEEE. 10.1109/ICPP.2014.20 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Freudenberger, J., Rajab, M., Rohweder, D., & Safieh, M. (2018). A codec architecture for the compression of short data blocks. Journal of Circuits, Systems, and Computers, 27(2), 1850019. doi:10.1142/S0218126618500196.Google ScholarGoogle ScholarCross RefCross Ref
  11. Freudenbrger, J., Beck, A., & Rajab, M. (2015). A data compression scheme for reliable data storage in non-volatile memories. In 2015 IEEE 5th international conference on consumer electronics, Bilbao, Spain (pp. 139-142). EHU Press. 10.1109/ICCE-Berlin.2015.7391216Google ScholarGoogle ScholarCross RefCross Ref
  12. Ha, J.-Y., Lee, Y.-S., & Kim, J.-S. (2013). Deduplication with block-level content-aware chunking for solid state drives (SSDs). In 2013 IEEE 15TH international conference on high performance computing and communications & 2013 IEEE international conference on embedded and ubiquitous computing (HPCC EUC) (pp. 1982–1989). New York: IEEE.Google ScholarGoogle Scholar
  13. Heo, H., Ahn, C., & Kim, D. H. (2016). Parity Data De-Duplication in All Flash Array-Based OpenStack Cloud Block Storage. IEICE Transactions on Information and Systems, 99(5), 1384–1387.Google ScholarGoogle ScholarCross RefCross Ref
  14. Hua, Y., Liu, X., & Feng, D. (2013). Smart in-network deduplication for storage-aware SDN. Computer Communication Review, 43(4), 509–510. doi:10.1145/2534169.2491714. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Huang, W., Chen, C., Chen, Y., & Chen, C. (2005). A compression layer for NAND type flash memory systems. In X. He, T. Hintz, M. Piccardi et al. (Eds.), Third International Conference on Information Technology and Applications (Vol. 1, pp. 599-604). Los Alamitos, CA: IEEE Computer Society. 10.1109/ICITA.2005.5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Huang, W. T., Chen, C. T., & Chen, C. H. (2007). The real-time compression layer for flash memory in mobile multimedia devices. In S. Kim, J. Park, N. Pissinou et al. (Eds.), MUE: 2007 International conference on multimedia and ubiquitous engineering, proceedings. Los Alamitos, CA: IEEE Computer soc. 10.1109/MUE.2007.206 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jagmohan, A., Franceschini, M., & Lastras, L. (2010). Write amplification reduction in NAND flash through multi-write coding. In M. Khatibm, X. He & M. Factor (Eds.), 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), IEEE Symposium on Mass Storage Systems-Proceedings. New York: IEEE. 10.1109/MSST.2010.5496985 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kaplan, R., Yavits, L., Morad, A., & Ginosar, R. (2016). Deduplication in resistive content addressable memory based solid state drive. In 2016 26th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) (pp. 100-106). IEEE. 10.1109/PATMOS.2016.7833432Google ScholarGoogle ScholarCross RefCross Ref
  19. Kilvansky, M. (2004). A thorough introduction to flexclone volumes. NetApp.Google ScholarGoogle Scholar
  20. Kim, J., Lee, C., Lee, S., Son, I., Choi, J., Yoon, S., . . . Cha, J. (2012). Deduplication in SSDs: Model and quantitative analysis. In 2012 IEEE 28th symposium on mass storage systems and technologies (MSST). New York: IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  21. Kim, K., Jung, S., & Song, Y. H. (2011). Compression ratio based hot/cold data identification for flash memory. In IEEE International conference on consumer electronics (ICCE 2011) (pp. 33-34) New York, USA. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  22. Kim, T., Lee, S., and Kim, J. (2017). FineDedup: A fine-grained deduplication technique for extending lifetime of flash-based SSDs. Journal of semiconductor technology and science, 17(5):648-659.Google ScholarGoogle ScholarCross RefCross Ref
  23. Kim, T., Lee, S., Park, J., & Kim, J. (2016). Efficient lifetime management of SSD-based RAIDs using dedup-assisted partial stripe writes. In 2016 5TH Non-volatile memory systems and applications symposium (NVMSA). New York: IEEE.Google ScholarGoogle Scholar
  24. Kjelso, M., & Jones, S. (1995). Memory management in flash-memory disks with data compression. In H. Baker (Eds.), Memory management (pp. 399-413). Springer-Verlag Berlin. doi:10.1007/3-540-60368-9_36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lee, S., Kim, T., Park, J.-S., & Kim, J. (2013). An integrated approach for managing the lifetime of flash-based SSDs. In Design, automation & test in Europe, Design Automation and Test in Europe Conference and Exhibition (pp. 1522-1525). New York: Assoc computing machinery. doi:10.7873/DATE.2013.309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lee, S., Park, J., Fleming, K., Arvind, , & Kim, J. (2011). Improving performance and lifetime of solid-state drives using hardware-accelerated compression. IEEE Transactions on Consumer Electronics, 57(4), 1732–1739. doi:10.1109/TCE.2011.6131148.Google ScholarGoogle ScholarCross RefCross Ref
  27. LiC.WangS.XiaoC.ZhouX.WuG. (2014). MMD: an approach to improve reading performance in deduplication systems. In 9th IEEE International Conference on Networking, Architecture, and Storage (NAS) (pp. 93-97). IEEE. 10.1109/NAS.2014.21 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Li, W., Jean-Baptise, G., Riveros, J., Narasimhan, G., Zhang, T., & Zhao, M. (2016). Cachededup: In-line deduplication for flash caching. In 14th Usenix conference on file and storage technologies (FAST‘16) (pp. 301-314). Berkeley, CA: USENIX ASSOC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Li, Y., Wang, Y., Jiang, A. A., & Bruck, J. (2012). Content-assisted file decoding for nonvolatile memories. In M. Matthews (Ed.), 2012 conference record of the forty sixth asilomar conference on signals, systems and computers (ASILOMAR) (pp. 937-941). New York: IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  30. Lim, S.-H., & Jeong, Y.-S. (2014). Journaling deduplication with invalidation scheme for flash storage-based smart systems. Journal of Systems Architecture, 60(8), 684–692. doi:10.1016/j.sysarc.2014.04.002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lin, L., Xiao, K., & Liu, W. (2016). Utilizing SSD to alleviate chunk fragmentation in de-duplicated backup systems. In 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS) (pp. 616-624). IEEE. 10.1109/ICPADS.2016.0087Google ScholarGoogle ScholarCross RefCross Ref
  32. Liu, J., Chai, Y., Qin, X., & Xiao, Y. (2014). PLC-cache: Endurable SSD cache for deduplication-based primary storage. In 2014 30th symposium on massive storage systems and technologies (MSST). New York. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  33. Liu, J., Chai, Y., Yan, C., & Wang, X. (2016). A delayed container organization approach to improve restore speed for deduplication systems. IEEE Transactions on Parallel and Distributed Systems, 27(9), 2477–2491. doi:10.1109/TPDS.2015.2509060. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Liu, J., Chai, Y.-P., Qin, X., & Liu, Y.-H. (2018). Endurable SSD-based read cache for improving the performance of selective restore from deduplication systems. Journal of Computer Science and Technology, 33(1), 58–78. doi:10.1007/s11390-018-1808-5.Google ScholarGoogle ScholarCross RefCross Ref
  35. Ma, J., Stones, R. J., Ma, Y., Wang, J., Ren, J., Wang, G., & Liu, X. (2017). Lazy exact deduplication. ACM Transactions on Storage, 13(2), 1–26. doi:10.1145/3078837. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ma, J., Wang, G., & Liu, X. (2016). DedupeSwift: object-oriented storage system based on data deduplication. In 2016 IEEE Trustcom/BigDataSE/I SPA (pp. 1069-1076). IEEE. doi:10.1109/TrustCom.2016.0177.Google ScholarGoogle Scholar
  37. Mandagere, N., Zhou, P., Smith, M. A., & Uttamchandani, S. (2008). Demystifying data deduplication. Companion (Gloucester), 8, 12–17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Manogar, E., & Abirami, S. (2014). A study on data deduplication techniques for optimized storage. In 2014 Sixth International Conference on Advanced computing (ICoAC) (pp. 161-166). IEEE. 10.1109/ICoAC.2014.7229702Google ScholarGoogle ScholarCross RefCross Ref
  39. Mao, B., Jiang, H., Wu, S., Fu, Y., & Tian, L. (2012). SAR: SSD assisted restore optimization for deduplication-based storage systems in the cloud. In IEEE 7th International Conference on Networking, Architecture and Storage (NAS) (pp. 328-337). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Mao, B., Jiang, H., Wu, S., Fu, Y., & Tian, L. (2014a). Read-performance optimization for deduplication-based storage systems in the cloud. ACM Transactions on Storage, 10(2), 1–22. doi:10.1145/2512348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Mao, B., Jiang, H., Wu, S., & Tian, L. (2014b). POD: performance oriented I/O deduplication for primary storage systems in the cloud. In IEEE 28th International Parallel and Distributed Processing Symposium (pp. 767-776). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Meister, D., & Brinkmann, A. (2010). dedupv1: Improving deduplication throughput using solid state drives (SSD). In M. Khatibm, X. He, and M. Factor (Eds.), 2010 IEEE 26TH symposium on mass storage systems and technologies (MSST). New York. IEEE. 10.1109/MSST.2010.5496992 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. MeyerD. T.BoloskyW. J. (2011). A study of practical deduplication. In Proceedings of the 9th USENIX Conference on File and Storage Technologies, FAST’11. Berkeley, CA: USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Paik, J.-Y., Chung, T.-S., & Cho, E.-S. (2015). Application-aware deduplication for performance improvement of flash memory. Design Automation for Embedded Systems, 19(1-2), 161–188. doi:10.1007/s10617-014-9142-9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Park, E. and Shin, D. (2015). Offline deduplication for solid state disk using a lightweight hash algorithm. JSTS: journal of semiconductor technology and science, 15(5), 539-545.Google ScholarGoogle Scholar
  46. Park, J., Lee, S., & Kim, J. (2017). DAC: Dedup-assisted compression scheme for improving lifetime of NAND storage systems. In Proc. of the 2017 design, automation & test in Europe conference & exhibition (DATE), Design Automation and Test in Europe Conference and Exhibition (pp. 1249–1252). New York: IEEE. doi:10.23919/DATE.2017.7927181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Park, Y., & Kim, J.-S. (2011). zFTL: Power-efficient data compression support for NAND flash-based consumer electronics devices. IEEE Transactions on Consumer Electronics, 57(3), 1148–1156. doi:10.1109/TCE.2011.6018868.Google ScholarGoogle ScholarCross RefCross Ref
  48. Paulo, J., & Pereira, J. (2014). A survey and classification of storage deduplication systems. ACM Computing Surveys, 47(1), 1–30. doi:10.1145/2611778. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Peng, B., Jin, X., Wang, T., & Du, X. (2015). Design of a distributed compressor for astronomy ssd. In 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). New York: IEEE. 10.1109/FCCM.2015.29 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Seagate. (2017). Data age 2025: The evolution of data to life-critical.Google ScholarGoogle Scholar
  51. Seo, B.-K., Maeng, S., Lee, J., & Seo, E. (2015). DRACO: A deduplicating FTL for tangible extra capacity. IEEE Computer Architecture Letters, 14(2), 123–126. doi:10.1109/LCA.2014.2350984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Seo, M.-K., & Lim, S.-H. (2010). Deduplication flash file system with PRAM for non-linear editing. IEEE Transactions on Consumer Electronics, 56(3), 1502–1510. doi:10.1109/TCE.2010.5606289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Shiming, W., Zhiyong, X., Yao, Z., & Chengyu, F. (2015). PCIE interface design for high-speed image storage system based on SSD. In C. Tang, S. Chen, and X. Tang (Eds.), 20th international symposium on high-power laser systems and applications 2014, Bellingham, WA USA. SPIE-INT socoptical engineering.Google ScholarGoogle Scholar
  54. Shin, Y., Koo, D., & Hur, J. (2017). A survey of secure data deduplication schemes for cloud storage systems. ACM Computing Surveys, 49(4), 1–38. doi:10.1145/3017428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Wei, D., Gong, Y., Qiao, L., & Deng, L. (2014). A Hardware-Software Co-design Experiments Platform for NAND Flash Based on Zynq. In 2014 IEEE 20th international conference on embedded and real-time computing systems and applications (RTCSA). New York: IEEE.Google ScholarGoogle Scholar
  56. Xia, W., Jiang, H., Feng, D., Douglis, F., Shilane, P., Hua, Y., & Zhou, Y. et al. (2016). A comprehensive study of the past, present, and future of data deduplication. Proceedings of the IEEE, 104(9), 1681–1710. doi:10.1109/JPROC.2016.2571298.Google ScholarGoogle ScholarCross RefCross Ref
  57. Xie, N., Dong, G., & Zhang, T. (2011). Using lossless data compression in data storage systems: Not for saving space. IEEE Transactions on Computers, 60(3), 335–345. doi:10.1109/TC.2010.150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yim, K., Koh, K., & Bahn, H. (2003). A compressed page management scheme for NAND-type flash memory. In H. Arabnia & L. Yang (Eds.), VLSI’03: Proceedings of the international conference on VLSI, Athens, GA (pp. 266-271). CSREA Press.Google ScholarGoogle Scholar
  59. Zhang, B., Wang, C., Zhou, B. B., Yuan, D., & Zomaya, A. Y. (2018). DCDedupe: Selective deduplication and delta compression with effective routing for distributed storage. Journal of Grid Computing, 16(2), 195–209. doi:10.1007/s10723-018-9429-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Zhang, B., Wang, C., Zhou, B. B., & Zomaya, A. Y. (2015). Inline data deduplication for SSD-based distributed storage. In IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS) (pp. 593-600). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Zhang, X., Li, J., Wang, H., Zhao, K., & Zhang, T. (2016). Reducing solid-state storage device write stress through opportunistic in-place delta compression. In 14TH USENIX Conference on file and storage technologies (FAST ‘16) (pp. 111-124). Berkeley, CA: USENIX ASSOC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Zhao, X., Zhang, Y., Wu, Y., Chen, K., Jiang, J., & Li, K. (2014). Liquid: A scalable deduplication file system for virtual machine images. IEEE Transactions on Parallel and Distributed Systems, 25(5), 1257–1266. doi:10.1109/TPDS.2013.173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Zhou, R., Liu, M., & Li, T. (2013). Characterizing the efficiency of data deduplication for big data storage management. In 2013 IEEE International symposium on workload characterization (IISWC 2013) (pp. 98-108). New York: IEEE. 10.1109/IISWC.2013.6704674Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics