Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Short Tail: taming tail latency for erasure-code-based in-memory systems

ShortTail:降低纠删码内存存储系统的尾部延迟

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

In-memory systems with erasure coding (EC) enabled are widely used to achieve high performance and data availability. However, as the scale of clusters grows, the server-level fail-slow problem is becoming increasingly frequent, which can create long tail latency. The influence of long tail latency is further amplified in EC-based systems due to the synchronous nature of multiple EC sub-operations. In this paper, we propose an EC-enabled in-memory storage system called Short Tail, which can achieve consistent performance and low latency for both reads and writes. First, Short Tail uses a lightweight request monitor to track the performance of each memory node and identify any fail-slow node. Second, Short Tail selectively performs degraded reads and redirected writes to avoid accessing fail-slow nodes. Finally, Short Tail posts an adaptive write strategy to reduce write amplification of small writes. We implement Short Tail on top of Memcached and compare it with two baseline systems. The experimental results show that Short Tail can reduce the P99 tail latency by up to 63.77%; it also brings significant improvements in the median latency and average latency.

摘要

为获得高性能和高数据可用性,基于纠删码的内存存储系统得到广泛应用。然而,随着集群规模不断增长,服务器级别的性能降级问题出现得越来越频繁,进而导致长尾延迟。在基于纠删码的系统中,由于一个纠删码操作可能依赖于多个子操作的同步完成,长尾延迟的影响被进一步放大。本文提出一种称为ShortTail的基于纠删码的内存存储系统,该系统可实现稳定的性能和较低的读写延迟。首先,ShortTail使用轻量请求监视器监测每个内存节点性能,以便及时发现性能降级节点。其次,ShortTail选择性执行降级读操作和重定向写操作,以避免访问性能降级节点。最后,ShortTail采用一种自适应写策略降低小写请求的写放大程度。本文在Memcached上实现了ShortTail,并将其与两个系统进行比较。实验结果表明,ShortTail最高可降低63.77%的99分位延迟,且显著改善中位延迟和平均延迟。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abebe M, Daudjee K, Glasbergen B, et al., 2018. EC-Store: bridging the gap between storage and latency in distributed erasure coded systems. Proc IEEE 38th Int Conf on Distributed Computing System, p.255–266. https://doi.org/10.1109/ICDCS.2018.00034

  • Andersen DG, Balakrishnan H, Kaashoek MF, et al., 2005. Improving web availability for clients with MONET. Proc 2nd Symp on Networked Systems Design and Implementation, p.115–128.

  • Balaji SB, Krishnan MN, Vajha M, et al., 2018. Erasure coding for distributed storage: an overview. Sci China, Inform Sci, 61(10):100301. https://doi.org/10.1007/s11432-018-9482-6

    Article  Google Scholar 

  • Cooper BF, Silberstein A, Tam E, et al., 2010. Benchmarking cloud serving systems with YCSB. Proc 1st ACM Symp on Cloud Computing, p.143–154. https://doi.org/10.1145/1807128.1807152

  • Dimakis AG, Godfrey PB, Wu YN, et al., 2010. Network coding for distributed storage systems. IEEE Trons Inform Theory, 56(9):4539–4551. https://doi.org/10.1109/TIT.2010.2054295

    Article  MATH  Google Scholar 

  • Dragojević A, Narayanan D, Hodson O, et al., 2014. FaRM: fast remote memory. Proc 11th USENIX Conf on Networked Systems Design and Implementation, p.401–414.

  • Dragojević A, Narayanan D, Nightingale EB, et al., 2015. No compromises: distributed transactions with consistency, availability, and performance. Proc 25th Symp on Operating Systems Principles, p.54–70. https://doi.org/10.1145/2815400.2815425

  • Fan B, Andersen DG, Kaminsky M, 2013. MemC3: compact and concurrent MemCache with dumber caching and smarter hashing. Proc 10th USENIX Conf on Networked Systems Design and Implementation, p.371–384.

  • Ford D, Labelle F, Popovici FI, et al., 2010. Availability in globally distributed storage systems. Proc 9th USENIX Conf on Operating Systems Design and Implementation, p.61–74.

  • Ganjam A, Jiang JC, Liu X, et al., 2015. C3: Internetscale control plane for video quality optimization. Proc 12th USENIX Conf on Networked Systems Design and Implementation, p.131–144.

  • Gunawi HS, Suminto RO, Sears R, et al., 2018. Fail-slow at scale: evidence of hardware performance faults in large production systems. Proc 16th USENIX Conf on File and Storage Technologies, p.1–14.

  • Hu YC, Niu D, 2016. Reducing access latency in erasure coded cloud storage with local block migration. Proc 35th Annual IEEE Int Conf on Computer Communications, p.1–9. https://doi.org/10.1109/INFOCOM.2016.7524628

  • Hu YC, Wang YS, Liu B, et al., 2017. Latency reduction and load balancing in coded storage systems. Symp on Cloud Computing, p.365–377. https://doi.org/10.1145/3127479.3131623

  • Hu YC, Cheng LF, Yao QR, et al., 2021. Exploiting combined locality for wide-stripe erasure coding in distributed storage. Proc 19th USENIX Conf on File and Storage Technologies, p.233–248.

  • Huang C, Simitci H, Xu YK, et al., 2012. Erasure coding in windows azure storage. USENIX Conf on Annual Technical Conf, p.2.

  • Huang P, Guo CX, Zhou LD, et al., 2017. Gray failure: the Achilles’ heel of cloud-scale systems. Proc 16th Workshop on Hot Topics in Operating Systems, p.150–155. https://doi.org/10.1145/3102980.3103005

  • Intel, 2015. Intel Announces Optane Storage Brand for 3D XPoint Products. https://www.anandtech.com/show/9541/intel-announces-optane-storage-brand-for-3d-xpoint-products [Accessed on Nov. 8, 2021].

  • Kalia A, Kaminsky M, Andersen DG, 2014. Using RDMA efficiently for key-value services. SIGCOMM Comput Commun Rev, 44(4):295–306. https://doi.org/10.1145/2740070.2626299

    Article  Google Scholar 

  • Kalia A, Kaminsky M, Andersen DG, 2016. FaSST: fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs. Proc 12th USENIX Symp on Operating Systems Design and Implementation, p.185–201.

  • Lamport L, 1998. The part-time parliament. ACM Trons Comput Syst, 16(2):133–169. https://doi.org/10.1145/279227.279229

    Article  MATH  Google Scholar 

  • Li C, Porto D, Clement A, et al., 2012. Making geo-replicated systems fast as possible, consistent when necessary. Proc 10th USENIX Conf on Operating Systems Design and Implementation, p.265–278.

  • Li XL, Li RH, Lee PPC, et al., 2019. OpenEC: toward unified and configurable erasure coding management in distributed storage systems. Proc 17th USENIX Conf on File and Storage Technologies, p.331–344.

  • Lin SY, Gong GW, Shen ZR, et al., 2021. Boosting full-node repair in erasure-coded storage. USENIX Annual Technical Conf, p.641–655.

  • Narayanan D, Donnelly A, Rowstron A, 2008. Write off-loading: practical power management for enterprise storage. ACM Trons Storoge, 4(3):10. https://doi.org/10.1145/1416944.1416949

    Google Scholar 

  • Nishtala R, Fugal H, Grimm S, et al., 2013. Scaling memcache at Facebook. Proc 10th USENIX Symp on Networked Systems Design and Implementation, p.385–398.

  • Ovsiannikov M, Rus S, Reeves D, et al., 2013. The quantcast file system. Proc VLDB Endow, 6(11):1092–1101. https://doi.org/10.14778/2536222.2536234

    Article  Google Scholar 

  • Pagh R, Rodler FF, 2004. Cuckoo hashing. J Algor, 51(2):122–144. https://doi.org/10.1016/j.jalgor.2003.12.002

    Article  MathSciNet  MATH  Google Scholar 

  • Pamies-Juarez L, Blagojevic F, Mateescu R, et al., 2016. Opening the chrysalis: on the real repair performance of MSR codes. Proc 14th USENIX Conf on File and Storage Technologies, p.81–94.

  • Plank JS, Huang C, 2013. Tutorial: erasure coding for storage applications. Proc 11th USENIX Conf on File and Storage Technologies.

  • Poke M, Hoefler T, 2015. DARE: high-performance state machine replication on RDMA networks. Proc 24th Int Symp on High-Performance Parallel and Distributed Computing, p.107–118. https://doi.org/10.1145/2749246.2749267

  • Rashmi KV, Nakkiran P, Wang JY, et al., 2015. Having your cake and eating it too: jointly optimal erasure codes for I/O, storage and network-bandwidth. Proc 13th USENIX Conf on File and Storage Technologies, p.81–94.

  • Rashmi KV, Chowdhury M, Kosaian J, et al., 2016. EC-Cache: load-balanced, low-latency cluster caching with online erasure coding. Proc 12th USENIX Conf on Operating Systems Design and Implementation, p.401–417.

  • Reed IS, Solomon G, 1960. Polynomial codes over certain finite fields. J Soc Ind Appl Math, 8(2):300–304. https://doi.org/10.1137/0108018

    Article  MathSciNet  MATH  Google Scholar 

  • Shah NB, Lee K, Ramchandran K, 2016. When do redundant requests reduce latency? IEEE Trans Commun, 64(2):715–722. https://doi.org/10.1109/TCOMM.2015.2506161

    Article  Google Scholar 

  • Stewart C, Chakrabarti A, Griffith R, 2013. Zoolander: efficiently meeting very strict, low-latency SLOs. Proc 10th Int Conf on Autonomic Computing, p.265–277.

  • Uluyol M, Huang A, Goel A, et al., 2020. Near-optimal latency versus cost tradeoffs in geo-distributed storage. Proc 17th USENIX Symp on Networked Systems Design and Implementation, p.157–180.

  • Vajha M, Ramkumar V, Puranik B, et al., 2018. Clay codes: moulding MDS codes to yield an MSR code. Proc 16th USENIX Conf on File and Storage Technologies, p.139–154.

  • Weil SA, Brandt SA, Miller EL, et al., 2006. Ceph: a scalable, high-performance distributed file system. Proc 7th Symp on Operating Systems Design and Implementation, p.307–320.

  • Wilcox-O’Hearn Z, Warner B, 2008. Tahoe: the least-authority filesystem. Proc 4th ACM Int Workshop on Storage Security and Survivability, p.21–26. https://doi.org/10.1145/1456469.1456474

  • Wilkes J, Golding R, Staelin C, et al., 1996. The HP AutoRAID hierarchical storage system. ACM Trans Comput Syst, 14(1):108–136. https://doi.org/10.1145/225535.225539

    Article  Google Scholar 

  • Wu SZ, Mao B, Chen XL, et al., 2016. LDM: log disk mirroring with improved performance and reliability for SSD-based disk arrays. ACM Trans Storage, 12(4):22. https://doi.org/10.1145/2892639

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Yun TENG designed the research and implemented the prototype system. Yun TENG and Zhiyue LI drafted the paper. Jing HUANG and Guangyan ZHANG helped organize the paper. Yun TENG and Guangyan ZHANG revised and finalized the paper.

Corresponding author

Correspondence to Guangyan Zhang  (张广艳).

Additional information

Compliance with ethics guidelines

Yun TENG, Zhiyue LI, Jing HUANG, and Guangyan ZHANG declare that they have no conflict of interest.

Project supported by the National Natural Science Foundation of China (No. 62025203) and the Changchun Key Scientific and Technological Research and Development Project, China (No. 21ZGN30)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Teng, Y., Li, Z., Huang, J. et al. Short Tail: taming tail latency for erasure-code-based in-memory systems. Front Inform Technol Electron Eng 23, 1646–1657 (2022). https://doi.org/10.1631/FITEE.2100566

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.2100566

Key words

关键词

CLC number