Short Tail: taming tail latency for erasure-code-based in-memory systems

Teng, Yun; Li, Zhiyue; Huang, Jing; Zhang, Guangyan

doi:10.1631/FITEE.2100566

Short Tail: taming tail latency for erasure-code-based in-memory systems

ShortTail:降低纠删码内存存储系统的尾部延迟

Published: 01 June 2022

Volume 23, pages 1646–1657, (2022)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

142 Accesses
1 Citation
Explore all metrics

Abstract

In-memory systems with erasure coding (EC) enabled are widely used to achieve high performance and data availability. However, as the scale of clusters grows, the server-level fail-slow problem is becoming increasingly frequent, which can create long tail latency. The influence of long tail latency is further amplified in EC-based systems due to the synchronous nature of multiple EC sub-operations. In this paper, we propose an EC-enabled in-memory storage system called Short Tail, which can achieve consistent performance and low latency for both reads and writes. First, Short Tail uses a lightweight request monitor to track the performance of each memory node and identify any fail-slow node. Second, Short Tail selectively performs degraded reads and redirected writes to avoid accessing fail-slow nodes. Finally, Short Tail posts an adaptive write strategy to reduce write amplification of small writes. We implement Short Tail on top of Memcached and compare it with two baseline systems. The experimental results show that Short Tail can reduce the P99 tail latency by up to 63.77%; it also brings significant improvements in the median latency and average latency.

摘要

为获得高性能和高数据可用性,基于纠删码的内存存储系统得到广泛应用。然而,随着集群规模不断增长,服务器级别的性能降级问题出现得越来越频繁,进而导致长尾延迟。在基于纠删码的系统中,由于一个纠删码操作可能依赖于多个子操作的同步完成,长尾延迟的影响被进一步放大。本文提出一种称为ShortTail的基于纠删码的内存存储系统,该系统可实现稳定的性能和较低的读写延迟。首先,ShortTail使用轻量请求监视器监测每个内存节点性能,以便及时发现性能降级节点。其次,ShortTail选择性执行降级读操作和重定向写操作,以避免访问性能降级节点。最后,ShortTail采用一种自适应写策略降低小写请求的写放大程度。本文在Memcached上实现了ShortTail,并将其与两个系统进行比较。实验结果表明,ShortTail最高可降低63.77%的99分位延迟,且显著改善中位延迟和平均延迟。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Resistive Random Access Memory (RRAM): an Overview of Materials, Switching Mechanism, Performance, Multilevel Cell (mlc) Storage, Modeling, and Applications

Article Open access 22 April 2020

Cloud storage cost: a taxonomy and survey

Article Open access 24 May 2024

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

References

Abebe M, Daudjee K, Glasbergen B, et al., 2018. EC-Store: bridging the gap between storage and latency in distributed erasure coded systems. Proc IEEE 38^th Int Conf on Distributed Computing System, p.255–266. https://doi.org/10.1109/ICDCS.2018.00034
Andersen DG, Balakrishnan H, Kaashoek MF, et al., 2005. Improving web availability for clients with MONET. Proc 2^nd Symp on Networked Systems Design and Implementation, p.115–128.
Balaji SB, Krishnan MN, Vajha M, et al., 2018. Erasure coding for distributed storage: an overview. Sci China, Inform Sci, 61(10):100301. https://doi.org/10.1007/s11432-018-9482-6
Article Google Scholar
Cooper BF, Silberstein A, Tam E, et al., 2010. Benchmarking cloud serving systems with YCSB. Proc 1^st ACM Symp on Cloud Computing, p.143–154. https://doi.org/10.1145/1807128.1807152
Dimakis AG, Godfrey PB, Wu YN, et al., 2010. Network coding for distributed storage systems. IEEE Trons Inform Theory, 56(9):4539–4551. https://doi.org/10.1109/TIT.2010.2054295
Article MATH Google Scholar
Dragojević A, Narayanan D, Hodson O, et al., 2014. FaRM: fast remote memory. Proc 11^th USENIX Conf on Networked Systems Design and Implementation, p.401–414.
Dragojević A, Narayanan D, Nightingale EB, et al., 2015. No compromises: distributed transactions with consistency, availability, and performance. Proc 25^th Symp on Operating Systems Principles, p.54–70. https://doi.org/10.1145/2815400.2815425
Fan B, Andersen DG, Kaminsky M, 2013. MemC3: compact and concurrent MemCache with dumber caching and smarter hashing. Proc 10^th USENIX Conf on Networked Systems Design and Implementation, p.371–384.
Ford D, Labelle F, Popovici FI, et al., 2010. Availability in globally distributed storage systems. Proc 9^th USENIX Conf on Operating Systems Design and Implementation, p.61–74.
Ganjam A, Jiang JC, Liu X, et al., 2015. C3: Internetscale control plane for video quality optimization. Proc 12^th USENIX Conf on Networked Systems Design and Implementation, p.131–144.
Gunawi HS, Suminto RO, Sears R, et al., 2018. Fail-slow at scale: evidence of hardware performance faults in large production systems. Proc 16^th USENIX Conf on File and Storage Technologies, p.1–14.
Hu YC, Niu D, 2016. Reducing access latency in erasure coded cloud storage with local block migration. Proc 35^th Annual IEEE Int Conf on Computer Communications, p.1–9. https://doi.org/10.1109/INFOCOM.2016.7524628
Hu YC, Wang YS, Liu B, et al., 2017. Latency reduction and load balancing in coded storage systems. Symp on Cloud Computing, p.365–377. https://doi.org/10.1145/3127479.3131623
Hu YC, Cheng LF, Yao QR, et al., 2021. Exploiting combined locality for wide-stripe erasure coding in distributed storage. Proc 19^th USENIX Conf on File and Storage Technologies, p.233–248.
Huang C, Simitci H, Xu YK, et al., 2012. Erasure coding in windows azure storage. USENIX Conf on Annual Technical Conf, p.2.
Huang P, Guo CX, Zhou LD, et al., 2017. Gray failure: the Achilles’ heel of cloud-scale systems. Proc 16^th Workshop on Hot Topics in Operating Systems, p.150–155. https://doi.org/10.1145/3102980.3103005
Intel, 2015. Intel Announces Optane Storage Brand for 3D XPoint Products. https://www.anandtech.com/show/9541/intel-announces-optane-storage-brand-for-3d-xpoint-products [Accessed on Nov. 8, 2021].
Kalia A, Kaminsky M, Andersen DG, 2014. Using RDMA efficiently for key-value services. SIGCOMM Comput Commun Rev, 44(4):295–306. https://doi.org/10.1145/2740070.2626299
Article Google Scholar
Kalia A, Kaminsky M, Andersen DG, 2016. FaSST: fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs. Proc 12^th USENIX Symp on Operating Systems Design and Implementation, p.185–201.
Lamport L, 1998. The part-time parliament. ACM Trons Comput Syst, 16(2):133–169. https://doi.org/10.1145/279227.279229
Article MATH Google Scholar
Li C, Porto D, Clement A, et al., 2012. Making geo-replicated systems fast as possible, consistent when necessary. Proc 10^th USENIX Conf on Operating Systems Design and Implementation, p.265–278.
Li XL, Li RH, Lee PPC, et al., 2019. OpenEC: toward unified and configurable erasure coding management in distributed storage systems. Proc 17^th USENIX Conf on File and Storage Technologies, p.331–344.
Lin SY, Gong GW, Shen ZR, et al., 2021. Boosting full-node repair in erasure-coded storage. USENIX Annual Technical Conf, p.641–655.
Narayanan D, Donnelly A, Rowstron A, 2008. Write off-loading: practical power management for enterprise storage. ACM Trons Storoge, 4(3):10. https://doi.org/10.1145/1416944.1416949
Google Scholar
Nishtala R, Fugal H, Grimm S, et al., 2013. Scaling memcache at Facebook. Proc 10^th USENIX Symp on Networked Systems Design and Implementation, p.385–398.
Ovsiannikov M, Rus S, Reeves D, et al., 2013. The quantcast file system. Proc VLDB Endow, 6(11):1092–1101. https://doi.org/10.14778/2536222.2536234
Article Google Scholar
Pagh R, Rodler FF, 2004. Cuckoo hashing. J Algor, 51(2):122–144. https://doi.org/10.1016/j.jalgor.2003.12.002
Article MathSciNet MATH Google Scholar
Pamies-Juarez L, Blagojevic F, Mateescu R, et al., 2016. Opening the chrysalis: on the real repair performance of MSR codes. Proc 14^th USENIX Conf on File and Storage Technologies, p.81–94.
Plank JS, Huang C, 2013. Tutorial: erasure coding for storage applications. Proc 11^th USENIX Conf on File and Storage Technologies.
Poke M, Hoefler T, 2015. DARE: high-performance state machine replication on RDMA networks. Proc 24^th Int Symp on High-Performance Parallel and Distributed Computing, p.107–118. https://doi.org/10.1145/2749246.2749267
Rashmi KV, Nakkiran P, Wang JY, et al., 2015. Having your cake and eating it too: jointly optimal erasure codes for I/O, storage and network-bandwidth. Proc 13^th USENIX Conf on File and Storage Technologies, p.81–94.
Rashmi KV, Chowdhury M, Kosaian J, et al., 2016. EC-Cache: load-balanced, low-latency cluster caching with online erasure coding. Proc 12^th USENIX Conf on Operating Systems Design and Implementation, p.401–417.
Reed IS, Solomon G, 1960. Polynomial codes over certain finite fields. J Soc Ind Appl Math, 8(2):300–304. https://doi.org/10.1137/0108018
Article MathSciNet MATH Google Scholar
Shah NB, Lee K, Ramchandran K, 2016. When do redundant requests reduce latency? IEEE Trans Commun, 64(2):715–722. https://doi.org/10.1109/TCOMM.2015.2506161
Article Google Scholar
Stewart C, Chakrabarti A, Griffith R, 2013. Zoolander: efficiently meeting very strict, low-latency SLOs. Proc 10^th Int Conf on Autonomic Computing, p.265–277.
Uluyol M, Huang A, Goel A, et al., 2020. Near-optimal latency versus cost tradeoffs in geo-distributed storage. Proc 17^th USENIX Symp on Networked Systems Design and Implementation, p.157–180.
Vajha M, Ramkumar V, Puranik B, et al., 2018. Clay codes: moulding MDS codes to yield an MSR code. Proc 16^th USENIX Conf on File and Storage Technologies, p.139–154.
Weil SA, Brandt SA, Miller EL, et al., 2006. Ceph: a scalable, high-performance distributed file system. Proc 7^th Symp on Operating Systems Design and Implementation, p.307–320.
Wilcox-O’Hearn Z, Warner B, 2008. Tahoe: the least-authority filesystem. Proc 4^th ACM Int Workshop on Storage Security and Survivability, p.21–26. https://doi.org/10.1145/1456469.1456474
Wilkes J, Golding R, Staelin C, et al., 1996. The HP AutoRAID hierarchical storage system. ACM Trans Comput Syst, 14(1):108–136. https://doi.org/10.1145/225535.225539
Article Google Scholar
Wu SZ, Mao B, Chen XL, et al., 2016. LDM: log disk mirroring with improved performance and reliability for SSD-based disk arrays. ACM Trans Storage, 12(4):22. https://doi.org/10.1145/2892639
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, Changchun, 130012, China
Yun Teng (滕云) & Jing Huang (黄晶)
Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Zhiyue Li (李之悦) & Guangyan Zhang (张广艳)
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China
Yun Teng (滕云) & Jing Huang (黄晶)
Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, 100084, China
Zhiyue Li (李之悦) & Guangyan Zhang (张广艳)

Authors

Yun Teng (滕云)
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyue Li (李之悦)
View author publications
You can also search for this author in PubMed Google Scholar
Jing Huang (黄晶)
View author publications
You can also search for this author in PubMed Google Scholar
Guangyan Zhang (张广艳)
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yun TENG designed the research and implemented the prototype system. Yun TENG and Zhiyue LI drafted the paper. Jing HUANG and Guangyan ZHANG helped organize the paper. Yun TENG and Guangyan ZHANG revised and finalized the paper.

Corresponding author

Correspondence to Guangyan Zhang (张广艳).

Additional information

Compliance with ethics guidelines

Yun TENG, Zhiyue LI, Jing HUANG, and Guangyan ZHANG declare that they have no conflict of interest.

Project supported by the National Natural Science Foundation of China (No. 62025203) and the Changchun Key Scientific and Technological Research and Development Project, China (No. 21ZGN30)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teng, Y., Li, Z., Huang, J. et al. Short Tail: taming tail latency for erasure-code-based in-memory systems. Front Inform Technol Electron Eng 23, 1646–1657 (2022). https://doi.org/10.1631/FITEE.2100566

Download citation

Received: 08 December 2021
Accepted: 01 April 2022
Published: 01 June 2022
Issue Date: November 2022
DOI: https://doi.org/10.1631/FITEE.2100566

Key words

关键词

CLC number

TP302

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Short Tail: taming tail latency for erasure-code-based in-memory systems

Abstract

摘要

Access this article

Similar content being viewed by others

Resistive Random Access Memory (RRAM): an Overview of Materials, Switching Mechanism, Performance, Multilevel Cell (mlc) Storage, Modeling, and Applications

Cloud storage cost: a taxonomy and survey

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Key words

关键词

CLC number

Navigation

Short Tail: taming tail latency for erasure-code-based in-memory systems

Abstract

摘要

Access this article

Similar content being viewed by others

Resistive Random Access Memory (RRAM): an Overview of Materials, Switching Mechanism, Performance, Multilevel Cell (mlc) Storage, Modeling, and Applications

Cloud storage cost: a taxonomy and survey

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Share this article

Key words

关键词

CLC number

Search

Navigation