research-article

Algorithmic improvements for fast concurrent Cuckoo hashing

Authors:
Xiaozhou Li

Princeton University

Princeton University
View Profile

,
David G. Andersen

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Michael Kaminsky

Intel Labs

Intel Labs
View Profile

,
Michael J. Freedman

Princeton University

Princeton University
View Profile

EuroSys '14: Proceedings of the Ninth European Conference on Computer SystemsApril 2014Article No.: 27Pages 1–14https://doi.org/10.1145/2592798.2592820

Published:14 April 2014Publication History

EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems

Pages 1–14

ABSTRACT

Fast concurrent hash tables are an increasingly important building block as we scale systems to greater numbers of cores and threads. This paper presents the design, implementation, and evaluation of a high-throughput and memory-efficient concurrent hash table that supports multiple readers and writers. The design arises from careful attention to systems-level optimizations such as minimizing critical section length and reducing interprocessor coherence traffic through algorithm re-engineering. As part of the architectural basis for this engineering, we include a discussion of our experience and results adopting Intel's recent hardware transactional memory (HTM) support to this critical building block. We find that naively allowing concurrent access using a coarse-grained lock on existing data structures reduces overall performance with more threads. While HTM mitigates this slowdown somewhat, it does not eliminate it. Algorithmic optimizations that benefit both HTM and designs for fine-grained locking are needed to achieve high performance.

Our performance results demonstrate that our new hash table design---based around optimistic cuckoo hashing---outperforms other optimized concurrent hash tables by up to 2.5x for write-heavy workloads, even while using substantially less memory for small key-value items. On a 16-core machine, our hash table executes almost 40 million insert and more than 70 million lookup operations per second.

References

Intel^® 64 and IA-32 Architectures Software Developer's Manual. Number 253665-047US. Intel Corporation, June 2013.Google Scholar
Intel Threading Building Block. https://www.threadingbuildingblocks.org/.Google Scholar
S. Chaudhry, R. Cypher, M. Ekman, M. Karlsson, A. Landin, S. Yip, H. Zeffer, and M. Tremblay. Rock: A High-Performance Sparc CMT Processor. IEEE Micro, 29(2):6--16, Mar. 2009. Google ScholarDigital Library
D. Christie, J.-W. Chung, S. Diestelhorst, M. Hohmuth, M. Pohlack, C. Fetzer, M. Nowack, T. Riegel, P. Felber, P. Marlier, and E. Rivière. Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack. In Proc. 5th EuroSys, pages 27--40, 2010. Google ScholarDigital Library
J. Chung, L. Yen, S. Diestelhorst, M. Pohlack, M. Hohmuth, D. Christie, and D. Grossman. ASF: AMD64 Extension for Lock-Free Data Structures and Transactional Memory. In Proc. 43rd MICRO, pages 39--50, 2010. Google ScholarDigital Library
D. Dice, Y. Lev, M. Moir, and D. Nussbaum. Early Experience with a Commercial Hardware Transactional Memory Implementation. In Proc. 14th ASPLOS, pages 157--168, 2009. Google ScholarDigital Library
U. Erlingsson, M. Manasse, and F. McSherry. A Cool and Practical Alternative to Traditional Hash Tables. In Proc. 7th Workshop on Distributed Data and Structures (WDAS'06), Santa Clara, CA, Jan. 2006.Google Scholar
B. Fan, D. G. Andersen, and M. Kaminsky. MemC3: Compact and Concurrent Memcache with Dumber Caching and Smarter Hashing. In Proc. 10th USENIX NSDI, Lombard, IL, Apr. 2013. Google ScholarDigital Library
Google SparseHash. https://code.google.com/p/sparsehash/.Google Scholar
M. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-free Data Structures. In Proc. 20th ISCA, pages 289--300, 1993. Google ScholarDigital Library
M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008. Google ScholarDigital Library
Intel Performance Counter Monitor. www.intel.com/software/pcm.Google Scholar
C. Jacobi, T. Slegel, and D. Greiner. Transactional Memory Architecture and Implementation for IBM System Z. In Proc. 45th MICRO, pages 25--36, 2012. Google ScholarDigital Library
H. T. Kung and J. T. Robinson. On Optimistic Methods for Concurrency Control. ACM Trans. Database Syst., 6(2):213--226, June 1981. Google ScholarDigital Library
libcuckoo. https://github.com/efficient/libcuckoo.Google Scholar
Y. Mao, E. Kohler, and R. T. Morris. Cache craftiness for fast multicore key-value storage. In Proc. 7th EuroSys, pages 183--196, 2012. Google ScholarDigital Library
P. E. McKenney, D. Sarma, A. Arcangeli, A. Kleen, O. Krieger, and R. Russell. Read-Copy Update. In In Ottawa Linux Symposium, pages 338--367, 2001.Google Scholar
R. Pagh and F. F. Rodler. Cuckoo Hashing. Journal of Algorithms, 51(2):122--144, May 2004. Google ScholarDigital Library
J. Triplett, P. E. McKenney, and J. Walpole. Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming. In Proc. USENIX ATC, pages 11--11, 2011. Google ScholarDigital Library
TSX lock elision for glibc. https://github.com/andikleen/glibc.Google Scholar
A. Wang, M. Gaudet, P. Wu, J. N. Amaral, M. Ohmacht, C. Barton, R. Silvera, and M. Michael. Evaluation of Blue Gene/Q Hardware Support for Transactional Memories. In Proc. 21st PACT, pages 127--136, 2012. Google ScholarDigital Library
R. M. Yoo, C. J. Hughes, K. Laiz, and R. Rajwar. Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing. In Proc. SC, 2013. Google ScholarDigital Library

Recommendations

Lock-Free Cuckoo Hashing
ICDCS '14: Proceedings of the 2014 IEEE 34th International Conference on Distributed Computing Systems

This paper presents a lock-free cuckoo hashing algorithm, to the best of our knowledge this is the first lock-free cuckoo hashing in the literature. The algorithm allows mutating operations to operate concurrently with query ones and requires only ...
Read More
Lock-Free Bucketized Cuckoo Hashing
Euro-Par 2023: Parallel Processing
Abstract
Concurrent hash tables are one of the fundamental building blocks for cloud computing. In this paper, we introduce lock-free modifications to in-memory bucketized cuckoo hashing. We present a novel concurrent strategy in designing a lock-free hash ...
Read More
Fast concurrent queues for x86 processors
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Conventional wisdom in designing concurrent data structures is to use the most powerful synchronization primitive, namely compare-and-swap (CAS), and to avoid contended hot spots. In building concurrent FIFO queues, this reasoning has led researchers to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems
April 2014
388 pages
ISBN:9781450327046
DOI:10.1145/2592798
General Chairs:
Dick Bultermann
CWI
,
Herbert Bos
Vrije Universiteit Amsterdam
,
Program Chairs:
Ant Rowstron
Microsoft Research Cambridge
,
Peter Druschel
MPI SWS
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 April 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
EuroSys '14 Paper Acceptance Rate27of147submissions,18%Overall Acceptance Rate241of1,308submissions,18%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 107
  Total Citations
  View Citations
- 1,263
  Total Downloads
- Downloads (Last 12 months)94
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Algorithmic improvements for fast concurrent Cuckoo hashing

EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems

ABSTRACT

References

Cited By

Recommendations

Lock-Free Cuckoo Hashing

Lock-Free Bucketized Cuckoo Hashing

Fast concurrent queues for x86 processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Algorithmic improvements for fast concurrent Cuckoo hashing

EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems

ABSTRACT

References

Cited By

Recommendations

Lock-Free Cuckoo Hashing

Lock-Free Bucketized Cuckoo Hashing

Fast concurrent queues for x86 processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media