Abstract
Contemporary and future programming languages for HPC promote hybrid parallelism and shared memory abstractions using a global address space. In this programming style, data races occur easily and are notoriously hard to find. Previous work on data race detection for shared memory programs reports 10X-100X slowdowns for non-scientific programs. Previous work on distributed memory programs instruments only communication operations. In this paper we present the first complete implementation of data race detection at scale for UPC programs. Our implementation tracks local and global memory references in the program and it uses two techniques to reduce the overhead: 1) hierarchical function and instruction level sampling; and 2) exploiting the runtime persistence of aliasing and locality specific to Partitioned Global Address Space applications. The results indicate that both techniques are required in practice: well optimized instruction sampling introduces overheads as high as 6500% (65X slowdown), while each technique in separation is able to reduce it to 1000% (10X slowdown). When applying the optimizations in conjunction our tool finds all previously known data races in our benchmark programs with at most 50% overhead. Furthermore, while previous results illustrate the benefits of function level sampling, our experiences show that this technique does not work for scientific programs: instruction sampling or a hybrid approach is required.
- D. Bailey, T. Harris, W. Saphir, R. Van Der Wijngaart, A. Woo, and M. Yarrow. The NAS Parallel Benchmarks 2.0. Technical Report NAS-95-010, NASA Ames Research Center, 1995.Google Scholar
- W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, and K. W. E. Brooks. Introduction to UPC and Language Specification, 1999.Google Scholar
- D. Marino, M. Musuvathi, and S. Narayanasamy. LiteRace: Effective Sampling for Lightweight Data-Race Detection. In PLDI, 2009. Google ScholarDigital Library
- C.-S. Park, K. Sen, P. Hargrove, and C. Iancu. Efficient Data Race Detection for Distributed Memory Parallel Programs. In Supercomputing (SC11), 2011. Google ScholarDigital Library
- A. Vo, S. Aananthakrishnan, G. Gopalakrishnan, B. R. d. Supinski, M. Schulz, and G. Bronevetsky. A Scalable and Distributed Dynamic Formal Verifier for MPI Programs. In Supercomputing (SC10), 2010. Google ScholarDigital Library
Index Terms
- Scalable data race detection for partitioned global address space programs
Recommendations
Scalable data race detection for partitioned global address space programs
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programmingContemporary and future programming languages for HPC promote hybrid parallelism and shared memory abstractions using a global address space. In this programming style, data races occur easily and are notoriously hard to find. Previous work on data race ...
Scaling data race detection for partitioned global address space programs
ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputingContemporary and future programming languages for HPC promote hybrid parallelism and shared memory abstractions using a global address space. In this programming style, data races occur easily and are notoriously hard to find. Existing state-of-the-art ...
A deployable sampling strategy for data race detection
FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software EngineeringDynamic data race detection incurs heavy runtime overheads. Recently, many sampling techniques have been proposed to detect data races. However, some sampling techniques (e.g., Pacer) are based on traditional happens-before relation and incur a large ...
Comments