Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open Access

Efficient Auto-Tuning of Parallel Programs with Interdependent Tuning Parameters via Auto-Tuning Framework (ATF)

Published:20 January 2021Publication History
Skip Abstract Section

Abstract

Auto-tuning is a popular approach to program optimization: it automatically finds good configurations of a program’s so-called tuning parameters whose values are crucial for achieving high performance for a particular parallel architecture and characteristics of input/output data. We present three new contributions of the Auto-Tuning Framework (ATF), which enable a key advantage in general-purpose auto-tuning: efficiently optimizing programs whose tuning parameters have interdependencies among them. We make the following contributions to the three main phases of general-purpose auto-tuning: (1) ATF generates the search space of interdependent tuning parameters with high performance by efficiently exploiting parameter constraints; (2) ATF stores such search spaces efficiently in memory, based on a novel chain-of-trees search space structure; (3) ATF explores these search spaces faster, by employing a multi-dimensional search strategy on its chain-of-trees search space representation. Our experiments demonstrate that, compared to the state-of-the-art, general-purpose auto-tuning frameworks, ATF substantially improves generating, storing, and exploring the search space of interdependent tuning parameters, thereby enabling an efficient overall auto-tuning process for important applications from popular domains, including stencil computations, linear algebra routines, quantum chemistry computations, and data mining algorithms.

References

  1. M. Ahmad and O. Khan. 2016. GPU concurrency choices in graph analytics. In 2016 IEEE International Symposium on Workload Characterization (IISWC’16). 1--10.Google ScholarGoogle Scholar
  2. Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O’Reilly, and Saman Amarasinghe. 2014. OpenTuner: An extensible framework for program autotuning. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, 303--316.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, and Angela Y. Wu. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45, 6 (Nov. 1998), 891--923. DOI:https://doi.org/10.1145/293347.293348Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. ATF Artifact Implementation. 2020. Retrieved from https://gitlab.com/mdh-project/taco2020-atf.Google ScholarGoogle Scholar
  5. P. Balaprakash, J. Dongarra, T. Gamblin, M. Hall, J. K. Hollingsworth, B. Norris, and R. Vuduc. 2018. Autotuning in high-performance computing applications. Proc. IEEE 106, 11 (Nov. 2018), 2068--2083. DOI:https://doi.org/10.1109/JPROC.2018.2841200Google ScholarGoogle ScholarCross RefCross Ref
  6. Protonu Basu, Mary Hall, Malik Khan, Suchit Maindola, Saurav Muralidharan, Shreyas Ramalingam, Axel Rivera, Manu Shantharam, and Anand Venkat. 2013. Towards making autotuning mainstream. Int. J. High Performance Comput. Appl. 27, 4 (2013), 379--393. DOI:https://doi.org/10.1177/1094342013493644Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gerald Baumgartner, Alexander Auer, David E. Bernholdt, Alina Bibireata, Venkatesh Choppella, Daniel Cociorva, Xiaoyang Gao, Robert J. Harrison, So Hirata, Sriram Krishnamoorthy, et al. 2005. Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models. Proc. IEEE 93, 2 (2005), 276--292.Google ScholarGoogle ScholarCross RefCross Ref
  8. David Beckingsale, Olga Pearce, Ignacio Laguna, and Todd Gamblin. 2017. Apollo: Reusable models for fast, dynamic tuning of input-dependent code. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS’17). IEEE, 307--316.Google ScholarGoogle ScholarCross RefCross Ref
  9. Joāo M. P. Cardoso, Tiago Carvalho, José G. F. Coutinho, Ricardo Nobre, Razvan Nane, Pedro C. Diniz, Zlatko Petrov, Wayne Luk, and Koen Bertels. 2013. Controlling a complete hardware synthesis toolchain with LARA aspects. Microprocess. Microsyst. 37, 8, Part C (2013), 1073--1089. DOI:https://doi.org/10.1016/j.micpro.2013.06.001 Special Issue on European Projects in Embedded System Design: EPESD2012.Google ScholarGoogle Scholar
  10. Cedric Nugteren. 2020. CLTune Issue. Retrieved from https://github.com/CNugteren/CLTune/blob/master/src/searchers/annealing.cc#L134 (commit: 2b49667).Google ScholarGoogle Scholar
  11. Chun Chen, Jacqueline Chame, and Mary Hall. 2008. CHiLL: A Framework for Composing High-Level Loop Transformations. Technical Report. Citeseer. 0--27 pages.Google ScholarGoogle Scholar
  12. Matthias Christen, Olaf Schenk, and Helmar Burkhart. 2011. PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In 2011 IEEE International Parallel 8 Distributed Processing Symposium. IEEE, 676--687.Google ScholarGoogle Scholar
  13. Marco Cianfriglia, Flavio Vella, Cedric Nugteren, Anton Lokhmotov, and Grigori Fursin. 2018. A model-driven approach for a new generation of adaptive libraries. CoRR abs/1806.07060 (2018), 14 pp. arxiv:1806.07060 http://arxiv.org/abs/1806.07060.Google ScholarGoogle Scholar
  14. T. Daniel Crawford and Henry F. Schaefer. 2000. An introduction to coupled cluster theory for computational chemists. Revi. Comput. Chem. 14 (2000), 33--136.Google ScholarGoogle Scholar
  15. Christophe Dubach, John Cavazos, Björn Franke, Grigori Fursin, Michael F. P. O’Boyle, and Olivier Temam. 2007. Fast compiler optimisation evaluation using code-feature based performance prediction. In Proceedings of the 4th International Conference on Computing Frontiers. ACM, 131--142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Matteo Frigo and Steven G. Johnson. 2005. The design and implementation of FFTW3. Proc. IEEE 93, 2 (2005), 216--231.Google ScholarGoogle ScholarCross RefCross Ref
  17. Grigori Fursin, Yuriy Kashnikov, Abdul Wahid Memon, Zbigniew Chamski, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Bilha Mendelson, Ayal Zaks, Eric Courtois, et al. 2011. Milepost GCC: Machine learning enabled self-tuning compile. Int. J. Parallel Program. 39, 3 (2011), 296--327.Google ScholarGoogle ScholarCross RefCross Ref
  18. Bastian Hagedorn, Larisa Stoltzfus, Michel Steuwer, Sergei Gorlatch, and Christophe Dubach. 2018. High performance stencil code generation with lift. In Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO’18). ACM, New York, NY, 100--112. DOI:https://doi.org/10.1145/3168824Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Albert Hartono, Boyana Norris, and Ponnuswamy Sadayappan. 2009. Annotation-based empirical performance tuning using Orio. In 2009 IEEE International Symposium on Parallel 8 Distributed Processing. IEEE, 1--11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Hentschel et al. 2008. Das Krebsregister-Manual der Gesellschaft der epidemiologischen Krebsregister in Deutschland e.V. Zuckschwerdt Verlag.Google ScholarGoogle Scholar
  21. Intel. 2020. Math Kernel Library. Retrieved from https://software.intel.com/en-us/mkl.Google ScholarGoogle Scholar
  22. Intel. 2020. Math Kernel Library for Deep Learning Networks. Retrieved from https://software.intel.com/en-us/articles/intel-mkl-dnn-part-1-library-overview-and-installation.Google ScholarGoogle Scholar
  23. ISO/IEC. 2017. ISO international standard ISO/IEC 14882:2017—Programming language C++.Google ScholarGoogle Scholar
  24. B. Janßen, F. Schwiegelshohn, M. Koedam, F. Duhem, L. Masing, S. Werner, C. Huriaux, A. Courtay, E. Wheatley, K. Goossens, F. Lemonnier, P. Millet, J. Becker, O. Sentieys, and M. Hübner. 2015. Designing applications for heterogeneous many-core architectures with the FlexTiles Platform. In 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS’15). 254--261.Google ScholarGoogle Scholar
  25. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, 675--678.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Z. Jia, C. Xue, G. Chen, J. Zhan, L. Zhang, Y. Lin, and P. Hofstee. 2016. Auto-tuning Spark big data workloads on POWER8: Prediction-based dynamic SMT threading. In 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT’16). 387--400.Google ScholarGoogle Scholar
  27. K. Kaszyk, H. Wagstaff, T. Spink, B. Franke, M. O’Boyle, B. Bodin, and H. Uhrenholt. 2019. Full-system simulation of mobile CPU/GPU platforms. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’19). 68--78. DOI:https://doi.org/10.1109/ISPASS.2019.00015Google ScholarGoogle ScholarCross RefCross Ref
  28. A. E. Kiasari, Z. Lu, and A. Jantsch. 2013. An analytical latency model for networks-on-chip. IEEE Trans. Very Large Scale Integration (VLSI) Syst. 21, 1 (2013), 113--123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jinsung Kim, Aravind Sukumaran-Rajam, Vineeth Thumma, Sriram Krishnamoorthy, Ajay Panyala, Louis-Noël Pouchet, Atanas Rountev, and P. Sadayappan. 2019. A code generator for high-performance tensor contractions on GPUs. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO’19). IEEE Press, Piscataway, NJ, 85--95. http://dl.acm.org/citation.cfm?id=3314872.3314885.Google ScholarGoogle Scholar
  30. Patrick Koch, Oleg Golovidov, Steven Gardner, Brett Wujek, Joshua Griffin, and Yan Xu. 2018. Autotune: A derivative-free optimization framework for hyperparameter tuning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery 8 Data Mining (KDD’18). Association for Computing Machinery, New York, NY, 443--452. DOI:https://doi.org/10.1145/3219819.3219837Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Bastian Köpcke, Michel Steuwer, and Sergei Gorlatch. 2019. Generating efficient FFT GPU code with lift. In Proceedings of the 8th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing (FHPNC’19). ACM, New York, NY, 1--13. DOI:https://doi.org/10.1145/3331553.3342613Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Prasad Kulkarni, Stephen Hines, Jason Hiser, David Whalley, Jack Davidson, and Douglas Jones. 2004. Fast searches for effective optimization phase sequences. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI’04). Association for Computing Machinery, New York, NY, 171--182. DOI:https://doi.org/10.1145/996841.996863Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Junjie Lai and André Seznec. 2012. Bound the peak performance of SGEMM on GPU with software-controlled fast memory. [Research Report] RR-7923, 2012. hal-00686006v1.Google ScholarGoogle Scholar
  34. John Lawson, Mehdi Goli, Duncan McBain, Daniel Soutar, and Louis Sugy. 2019. Cross-platform performance portability using highly parametrized SYCL kernels. CoRR abs/1904.05347 (2019), 11 pp. arxiv:1904.05347 http://arxiv.org/abs/1904.05347Google ScholarGoogle Scholar
  35. Alberto Magni, Dominik Grewe, and Nick Johnson. 2013. Input-aware auto-tuning for directive-based GPU programming. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU-6). Association for Computing Machinery, New York, NY, 66--75. DOI:https://doi.org/10.1145/2458523.2458530Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. 2015. PolyMage: Automatic optimization for image processing pipelines. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). Association for Computing Machinery, New York, NY, 429--443. DOI:https://doi.org/10.1145/2694344.2694364Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Saurav Muralidharan, Manu Shantharam, Mary Hall, Michael Garland, and Bryan Catanzaro. 2014. Nitro: A framework for adaptive code variant tuning. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, 501--512.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Nelson, A. Rivera, P. Balaprakash, M. Hall, P. D. Hovland, E. Jessup, and B. Norris. 2015. Generating efficient tensor contractions for GPUs. In 2015 44th International Conference on Parallel Processing. 969--978.Google ScholarGoogle Scholar
  39. Gustavo Niemeyer. 2018. Python-constraint. Retrieved from https://pypi.org/project/python-constraint/.Google ScholarGoogle Scholar
  40. Cedric Nugteren. 2018. CLBlast: A tuned OpenCL BLAS library. In Proceedings of the International Workshop on OpenCL. ACM, 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Cedric Nugteren and Valeriu Codreanu. 2015. CLTune: A generic auto-tuner for OpenCL kernels. In 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip. IEEE, 195--202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. NVIDIA. 2020. cuBLAS library. Retrieved from https://developer.nvidia.com/cublas.Google ScholarGoogle Scholar
  43. NVIDIA. 2020. CUDA C++ Best Practices Guide. Retrieved from https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html.Google ScholarGoogle Scholar
  44. NVIDIA. 2020. CUDA®Deep Neural Network library. Retrieved from https://developer.nvidia.com/cudnn.Google ScholarGoogle Scholar
  45. OpenTuner. 2018. Interdependent Tuning Parameters (Issue 106). Retrieved from https://github.com/jansel/opentuner/issues/106.Google ScholarGoogle Scholar
  46. Philip Pfaffe, Tobias Grosser, and Martin Tillmann. 2019. Efficient hierarchical online-autotuning: A case study on polyhedral accelerator mapping. In Proceedings of the ACM International Conference on Supercomputing (ICS’19). ACM, New York, NY, 354--366. DOI:https://doi.org/10.1145/3330345.3330377Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Markus Puschel, José M. F. Moura, Jeremy R. Johnson, David Padua, Manuela M. Veloso, Bryan W. Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, et al. 2005. SPIRAL: Code generation for DSP transforms. Proc. IEEE 93, 2 (2005), 232--275.Google ScholarGoogle ScholarCross RefCross Ref
  48. Ari Rasch and Sergei Gorlatch. 2018. Multi-dimensional homomorphisms and their implementation in OpenCL. Int. J. Parallel Program. 46, 1 (01 Feb. 2018), 101--119. DOI:https://doi.org/10.1007/s10766-017-0508-zGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  49. Ari Rasch and Sergei Gorlatch. 2019. ATF: A generic, directive-based auto-tuning framework. Concurrency Comput.: Pract. Exper. 31, 5 (2019), 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  50. A. Rasch, M. Haidl, and S. Gorlatch. 2017. ATF: A generic auto-tuning framework. In 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). 64--71. DOI:https://doi.org/10.1109/HPCC-SmartCity-DSS.2017.9Google ScholarGoogle Scholar
  51. A. Rasch, R. Schulze, and S. Gorlatch. 2019. Generating portable high-performance code via multi-dimensional homomorphisms. In 28th International Conference on Parallel Architectures and Compilation Techniques (PACT’19). 354--369.Google ScholarGoogle Scholar
  52. Ari Rasch, Richard Schulze, Waldemar Gorus, Jan Hiller, Sebastian Bartholomäus, and Sergei Gorlatch. 2019. High-performance probabilistic record linkage via multi-dimensional homomorphisms. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing (SAC’19). Association for Computing Machinery, New York, NY, 526--533. DOI:https://doi.org/10.1145/3297280.3297330Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Simon Rovder, José Cano, and Michael O’Boyle. 2019. Optimising convolutional neural networks inference on low-powered GPUs. In 12th International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2019). 14 pp.Google ScholarGoogle Scholar
  54. D. Schaa and D. Kaeli. 2009. Exploring the multiple-GPU design space. In 2009 IEEE International Symposium on Parallel Distributed Processing. 1--12.Google ScholarGoogle Scholar
  55. Mohammed Sourouri, Espen Birger Raknes, Nico Reissmann, Johannes Langguth, Daniel Hackenberg, Robert Schöne, and Per Gunnar Kjeldsberg. 2017. Towards fine-grained dynamic tuning of HPC applications on modern multi-core architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Akshitha Sriraman and Thomas F. Wenisch. 2018. µTune: Auto-tuned threading for OLDI microservices. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18). USENIX Association, Carlsbad, CA, 177--194. https://www.usenix.org/conference/osdi18/presentation/sriraman.Google ScholarGoogle Scholar
  57. Per Stenström and Jonas Skeppstedt. 1997. A performance tuning approach for shared-memory multiprocessors. In Euro-Par’97 Parallel Processing, Christian Lengauer, Martin Griebl, and Sergei Gorlatch (Eds.). Springer, Berlin, 72--83.Google ScholarGoogle Scholar
  58. Larisa Stoltzfus, Bastian Hagedorn, Michel Steuwer, Sergei Gorlatch, and Christophe Dubach. 2019. Tiling optimizations for stencil computations using rewrite rules in lift. ACM Trans. Archit. Code Optim. 16, 4, (Dec. 2019), Article 52, 25 pages. DOI:https://doi.org/10.1145/3368858Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Huihui Sun, Florian Fey, Jie Zhao, and Sergei Gorlatch. 2019. WCCV: Improving the vectorization of IF-statements with warp-coherent conditions. In Proceedings of the ACM International Conference on Supercomputing (ICS’19). ACM, New York, NY, 319--329. DOI:https://doi.org/10.1145/3330345.3331059Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. X. Tang, A. Pattnaik, H. Jiang, O. Kayiran, A. Jog, S. Pai, M. Ibrahim, M. T. Kandemir, and C. R. Das. 2017. Controlled kernel launch for dynamic parallelism in GPUs. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA’17). 649--660. DOI:https://doi.org/10.1109/HPCA.2017.14Google ScholarGoogle ScholarCross RefCross Ref
  61. Thiago SFX Teixeira, William Gropp, and David Padua. 2019. Managing code transformations for better performance portability. Int. J. High Performance Comput. Appl. 33, 6 (2019), 1290--1306.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Thiago S. F. X. Teixeira, Corinne Ancourt, David Padua, and William Gropp. 2019. Locus: A system and a language for program optimization. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO’19). IEEE Press, Piscataway, NJ, 217--228.Google ScholarGoogle ScholarCross RefCross Ref
  63. Philippe Tillet and David Cox. 2017. Input-aware auto-tuning of compute-bound HPC kernels. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Philippe Tillet, H. T. Kung, and David Cox. 2019. Triton: An intermediate language and compiler for tiled neural network computations. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL’19). ACM, New York, NY, 10--19. DOI:https://doi.org/10.1145/3315508.3329973Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Ananta Tiwari, Vahid Tabatabaee, and Jeffrey K. Hollingsworth. 2009. Tuning parallel applications in parallel. Parallel Comput. 35, 8 (2009), 475--492. DOI:https://doi.org/10.1016/j.parco.2009.07.001Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Ben van Werkhoven. 2019. Kernel tuner: A search-optimizing GPU code auto-tuner. Future Gen. Comput. Syst. 90 (2019), 347--358. DOI:https://doi.org/10.1016/j.future.2018.08.004Google ScholarGoogle ScholarCross RefCross Ref
  67. Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary Devito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2019. The next 700 accelerated layers: From mathematical expressions of network computation graphs to accelerated GPU kernels, automatically. ACM Trans. Archit. Code Optim. 16, 4 (Oct. 2019), Article 38, 26 pages. DOI:https://doi.org/10.1145/3355606Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. N. Vijaykumar, K. Hsieh, G. Pekhimenko, S. Khan, A. Shrestha, S. Ghose, A. Jog, P. B. Gibbons, and O. Mutlu. 2016. Zorua: A holistic approach to resource virtualization in GPUs. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 1--14.Google ScholarGoogle Scholar
  69. R. Clinton Whaley and Jack J. Dongarra. 1998. Automatically tuned linear algebra software. In SC’98: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing. IEEE, 38.Google ScholarGoogle Scholar
  70. Stephen Wright and Jorge Nocedal. 1999. Numerical optimization. Springer Sci. 35, 67–68 (1999), 7.Google ScholarGoogle Scholar
  71. Vasileios Zois, Divya Gupta, Vassilis J. Tsotras, Walid A. Najjar, and Jean-Francois Roy. 2018. Massively parallel skyline computation for processing-in-memory architectures. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (PACT’18). Association for Computing Machinery, New York, NY, Article 1, 12 pages. DOI:https://doi.org/10.1145/3243176.3243187Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient Auto-Tuning of Parallel Programs with Interdependent Tuning Parameters via Auto-Tuning Framework (ATF)

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Architecture and Code Optimization
          ACM Transactions on Architecture and Code Optimization  Volume 18, Issue 1
          March 2021
          402 pages
          ISSN:1544-3566
          EISSN:1544-3973
          DOI:10.1145/3446348
          Issue’s Table of Contents

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 January 2021
          • Revised: 1 September 2020
          • Accepted: 1 September 2020
          • Received: 1 May 2020
          Published in taco Volume 18, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format