Abstract
The National Center for Atmospheric Research released a global atmosphere model named Community Atmosphere Model version 5.0 (CAM5), which aimed to provide a global climate simulation for meteorological research. Among them, the cloud microphysics scheme is extremely time-consuming, so developing efficient parallel algorithms faces large-scale and chronic simulation challenges. Due to the wide application of GPU in the fields of science and engineering and the NVIDIA’s mature and stable CUDA platform, we ported the code to GPU to accelerate computing. In this paper, by analyzing the parallelism of CAM5 cloud microphysical schemes (CAM5 CMS) in different dimensions, corresponding GPU-based one-dimensional (1D) and two-dimensional (2D) parallel acceleration algorithms are proposed. Among them, the 2D parallel algorithm exploits finer-grained parallelism. In addition, we present a data transfer optimization method between the CPU and GPU to further improve the overall performance. Finally, GPU version of the CAM5 CMS (GPU-CMS) was implemented. The GPU-CMS can obtain a speedup of 141.69\(\times\) on a single NVIDIA A100 GPU with I/O transfer. In the case without I/O transfer, compared to the baseline performance on a single Intel Xeon E5-2680 CPU core, the 2D acceleration algorithm obtained a speedup of 48.75\(\times\), 280.11\(\times\), and 507.18\(\times\) on a single NVIDIA K20, P100, and A100 GPU, respectively.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available on request from the corresponding author.
References
Collins WD, Rasch PJ, Boville BA, Hack JJ, McCaa JR, Williamson DL, Kiehl JT, Briegleb B, Bitz C, Lin S-J, et al (2004) Description of the ncar community atmosphere model (cam 3.0). NCAR Tech. Note NCAR/TN-464+ STR 226, 1326–1334
Neale RB, Chen C-C, Gettelman A, Lauritzen PH, Park S, Williamson DL, Conley AJ, Garcia R, Kinnison D, Lamarque J-F et al (2010) Description of the ncar community atmosphere model (cam 5.0). NCAR Tech Note NCAR/TN-486+ STR 1(1):1–12
Conley AJ, Garcia R, Kinnison D, Lamarque J-F, Marsh D, Mills M, Smith AK, Tilmes S, Vitt F, Morrison H et al (2012) Description of the ncar community atmosphere model (cam 5.0). NCAR technical note 3
Morrison H, Curry J, Khvorostyanov V (2005) A new double-moment microphysics parameterization for application in cloud and climate models. part i: description. J Atmos Sci 62(6):1665–1677
Fan Z, Qiu F, Kaufman A, Yoakum-Stover S (2004) Gpu cluster for high performance computing. In: SC’04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, pp 47–47. IEEE
Deng Z, Chen D, Hu Y, Wu X, Peng W, Li X (2012) Massively parallel non-stationary eeg data processing on gpgpu platforms with morlet continuous wavelet transform. J Internet Serv Appl 3(3):347–357
Chen D, Wang L, Tian M, Tian J, Wang S, Bian C, Li X (2013) Massively parallel modelling & simulation of large crowd with gpgpu. J Supercomput 63(3):675–690
Yuan Y, Shi F, Kirby JT, Yu F (2020) Funwave-gpu: multiple-gpu acceleration of a boussinesq-type wave model. J Adv Model Earth Syst 12(5):e01957
Sanders J, Kandrot E (2010) CUDA by Example: an Introduction to General-purpose GPU Programming, Addison-Wesley Professional
Xiao D, Tong-Hua S, Jun W, Ren-Ping L (2014) Decadal variation of the aleutian low-icelandic low seesaw simulated by a climate system model (cas-esm-c). Atmos Oceanic Sci Lett 7(2):110–114
Zhang H, Zhang M, Zeng Q-C (2013) Sensitivity of simulated climate to two atmospheric models: interpretation of differences between dry models and moist models. Mon Weather Rev 141(5):1558–1576
Owens JD, Houston M, Luebke D, Green S, Stone JE, Phillips JC (2008) Gpu computing. Proc IEEE 96(5):879–899
Nickolls J, Dally WJ (2010) The gpu computing era. IEEE Micro 30(2):56–69
Mielikainen J, Huang B, Huang H-LA, Goldberg MD (2012) Improved gpu/cuda based parallel weather and research forecast (wrf) single moment 5-class (wsm5) cloud microphysics. IEEE J Select Topics Appl Earth Observ Remote Sensing 5(4):1256–1265
Mielikainen J, Huang B, Wang J, Huang H-LA, Goldberg MD (2013) Compute unified device architecture (cuda)-based parallelization of wrf kessler cloud microphysics scheme. Comput Geosci 52:292–299
Xiao H, Sun J, Bian X, Dai Z (2013) Gpu acceleration of the wsm6 cloud microphysics scheme in grapes model. Comput Geosci 59:156–162
Mielikainen J, Huang B, Huang H-L, Goldberg M, Mehta A (2013) Speeding up the computation of wrf double-moment 6-class microphysics scheme with gpu. J Atmos Oceanic Tech 30(12):2896–2906
Huang M, Huang B, Gu L, Huang H-LA, Goldberg MD (2015) Parallel gpu architecture framework for the wrf single moment 6-class microphysics scheme. Comput Geosci 83:17–26
Kim JY, Kang J-S, Joh M (2021) Gpu acceleration of mpas microphysics wsm6 using openacc directives: performance and verification. Comput Geosci 146:104627
Wang Z, Wang Y, Wang X, Li F, Zhou C, Hu H, Jiang J (2021) Gpu-rrtmg_sw: accelerating a shortwave radiative transfer scheme on gpu. IEEE Access 9:84231–84240
Carlotto T, Borges Chaffe PL, Innocente dos Santos C, Lee S (2021) Sw2d-gpu: a two-dimensional shallow water model accelerated by gpgpu. Environ Modell Softw 145:105205. https://doi.org/10.1016/j.envsoft.2021.105205
Cao H, Yuan L, Zhang H, Zhang Y, Wu B, Li K, Li S, Zhang M, Lu P, Xiao J (2023) Agcm-3dlf: accelerating atmospheric general circulation model via 3-d parallelization and leap-format. IEEE Trans Parallel Distrib Syst 34(3):766–780. https://doi.org/10.1109/TPDS.2022.3231013
Fung J, Mann S (2004) Computer vision signal processing on graphics processing units. In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, pp 93. IEEE
Kirk D et al (2007) Nvidia cuda software and gpu parallel computing architecture. In: ISMM 7:103–104
Wolfe M et al (2012) Cuda fortran programming guide and reference. The Portland Group, Release
Ruetsch G, Fatica M (2013) CUDA Fortran for Scientists and Engineers: Best Practices for Efficient CUDA Fortran Programming, Elsevier
NVIDIA: CUDA Fortran Programming Guide and Reference. (2019). [Online]. available at https://www.pgroup.com/resources/docs/19.1/pdf/pgi19cudaforug.pdf
Morrison H, Gettelman A (2008) A new two-moment bulk stratiform cloud microphysics scheme in the community atmosphere model, version 3 (cam3). part i: description and numerical tests. J Clim 21(15):3642–3659
Wang Y, Zhao Y, Jiang J, Zhang H (2020) A novel gpu-based acceleration algorithm for a longwave radiative transfer model. Appl Sci 10(2):649
NVIDIA: “CUDA C Programming Guide v10.0.”. [Online]. https://docs.nvidia.com/pdf/CUDA_C_Programming_Guide.pdf (2019)
Farhatuaini L, Pulungan R (2019) Parallelization of uniformization algorithm with cuda-aware mpi. In: 2019 7th International Conference on Information and Communication Technology (ICoICT), pp 1–6. IEEE
Czarnul P (2018) Parallelization of large vector similarity computations in a hybrid cpu+ gpu environment. J Supercomput 74(2):768–786
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 41931183, in part by the National Key Research and Development Program of China under Grant 2016YFB0200800, and in part by the National Key Scientific and Technological Infrastructure project “Earth System Science Numerical Simulator Facility” (Earth Lab).
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
YH helped in methodology, software, and writing—original draft; YW contributed to supervision, conceptualization, methodology, and writing—review and editing; XZ: Writing-original draft; XW, HZ, and JJ helped in writing—review and editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hong, Y., Wang, Y., Zhang, X. et al. A GPU-enabled acceleration algorithm for the CAM5 cloud microphysics scheme. J Supercomput 79, 17784–17809 (2023). https://doi.org/10.1007/s11227-023-05360-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05360-7