Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








4,276 Hits in 3.4 sec

A Survey of Multi-Tenant Deep Learning Inference on GPU [article]

Fuxun Yu, Di Wang, Longfei Shangguan, Minjia Zhang, Chenchen Liu, Xiang Chen
2022 arXiv   pre-print
This survey aims to summarize and categorize the emerging challenges and optimization opportunities for multi-tenant DL inference on GPU.  ...  With such strong computing scaling of GPUs, multi-tenant deep learning inference by co-locating multiple DL models onto the same GPU becomes widely deployed to improve resource utilization, enhance serving  ...  However, as we introduced before, achieving fine-grained resource partitioning is non-achievable until recently GPU vendors release a series of resource sharing and partitioning support like multistreams  ... 
arXiv:2203.09040v3 fatcat:utvpoyvvajfhfghgpf45nxnbne

FULL-W2V: Fully Exploiting Data Reuse for W2V on GPU-Accelerated Systems [article]

Thomas Randall, Tyler Allen, Rong Ge
2023 arXiv   pre-print
Our prototype implementation achieves 2.97X speedup when ported from Nvidia Pascal P100 to Volta V100 cards, and outperforms the state-of-the-art by 5.72X on V100 cards with the same embedding quality.  ...  In-depth analysis indicates that the reduction of memory accesses through register and shared memory caching and high-throughput shared memory reduction leads to a significantly improved arithmetic intensity  ...  ACKNOWLEDGEMENTS This work is supported in part by the U.S. National Science Foundation under Grants CCF-1551511 and CNS-1551262.  ... 
arXiv:2312.07743v1 fatcat:36ixvocghbdjnm6ioogsqjkefy

Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications [article]

Peifeng Yu, Mosharaf Chowdhury
2019 arXiv   pre-print
However, unlike traditional resources such as CPU or the network, modern GPUs do not natively support fine-grained sharing primitives.  ...  Salus implements an efficient, consolidated execution service that exposes the GPU to different DL applications, and enforces fine-grained sharing by performing iteration scheduling and addressing associated  ...  Prior works on fine-grained GPU sharing fall into several categories.  ... 
arXiv:1902.04610v1 fatcat:a4l66d2zcbd23jwwzdez2qitfq

ProvDeploy: Provenance-oriented Containerization of High Performance Computing Scientific Workflows [article]

Liliane Kunstmann, Débora Pina, Daniel de Oliveira, Marta Mattoso
2024 arXiv   pre-print
This complexity increases if the user needs to add provenance data capture services to the workflow.  ...  This manuscript introduces ProvDeploy to assist the user in configuring containers for scientific workflows with integrated provenance data capture.  ...  Acknowledgments This was supported in part by CNPq, FAPERJ, and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior -Brazil (CAPES) -Finance Code 001.  ... 
arXiv:2403.15324v2 fatcat:hd6gtot4azghvpi55znbclps74

IMPROVING PERFORMANCE IN HPC SYSTEM UNDER POWER CONSUMPTIONS LIMITATIONS

Muhammad Usman Ashraf
2019 International Journal of Advanced Research in Computer Science  
However, the primary focus of this study is to analyse how to enhance performance under power consumption limitations for emerging technologies.  ...  Today's High-Performance Computing (HPC) systems require significant usage of "supercomputers" and extensiveparallel processing approaches for solving complicated computational tasks at the Petascale level  ...  This model achieves coarse grain parallelism through MPI and fine-grain parallelism through GPU computations.  ... 
doi:10.26483/ijarcs.v10i2.6397 fatcat:k3l3lk5kuzhnldn5b2qzkh4eia

Balanced Sparsity for Efficient DNN Inference on GPU [article]

Zhuliang Yao, Shijie Cao, Wencong Xiao, Chen Zhang, Lanshun Nie
2018 arXiv   pre-print
Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services.  ...  Experiment results show that balanced sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as fine-grained sparsity.  ...  Shijie Cao was partly supported by National Nature Science Foundation of China (No.61772159).  ... 
arXiv:1811.00206v4 fatcat:3yptunrdnzchlepiikqwszhizu

NURA

Sina Darabi, Negin Mahani, Hazhir Baxishi, Ehsan Yousefzadeh, Mohammad Sadrosadati, Hamid Sarbazi-Azad
2022 Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems  
This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensure fairness and Quality-of-Service (QoS).  ...  Some pieces of prior work (e.g. spatial multitasking) have limited opportunity to improve resource utilization, while others, e.g. simultaneous multi-kernel, provide fine-grained resource sharing at the  ...  To ensure the quality of service (QoS) of the primary kernel, we slightly modify the warp scheduler to always prioritize CTAs of the primary kernel over the CTAs of the Helper Kernel.  ... 
doi:10.1145/3489048.3522656 fatcat:xcmtppre3rer3etvsjnrvzjtei

Balanced Sparsity for Efficient DNN Inference on GPU

Zhuliang Yao, Shijie Cao, Wencong Xiao, Chen Zhang, Lanshun Nie
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services.  ...  Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation.  ...  Shijie Cao was partly supported by National Nature Science Foundation of China (No.61772159).  ... 
doi:10.1609/aaai.v33i01.33015676 fatcat:uq6pbptj5bayhbhkmng3fzy2xi

ShareRender

Wei Zhang, Xiaofei Liao, Peng Li, Hai Jin, Li Lin
2017 Proceedings of the 2017 ACM on Multimedia Conference - MM '17  
Thanks to the flexible workload assignment among multiple render agents, ShareRender enables fine-grained resource sharing at the frame-level to significantly improve GPU utilization.  ...  For each game running in a VM, ShareRender starts a graphics wrapper to intercept frame rendering requests and assign them to render agents responsible for frame rendering on GPUs.  ...  CONCLUSION In this paper, we present the ShareRender, a cloud gaming system bypasses GPU virtualization and enables fine-grained resource sharing in cloud gaming.  ... 
doi:10.1145/3123266.3123306 dblp:conf/mm/ZhangLLJL17 fatcat:ytpngwkigfdtjdi34a5fvo3dju

Guest Editorial: Big Traffic Data Analysis and Mining

2018 IET Intelligent Transport Systems  
The authors propose a solution called DRPRS for fine-grained pedestrian recognition using deep learning techniques supported by stream processing from Apache Storm.  ...  Timetable performance evaluation is critical for improving the train service quality.  ...  The authors propose a solution called DRPRS for fine-grained pedestrian recognition using deep learning techniques supported by stream processing from Apache Storm.  ... 
doi:10.1049/iet-its.2018.0116 fatcat:jjxp3lvyz5d6bho5irovoolk6y

Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision [article]

Wei Gao, Qinghao Hu, Zhisheng Ye, Peng Sun, Xiaolin Wang, Yingwei Luo, Tianwei Zhang, Yonggang Wen
2022 arXiv   pre-print
However, traditional approaches designed for big data or high performance computing workloads can not support DL workloads to fully utilize the GPU resources.  ...  Recently, substantial schedulers are proposed to tailor for DL workloads in GPU datacenters. This paper surveys existing research efforts for both training and inference workloads.  ...  Analogously, Liquid [47] also supports fine-grained GPU sharing for further resource utilization improvement using a random forest model.  ... 
arXiv:2205.11913v3 fatcat:fnbinueyijb4nc75fpzd6hzjgq

Construction of College Students' Mental Health Education Model Based on Data Analysis

Fengxia Tang, Sheng Bin
2022 Scientific Programming  
This paper presents an in-depth study and analysis of the model of college students' mental health education using fine-grained parallel computational programming.  ...  for future intervention studies.  ...  Acknowledgments e study was supported by the Anqing Normal University.  ... 
doi:10.1155/2022/7044526 fatcat:6ngexqt4snbrjpd43m3rqx2era

Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service [article]

Baolin Li, Siddharth Samsi, Vijay Gadepally, Devesh Tiwari
2023 arXiv   pre-print
We introduce Clover, a carbon-friendly ML inference service runtime system that balances performance, accuracy, and carbon emissions through mixed-quality models and GPU resource partitioning.  ...  This paper presents a solution to the challenge of mitigating carbon emissions from hosting large-scale machine learning (ML) inference services.  ...  This is because fine-grained partitioning allows a higher degree of hardware sharing, and hence, better resource utilization. This leads to lower carbon emissions per request.  ... 
arXiv:2304.09781v1 fatcat:epbri7k5hnhptprsibgxsviop4

SLA-Driven ML Inference Framework for Clouds with Hetergeneous Accelerators

Junguk Cho, Diman Zad Tootaghaj, Lianjie Cao, Puneet Sharma
2022 Conference on Machine Learning and Systems  
In addition, our framework enables efficient shares of GPU accelerators with multiple functions to increase resource efficiency with minimal overhead.  ...  This homogeneity assumption causes two challenges in running ML workloads like Deep Neural Network (DNN) inference services on these frameworks.  ...  ACKNOWLEDGEMENT We thank the anonymous reviewers for their feedback on earlier drafts of this paper. We wish to thank Eric Wu in Hewlett Packard Labs for his support in setting up the testbed.  ... 
dblp:conf/mlsys/ChoTCS22 fatcat:uxfzaro2lza3ti7bfhe3onhqcq

Midpoint routing algorithms for Delaunay triangulations

Weisheng Si, Albert Y. Zomaya
2010 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)  
While this clustering solution is gaining momentum in recent years, efficient runtime support for fine-grained object sharing over the distributed JVM remains a challenge.  ...  run-time overheads of fine-grained threading.  ...  We map these fine-grain computations onto multithreaded GPUs in such a way that the processing cost per element is shown to be close to the best possible.  ... 
doi:10.1109/ipdps.2010.5470471 dblp:conf/ipps/SiZ10 fatcat:yuchdc4zp5borm5vs7j4rqgmzy
« Previous Showing results 1 — 15 out of 4,276 results