A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Survey of Multi-Tenant Deep Learning Inference on GPU
[article]
2022
arXiv
pre-print
This survey aims to summarize and categorize the emerging challenges and optimization opportunities for multi-tenant DL inference on GPU. ...
With such strong computing scaling of GPUs, multi-tenant deep learning inference by co-locating multiple DL models onto the same GPU becomes widely deployed to improve resource utilization, enhance serving ...
However, as we introduced before, achieving fine-grained resource partitioning is non-achievable until recently GPU vendors release a series of resource sharing and partitioning support like multistreams ...
arXiv:2203.09040v3
fatcat:utvpoyvvajfhfghgpf45nxnbne
FULL-W2V: Fully Exploiting Data Reuse for W2V on GPU-Accelerated Systems
[article]
2023
arXiv
pre-print
Our prototype implementation achieves 2.97X speedup when ported from Nvidia Pascal P100 to Volta V100 cards, and outperforms the state-of-the-art by 5.72X on V100 cards with the same embedding quality. ...
In-depth analysis indicates that the reduction of memory accesses through register and shared memory caching and high-throughput shared memory reduction leads to a significantly improved arithmetic intensity ...
ACKNOWLEDGEMENTS This work is supported in part by the U.S. National Science Foundation under Grants CCF-1551511 and CNS-1551262. ...
arXiv:2312.07743v1
fatcat:36ixvocghbdjnm6ioogsqjkefy
Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications
[article]
2019
arXiv
pre-print
However, unlike traditional resources such as CPU or the network, modern GPUs do not natively support fine-grained sharing primitives. ...
Salus implements an efficient, consolidated execution service that exposes the GPU to different DL applications, and enforces fine-grained sharing by performing iteration scheduling and addressing associated ...
Prior works on fine-grained GPU sharing fall into several categories. ...
arXiv:1902.04610v1
fatcat:a4l66d2zcbd23jwwzdez2qitfq
ProvDeploy: Provenance-oriented Containerization of High Performance Computing Scientific Workflows
[article]
2024
arXiv
pre-print
This complexity increases if the user needs to add provenance data capture services to the workflow. ...
This manuscript introduces ProvDeploy to assist the user in configuring containers for scientific workflows with integrated provenance data capture. ...
Acknowledgments This was supported in part by CNPq, FAPERJ, and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior -Brazil (CAPES) -Finance Code 001. ...
arXiv:2403.15324v2
fatcat:hd6gtot4azghvpi55znbclps74
IMPROVING PERFORMANCE IN HPC SYSTEM UNDER POWER CONSUMPTIONS LIMITATIONS
2019
International Journal of Advanced Research in Computer Science
However, the primary focus of this study is to analyse how to enhance performance under power consumption limitations for emerging technologies. ...
Today's High-Performance Computing (HPC) systems require significant usage of "supercomputers" and extensiveparallel processing approaches for solving complicated computational tasks at the Petascale level ...
This model achieves coarse grain parallelism through MPI and fine-grain parallelism through GPU computations. ...
doi:10.26483/ijarcs.v10i2.6397
fatcat:k3l3lk5kuzhnldn5b2qzkh4eia
Balanced Sparsity for Efficient DNN Inference on GPU
[article]
2018
arXiv
pre-print
Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. ...
Experiment results show that balanced sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as fine-grained sparsity. ...
Shijie Cao was partly supported by National Nature Science Foundation of China (No.61772159). ...
arXiv:1811.00206v4
fatcat:3yptunrdnzchlepiikqwszhizu
NURA
2022
Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems
This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensure fairness and Quality-of-Service (QoS). ...
Some pieces of prior work (e.g. spatial multitasking) have limited opportunity to improve resource utilization, while others, e.g. simultaneous multi-kernel, provide fine-grained resource sharing at the ...
To ensure the quality of service (QoS) of the primary kernel, we slightly modify the warp scheduler to always prioritize CTAs of the primary kernel over the CTAs of the Helper Kernel. ...
doi:10.1145/3489048.3522656
fatcat:xcmtppre3rer3etvsjnrvzjtei
Balanced Sparsity for Efficient DNN Inference on GPU
2019
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. ...
Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. ...
Shijie Cao was partly supported by National Nature Science Foundation of China (No.61772159). ...
doi:10.1609/aaai.v33i01.33015676
fatcat:uq6pbptj5bayhbhkmng3fzy2xi
ShareRender
2017
Proceedings of the 2017 ACM on Multimedia Conference - MM '17
Thanks to the flexible workload assignment among multiple render agents, ShareRender enables fine-grained resource sharing at the frame-level to significantly improve GPU utilization. ...
For each game running in a VM, ShareRender starts a graphics wrapper to intercept frame rendering requests and assign them to render agents responsible for frame rendering on GPUs. ...
CONCLUSION In this paper, we present the ShareRender, a cloud gaming system bypasses GPU virtualization and enables fine-grained resource sharing in cloud gaming. ...
doi:10.1145/3123266.3123306
dblp:conf/mm/ZhangLLJL17
fatcat:ytpngwkigfdtjdi34a5fvo3dju
Guest Editorial: Big Traffic Data Analysis and Mining
2018
IET Intelligent Transport Systems
The authors propose a solution called DRPRS for fine-grained pedestrian recognition using deep learning techniques supported by stream processing from Apache Storm. ...
Timetable performance evaluation is critical for improving the train service quality. ...
The authors propose a solution called DRPRS for fine-grained pedestrian recognition using deep learning techniques supported by stream processing from Apache Storm. ...
doi:10.1049/iet-its.2018.0116
fatcat:jjxp3lvyz5d6bho5irovoolk6y
Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision
[article]
2022
arXiv
pre-print
However, traditional approaches designed for big data or high performance computing workloads can not support DL workloads to fully utilize the GPU resources. ...
Recently, substantial schedulers are proposed to tailor for DL workloads in GPU datacenters. This paper surveys existing research efforts for both training and inference workloads. ...
Analogously, Liquid [47] also supports fine-grained GPU sharing for further resource utilization improvement using a random forest model. ...
arXiv:2205.11913v3
fatcat:fnbinueyijb4nc75fpzd6hzjgq
Construction of College Students' Mental Health Education Model Based on Data Analysis
2022
Scientific Programming
This paper presents an in-depth study and analysis of the model of college students' mental health education using fine-grained parallel computational programming. ...
for future intervention studies. ...
Acknowledgments e study was supported by the Anqing Normal University. ...
doi:10.1155/2022/7044526
fatcat:6ngexqt4snbrjpd43m3rqx2era
Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service
[article]
2023
arXiv
pre-print
We introduce Clover, a carbon-friendly ML inference service runtime system that balances performance, accuracy, and carbon emissions through mixed-quality models and GPU resource partitioning. ...
This paper presents a solution to the challenge of mitigating carbon emissions from hosting large-scale machine learning (ML) inference services. ...
This is because fine-grained partitioning allows a higher degree of hardware sharing, and hence, better resource utilization. This leads to lower carbon emissions per request. ...
arXiv:2304.09781v1
fatcat:epbri7k5hnhptprsibgxsviop4
SLA-Driven ML Inference Framework for Clouds with Hetergeneous Accelerators
2022
Conference on Machine Learning and Systems
In addition, our framework enables efficient shares of GPU accelerators with multiple functions to increase resource efficiency with minimal overhead. ...
This homogeneity assumption causes two challenges in running ML workloads like Deep Neural Network (DNN) inference services on these frameworks. ...
ACKNOWLEDGEMENT We thank the anonymous reviewers for their feedback on earlier drafts of this paper. We wish to thank Eric Wu in Hewlett Packard Labs for his support in setting up the testbed. ...
dblp:conf/mlsys/ChoTCS22
fatcat:uxfzaro2lza3ti7bfhe3onhqcq
Midpoint routing algorithms for Delaunay triangulations
2010
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
While this clustering solution is gaining momentum in recent years, efficient runtime support for fine-grained object sharing over the distributed JVM remains a challenge. ...
run-time overheads of fine-grained threading. ...
We map these fine-grain computations onto multithreaded GPUs in such a way that the processing cost per element is shown to be close to the best possible. ...
doi:10.1109/ipdps.2010.5470471
dblp:conf/ipps/SiZ10
fatcat:yuchdc4zp5borm5vs7j4rqgmzy
« Previous
Showing results 1 — 15 out of 4,276 results