Simultaneous Multikernel: Fine-Grained Sharing of GPUs.

Some pieces of prior work (e.g. spatial multitasking) have limited opportunity to improve resource utilization, while others, e.g. simultaneous multi-kernel, provide fine-grained resource sharing at the ... In terms of fairness, NURA has almost similar results to spatial multitasking, while it outperforms simultaneous multi-kernel by 76%, on average. ... There are two known approaches of multitasking in GPUs: (1) spatial multi-tasking [3] , and (2) simultaneous multi-kernel (SMK) execution [2] . ...

doi:10.1145/3489048.3522656 fatcat:xcmtppre3rer3etvsjnrvzjtei

Recent proposals on multitasking GPUs have focused on either spatial multitasking, which partitions GPU resource at a streaming multiprocessor (SM) granularity, or simultaneous multikernel (SMK), which ... In this paper, we propose GPU Maestro that performs dynamic resource management for efficient utilization of multitasking GPUs. ... Acknowledgments We would like to thank the anonymous reviewers as well as the fellow members of CCCP research group for their valuable comments and feedbacks. ...

doi:10.1145/3093337.3037707 fatcat:7vikinfjtbbmnperrnxretampm

Recent proposals on multitasking GPUs have focused on either spatial multitasking, which partitions GPU resource at a streaming multiprocessor (SM) granularity, or simultaneous multikernel (SMK), which ... In this paper, we propose GPU Maestro that performs dynamic resource management for efficient utilization of multitasking GPUs. ... Acknowledgments We would like to thank the anonymous reviewers as well as the fellow members of CCCP research group for their valuable comments and feedbacks. ...

doi:10.1145/3093315.3037707 fatcat:5xjasiupnrcctp6oarqwrtbukm

Recent proposals on multitasking GPUs have focused on either spatial multitasking, which partitions GPU resource at a streaming multiprocessor (SM) granularity, or simultaneous multikernel (SMK), which ... In this paper, we propose GPU Maestro that performs dynamic resource management for efficient utilization of multitasking GPUs. ... Acknowledgments We would like to thank the anonymous reviewers as well as the fellow members of CCCP research group for their valuable comments and feedbacks. ...

doi:10.1145/3037697.3037707 dblp:conf/asplos/ParkPM17 fatcat:flmnbk4x3je4loearqmn2uzwkm

Recent proposals on multitasking GPUs have focused on either spatial multitasking, which partitions GPU resource at a streaming multiprocessor (SM) granularity, or simultaneous multikernel (SMK), which ... In this paper, we propose GPU Maestro that performs dynamic resource management for efficient utilization of multitasking GPUs. ... Acknowledgments We would like to thank the anonymous reviewers as well as the fellow members of CCCP research group for their valuable comments and feedbacks. ...

doi:10.1145/3093336.3037707 fatcat:xkqmepvakbh23gfyrnkjot6uge

In this article, we show that efficient resource sharing in GPGPU is possible without run-time profiling if resource usage characteristics of workloads are analyzed down to a fine-grained unit level. ... To determine the co-location of workloads, previous studies have shown that run-time performance profiling and dynamic relocation of workloads is necessary due to interference between workloads. ... ., device memory and shared memory. The coarse-grained utilization columns in the tables are calculated by the maximum value of the fine-grained utilizations within the same resource classifications. ...

doi:10.1109/access.2021.3132492 fatcat:y5yucr4qbbeevcmwkfpydzrqbe

DOAJ

With such strong computing scaling of GPUs, multi-tenant deep learning inference by co-locating multiple DL models onto the same GPU becomes widely deployed to improve resource utilization, enhance serving ... This survey aims to summarize and categorize the emerging challenges and optimization opportunities for multi-tenant DL inference on GPU. ... However, as we introduced before, achieving fine-grained resource partitioning is non-achievable until recently GPU vendors release a series of resource sharing and partitioning support like multistreams ...

arXiv:2203.09040v3 fatcat:utvpoyvvajfhfghgpf45nxnbne

Open Access Multiple Versions

Multikernel operating systems (OSs) were introduced to match the architectural characteristics of lightweight manycores. ... While several multikernel OS designs are possible, in this work we argue on one that is structured in asymmetric microkernel instances. ... Notwithstanding, due to a lack of context information, this is not enough for the kernel to either run a fine-grain data prefetch algorithm or a coherency protocol. ...

doi:10.1109/sbesc49506.2019.9046080 dblp:conf/sbesc/PennaSLCBFM19 fatcat:oe576cky2jfp3h4hynrszytmbq

A GPU consists of several StreamingMulti-processors (SMs) that collectively determine how shared resources are partitioned and accessed. ... However, neither scaling up nor scaling out can meet the scalability requirement of all applications running on a given GPU system, which inevitably results in performance degradation and resource under-utilization ... Dhar et al. proposed fine grained and coarse grained reconfigurations of SMs in GPUs in order to reduce the underutilization of resources and power consumption [15] . ...

arXiv:1911.03364v1 fatcat:tcmbakgikjhtdl2khxzei3zf74

The unrivaled computing capabilities of modern GPUs meet the demand of processing massive amounts of data seen in many application domains. ... Comparing to earlier studies of enabling concurrent tasks support on GPU such as MultiQx-GPU, we use a different approach that is to control the launching parameters of multiple GPU kernels as provided ... fine-grained context switches. ...

doi:10.1371/journal.pone.0214720 pmid:30990851 pmcid:PMC6467383 fatcat:4u2hmql235c4fkxajva5mcx6m4

DOAJ

We propose a fine-grained search space comprised of atomic blocks, a minimal search unit that is much smaller than the ones used in recent NAS algorithms. ... Instead of a search-and-retrain two-stage paradigm, our method simultaneously searches and trains the target architecture. ... This perspective enables a much larger and more fine-grained search space. ...

arXiv:1912.09640v2 fatcat:qyjyg33dkvc53pglyppjrnee44

Multiple Versions

Spatial multitasking in which independent applications co-execute on different sets of SMs is a promising solution to share GPU resources. ... Graphics processing units (GPUs) feature an increasing number of streaming multiprocessors (SMs) with each successive generation. ... We thank CalcUA for letting us use the NVIDIA P100 GPU. This work is supported by the European Research Council (ERC) Ad- ...

doi:10.1145/3205289.3205311 dblp:conf/ics/ZhaoWE18 fatcat:b4hzqrztwrabbjcydmgka3kqcq

A State-of-the-Art Survey on Real-Time Issues in Embedded Systems Virtualization 278 that are not specific to any virtualization approach, but we believe they are of sufficient importance to dedicate separate ... fine-grained control, e.g., an important task in the GPOS can be assigned a higher priority than a less important task in the RTOS. ... [75] presented a redesign of SPUMONE as a multikernel architecture. (Multikernel means that a separate copy of the kernel or VMM runs on each core on a multicore processor.) ...

doi:10.4236/jsea.2012.54033 fatcat:iiqszwe3brhjxl4ycyf6ynbp2u

Open Access

Currently, the decision on the simultaneous execution of kernels is performed by the hardware, which can lead to unreasonable use of resources. ... In this work, we tackle the problem of co-scheduling for GPUs in high competition scenarios. ... [23] proposed Simultaneous Multikernel (SMK) that allows fine-grain sharing within each SM, kernels with complimentary resource usage are co-scheduled in the same SM to achieve resource fairness and ...

doi:10.1109/hipc50609.2020.00033 fatcat:piwx3puyyfdhpbdtxkk7hpzllq

We show that EMRF satisfies fairness properties of sharing incentive, envy-freeness and pareto efficiency. ... Heterogeneous computing poses new challenging issues on the fair allocation of computational resources among users due to the availability of different kinds of computing devices (e.g., CPU and GPU). ... Shuhao Zhang's work is partially funded by the Economic Development Board and the National Research Foundation of Singapore. ...

doi:10.1109/sc.2016.74 dblp:conf/sc/TangHZN16 fatcat:l4hcko577favfa35c7jyxudvka

NURA

Preserved Fulltext

Dynamic Resource Management for Efficient Utilization of Multitasking GPUs

Preserved Fulltext

Dynamic Resource Management for Efficient Utilization of Multitasking GPUs

Preserved Fulltext

Dynamic Resource Management for Efficient Utilization of Multitasking GPUs

Preserved Fulltext

Dynamic Resource Management for Efficient Utilization of Multitasking GPUs

Preserved Fulltext

Characterizing Fine-Grained Resource Utilization for Multitasking GPGPU in Cloud Systems

Preserved Fulltext

A Survey of Multi-Tenant Deep Learning Inference on GPU [article]

Preserved Fulltext

Other Versions

On the Performance and Isolation of Asymmetric Microkernel Design for Lightweight Manycores

Preserved Fulltext

AMOEBA: A Coarse Grained Reconfigurable Architecture for Dynamic GPU Scaling [article]

Preserved Fulltext

Concurrent query processing in a GPU-based database system

Preserved Fulltext

AtomNAS: Fine-Grained End-to-End Neural Architecture Search [article]

Preserved Fulltext

Other Versions

Classification-Driven Search for Effective SM Partitioning in Multitasking GPUs

Preserved Fulltext

A State-of-the-Art Survey on Real-Time Issues in Embedded Systems Virtualization

Preserved Fulltext

Algorithms for Preemptive Co-scheduling of Kernels on GPUs

Preserved Fulltext

Elastic Multi-resource Fairness: Balancing Fairness and Efficiency in Coupled CPU-GPU Architectures

Preserved Fulltext