A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Micro-Sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems
2014
2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
However, for the guest operating system on a virtual machine (VM), such assumption cannot be guaranteed, since virtual CPUs of VMs share limited physical cores. ...
Such a virtual time discontinuity problem leads to significant inefficiency for lock and interrupt handling, which rely on the immediate availability of CPUs whenever the operating system requires computation ...
The dual insertion policies were originally proposed by Qureshi et. al for a single core cache [16] , and later extended for multi-core shared caches [8] . ...
doi:10.1109/micro.2014.49
dblp:conf/micro/AhnPH14
fatcat:tncat37lxbayploevptpnnaejq
Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures
[article]
2020
arXiv
pre-print
Designing efficient and scalable sparse linear algebra kernels on modern multi-GPU based HPC systems is a daunting task due to significant irregular memory references and workload imbalance across the ...
By applying these techniques, our experiments on the NVIDIA multi-GPU supernode V100-DGX-1 and DGX-2 systems demonstrate that our design can achieve on average 3.53x (up to 9.86x) speedup on a DGX-1 system ...
In this case, determining optimal number of tasks becomes a new trade-off between fine-grained task scheduling and long kernel scheduling overhead. ...
arXiv:2012.06959v1
fatcat:am7guw7i5fchxafrkp34plwvky
Malthusian Locks
[article]
2017
arXiv
pre-print
We opportunistically leverage the existence of such locks by modifying the lock admission policy so as to intentionally limit the number of threads circulating over the lock in a given period. ...
We borrow the concept of swapping from the field of memory management and intentionally impose concurrency restriction (CR) if a lock is oversubscribed. ...
Bishop for useful discussions. ...
arXiv:1511.06035v7
fatcat:led5wcfcfnaoznmdxjrbhrxrwa
Hardware support for spin management in overcommitted virtual machines
2006
Proceedings of the 15th international conference on Parallel architectures and compilation techniques - PACT '06
For example, most existing system VMs resort to gang scheduling a guest OS's virtual processors (VCPUs) to avoid OS synchronization overhead. ...
We propose one such policy that logically partitions the CMP cores between guest VMs. ...
We also thank Saisanthosh Balakrishnan and the anonymous reviewers for their comments on this paper. ...
doi:10.1145/1152154.1152176
dblp:conf/IEEEpact/WellsCS06
fatcat:32z2wvc7kzcubpzikzwqf2iptu
Latch-free Synchronization in Database Systems: Silver Bullet or Fool's Gold?
2017
Conference on Innovative Data Systems Research
Recent research on multi-core database architectures has made the argument that, when possible, database systems should abandon the use of latches in favor of latch-free algorithms. ...
Our findings indicate that the argument for latch-free algorithms' superior scalability is far more nuanced than the current state-of-the-art in multi-core database architectures suggests. ...
We thank Joseph Hellerstein, Hideaki Kimura, Justin Levandoski, Ippokratis Pandis, Julian Shun, and the anonymous CIDR 2017 reviewers for their insightful comments on earlier versions of this paper. ...
dblp:conf/cidr/FaleiroA17
fatcat:ph7hhvys65fbjo3pp5ft3h7y6u
Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms
2011
Parallel Computing
Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by the HPC community. ...
The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi-and manycore architectures as power and cooling constraints limit increases in microprocessor ...
on a multi-chip module. ...
doi:10.1016/j.parco.2011.02.001
fatcat:dc6yotxbj5htteiv2z43o4ooje
Arachne: Core-Aware Thread Management
2018
USENIX Symposium on Operating Systems Design and Implementation
to the Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX. ...
Thanks to Collin Lee for giving feedback on design, and to Yilong Li for help with debugging the RAMCloud networking stack. ...
This work was supported by C-FAR (one of six centers of STARnet, a Semiconductor Research Corporation program, sponsored by MARCO and DARPA) and by the industrial affiliates of the Stanford Platform Lab ...
dblp:conf/osdi/QinLSKO18
fatcat:x32cf2z37vaxrcdhirnihdcily
Multi-threaded Simulation of 4G Cellular Systems within the LTE-Sim Framework
2013
2013 27th International Conference on Advanced Information Networking and Applications Workshops
To bridge this gap, we have significantly upgraded the LTE-Sim framework by implementing a concurrent scheduling algorithm, namely the Multi-Master Scheduler, aimed at efficiently handling events in a ...
In this context, multi-core/multi-processor simulation tools can accelerate their activities by drastically reducing the time required to simulate complex scenarios. ...
Each processor has 8 CPU-cores (for a total of 32 CPU-cores) that share a 10MB L3 cache (5118KB per each 4-cores set), and each core has a 512KB private L2 cache. ...
doi:10.1109/waina.2013.202
dblp:conf/aina/PellegriniP13
fatcat:7jzewxnat5clpkgr3btb42ccpi
A concurrent dynamic analysis framework for multicore hardware
2009
Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications - OOPSLA 09
Since Moore's law is now delivering multiple cores instead of faster processors, future systems must either bear a relatively higher cost for abstractions or use some cores to help tolerate abstraction ...
It introduces Cache-friendly Asymmetric Buffering (CAB), a lock-free ring-buffer that implements efficient communication between application and analysis threads. ...
It uses a sentinel value (NULL) to avoid concurrent access of the queue head and tail indices, and forces a delay between the consumer and producer to avoid cache line thrashing. ...
doi:10.1145/1640089.1640101
dblp:conf/oopsla/HaABM09
fatcat:lzkkrc3w4ne4bjznhiwaie36jm
A concurrent dynamic analysis framework for multicore hardware
2009
SIGPLAN notices
Since Moore's law is now delivering multiple cores instead of faster processors, future systems must either bear a relatively higher cost for abstractions or use some cores to help tolerate abstraction ...
It introduces Cache-friendly Asymmetric Buffering (CAB), a lock-free ring-buffer that implements efficient communication between application and analysis threads. ...
It uses a sentinel value (NULL) to avoid concurrent access of the queue head and tail indices, and forces a delay between the consumer and producer to avoid cache line thrashing. ...
doi:10.1145/1639949.1640101
fatcat:i7dgkfveqnex3ms5bvefsqn3u4
Performance analysis of N-computing device under various load conditions
2012
IOSR Journal of Computer Engineering
We will create a multiuser environment on a uniprocessor system; this can be achieved, when there is a separate kernel for each user in the same operating system. ...
In the present time, a personal computer has a very high processing power than required for a single user system. ...
On a multiprocessor (including multi-core system), the threads or tasks will actually run at the same time, with each processor or core running a particular thread or task. ...
doi:10.9790/0661-0744652
fatcat:klqyxqbvyncpbcm4gh6rui4uvq
Memos: Revisiting Hybrid Memory Management in Modern Operating System
[article]
2017
arXiv
pre-print
Powered by our newly designed kernel-level monitoring module and page migration engine, memos can dynamically optimize the data placement at the memory hierarchy in terms of the on-line memory patterns ...
In this paper, we introduce memos, which can schedule memory resources over the entire memory hierarchy including cache, channels, main memory comprising DRAM and NVM simultaneously. ...
Furthermore, we deploy a practical emulation platform for hybrid DRAM-NVM on a real multi-core machine by using the channel-partitioning approach [35] . ...
arXiv:1703.07725v1
fatcat:wzdseqs4pbc7nozfejrllqszl4
A State-of-the-Art Survey on Real-Time Issues in Embedded Systems Virtualization
2012
Journal of Software Engineering and Applications
We present a comprehensive survey on real-time issues in virtualization for embedded systems, covering popular virtualization systems including KVM, Xen, L4 and others. ...
A State-of-the-Art Survey on Real-Time Issues in Embedded Systems Virtualization 278 that are not specific to any virtualization approach, but we believe they are of sufficient importance to dedicate separate ...
[45] proposed a method for improving I/O performance of Xen on a multicore processor. ...
doi:10.4236/jsea.2012.54033
fatcat:iiqszwe3brhjxl4ycyf6ynbp2u
Factored operating systems (fos)
2009
ACM SIGOPS Operating Systems Review
The next decade will afford us computer chips with 100's to 1,000's of cores on a single piece of silicon. ...
Contemporary operating systems have been designed to operate on a single core or small number of cores and hence are not well suited to manage and provide operating system services at such large scale. ...
We thank Robert Morris and Frans Kaashoek for feedback on this work. We also thank Charles Gruenwald for help on fos. ...
doi:10.1145/1531793.1531805
fatcat:vdak4y4dt5cavlcqj7s7q4p3bu
Building High-Performance Application Protocol Parsers on Multi-core Architectures
2011
2011 IEEE 17th International Conference on Parallel and Distributed Systems
Finally, an efficient parallel run-time system is built by employing lock-free design principles from top to bottom to support multi-threaded execution on multi-core processors. ...
Multi-core architectures provide a viable solution for building high-performance parsers for application protocols. ...
on a multi-core server. ...
doi:10.1109/icpads.2011.37
dblp:conf/icpads/ZhangWHT11
fatcat:t2vg3k5i5rbuvodtuqd4voipm4
« Previous
Showing results 1 — 15 out of 358 results