Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








358 Hits in 6.0 sec

Micro-Sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems

Jeongseob Ahn, Chang Hyun Park, Jaehyuk Huh
2014 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture  
However, for the guest operating system on a virtual machine (VM), such assumption cannot be guaranteed, since virtual CPUs of VMs share limited physical cores.  ...  Such a virtual time discontinuity problem leads to significant inefficiency for lock and interrupt handling, which rely on the immediate availability of CPUs whenever the operating system requires computation  ...  The dual insertion policies were originally proposed by Qureshi et. al for a single core cache [16] , and later extended for multi-core shared caches [8] .  ... 
doi:10.1109/micro.2014.49 dblp:conf/micro/AhnPH14 fatcat:tncat37lxbayploevptpnnaejq

Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures [article]

Chenhao Xie, Jieyang Chen, Jesun S Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin Barker, Mark Raugas, Ang Li
2020 arXiv   pre-print
Designing efficient and scalable sparse linear algebra kernels on modern multi-GPU based HPC systems is a daunting task due to significant irregular memory references and workload imbalance across the  ...  By applying these techniques, our experiments on the NVIDIA multi-GPU supernode V100-DGX-1 and DGX-2 systems demonstrate that our design can achieve on average 3.53x (up to 9.86x) speedup on a DGX-1 system  ...  In this case, determining optimal number of tasks becomes a new trade-off between fine-grained task scheduling and long kernel scheduling overhead.  ... 
arXiv:2012.06959v1 fatcat:am7guw7i5fchxafrkp34plwvky

Malthusian Locks [article]

Dave Dice
2017 arXiv   pre-print
We opportunistically leverage the existence of such locks by modifying the lock admission policy so as to intentionally limit the number of threads circulating over the lock in a given period.  ...  We borrow the concept of swapping from the field of memory management and intentionally impose concurrency restriction (CR) if a lock is oversubscribed.  ...  Bishop for useful discussions.  ... 
arXiv:1511.06035v7 fatcat:led5wcfcfnaoznmdxjrbhrxrwa

Hardware support for spin management in overcommitted virtual machines

Philip M. Wells, Koushik Chakraborty, Gurindar S. Sohi
2006 Proceedings of the 15th international conference on Parallel architectures and compilation techniques - PACT '06  
For example, most existing system VMs resort to gang scheduling a guest OS's virtual processors (VCPUs) to avoid OS synchronization overhead.  ...  We propose one such policy that logically partitions the CMP cores between guest VMs.  ...  We also thank Saisanthosh Balakrishnan and the anonymous reviewers for their comments on this paper.  ... 
doi:10.1145/1152154.1152176 dblp:conf/IEEEpact/WellsCS06 fatcat:32z2wvc7kzcubpzikzwqf2iptu

Latch-free Synchronization in Database Systems: Silver Bullet or Fool's Gold?

Jose M. Faleiro, Daniel J. Abadi
2017 Conference on Innovative Data Systems Research  
Recent research on multi-core database architectures has made the argument that, when possible, database systems should abandon the use of latches in favor of latch-free algorithms.  ...  Our findings indicate that the argument for latch-free algorithms' superior scalability is far more nuanced than the current state-of-the-art in multi-core database architectures suggests.  ...  We thank Joseph Hellerstein, Hideaki Kimura, Justin Levandoski, Ippokratis Pandis, Julian Shun, and the anonymous CIDR 2017 reviewers for their insightful comments on earlier versions of this paper.  ... 
dblp:conf/cidr/FaleiroA17 fatcat:ph7hhvys65fbjo3pp5ft3h7y6u

Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms

Kamesh Madduri, Eun-Jin Im, Khaled Z. Ibrahim, Samuel Williams, Stéphane Ethier, Leonid Oliker
2011 Parallel Computing  
Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by the HPC community.  ...  The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi-and manycore architectures as power and cooling constraints limit increases in microprocessor  ...  on a multi-chip module.  ... 
doi:10.1016/j.parco.2011.02.001 fatcat:dc6yotxbj5htteiv2z43o4ooje

Arachne: Core-Aware Thread Management

Henry Qin, Qian Li, Jacqueline Speiser, Peter Kraft, John K. Ousterhout
2018 USENIX Symposium on Operating Systems Design and Implementation  
to the Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX.  ...  Thanks to Collin Lee for giving feedback on design, and to Yilong Li for help with debugging the RAMCloud networking stack.  ...  This work was supported by C-FAR (one of six centers of STARnet, a Semiconductor Research Corporation program, sponsored by MARCO and DARPA) and by the industrial affiliates of the Stanford Platform Lab  ... 
dblp:conf/osdi/QinLSKO18 fatcat:x32cf2z37vaxrcdhirnihdcily

Multi-threaded Simulation of 4G Cellular Systems within the LTE-Sim Framework

A. Pellegrini, G. Piro
2013 2013 27th International Conference on Advanced Information Networking and Applications Workshops  
To bridge this gap, we have significantly upgraded the LTE-Sim framework by implementing a concurrent scheduling algorithm, namely the Multi-Master Scheduler, aimed at efficiently handling events in a  ...  In this context, multi-core/multi-processor simulation tools can accelerate their activities by drastically reducing the time required to simulate complex scenarios.  ...  Each processor has 8 CPU-cores (for a total of 32 CPU-cores) that share a 10MB L3 cache (5118KB per each 4-cores set), and each core has a 512KB private L2 cache.  ... 
doi:10.1109/waina.2013.202 dblp:conf/aina/PellegriniP13 fatcat:7jzewxnat5clpkgr3btb42ccpi

A concurrent dynamic analysis framework for multicore hardware

Jungwoo Ha, Matthew Arnold, Stephen M. Blackburn, Kathryn S. McKinley
2009 Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications - OOPSLA 09  
Since Moore's law is now delivering multiple cores instead of faster processors, future systems must either bear a relatively higher cost for abstractions or use some cores to help tolerate abstraction  ...  It introduces Cache-friendly Asymmetric Buffering (CAB), a lock-free ring-buffer that implements efficient communication between application and analysis threads.  ...  It uses a sentinel value (NULL) to avoid concurrent access of the queue head and tail indices, and forces a delay between the consumer and producer to avoid cache line thrashing.  ... 
doi:10.1145/1640089.1640101 dblp:conf/oopsla/HaABM09 fatcat:lzkkrc3w4ne4bjznhiwaie36jm

A concurrent dynamic analysis framework for multicore hardware

Jungwoo Ha, Matthew Arnold, Stephen M. Blackburn, Kathryn S. McKinley
2009 SIGPLAN notices  
Since Moore's law is now delivering multiple cores instead of faster processors, future systems must either bear a relatively higher cost for abstractions or use some cores to help tolerate abstraction  ...  It introduces Cache-friendly Asymmetric Buffering (CAB), a lock-free ring-buffer that implements efficient communication between application and analysis threads.  ...  It uses a sentinel value (NULL) to avoid concurrent access of the queue head and tail indices, and forces a delay between the consumer and producer to avoid cache line thrashing.  ... 
doi:10.1145/1639949.1640101 fatcat:i7dgkfveqnex3ms5bvefsqn3u4

Performance analysis of N-computing device under various load conditions

Snigdha Srivastava
2012 IOSR Journal of Computer Engineering  
We will create a multiuser environment on a uniprocessor system; this can be achieved, when there is a separate kernel for each user in the same operating system.  ...  In the present time, a personal computer has a very high processing power than required for a single user system.  ...  On a multiprocessor (including multi-core system), the threads or tasks will actually run at the same time, with each processor or core running a particular thread or task.  ... 
doi:10.9790/0661-0744652 fatcat:klqyxqbvyncpbcm4gh6rui4uvq

Memos: Revisiting Hybrid Memory Management in Modern Operating System [article]

Lei Liu, Mengyao Xie, Hao Yang
2017 arXiv   pre-print
Powered by our newly designed kernel-level monitoring module and page migration engine, memos can dynamically optimize the data placement at the memory hierarchy in terms of the on-line memory patterns  ...  In this paper, we introduce memos, which can schedule memory resources over the entire memory hierarchy including cache, channels, main memory comprising DRAM and NVM simultaneously.  ...  Furthermore, we deploy a practical emulation platform for hybrid DRAM-NVM on a real multi-core machine by using the channel-partitioning approach [35] .  ... 
arXiv:1703.07725v1 fatcat:wzdseqs4pbc7nozfejrllqszl4

A State-of-the-Art Survey on Real-Time Issues in Embedded Systems Virtualization

Zonghua Gu, Qingling Zhao
2012 Journal of Software Engineering and Applications  
We present a comprehensive survey on real-time issues in virtualization for embedded systems, covering popular virtualization systems including KVM, Xen, L4 and others.  ...  A State-of-the-Art Survey on Real-Time Issues in Embedded Systems Virtualization 278 that are not specific to any virtualization approach, but we believe they are of sufficient importance to dedicate separate  ...  [45] proposed a method for improving I/O performance of Xen on a multicore processor.  ... 
doi:10.4236/jsea.2012.54033 fatcat:iiqszwe3brhjxl4ycyf6ynbp2u

Factored operating systems (fos)

David Wentzlaff, Anant Agarwal
2009 ACM SIGOPS Operating Systems Review  
The next decade will afford us computer chips with 100's to 1,000's of cores on a single piece of silicon.  ...  Contemporary operating systems have been designed to operate on a single core or small number of cores and hence are not well suited to manage and provide operating system services at such large scale.  ...  We thank Robert Morris and Frans Kaashoek for feedback on this work. We also thank Charles Gruenwald for help on fos.  ... 
doi:10.1145/1531793.1531805 fatcat:vdak4y4dt5cavlcqj7s7q4p3bu

Building High-Performance Application Protocol Parsers on Multi-core Architectures

Kai Zhang, Junchang Wang, Bei Hua, Xinan Tang
2011 2011 IEEE 17th International Conference on Parallel and Distributed Systems  
Finally, an efficient parallel run-time system is built by employing lock-free design principles from top to bottom to support multi-threaded execution on multi-core processors.  ...  Multi-core architectures provide a viable solution for building high-performance parsers for application protocols.  ...  on a multi-core server.  ... 
doi:10.1109/icpads.2011.37 dblp:conf/icpads/ZhangWHT11 fatcat:t2vg3k5i5rbuvodtuqd4voipm4
« Previous Showing results 1 — 15 out of 358 results