Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








9,371 Hits in 5.3 sec

Memory-aware Thread and Data Mapping for Hierarchical Multi-core Platforms

Eduardo Henrique Molina da Cruz, Marco Antonio Zanata Alves, Alexandre Carissimi, Philippe Olivier Alexandre Navaux, Christiane Pousa Ribeiro, Jean-François Méhaut
2012 International Journal of Networking and Computing  
The problem is even more important in multi-core machines with NUMA characteristics, since the remote access imposes high overhead, making them more sensitive to thread and data mapping.  ...  In this context, thread and data mapping are techniques that provide performance gains by improving the use of resources such as interconnections, main memory and cache memory.  ...  Acknowledgment This research has been partially supported by the CAPES under grant 4874-06-4 and CNPq.  ... 
doi:10.15803/ijnc.2.1_97 fatcat:pcbmir2eirc4dbobn47efmplcq

Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes

Rolf Rabenseifner, Georg Hager, Gabriele Jost
2009 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing  
Today most systems in high-performance computing (HPC) feature a hierarchical hardware design: Shared memory nodes with several multi-core CPUs are connected via a network infrastructure.  ...  Furthermore we show that machine topology has a significant impact on performance for all parallelization strategies and that topology awareness should be built into all applications in the future.  ...  Fruitful discussions with Rainer Keller and Gerhard Wellein are gratefully acknowledged.  ... 
doi:10.1109/pdp.2009.43 dblp:conf/pdp/RabenseifnerHJ09 fatcat:jqwqavp655bvpdkavvhk3so64y

Efficient simulation of agent-based models on multi-GPU and multi-core clusters

Brandon G. Aaby, Kalyan S. Perumalla, Sudip K. Seal
2010 Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques  
Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors.  ...  pthreads on multi-core processors.  ...  Accordingly, the United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paidup, irrevocable  ... 
doi:10.4108/icst.simutools2010.8822 dblp:conf/simutools/AabyPS10 fatcat:ewcei2izsnh4bbymwmqjhqwd4y

GRapid: A compilation and runtime framework for rapid prototyping of graph applications on many-core processors

Da Li, Srimat Chakradhar, Michela Becchi
2014 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)  
While compilation and runtime frameworks for parallelizing graph applications on multi-core CPUs exist, there is still a need for comparable frameworks for many-core devices.  ...  We propose GRapid: a compilation and runtime framework that generates efficient parallel implementations of generic graph applications for multi-core CPUs, NVIDIA GPUs and Intel Xeon Phi.  ...  Note that our framework also produces multi-threaded code for multi-core CPUs.  ... 
doi:10.1109/padsw.2014.7097806 dblp:conf/icpads/LiCB14 fatcat:getyhmefjfdlzbfsy4uawh46au

On the effectiveness of OpenMP teams for cluster-based many-core accelerators

Alessandro Capotondi, Andrea Marongiu
2016 2016 International Conference on High Performance Computing & Simulation (HPCS)  
Application developers are indeed required to manually deal with outlining code parts suitable for acceleration, parallelize them efficiently over many available cores, and orchestrate data transfers to  ...  should also take care of properly mapping the parallel computation so as to avoid poor data locality.  ...  In a scratchpad-based architecture, the master thread is typically responsible for bringing data in and out via DMA transfers, thus it is extremely important that the thread-to-core mapping follows a cluster-aware  ... 
doi:10.1109/hpcsim.2016.7568399 dblp:conf/ieeehpcs/CapotondiM16 fatcat:ipkzykxisncbvchukkpnl2b3lu

Exposing the Locality of Heterogeneous Memory Architectures to HPC Applications

Brice Goglin
2016 Proceedings of the Second International Symposium on Memory Systems - MEMSYS '16  
High-performance computing requires a deep knowledge of the hardware platform to fully exploit its computing power. The performance of data transfer between cores and memory is becoming critical.  ...  Indeed, tasks and data have to be carefully distributed on the computing and memory resources.  ...  ACKNOWLEDGMENTS We would like to thank Intel for providing us with hints for designing our new hwloc model.  ... 
doi:10.1145/2989081.2989115 dblp:conf/memsys/Goglin16 fatcat:eev2v2bomzcdri2gnsnfyn3fey

Architecture Aware Programming on Multi-Core Systems

M R, S.R. Sathe
2011 International Journal of Advanced Computer Science and Applications  
In this paper, we propose a programming approach for the algorithms running on shared memory multi-core systems by using blocking, which is a wellknown optimization technique coupled with parallel programming  ...  With the advent of multi-core architectures, we are facing the problem that is new to parallel computing, namely, the management of hierarchical caches.  ...  For a shared memory platform, all the cores on a single die share the same memory subsystem, and there is no direct support for binding the threads to the core using OpeMP.  ... 
doi:10.14569/ijacsa.2011.020615 fatcat:oxappnveqjetpoetnq2c5zfnzq

Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime

Chao Mei, Yanhua Sun, Gengbin Zheng, Eric J. Bohm, Laxmikant V. Kale, James C. Phillips, Chris Harrison
2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11  
We exploit node-aware techniques to optimize both the application and the underlying SMP runtime.  ...  Hierarchical load balancing is further exploited to scale NAMD to the full Jaguar PF Cray XT5 (224,076 cores) at Oak Ridge National Laboratory, both with and without PME full electrostatics, achieving  ...  Acknowledgments This work was supported in part by a NIH Grant PHS 5 P41 RR05969-04 for Molecular Dynamics, by NSF grant OCI-0725070 for Blue Waters deployment, by the Institute for Advanced Computing  ... 
doi:10.1145/2063384.2063466 dblp:conf/sc/MeiSZBKPH11 fatcat:jgkzpijiljgl3hxfhl2vevglyu

A memory-centric approach to enable timing-predictability within embedded many-core accelerators

Paolo Burgio, Andrea Marongiu, Paolo Valente, Marko Bertogna
2015 2015 CSI Symposium on Real-Time and Embedded Systems and Technologies (RTEST)  
There is an increasing interest among real-time systems architects for multi-and many-core accelerated platforms.  ...  In this paper, we study how the predictable execution model (PREM), a memory-aware approach to enable timing-predictability in realtime systems, can be successfully adopted on multi-and manycore heterogeneous  ...  With a memory-aware task mapping and scheduling algorithm in place 3 , it would be then possible to select which task to assign to this "unlucky" thread, reserving the higher priority threads for more  ... 
doi:10.1109/rtest.2015.7369851 fatcat:dz44lvdm5fffxkiwrjvfkbvtqy

Topology-Aware Mapping Techniques for Heterogeneous HPC Systems: A Systematic Survey

Saad B. Alotaibi, Fathy alboraei
2018 International Journal of Advanced Computer Science and Applications  
In this survey paper, we have studied various topology-aware mapping techniques and algorithms.  ...  Given that, the efficient topology-aware process mapping has become vital to efficiently optimize the data locality management in order to improve the system performance and energy consumption.  ...  [23] used the network/node architecture and graph embedding modules for mapping the application communication topology onto the multi-core clusters physical topology with multi-level networks.  ... 
doi:10.14569/ijacsa.2018.091045 fatcat:taeescyyjjej7pbqutm4kulagy

Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective [chapter]

François Broquedis, Nathalie Furmento, Brice Goglin, Raymond Namyst, Pierre-André Wacrenier
2009 Lecture Notes in Computer Science  
Our runtime, which is based on a multi-level thread scheduler combined with a NUMA-aware memory manager, converts this information into "scheduling hints" to solve thread/memory affinity issues.  ...  First experiments show that mixed solutions (migrating threads and data) outperform next-touch-based data distribution policies and open possibilities for new optimizations.  ...  These features enable memory-aware task and data placement but they remain expensive.  ... 
doi:10.1007/978-3-642-02303-3_7 fatcat:e4cc3wengncrnfrvcnbtmvbmtq

An Optimized Model for MapReduce Based on Hadoop

Zhang Hong, Wang Xiao-ming, Cao Jie, Ma Yan-hong, Guo Yi-rong, Wang Min
2016 TELKOMNIKA (Telecommunication Computing Electronics and Control)  
From the perspective of fine-grained parallel data processing, combined with Fork/Join framework,a parallel and multi-thread model,this paper optimizes MapReduce model and puts forward a MapReduce+Fork  ...  shared and distributed memory machines.  ...  Acknowledgements We acknowledge the support from various grant sources: the Natural Science Foundation of Gansu Province (Grant No.148RJZA019), the Scientific and Technological support program Foundation  ... 
doi:10.12928/telkomnika.v14i4.3606 fatcat:kcettjgtq5d3dkvad3d647dj3q

Structuring the execution of OpenMP applications for multicore architectures

Francois Broquedis, Olivier Aumage, Brice Goglin, Samuel Thibault, Pierre-Andr Wacrenier, Raymond Namyst
2010 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)  
ForestGOMP features a highlevel platform for developing and tuning portable threads schedulers.  ...  The now commonplace multi-core chips have introduced, by design, a deep hierarchy of memory and cache banks within parallel computers as a tradeoff between the user friendliness of shared memory on the  ...  CONCLUSION AND FUTURE WORK FORESTGOMP is a platform for executing and tuning OpenMP programs over hierarchical multicore architectures.  ... 
doi:10.1109/ipdps.2010.5470442 dblp:conf/ipps/BroquedisAGTWN10 fatcat:y3xatov5zvhn3op7b3ququbrom

NUMA-Aware DGEMM Based on 64-Bit ARMv8 Multicore Processors Architecture

Wei Zhang, Zihao Jiang, Zhiguang Chen, Nong Xiao, Yang Ou
2021 Electronics  
The critical enabler for NUMA-aware DGEMM is to leverage two levels of parallelism between and within nodes in a purely threaded implementation, which allows the task independence and data localization  ...  We present a NUMA-aware method to reduce the number of cross-die and cross-chip memory access events.  ...  Acknowledgments: The authors thank ZhiGuang Chen and Nong Xiao for their guidance and the server provided by Pengcheng Labs. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/electronics10161984 fatcat:mkevjicswjfpzcdrtsfarp2dnq

A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators

Rainer Buchty, Vincent Heuveline, Wolfgang Karl, Jan-Philipp Weiss
2011 Concurrency and Computation  
In particular, we characterize the discrepancy to conventional parallel platforms with respect to hierarchical memory sub-systems, fine-grained parallelism on several system levels, and chip-and system-level  ...  Performance gains for data-and compute-intensive applications can currently only be achieved by exploiting coarse-and fine-grained parallelism on all system levels, and improved scalability with respect  ...  Acknowledgements The Shared Research Group 16-1 received financial support by the Concept for the Future of Karlsruhe Institute of Technology in the framework of the German Excellence Initiative and the  ... 
doi:10.1002/cpe.1904 fatcat:fwg2vjaobral3b2v46vq4x2c3q
« Previous Showing results 1 — 15 out of 9,371 results