Tightly Integrated Design Space Exploration with Spatial and Temporal Partitioning in SPARCS.

The CHARSTAR design is further optimized for balanced spatiotemporal reconfiguration and also enables efficient joint control of resource and frequency scaling. ... CHARSTAR, when deployed on the CRIB tiled microarchitecture, improves processor energy efficiency by 20-25%, with efficiency improvements of roughly 2x in comparison to a naive power gating mechanism. ... This work was supported in part by NSF grants CCF-1318298 and CCF-1615014. Mikko Lipasti has a financial interest in Thalchemy Corp. ...

doi:10.1145/3140659.3080212 fatcat:wylho4k46ffcxdm65xf4lrjvla

The differences in high-level synthesis technology between classical systems and dynamically reconfigurable systems are discussed. ... In this paper we survey the current state-of-the-art in high-level synthesis techniques for dynamically reconfigurable systems. ... Acknowledgements The work described in this paper was partly supported by ...

doi:10.1016/s0141-9331(00)00074-0 fatcat:j2udiqvfvja3xieopntaquluce

With time and space partitioned architectures becoming increasingly appealing to the European space sector, the dependability of separation kernel technology is a key factor to its applicability in European ... This paper explores the potential of the data type fault model, which injects faults through the Application Program Interface, in separation kernel robustness testing. ... Their contribution supported the development of the fault injection methodology described in this work. ...

doi:10.1109/cluster.2016.91 dblp:conf/cluster/GrixtiSHCMC16 fatcat:5bag72nbwfgsvhxjd66oexs5ba

In this paper, we assess the effectiveness of the PROXIMA's dynamic software randomisation (DSR) with a space industrial case study executed on a real unmodified hardware platform and an industrial operating ... Timing Validation and Verification (V&V) is an important step in real-time system design, in which a system's timing behaviour is assessed via Worst Case Execution Time (WCET) estimation and scheduling ... This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence. ...

doi:10.23919/date.2017.7926966 dblp:conf/date/CrosKWMABC17 fatcat:rebhu4ijbnfc5b3dq7tk2cxllm

We are now in the multicore revolution which is witnessing a rapid evolution of architectural designs due to power constraints and correspondingly limited microprocessor clock speeds. ... Our evaluated kernels are derived from two important numerical computations: a biological simulation of the heart using the Immersed Boundary method, and a Gyrokinetic Particle-in-Cell based application ... Additional support comes from Microsoft (Award #024263) and Intel (Award #024894) funding, and by matching funding by U.C. Discovery (Award #DIG07-10227). ...

doi:10.1109/tpds.2012.28 fatcat:ka67zev5grdspa3zjkcls325te

Global Energy and Water Cycle Experiment (GEWEX) News ... spatial (10-100 m) and temporal (1 week) scales necessary for answering key water cycle and water management questions. ... Therefore, it is essential that we produce an accurate accounting of the key reservoirs and fluxes associated with the global water and energy cycle, including their spatial and temporal variability, and ...

doi:10.5281/zenodo.7354616 fatcat:pdbuw5per5dwncoaggjobbufo4

Open Access

Taking advantage of spatial, temporal, and statistical redundancies in video data, a video compression system aims to maximize the compression ratio while maintaining a high picture quality. ... Despite the availability of fast single processors, parallel processing helps to explore advanced algorithms and to build more sophisticated systems. ... The basic coding structure of H.261 includes spatial and temporal coding schemes. ...

doi:10.1016/s0167-8191(02)00100-x fatcat:rmudbaphj5cfvowalnziuvwnwu

In particular, the partitioning boundary can be configured statically at design time and dynamically at runtime. ... In the experiments with three applications (matrix multiplication, 2D FFT, and H.264/AVC encoding), compared with the conventional DSM, our techniques show performance improvement up to 37.89%. ... Acknowledgment The research is partially supported by the National Natural Science Foundation of China (No. 61070036 and No. 61133007). ...

doi:10.1016/j.compeleceng.2012.04.009 fatcat:bagthuc4cfeevl6crm7vxszfui

with new techniques for memoization (i.e., spatial or temporal reuse of computation). ... | Variation in performance and power across manufactured parts and their operating conditions is an accepted reality in modern microelectronic manufacturing processes with geometries in nanometer scales ... Spatial parameter variations in the device geometries in conjunction with temporal degradation and undesirable fluctuations in the operating condition may prevent circuit from meeting the performance and ...

doi:10.1109/jproc.2016.2518864 fatcat:sxrsu3excbdg5p7sk4iczz262y

In other words, the scientific method must be adapted to bring machine learning into the picture, and make the best use of the massive amount of data we have produced, thanks to the advances in numerical ... However, the key to an effective use of machine learning tools in multi-physics problems, including combustion, is to couple them to physical and computer models. ... It allows for both spatial and temporal evolution of the flow. ...

arXiv:2209.02051v1 fatcat:qeqvqua5g5ehxcc2uhbeyryzm4

Open Access

These elementary operations will help in exploring and evaluating new memory models and consistency protocols. ... With the evolution toward fast networks of many-core processors, the design assumptions at the basis of software-level distributed shared memory (DSM) systems change considerably. ... Similar to hardware-based caches, they try to exploit spatial and temporal locality in the application's data access patterns. ...

doi:10.1007/978-3-319-14313-2_30 fatcat:ah6h3rubebh4zj52w4wfnhaz2m

In doing so, Footprint Cache eliminates the excessive off-chip traffic associated with page-based designs, while preserving their high hit ratio, small tag array overhead, and low lookup latency. ... Furthermore, such designs suffer from low hit ratios due to poor temporal locality. ... We next detail the predictor design and its integration with the tag array, and we further explain the prediction history management. ...

doi:10.1145/2485922.2485957 dblp:conf/isca/JevdjicVF13 fatcat:bl2twnnncjhpvmby7khhoj3lwy

In particular, we characterize the discrepancy to conventional parallel platforms with respect to hierarchical memory sub-systems, fine-grained parallelism on several system levels, and chip-and system-level ... Performance gains for data-and compute-intensive applications can currently only be achieved by exploiting coarse-and fine-grained parallelism on all system levels, and improved scalability with respect ... Acknowledgements The Shared Research Group 16-1 received financial support by the Concept for the Future of Karlsruhe Institute of Technology in the framework of the German Excellence Initiative and the ...

doi:10.1002/cpe.1904 fatcat:fwg2vjaobral3b2v46vq4x2c3q

We have successfully coupled a dynamic reconfigurable system to an SPARC-based multiprocessor and obtained performance gains of up to 40%, even for applications that show a great level of parallelism at ... As parallel applications are becoming increasingly present in embedded and general-purpose domains and multiprocessing systems must handle a wide range of different application classes, there is no consensus ... Results of area and power reduction are demonstrated when sharing temporally and spatially the reconfigurable fabric. ...

doi:10.1155/2011/546962 fatcat:rxidkhq36vgdthe4jvqvsrtq6y

DOAJ

The design trade-o s for this and similar algorithms are indicated. In the manner of codesign, eventual implementation on large-and ne-grained hardware is considered. ... Practical parallelizations of multi-phased low-level image-processing algorithms may require working in batch mode. ... A Pipeline Decomposition The design trade-o s between pipeline throughput, latency and algorithm decomposition for practical parallelizations are explored in 6] and this section makes use of the conceptual ...

doi:10.1007/bfb0002818 fatcat:d643keoqvvgkbce33uss6gum64

CHARSTAR

Preserved Fulltext

A review of high-level synthesis for dynamically reconfigurable FPGAs

Preserved Fulltext

Separation Kernel Robustness Testing: The XtratuM Case Study

Preserved Fulltext

Dynamic software randomisation: Lessons learnec from an aerospace case study

Preserved Fulltext

Optimization of Parallel Particle-to-Grid Interpolation on Leading Multicore Platforms

Preserved Fulltext

Global Energy and Water Cycle Experiment (GEWEX) News [article]

Preserved Fulltext

Video compression with parallel processing

Preserved Fulltext

Reducing Virtual-to-Physical address translation overhead in Distributed Shared Memory based multi-core Network-on-Chips according to data property

Preserved Fulltext

Variability Mitigation in Nanometer CMOS Integrated Systems: A Survey of Techniques From Circuits to Software

Preserved Fulltext

Advancing Reacting Flow Simulations with Data-Driven Models [article]

Preserved Fulltext

Shared Memory in the Many-Core Age [chapter]

Preserved Fulltext

Die-stacked DRAM caches for servers

Preserved Fulltext

A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators

Preserved Fulltext

Boosting Parallel Applications Performance on Applying DIM Technique in a Multiprocessing Environment

Preserved Fulltext

Karhünen-Loève transform: An exercise in simple image-processing parallel pipelines [chapter]

Preserved Fulltext