A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Algorithm and Architecture for a Low-Power Content-Addressable Memory Based on Sparse Clustered Networks
2015
IEEE Transactions on Very Large Scale Integration (vlsi) Systems
We propose a low-power content-addressable memory (CAM) employing a new algorithm for associativity between the input tag and the corresponding address of the output data. ...
Index Terms-Associative memory, content-addressable memory (CAM), low-power computing, recurrent neural networks, sparse clustered networks (SCNs). 1063-8210 He was a Visiting Scholar with the Research ...
INTRODUCTION A CONTENT-addressable memory (CAM) is a type of memory that can be accessed using its contents rather than an explicit address. ...
doi:10.1109/tvlsi.2014.2316733
fatcat:i3zdsmqedranlg4yaaoixq3gsq
A low-power Content-Addressable Memory based on clustered-sparse networks
2013
2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors
A low-power Content-Addressable-Memory (CAM) is introduced employing a new mechanism for associativity between the input tags and the corresponding address of the output data. ...
The proposed architecture is based on a recently developed clustered-sparse-network using binary-weighted connections that on-average will eliminate most of the parallel comparisons performed during a ...
., "Architecture and implementation of an associative memory using sparse clustered networks," 2012 IEEE International Symposium onCircuits and Systems (ISCAS), Seoul, Korea, 20-23 May 2012, pp. 2901-2904 ...
doi:10.1109/asap.2013.6567594
dblp:conf/asap/JarollahiGOG13
fatcat:kq2tvtdxindfxb4zhrxc7oct6u
SNE: an Energy-Proportional Digital Accelerator for Sparse Event-Based Convolutions
[article]
2022
arXiv
pre-print
Event-based sensors are drawing increasing attention due to their high temporal resolution, low power consumption, and low bandwidth. ...
To efficiently extract semantically meaningful information from sparse data streams produced by such sensors, we present a 4.5TOP/s/W digital accelerator capable of performing 4-bits-quantized event-based ...
The authors would like to thank Armasuisse Science and Technology for funding this research, and IniVation for kindly lending us a DVS camera. ...
arXiv:2204.10687v2
fatcat:6nq43tujy5dkxppnpxifwpgu6u
Networked Power-Gated MRAMs for Memory-Based Computing
2018
IEEE Transactions on Very Large Scale Integration (vlsi) Systems
The proposed architecture uses a Network-on-Chip (NoC) to interconnect MRAM-based clusters, processing elements, and managers. ...
This results in a low-power, highly scalable and configurable implementation of memory-based computing. ...
ACKNOWLEDGMENT This work was supported in Japan by JSPS KAKENHI Grant Numbers JP16H06300 and in France by the Region Bretagne CyAM project and by the MFC project of Future & Rupture IMT program. ...
doi:10.1109/tvlsi.2018.2856458
fatcat:jax77tbsufa5jlvgthpoo672li
Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra
[article]
2020
arXiv
pre-print
A matrix-vector implementation on a multi-core cluster is up to 5.8x faster and 2.7x more energy-efficient with our kernels than an optimized baseline. ...
Sparse-dense linear algebra is crucial in many domains, but challenging to handle efficiently on CPUs, GPUs, and accelerators alike; multiplications with sparse formats like CSR and CSF require indirect ...
[20] present an algorithm and tensor core modifications for efficient sparse neural network inference. ...
arXiv:2011.08070v2
fatcat:fmvliell4fc6joyox3invozd4e
A Case for Embedded FPGA-based SoCs in Energy-Efficient Acceleration of Graph Problems
2015
Supercomputing Frontiers and Innovations
A cluster of embedded SoCs (systems-on-chip) with closely-coupled FPGA accelerators can support distributed memory access with better matched low-power processing. ...
These bottlenecks on traditional x86-based systems mean that sparse graph problems scale very poorly, both in terms of performance and power efficiency. ...
The graph data is conventionally stored in a compressed sparse format (row based or column based), which is a memory storage optimization for sparse graph structures. ...
doi:10.14529/jsfi150307
fatcat:txw4k4jwx5hu7ck7ep3nykg6cy
A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference
2021
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
The main goal is to allow efficient processing of the DNNs on low-power micro-AI platforms without compromising hardware resources and accuracy. ...
Deep neural networks (DNNs) are being prototyped for a variety of artificial intelligence (AI) tasks including computer vision, data analytics, robotics, etc. ...
Cluster pruning [43] first optimizes the filter cluster size based on accuracy and inference latency changes, then prunes the clusters to maximize the hardware performance. ...
doi:10.1109/jetcas.2021.3129415
fatcat:nknpy4eernaeljz2hpqafe7sja
The Versatile Image Processor V. I. P. (Hardware Design)
1992
IAPR International Workshop on Machine Vision Applications
The processors operate concurrently on cluster and system shared memories through the parallel busses and exchange messages among crates and with a host system through the serial network. ...
Video Rus and a serial network. ...
F.Piccirelli and to Mr C. Granuzzo, Project Managers of TECNINT for the continuous and Fruitful support of ideas and technical solutions to the development of VIP prototype. ...
dblp:conf/mva/GugliottaM92
fatcat:nkorxadqz5cmtlertw4borg2zi
Customized Video Summarization with Thumbnail Containers and 2D CNN
2024
International Journal of Advanced Research in Science, Communication and Technology
This framework creates a custom keyshot summary for two or more concurrent users by leveraging the computing power of the user's device. ...
This project focuses on acquiring customized video summaries using thumbnail container-based summarization framework and 2D CNN model to select and extract specific features from thumbnails. ...
Methodologies employed encompass feature extraction with CNNs and HEVC, dimensionality reduction using Sparse Autoencoder (SAE) and Stepwise Regression, frame elimination techniques based on low-level ...
doi:10.48175/ijarsct-15397
fatcat:24vrdln6ojbhxi2iqmzfgkajo4
Custom FPGA-based soft-processors for sparse graph acceleration
2015
2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
FPGA-based soft processors customized for operations on sparse graphs can deliver significant performance improvements over conventional organizations (ARMv7 CPUs) for bulk synchronous sparse graph algorithms ...
We interconnect a 2D array of these lightweight processors with a packet-switched network-on-chip to enable fine-grained operand routing along the graph edges and provide custom send/receive instructions ...
graph algorithm functions, and (3) data memory contents (graph memory) along with a template for the 2D NoC. ...
doi:10.1109/asap.2015.7245698
dblp:conf/asap/Kapre15
fatcat:mqos2rxf4zdkxcsq2hf6q3xji4
Ternary CAM Memory Design using MOS Transistors
2020
International Journal for Research in Applied Science and Engineering Technology
Ternary content addressable memory (TCAM) is a high performance search engine which accesses the data based on its contents in a single clock cycle. ...
Ternary content addressable memory is a hardware search engine that is much faster than modified algorithmic approaches for search intensive applications. ...
Area consumption is depends on optimization of MOS transistors count level. Sparse clustered networks uses content addressable memories [7] .
III. ...
doi:10.22214/ijraset.2020.30252
fatcat:24ubdxgverch5mgvvaqveyzrvi
A Unified Programmable Edge Matrix Processor for Deep Neural Networks and Matrix Algebra
2022
ACM Transactions on Embedded Computing Systems
for low power and cost-sensitive Edge deployments. ...
, sparse Deep Neural Network (DNN), Cholesky decomposition, and triangular matrix solve respectively. ...
Ports are mapped to a programmable base address for lexibility and transactions are spawned from tiling stage based on matrix operations. ...
doi:10.1145/3524453
fatcat:miqhwzep3fey5admehib4md5ly
Parallel Phase Model: A Programming Model for High-end Parallel Machines with Manycores
2009
2009 International Conference on Parallel Processing
This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster ...
PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster ...
high-end parallel machines have been based on a distributed memory cluster architecture consisting of a networked cluster of commodity processors, each with its own memory. ...
doi:10.1109/icpp.2009.69
dblp:conf/icpp/BrightwellHWW09
fatcat:cosq32nilrbv5bkom7usbczkbi
Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster
2015
2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines
In this paper, we prototype a 32-node cluster composed from these Zynq SoC chips to accelerate communication-bound sparse graphoriented applications such as neural network simulations. ...
Commodity SoCs with hybrid architectures that combine CPUs with programmable FPGA fabric such as the Xilinx Zynq SoC have become a competitive energy-efficient platform for addressing irregular parallelism ...
To address this potential, we focus on the ARM SoC capabilities in Figure 3 and show the various system throughputs for the on-chip memories, offchip memories, CPU-FPGA links and the network interfaces ...
doi:10.1109/fccm.2015.37
dblp:conf/fccm/MoorthyK15
fatcat:7iapt5jzp5fmddhu3fn7oxs5ci
Sense: Model Hardware Co-design for Accelerating Sparse CNN on Systolic Array
[article]
2022
arXiv
pre-print
Thus, this paper proposed a systolicarray-based architecture, called Sense, for sparse CNN acceleration by model-hardware co-design, achieving large performance improvement. ...
Meanwhile, systolic array has been increasingly competitive on CNNs acceleration for its high spatiotemporal locality and low hardware overhead. ...
Finally, a mapping algorithm is designed to map various networks on our architecture, based on provided network parameters.
A. ...
arXiv:2202.00389v2
fatcat:szrnivmxr5bglgavg5qqlt6wci
« Previous
Showing results 1 — 15 out of 12,800 results