Algorithm and Architecture for a Low-Power Content-Addressable Memory Based on Sparse Clustered Networks.

We propose a low-power content-addressable memory (CAM) employing a new algorithm for associativity between the input tag and the corresponding address of the output data. ... Index Terms-Associative memory, content-addressable memory (CAM), low-power computing, recurrent neural networks, sparse clustered networks (SCNs). 1063-8210 He was a Visiting Scholar with the Research ... INTRODUCTION A CONTENT-addressable memory (CAM) is a type of memory that can be accessed using its contents rather than an explicit address. ...

doi:10.1109/tvlsi.2014.2316733 fatcat:i3zdsmqedranlg4yaaoixq3gsq

A low-power Content-Addressable-Memory (CAM) is introduced employing a new mechanism for associativity between the input tags and the corresponding address of the output data. ... The proposed architecture is based on a recently developed clustered-sparse-network using binary-weighted connections that on-average will eliminate most of the parallel comparisons performed during a ... ., "Architecture and implementation of an associative memory using sparse clustered networks," 2012 IEEE International Symposium onCircuits and Systems (ISCAS), Seoul, Korea, 20-23 May 2012, pp. 2901-2904 ...

doi:10.1109/asap.2013.6567594 dblp:conf/asap/JarollahiGOG13 fatcat:kq2tvtdxindfxb4zhrxc7oct6u

Multiple Versions

Event-based sensors are drawing increasing attention due to their high temporal resolution, low power consumption, and low bandwidth. ... To efficiently extract semantically meaningful information from sparse data streams produced by such sensors, we present a 4.5TOP/s/W digital accelerator capable of performing 4-bits-quantized event-based ... The authors would like to thank Armasuisse Science and Technology for funding this research, and IniVation for kindly lending us a DVS camera. ...

arXiv:2204.10687v2 fatcat:6nq43tujy5dkxppnpxifwpgu6u

Open Access Multiple Versions

The proposed architecture uses a Network-on-Chip (NoC) to interconnect MRAM-based clusters, processing elements, and managers. ... This results in a low-power, highly scalable and configurable implementation of memory-based computing. ... ACKNOWLEDGMENT This work was supported in Japan by JSPS KAKENHI Grant Numbers JP16H06300 and in France by the Region Bretagne CyAM project and by the MFC project of Future & Rupture IMT program. ...

doi:10.1109/tvlsi.2018.2856458 fatcat:jax77tbsufa5jlvgthpoo672li

A matrix-vector implementation on a multi-core cluster is up to 5.8x faster and 2.7x more energy-efficient with our kernels than an optimized baseline. ... Sparse-dense linear algebra is crucial in many domains, but challenging to handle efficiently on CPUs, GPUs, and accelerators alike; multiplications with sparse formats like CSR and CSF require indirect ... [20] present an algorithm and tensor core modifications for efficient sparse neural network inference. ...

arXiv:2011.08070v2 fatcat:fmvliell4fc6joyox3invozd4e

Multiple Versions

A cluster of embedded SoCs (systems-on-chip) with closely-coupled FPGA accelerators can support distributed memory access with better matched low-power processing. ... These bottlenecks on traditional x86-based systems mean that sparse graph problems scale very poorly, both in terms of performance and power efficiency. ... The graph data is conventionally stored in a compressed sparse format (row based or column based), which is a memory storage optimization for sparse graph structures. ...

doi:10.14529/jsfi150307 fatcat:txw4k4jwx5hu7ck7ep3nykg6cy

DOAJ OJS

The main goal is to allow efficient processing of the DNNs on low-power micro-AI platforms without compromising hardware resources and accuracy. ... Deep neural networks (DNNs) are being prototyped for a variety of artificial intelligence (AI) tasks including computer vision, data analytics, robotics, etc. ... Cluster pruning [43] first optimizes the filter cluster size based on accuracy and inference latency changes, then prunes the clusters to maximize the hardware performance. ...

doi:10.1109/jetcas.2021.3129415 fatcat:nknpy4eernaeljz2hpqafe7sja

The processors operate concurrently on cluster and system shared memories through the parallel busses and exchange messages among crates and with a host system through the serial network. ... Video Rus and a serial network. ... F.Piccirelli and to Mr C. Granuzzo, Project Managers of TECNINT for the continuous and Fruitful support of ideas and technical solutions to the development of VIP prototype. ...

dblp:conf/mva/GugliottaM92 fatcat:nkorxadqz5cmtlertw4borg2zi

This framework creates a custom keyshot summary for two or more concurrent users by leveraging the computing power of the user's device. ... This project focuses on acquiring customized video summaries using thumbnail container-based summarization framework and 2D CNN model to select and extract specific features from thumbnails. ... Methodologies employed encompass feature extraction with CNNs and HEVC, dimensionality reduction using Sparse Autoencoder (SAE) and Stepwise Regression, frame elimination techniques based on low-level ...

doi:10.48175/ijarsct-15397 fatcat:24vrdln6ojbhxi2iqmzfgkajo4

FPGA-based soft processors customized for operations on sparse graphs can deliver significant performance improvements over conventional organizations (ARMv7 CPUs) for bulk synchronous sparse graph algorithms ... We interconnect a 2D array of these lightweight processors with a packet-switched network-on-chip to enable fine-grained operand routing along the graph edges and provide custom send/receive instructions ... graph algorithm functions, and (3) data memory contents (graph memory) along with a template for the 2D NoC. ...

doi:10.1109/asap.2015.7245698 dblp:conf/asap/Kapre15 fatcat:mqos2rxf4zdkxcsq2hf6q3xji4

Ternary content addressable memory (TCAM) is a high performance search engine which accesses the data based on its contents in a single clock cycle. ... Ternary content addressable memory is a hardware search engine that is much faster than modified algorithmic approaches for search intensive applications. ... Area consumption is depends on optimization of MOS transistors count level. Sparse clustered networks uses content addressable memories [7] . III. ...

doi:10.22214/ijraset.2020.30252 fatcat:24ubdxgverch5mgvvaqveyzrvi

Open Access

for low power and cost-sensitive Edge deployments. ... , sparse Deep Neural Network (DNN), Cholesky decomposition, and triangular matrix solve respectively. ... Ports are mapped to a programmable base address for lexibility and transactions are spawned from tiling stage based on matrix operations. ...

doi:10.1145/3524453 fatcat:miqhwzep3fey5admehib4md5ly

This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster ... PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster ... high-end parallel machines have been based on a distributed memory cluster architecture consisting of a networked cluster of commodity processors, each with its own memory. ...

doi:10.1109/icpp.2009.69 dblp:conf/icpp/BrightwellHWW09 fatcat:cosq32nilrbv5bkom7usbczkbi

In this paper, we prototype a 32-node cluster composed from these Zynq SoC chips to accelerate communication-bound sparse graphoriented applications such as neural network simulations. ... Commodity SoCs with hybrid architectures that combine CPUs with programmable FPGA fabric such as the Xilinx Zynq SoC have become a competitive energy-efficient platform for addressing irregular parallelism ... To address this potential, we focus on the ARM SoC capabilities in Figure 3 and show the various system throughputs for the on-chip memories, offchip memories, CPU-FPGA links and the network interfaces ...

doi:10.1109/fccm.2015.37 dblp:conf/fccm/MoorthyK15 fatcat:7iapt5jzp5fmddhu3fn7oxs5ci

Thus, this paper proposed a systolicarray-based architecture, called Sense, for sparse CNN acceleration by model-hardware co-design, achieving large performance improvement. ... Meanwhile, systolic array has been increasingly competitive on CNNs acceleration for its high spatiotemporal locality and low hardware overhead. ... Finally, a mapping algorithm is designed to map various networks on our architecture, based on provided network parameters. A. ...

arXiv:2202.00389v2 fatcat:szrnivmxr5bglgavg5qqlt6wci

Multiple Versions

Algorithm and Architecture for a Low-Power Content-Addressable Memory Based on Sparse Clustered Networks

Preserved Fulltext

A low-power Content-Addressable Memory based on clustered-sparse networks

Preserved Fulltext

Other Versions

SNE: an Energy-Proportional Digital Accelerator for Sparse Event-Based Convolutions [article]

Preserved Fulltext

Other Versions

Networked Power-Gated MRAMs for Memory-Based Computing

Preserved Fulltext

Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra [article]

Preserved Fulltext

Other Versions

A Case for Embedded FPGA-based SoCs in Energy-Efficient Acceleration of Graph Problems

Preserved Fulltext

A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference

Preserved Fulltext

The Versatile Image Processor V. I. P. (Hardware Design)

Preserved Fulltext

Customized Video Summarization with Thumbnail Containers and 2D CNN

Preserved Fulltext

Custom FPGA-based soft-processors for sparse graph acceleration

Preserved Fulltext

Ternary CAM Memory Design using MOS Transistors

Preserved Fulltext

A Unified Programmable Edge Matrix Processor for Deep Neural Networks and Matrix Algebra

Preserved Fulltext

Parallel Phase Model: A Programming Model for High-end Parallel Machines with Manycores

Preserved Fulltext

Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster

Preserved Fulltext

Sense: Model Hardware Co-design for Accelerating Sparse CNN on Systolic Array [article]

Preserved Fulltext

Other Versions