Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








12,800 Hits in 5.4 sec

Algorithm and Architecture for a Low-Power Content-Addressable Memory Based on Sparse Clustered Networks

Hooman Jarollahi, Vincent Gripon, Naoya Onizawa, Warren J. Gross
2015 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
We propose a low-power content-addressable memory (CAM) employing a new algorithm for associativity between the input tag and the corresponding address of the output data.  ...  Index Terms-Associative memory, content-addressable memory (CAM), low-power computing, recurrent neural networks, sparse clustered networks (SCNs). 1063-8210 He was a Visiting Scholar with the Research  ...  INTRODUCTION A CONTENT-addressable memory (CAM) is a type of memory that can be accessed using its contents rather than an explicit address.  ... 
doi:10.1109/tvlsi.2014.2316733 fatcat:i3zdsmqedranlg4yaaoixq3gsq

A low-power Content-Addressable Memory based on clustered-sparse networks

Hooman Jarollahi, Vincent Gripon, Naoya Onizawa, Warren J. Gross
2013 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors  
A low-power Content-Addressable-Memory (CAM) is introduced employing a new mechanism for associativity between the input tags and the corresponding address of the output data.  ...  The proposed architecture is based on a recently developed clustered-sparse-network using binary-weighted connections that on-average will eliminate most of the parallel comparisons performed during a  ...  ., "Architecture and implementation of an associative memory using sparse clustered networks," 2012 IEEE International Symposium onCircuits and Systems (ISCAS), Seoul, Korea, 20-23 May 2012, pp. 2901-2904  ... 
doi:10.1109/asap.2013.6567594 dblp:conf/asap/JarollahiGOG13 fatcat:kq2tvtdxindfxb4zhrxc7oct6u

SNE: an Energy-Proportional Digital Accelerator for Sparse Event-Based Convolutions [article]

Alfio Di Mauro, Arpan Suravi Prasad, Zhikai Huang, Matteo Spallanzani, Francesco Conti, Luca Benini
2022 arXiv   pre-print
Event-based sensors are drawing increasing attention due to their high temporal resolution, low power consumption, and low bandwidth.  ...  To efficiently extract semantically meaningful information from sparse data streams produced by such sensors, we present a 4.5TOP/s/W digital accelerator capable of performing 4-bits-quantized event-based  ...  The authors would like to thank Armasuisse Science and Technology for funding this research, and IniVation for kindly lending us a DVS camera.  ... 
arXiv:2204.10687v2 fatcat:6nq43tujy5dkxppnpxifwpgu6u

Networked Power-Gated MRAMs for Memory-Based Computing

Jean-Philippe Diguet, Naoya Onizawa, Mostafa Rizk, Johanna Sepulveda, Amer Baghdadi, Takahiro Hanyu
2018 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
The proposed architecture uses a Network-on-Chip (NoC) to interconnect MRAM-based clusters, processing elements, and managers.  ...  This results in a low-power, highly scalable and configurable implementation of memory-based computing.  ...  ACKNOWLEDGMENT This work was supported in Japan by JSPS KAKENHI Grant Numbers JP16H06300 and in France by the Region Bretagne CyAM project and by the MFC project of Future & Rupture IMT program.  ... 
doi:10.1109/tvlsi.2018.2856458 fatcat:jax77tbsufa5jlvgthpoo672li

Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra [article]

Paul Scheffler, Florian Zaruba, Fabian Schuiki, Torsten Hoefler, Luca Benini
2020 arXiv   pre-print
A matrix-vector implementation on a multi-core cluster is up to 5.8x faster and 2.7x more energy-efficient with our kernels than an optimized baseline.  ...  Sparse-dense linear algebra is crucial in many domains, but challenging to handle efficiently on CPUs, GPUs, and accelerators alike; multiplications with sparse formats like CSR and CSF require indirect  ...  [20] present an algorithm and tensor core modifications for efficient sparse neural network inference.  ... 
arXiv:2011.08070v2 fatcat:fmvliell4fc6joyox3invozd4e

A Case for Embedded FPGA-based SoCs in Energy-Efficient Acceleration of Graph Problems

2015 Supercomputing Frontiers and Innovations  
A cluster of embedded SoCs (systems-on-chip) with closely-coupled FPGA accelerators can support distributed memory access with better matched low-power processing.  ...  These bottlenecks on traditional x86-based systems mean that sparse graph problems scale very poorly, both in terms of performance and power efficiency.  ...  The graph data is conventionally stored in a compressed sparse format (row based or column based), which is a memory storage optimization for sparse graph structures.  ... 
doi:10.14529/jsfi150307 fatcat:txw4k4jwx5hu7ck7ep3nykg6cy

A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference

Arnab Neelim Mazumder, Jian Meng, Hasib-Al Rashid, Utteja Kallakuri, Xin Zhang, Jae-sun Seo, Tinoosh Mohsenin
2021 IEEE Journal on Emerging and Selected Topics in Circuits and Systems  
The main goal is to allow efficient processing of the DNNs on low-power micro-AI platforms without compromising hardware resources and accuracy.  ...  Deep neural networks (DNNs) are being prototyped for a variety of artificial intelligence (AI) tasks including computer vision, data analytics, robotics, etc.  ...  Cluster pruning [43] first optimizes the filter cluster size based on accuracy and inference latency changes, then prunes the clusters to maximize the hardware performance.  ... 
doi:10.1109/jetcas.2021.3129415 fatcat:nknpy4eernaeljz2hpqafe7sja

The Versatile Image Processor V. I. P. (Hardware Design)

G. Gugliotta, Alberto Machì
1992 IAPR International Workshop on Machine Vision Applications  
The processors operate concurrently on cluster and system shared memories through the parallel busses and exchange messages among crates and with a host system through the serial network.  ...  Video Rus and a serial network.  ...  F.Piccirelli and to Mr C. Granuzzo, Project Managers of TECNINT for the continuous and Fruitful support of ideas and technical solutions to the development of VIP prototype.  ... 
dblp:conf/mva/GugliottaM92 fatcat:nkorxadqz5cmtlertw4borg2zi

Customized Video Summarization with Thumbnail Containers and 2D CNN

Vaishnavi MN, Shashank Reddy, Kiran YC
2024 International Journal of Advanced Research in Science, Communication and Technology  
This framework creates a custom keyshot summary for two or more concurrent users by leveraging the computing power of the user's device.  ...  This project focuses on acquiring customized video summaries using thumbnail container-based summarization framework and 2D CNN model to select and extract specific features from thumbnails.  ...  Methodologies employed encompass feature extraction with CNNs and HEVC, dimensionality reduction using Sparse Autoencoder (SAE) and Stepwise Regression, frame elimination techniques based on low-level  ... 
doi:10.48175/ijarsct-15397 fatcat:24vrdln6ojbhxi2iqmzfgkajo4

Custom FPGA-based soft-processors for sparse graph acceleration

Nachiket Kapre
2015 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)  
FPGA-based soft processors customized for operations on sparse graphs can deliver significant performance improvements over conventional organizations (ARMv7 CPUs) for bulk synchronous sparse graph algorithms  ...  We interconnect a 2D array of these lightweight processors with a packet-switched network-on-chip to enable fine-grained operand routing along the graph edges and provide custom send/receive instructions  ...  graph algorithm functions, and (3) data memory contents (graph memory) along with a template for the 2D NoC.  ... 
doi:10.1109/asap.2015.7245698 dblp:conf/asap/Kapre15 fatcat:mqos2rxf4zdkxcsq2hf6q3xji4

Ternary CAM Memory Design using MOS Transistors

V. Raghavendran
2020 International Journal for Research in Applied Science and Engineering Technology  
Ternary content addressable memory (TCAM) is a high performance search engine which accesses the data based on its contents in a single clock cycle.  ...  Ternary content addressable memory is a hardware search engine that is much faster than modified algorithmic approaches for search intensive applications.  ...  Area consumption is depends on optimization of MOS transistors count level. Sparse clustered networks uses content addressable memories [7] . III.  ... 
doi:10.22214/ijraset.2020.30252 fatcat:24ubdxgverch5mgvvaqveyzrvi

A Unified Programmable Edge Matrix Processor for Deep Neural Networks and Matrix Algebra

Biji George, Om ji Omer, Ziaul Choudhury, Anoop V, Sreenivas Subramoney
2022 ACM Transactions on Embedded Computing Systems  
for low power and cost-sensitive Edge deployments.  ...  , sparse Deep Neural Network (DNN), Cholesky decomposition, and triangular matrix solve respectively.  ...  Ports are mapped to a programmable base address for lexibility and transactions are spawned from tiling stage based on matrix operations.  ... 
doi:10.1145/3524453 fatcat:miqhwzep3fey5admehib4md5ly

Parallel Phase Model: A Programming Model for High-end Parallel Machines with Manycores

Ron Brightwell, Mike Heroux, Zhaofang Wen, Junfeng Wu
2009 2009 International Conference on Parallel Processing  
This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster  ...  PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster  ...  high-end parallel machines have been based on a distributed memory cluster architecture consisting of a networked cluster of commodity processors, each with its own memory.  ... 
doi:10.1109/icpp.2009.69 dblp:conf/icpp/BrightwellHWW09 fatcat:cosq32nilrbv5bkom7usbczkbi

Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster

Pradeep Moorthy, Nachiket Kapre
2015 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines  
In this paper, we prototype a 32-node cluster composed from these Zynq SoC chips to accelerate communication-bound sparse graphoriented applications such as neural network simulations.  ...  Commodity SoCs with hybrid architectures that combine CPUs with programmable FPGA fabric such as the Xilinx Zynq SoC have become a competitive energy-efficient platform for addressing irregular parallelism  ...  To address this potential, we focus on the ARM SoC capabilities in Figure 3 and show the various system throughputs for the on-chip memories, offchip memories, CPU-FPGA links and the network interfaces  ... 
doi:10.1109/fccm.2015.37 dblp:conf/fccm/MoorthyK15 fatcat:7iapt5jzp5fmddhu3fn7oxs5ci

Sense: Model Hardware Co-design for Accelerating Sparse CNN on Systolic Array [article]

Wenhao Sun, Deng Liu, Zhiwei Zou, Wendi Sun, Yi Kang, Song Chen
2022 arXiv   pre-print
Thus, this paper proposed a systolicarray-based architecture, called Sense, for sparse CNN acceleration by model-hardware co-design, achieving large performance improvement.  ...  Meanwhile, systolic array has been increasingly competitive on CNNs acceleration for its high spatiotemporal locality and low hardware overhead.  ...  Finally, a mapping algorithm is designed to map various networks on our architecture, based on provided network parameters. A.  ... 
arXiv:2202.00389v2 fatcat:szrnivmxr5bglgavg5qqlt6wci
« Previous Showing results 1 — 15 out of 12,800 results