Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








1,631 Hits in 2.5 sec

Mapping of Neural Network Models onto Systolic Arrays

Sudipta Mahapatra, Rabi N. Mahapatra
2000 Journal of Parallel and Distributed Computing  
This paper presents a mapping scheme for the proposed implementation of neural network models on systolic arrays.  ...  systolic array.  ...  The same mapping strategy has been used in [8] for mapping the hidden Markov model (HMM) and the recursive back-propagation network (RBP) onto the ring systolic array.  ... 
doi:10.1006/jpdc.2000.1634 fatcat:p3pelrmeonh7bnm6blbcm3w3f4

REACT

Mohit Upadhyay, Rohan Juneja, Bo Wang, Jun Zhou, Weng-Fai Wong, Li-Shiuan Peh
2022 Proceedings of the 59th ACM/IEEE Design Automation Conference  
Unlike conventional dynamic NoCs, REACT's NoCs have no buffer queues, flow control or routing, as they are entirely configured by software for each neural network.  ...  On-chip training improves model accuracy on personalised user data and preserves privacy.  ...  Tushar Krishna and Sheng-Chun (Felix) Kao from Georgia Institute of Technology for their help using the SCALE-Sim and MAESTRO tools.  ... 
doi:10.1145/3489517.3530406 fatcat:odymeerkc5cfhl4bt7bhybeyum

Trends in systolic and cellular computation

George Miel
1991 Journal of Computational and Applied Mathematics  
Using annotated lists of recent references, snapshots of active research areas are given on systolic linear solvers, the singular value decomposition, artificial neural networks, and the simulated annealing  ...  The focus of the presentation is on linear algebraic techniques. A systolization process is illustrated by matching an arbitrary gaxpy operation onto a fixed-size square array processor.  ...  Acknowledgements The author is grateful to colleagues at Hughes Research Laboratories, Greg Nash, Wojtek Przytula and David Schwartz, who provided much of the information contained in this paper.  ... 
doi:10.1016/0377-0427(91)90178-m fatcat:ncp26oglknet3bjejr6klqhioa

A fuzzy clustering neural networks (FCNs) system design methodology

D. Zhang, S.K. Pal
2000 IEEE Transactions on Neural Networks  
Two mapping strategies both from FCN model to system architecture and from the given architecture to systolic arrays are described.  ...  A system design methodology for fuzzy clustering neural networks (FCNs) is presented.  ...  , and mapping the neural networks onto the corresponding systolic arrays (see Fig. 1 ).  ... 
doi:10.1109/72.870048 pmid:18249843 fatcat:zevwhbabxbhkrhuywug6hfpls4

SYSTOLIC ARRAY METHODOLOGY FOR A NEURAL MODEL TO SOLVE THE MIXTURE PROBLEM [chapter]

R. M. Pérez, P. Martínez, A. Plaza, P. L. Aguilar
2002 Series in Machine Perception and Artificial Intelligence  
The proposed method is supported by a linear recurrent neural network based on the Hopfield model (HRNN).  ...  Both structures are used for realising the iterative process of the Neural Network.  ...  Alejandro Curado Fuentes for his linguistic revision of this paper.  ... 
doi:10.1142/9789812778086_0002 fatcat:xvzov3catnb4pee53z66ox6vea

Design and Scaffolded Training of an Efficient DNN Operator for Computer Vision on the Edge [article]

Vinod Ganesan, Pratyush Kumar
2021 arXiv   pre-print
Further, by combining NOS with NAS, we design networks that define state-of-the-art models improving on both accuracy and latency on systolic arrays.  ...  The resultant computation efficiently maps to systolic arrays. The optimal dataflow, called Spatial-Tiled Output Stationary (ST-OS), maximizes the efficiency of FuSeConv on systolic arrays.  ...  We thank Surya Selvam for his contributions to an earlier version of this research effort [49] .  ... 
arXiv:2108.11441v1 fatcat:duhrhvp3g5dapl4b43fkokfiku

FuSeConv: Fully Separable Convolutions for Fast Inference on Systolic Arrays [article]

Surya Selvam, Vinod Ganesan, Pratyush Kumar
2021 arXiv   pre-print
With FuSeConv, we achieve a significant speed-up of 3x-7x with the MobileNet family of networks on a systolic array of size 64x64, with comparable accuracy on the ImageNet dataset.  ...  Both efficient neural networks and hardware accelerators are being explored to speed up DNN inference on edge devices.  ...  We thank Gokulan for his help in modeling systolic-arrays. Finally, we thank the anonymous reviewers for their insightful comments and suggestions towards improving the work.  ... 
arXiv:2105.13434v1 fatcat:gjnzf7mnabeoti2cc5zol47iaq

A Full-stack Accelerator Search Technique for Vision Applications [article]

Dan Zhang, Safeen Huda, Ebrahim Songhori, Quoc Le, Anna Goldie, Azalia Mirhoseini
2021 arXiv   pre-print
Although FAST can be used on any number and type of deep learning workload, in this paper we focus on optimizing for a single or small set of vision models, resulting in significantly faster and more power-efficient  ...  The rapidly-changing ML model landscape presents a unique opportunity for building hardware accelerators optimized for specific datacenter-scale workloads.  ...  However, depthwise-separable convolutions do not map well onto TPUs due to poor systolic array utilization and operational intensity.  ... 
arXiv:2105.12842v1 fatcat:mtunvjdcdrcr5pc5bpyfye7mea

Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators

Krishna Teja Chitty-Venkata, Arun K. Somani
2020 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)  
Our goal is to compress the model based on the size of the array so as to reduce the number of computation cycles.  ...  In practice, due to the mismatch between matrix and array sizes, the computation does not map on the array exactly.  ...  CONCLUSION We addressed the problem of optimizing the size of DNN weight matrices to suit the hardware specifications for efficient forward propagation on array-based neural network accelerators.  ... 
doi:10.1109/asap49362.2020.00016 dblp:conf/asap/Chitty-VenkataS20 fatcat:ossyqk2wdffstohgwgnh2azmrm

systolic array exploitation of a neural network inherent parallelism solving the nearest neighbor problem

Michael P. Bekakos, Katherine C. Pramataris
1997 Nonlinear Analysis  
The inherent parallelism of the proposed neural network model is further exploited by the its efficient parallel implementation, based on a systolic network architecture.  ...  In this paper, we suggest the employment of artificial neural networks for solving the problem of document clustering and nearest neighbor matching, in a dynamic and self-adapting way.  ...  One is in mapping the systolic algorithms for neural networks onto parallel computers, such as Warp, MasPar, MP-1, and Transputer arrays [10] [11] [12] [13] [14] , another is in designing programmable  ... 
doi:10.1016/s0362-546x(97)00345-3 fatcat:alz7hgjjbjevfpzwzbawlhoqo4

AIRCHITECT: Learning Custom Architecture Design and Mapping Space [article]

Ananda Samajdar, Jan Moritz Joseph, Matthew Denton, Tushar Krishna
2021 arXiv   pre-print
We use three case studies involving the optimal array design, SRAM buffer sizing, mapping, and schedule determination for systolic-array-based custom architecture design and mapping space.  ...  In this paper we investigate the possibility of learning the optimization task using machine learning and hence using the learnt model to predict optimal parameters for the design and mapping space of  ...  CONCLUSION This paper presents AIRCHITECT, a recommendation neural network for learning the architecture design and mapping space of systolic array based accelerators.  ... 
arXiv:2108.08295v1 fatcat:oynpbnf6mzc4vive7em5egfexy

FireFly: A High-Throughput and Reconfigurable Hardware Accelerator for Spiking Neural Networks [article]

Jindong Li and Guobin Shen and Dongcheng Zhao and Qian Zhang and Zeng Yi
2023 arXiv   pre-print
With the introduction of the backpropagation algorithm and surrogate gradient, the structure of spiking neural networks has become more complex, and the performance gap with artificial neural networks  ...  Spiking neural networks (SNNs) have been widely used due to their strong biological interpretability and high energy efficiency.  ...  Input spike map channels are split into multiple tiles to fit into the height of the systolic array. Output spike map channels are calculated N at a time according to the width of the systolic array.  ... 
arXiv:2301.01905v2 fatcat:ngzoen5ohrabxfiztnlc32zkfq

SCALE-Sim: Systolic CNN Accelerator Simulator [article]

Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, Tushar Krishna
2019 arXiv   pre-print
However, the research community lacks tools to insights on both the design trade-offs and efficient mapping strategies for systolic-array based accelerators.  ...  This is the first systolic-array simulator tuned for running DNNs to the best of our knowledge.  ...  As mentioned earlier there any many possible ways of mapping the compute onto the array. Each such mapping is called a data-flow.  ... 
arXiv:1811.02883v2 fatcat:x252tx3qgnffhiolmllj4jhmsa

SalvageDNN: salvaging deep neural network accelerators with permanent faults through saliency-driven fault-aware mapping

Muhammad Abdullah Hanif, Muhammad Shafique
2019 Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences  
Deep neural networks (DNNs) have proliferated in most of the application domains that involve data processing, predictive analysis and knowledge inference.  ...  We also present novel modifications in a systolic array design to further improve the yield of the accelerators while ensuring reliable DNN execution using 'SalvageDNN' and negligible overheads in terms  ...  This Work is supported in parts by the German Research Foundation (DFG) as part of the GetSURE project in the scope of SPP-1500 (http://spp1500.itec.kit.edu) priority program, 'Dependable Embedded Systems  ... 
doi:10.1098/rsta.2019.0164 pmid:31865875 pmcid:PMC6939235 fatcat:5xw2nvfzgzbnfjnl3pfkqq3o7a

Sparse Winograd Convolutional neural networks on small-scale systolic arrays [article]

Feng Shi, Haochen Li, Yuhe Gao, Benjamin Kuschner, Song-Chun Zhu
2018 arXiv   pre-print
In this paper, we implement an accelerator on FPGA by combining the sparse Winograd convolution, clusters of small-scale systolic arrays, and a tailored memory layout design.  ...  We also provide an analytical model analysis for the general Winograd convolution algorithm as a design reference.  ...  layout down to small blocks then map these blocks onto small-scale systolic arrays to perform multiplications of submatrices, and share these submatrices among working arrays to reduce required memory  ... 
arXiv:1810.01973v1 fatcat:g44643olkvh23fkdp5lsraipe4
« Previous Showing results 1 — 15 out of 1,631 results