Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








5,390 Hits in 3.6 sec

Automatic generation of specialized direct convolutions for mobile GPUs

Naums Mogers, Valentin Radu, Lu Li, Jack Turner, Michael O'Boyle, Christophe Dubach
2020 Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit  
Using Lift, we show that it is possible to generate automatically code that is ×10 faster than the direct convolution while using ×3.6 less space than the GEMM-based convolution of the very specialized  ...  ARM Compute Library on the latest generation of ARM Mali GPU.  ...  Acknowledgments This work was supported by the Engineering and Physical Sciences Research Council (grant EP/L01503X/1), EPSRC Centre for Doctoral Training in Pervasive Parallelism at the University of  ... 
doi:10.1145/3366428.3380771 dblp:conf/ppopp/MogersRLTOD20 fatcat:342savoeijb3zaznujfmhptoku

Design Automation for Efficient Deep Learning Computing [article]

Song Han, Han Cai, Ligeng Zhu, Ji Lin, Kuan Wang, Zhijian Liu, Yujun Lin
2019 arXiv   pre-print
We propose design automation techniques for efficient neural networks. We investigate automatically designing specialized fast models, auto channel pruning, and auto mixed-precision quantization.  ...  Moreover, we shorten the design cycle by 200x than previous work, so that we can afford to design specialized neural network models for different hardware platforms.  ...  Compared general purpose models, our specialized model improves the top-1 accuracy by 1.1% -3.1% while being 1.2×-7.5× faster. Table 2 compares the specialized models on CPU/GPU/Mobile.  ... 
arXiv:1904.10616v1 fatcat:77ft4alwqvgszhevtcjssnkyzm

Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU

Evgeny Ponomarev, Sergey Matveev, Ivan Oseledets, Valery Glukhov
2021 Computers  
We experimentally demonstrate the applicability of such an approach on a subset of the popular NAS-Benchmark 101 dataset for two different mobile GPU.  ...  A lot of deep learning applications are desired to be run on mobile devices. Both accuracy and inference time are meaningful for a lot of them.  ...  Acknowledgments: We thank the editor and three anonymous reviewers for their constructive comments, which helped us to improve the manuscript.  ... 
doi:10.3390/computers10080104 fatcat:4titj4ftlvfdxkcrdgn3td7um4

Deep Learning for Mobile Multimedia

Kaoru Ota, Minh Son Dao, Vasileios Mezaris, Francesco G. B. De Natale
2017 ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)  
DL architectures and algorithms are hardly adapted to the storage and computation resources of a mobile device. erefore, there is a need for new generations of mobile processors and chipsets, small footprint  ...  Speci cally, in recent years powerful and compact GPUs have been released at a ordable prices, which allow accelerating the computation of the weights of DNNs.  ...  special hardware platforms for mobile DNNs.  ... 
doi:10.1145/3092831 fatcat:ez2fcgckhjawlfywyecest4jqy

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Valentin Radu, Kuba Kaszyk, Yuan Wen, Jack Turner, Jose Cano, Elliot J. Crowley, Bjorn Franke, Amos Storkey, Michael O'Boyle
2019 2019 IEEE International Symposium on Workload Characterization (IISWC)  
Mobile and embedded systems now have GPUs which are ideal for the parallel computations of neural networks and for their lower energy cost per operation.  ...  We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance, leading to 2⇥ slowdown.  ...  Second, designing new neural network architectures for specific devices should consider the best sizes of convolutional layers for each library and hardware, thus building specialized networks for each  ... 
doi:10.1109/iiswc47752.2019.9042000 dblp:conf/iiswc/RaduKWTCCFSO19 fatcat:hvo6ll2esndyzg7sfnv5ujbe2u

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs [article]

Valentin Radu, Kuba Kaszyk, Yuan Wen, Jack Turner, Jose Cano, Elliot J. Crowley, Bjorn Franke, Amos Storkey, Michael O'Boyle
2020 arXiv   pre-print
Mobile and embedded systems now have GPUs which are ideal for the parallel computations of neural networks and for their lower energy cost per operation.  ...  We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance, leading to 2x slowdown.  ...  Second, designing new neural network architectures for specific devices should consider the best sizes of convolutional layers for each library and hardware, thus building specialized networks for each  ... 
arXiv:2002.08697v1 fatcat:eii47oyijfgkbfuivmmfx2xmd4

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han
2022 ACM Transactions on Design Automation of Electronic Systems  
Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video, and natural language processing by exploiting their spatial sparsity and temporal  ...  To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization.  ...  In practice, depthwise convolution is usually used for edge devices (e.g., mobile), while group/normal convolution is usually used for cloud devices (e.g., GPU).  ... 
doi:10.1145/3486618 fatcat:h6xwv2slo5eklift2fl24usine

Neural Architecture Search Survey: A Hardware Perspective

Krishna Teja Chitty-Venkata, Arun K. Somani
2022 ACM Computing Surveys  
The goal of this paper is to provide insights and understanding of HW-NAS techniques for various hardware platforms (MCU, CPU, GPU, ASIC, FPGA, ReRAM, DSP, and VPU), followed by the co-search methodologies  ...  At the same time, several hardware platforms, general- and special-purpose, have equally contributed to the training and deployment of these complex networks in a different setting.  ...  Any opinions, indings, and conclusions or recommendations expressed in this material are those of the author(s).  ... 
doi:10.1145/3524500 fatcat:4ibnwmgbdnbhjpk4u7soc6aom4

Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision [article]

Jussi Hanhirova, Teemu Kämäräinen, Sipi Seppälä, Matti Siekkinen, Vesa Hirvisalo, Antti Ylä-Jääski
2018 arXiv   pre-print
We study performance characteristics of convolutional neural networks (CNN) for mobile computer vision systems. CNNs have proven to be a powerful and efficient approach to implement such systems.  ...  Our measurements include embedded processors found on mobile devices and high-performance processors that can be used on the network side of mobile systems.  ...  Some research prototypes that leverage mobile device special purpose processors (e.g., DSP, GPU) also exist [13, [15] [16] [17] [18] .  ... 
arXiv:1803.09492v1 fatcat:akf3qn7p5vdtxppjoitajrg6ri

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications [article]

Matthew W. Moskewicz and Ali Jannesari and Kurt Keutzer
2016 arXiv   pre-print
However, on other hardware targets, especially mobile GPUs, such vendor libraries are not generally available.  ...  Thus, the development of portable, open, high-performance, energy-efficient GPU code for DNN operations would enable broader deployment of DNN-based algorithms.  ...  Tuning for Qualcomm Mobile GPUs . In Figure 11 , the bodainitial values show the initial (poor) performance when running the general-case fallback convolution variant on the SD820 platform.  ... 
arXiv:1611.06945v1 fatcat:clgpegm2ubd6lowwclnheqjf7q

CloudifierNet – Deep Vision Models for Artificial Image Processing [article]

Andrei Damian, Laurentiu Piciu, Alexandru Purdila, Nicolae Tapus
2019 arXiv   pre-print
Computer vision models and particularly deep directed acyclic graphs based on convolutional modules are generally constructed and trained based on natural images datasets.  ...  In the current paper, we will present the base principles of a deep neural pipeline for computer vision applied to artificial scenes (scenes generated by user interfaces or similar).  ...  for automatic code generation based on (near) natural language specifications up to source code generation based on an interface mock-up (computer-aided drawing of user-interface mock-up).  ... 
arXiv:1911.01346v1 fatcat:zmbuoiwnrfcsvbhbqkson4nkxy

RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices [article]

Wei Niu, Mengshu Sun, Zhengang Li, Jou-An Chen, Jiexiong Guan, Xipeng Shen, Yanzhi Wang, Sijia Liu, Xue Lin, Bin Ren
2021 arXiv   pre-print
However, the direct generalization of existing 2D CNN weight pruning methods to 3D CNNs is not ideal for fully exploiting mobile parallelism while achieving high inference accuracy.  ...  Mobile devices are becoming an important carrier for deep learning tasks, as they are being equipped with powerful, high-end mobile CPUs and GPUs.  ...  Consider a general 3D CNN consisting of L convolutional (CONV) layers. Besides the l-th CONV layer weight tensor W l , the bias is denoted by b l .  ... 
arXiv:2007.09835v2 fatcat:qsyhrk6hhvcjfc2tcxyxoqupya

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design [chapter]

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun
2018 Lecture Notes in Computer Science  
Based on a series of controlled experiments, this work derives several practical guidelines for efficient network design. Accordingly, a new architecture is presented, called ShuffleNet V2.  ...  Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs.  ...  Acknowledgements Thanks Yichen Wei for his help on paper writing. This research is partially supported by National Natural Science Foundation of China (Grant No. 61773229).  ... 
doi:10.1007/978-3-030-01264-9_8 fatcat:5eljnbtc65blveoza4nm5k6gbi

MNN: A Universal and Efficient Inference Engine [article]

Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, Chengfei Lv, Zhihua Wu
2020 arXiv   pre-print
To deal with these challenges, we propose Mobile Neural Network (MNN), a universal and efficient inference engine tailored to mobile applications.  ...  Deploying deep learning models on mobile devices draws more and more attention recently.  ...  ACKNOWLEDGEMENTS We thank Chaoyue Niu for helpful discussions and the anonymous reviewers for their valuable comments to improve our work.  ... 
arXiv:2002.12418v1 fatcat:ppeykiv57nc6bfqa74lyzse3by

MobiSR

Royson Lee, Stylianos I. Venieris, Lukasz Dudziak, Sourav Bhattacharya, Nicholas D. Lane
2019 The 25th Annual International Conference on Mobile Computing and Networking - MobiCom '19  
In recent years, convolutional networks have demonstrated unprecedented performance in the image restoration task of super-resolution (SR).  ...  SR entails the upscaling of a single low-resolution image in order to meet application-specific image quality demands and plays a key role in mobile devices.  ...  In general, CE can represent a diversity of mobile SoCs hosting heterogeneous compute engines, ranging from the ubiquitous mobile CPUs and GPUs to the newer emerging NPUs [26] .  ... 
doi:10.1145/3300061.3345455 dblp:conf/mobicom/LeeVDBL19 fatcat:k52pugz3tvc3jjky3cmb4d3t7m
« Previous Showing results 1 — 15 out of 5,390 results