A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Automatic generation of specialized direct convolutions for mobile GPUs
2020
Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit
Using Lift, we show that it is possible to generate automatically code that is ×10 faster than the direct convolution while using ×3.6 less space than the GEMM-based convolution of the very specialized ...
ARM Compute Library on the latest generation of ARM Mali GPU. ...
Acknowledgments This work was supported by the Engineering and Physical Sciences Research Council (grant EP/L01503X/1), EPSRC Centre for Doctoral Training in Pervasive Parallelism at the University of ...
doi:10.1145/3366428.3380771
dblp:conf/ppopp/MogersRLTOD20
fatcat:342savoeijb3zaznujfmhptoku
Design Automation for Efficient Deep Learning Computing
[article]
2019
arXiv
pre-print
We propose design automation techniques for efficient neural networks. We investigate automatically designing specialized fast models, auto channel pruning, and auto mixed-precision quantization. ...
Moreover, we shorten the design cycle by 200x than previous work, so that we can afford to design specialized neural network models for different hardware platforms. ...
Compared general purpose models, our specialized model improves the top-1 accuracy by 1.1% -3.1% while being 1.2×-7.5× faster. Table 2 compares the specialized models on CPU/GPU/Mobile. ...
arXiv:1904.10616v1
fatcat:77ft4alwqvgszhevtcjssnkyzm
Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU
2021
Computers
We experimentally demonstrate the applicability of such an approach on a subset of the popular NAS-Benchmark 101 dataset for two different mobile GPU. ...
A lot of deep learning applications are desired to be run on mobile devices. Both accuracy and inference time are meaningful for a lot of them. ...
Acknowledgments: We thank the editor and three anonymous reviewers for their constructive comments, which helped us to improve the manuscript. ...
doi:10.3390/computers10080104
fatcat:4titj4ftlvfdxkcrdgn3td7um4
Deep Learning for Mobile Multimedia
2017
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
DL architectures and algorithms are hardly adapted to the storage and computation resources of a mobile device. erefore, there is a need for new generations of mobile processors and chipsets, small footprint ...
Speci cally, in recent years powerful and compact GPUs have been released at a ordable prices, which allow accelerating the computation of the weights of DNNs. ...
special hardware platforms for mobile DNNs. ...
doi:10.1145/3092831
fatcat:ez2fcgckhjawlfywyecest4jqy
Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs
2019
2019 IEEE International Symposium on Workload Characterization (IISWC)
Mobile and embedded systems now have GPUs which are ideal for the parallel computations of neural networks and for their lower energy cost per operation. ...
We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance, leading to 2⇥ slowdown. ...
Second, designing new neural network architectures for specific devices should consider the best sizes of convolutional layers for each library and hardware, thus building specialized networks for each ...
doi:10.1109/iiswc47752.2019.9042000
dblp:conf/iiswc/RaduKWTCCFSO19
fatcat:hvo6ll2esndyzg7sfnv5ujbe2u
Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs
[article]
2020
arXiv
pre-print
Mobile and embedded systems now have GPUs which are ideal for the parallel computations of neural networks and for their lower energy cost per operation. ...
We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance, leading to 2x slowdown. ...
Second, designing new neural network architectures for specific devices should consider the best sizes of convolutional layers for each library and hardware, thus building specialized networks for each ...
arXiv:2002.08697v1
fatcat:eii47oyijfgkbfuivmmfx2xmd4
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
2022
ACM Transactions on Design Automation of Electronic Systems
Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video, and natural language processing by exploiting their spatial sparsity and temporal ...
To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. ...
In practice, depthwise convolution is usually used for edge devices (e.g., mobile), while group/normal convolution is usually used for cloud devices (e.g., GPU). ...
doi:10.1145/3486618
fatcat:h6xwv2slo5eklift2fl24usine
Neural Architecture Search Survey: A Hardware Perspective
2022
ACM Computing Surveys
The goal of this paper is to provide insights and understanding of HW-NAS techniques for various hardware platforms (MCU, CPU, GPU, ASIC, FPGA, ReRAM, DSP, and VPU), followed by the co-search methodologies ...
At the same time, several hardware platforms, general- and special-purpose, have equally contributed to the training and deployment of these complex networks in a different setting. ...
Any opinions, indings, and conclusions or recommendations expressed in this material are those of the author(s). ...
doi:10.1145/3524500
fatcat:4ibnwmgbdnbhjpk4u7soc6aom4
Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision
[article]
2018
arXiv
pre-print
We study performance characteristics of convolutional neural networks (CNN) for mobile computer vision systems. CNNs have proven to be a powerful and efficient approach to implement such systems. ...
Our measurements include embedded processors found on mobile devices and high-performance processors that can be used on the network side of mobile systems. ...
Some research prototypes that leverage mobile device special purpose processors (e.g., DSP, GPU) also exist [13, [15] [16] [17] [18] . ...
arXiv:1803.09492v1
fatcat:akf3qn7p5vdtxppjoitajrg6ri
A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications
[article]
2016
arXiv
pre-print
However, on other hardware targets, especially mobile GPUs, such vendor libraries are not generally available. ...
Thus, the development of portable, open, high-performance, energy-efficient GPU code for DNN operations would enable broader deployment of DNN-based algorithms. ...
Tuning for Qualcomm Mobile GPUs . In Figure 11 , the bodainitial values show the initial (poor) performance when running the general-case fallback convolution variant on the SD820 platform. ...
arXiv:1611.06945v1
fatcat:clgpegm2ubd6lowwclnheqjf7q
CloudifierNet – Deep Vision Models for Artificial Image Processing
[article]
2019
arXiv
pre-print
Computer vision models and particularly deep directed acyclic graphs based on convolutional modules are generally constructed and trained based on natural images datasets. ...
In the current paper, we will present the base principles of a deep neural pipeline for computer vision applied to artificial scenes (scenes generated by user interfaces or similar). ...
for automatic code generation based on (near) natural language specifications up to source code generation based on an interface mock-up (computer-aided drawing of user-interface mock-up). ...
arXiv:1911.01346v1
fatcat:zmbuoiwnrfcsvbhbqkson4nkxy
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices
[article]
2021
arXiv
pre-print
However, the direct generalization of existing 2D CNN weight pruning methods to 3D CNNs is not ideal for fully exploiting mobile parallelism while achieving high inference accuracy. ...
Mobile devices are becoming an important carrier for deep learning tasks, as they are being equipped with powerful, high-end mobile CPUs and GPUs. ...
Consider a general 3D CNN consisting of L convolutional (CONV) layers. Besides the l-th CONV layer weight tensor W l , the bias is denoted by b l . ...
arXiv:2007.09835v2
fatcat:qsyhrk6hhvcjfc2tcxyxoqupya
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
[chapter]
2018
Lecture Notes in Computer Science
Based on a series of controlled experiments, this work derives several practical guidelines for efficient network design. Accordingly, a new architecture is presented, called ShuffleNet V2. ...
Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. ...
Acknowledgements Thanks Yichen Wei for his help on paper writing. This research is partially supported by National Natural Science Foundation of China (Grant No. 61773229). ...
doi:10.1007/978-3-030-01264-9_8
fatcat:5eljnbtc65blveoza4nm5k6gbi
MNN: A Universal and Efficient Inference Engine
[article]
2020
arXiv
pre-print
To deal with these challenges, we propose Mobile Neural Network (MNN), a universal and efficient inference engine tailored to mobile applications. ...
Deploying deep learning models on mobile devices draws more and more attention recently. ...
ACKNOWLEDGEMENTS We thank Chaoyue Niu for helpful discussions and the anonymous reviewers for their valuable comments to improve our work. ...
arXiv:2002.12418v1
fatcat:ppeykiv57nc6bfqa74lyzse3by
MobiSR
2019
The 25th Annual International Conference on Mobile Computing and Networking - MobiCom '19
In recent years, convolutional networks have demonstrated unprecedented performance in the image restoration task of super-resolution (SR). ...
SR entails the upscaling of a single low-resolution image in order to meet application-specific image quality demands and plays a key role in mobile devices. ...
In general, CE can represent a diversity of mobile SoCs hosting heterogeneous compute engines, ranging from the ubiquitous mobile CPUs and GPUs to the newer emerging NPUs [26] . ...
doi:10.1145/3300061.3345455
dblp:conf/mobicom/LeeVDBL19
fatcat:k52pugz3tvc3jjky3cmb4d3t7m
« Previous
Showing results 1 — 15 out of 5,390 results