Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








72,296 Hits in 5.3 sec

Exploiting Parallelism Opportunities with Deep Learning Frameworks [article]

Yu Emma Wang, Carole-Jean Wu, Xiaodong Wang, Kim Hazelwood, David Brooks
2020 arXiv   pre-print
This paper takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism.  ...  State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning  ...  In this section, we describe how deep learning framework design choices (Section 2.1) exploit parallelism opportunities exposed in deep learning workloads (Section 2.2), and overview our framework parameter  ... 
arXiv:1908.04705v2 fatcat:fnmcly3f3vanvlc6hi6uxxj6pi

Editorial for the special issue on operating systems and programming systems for HPC

Xiaobing Feng, Minyi Guo
2020 CCF Transactions on High Performance Computing  
With a benchmark from deep learning-based cancerous region detection algorithm, good parallel efficiency are obtained for at most 1024 processors, revealing the great opportunity for joint combination  ...  of deep learning and HPC system.  ...  With a benchmark from deep learning-based cancerous region detection algorithm, good parallel efficiency are obtained for at most 1024 processors, revealing the great opportunity for joint combination  ... 
doi:10.1007/s42514-020-00053-6 fatcat:nthaiyn6m5eqvisdxwiz7r7u2m

HOTI 2020 Commentary

2020 2020 IEEE Symposium on High-Performance Interconnects (HOTI)  
The recent advances in Deep Learning (DL) has led to many exciting challenges and opportunities for CS and AI researchers alike.  ...  We will also present an overview of different DNN architectures and DL frameworks. Most DL frameworks started with a single-node design.  ...  His research interests include parallel computer architecture, high performance networking, InfiniBand, network-based computing, exascale computing, programming models, GPUs and accelerators, high performance  ... 
doi:10.1109/hoti51249.2020.00012 fatcat:ptxu3fuk5vghflln7ezaitoeyq

Faith: An Efficient Framework for Transformer Verification on GPUs [article]

Boyuan Feng, Tianqi Tang, Yuke Wang, Zhaodong Chen, Zheng Wang, Shu Yang, Yuan Xie, Yufei Ding
2022 arXiv   pre-print
Transformer verification draws increasing attention in machine learning research and industry.  ...  It formally verifies the robustness of transformers against adversarial attacks such as exchanging words in a sentence with synonyms.  ...  Deep Learning Frameworks on GPUs GPUs have been widely exploited to accelerate deep learning workload [13, 39, 40, 46, 49] .  ... 
arXiv:2209.12708v1 fatcat:xe5aatchkvc7xdyd4btbc4ci2q

Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free Optimization Algorithms [article]

Derssie Mebratu, Niranjan Hasabnis, Pietro Mercati, Gaurit Sharma, Shamima Najnin
2021 arXiv   pre-print
Modern deep learning (DL) applications are built using DL libraries and frameworks such as TensorFlow and PyTorch.  ...  Manual tuning requires deep knowledge of the user-controllable parameters of DL frameworks as well as the underlying hardware.  ...  The availability of open-source deep learning software frameworks, such as PyTorch [11] and TensorFlow [1] , along with the suites of neural network models [15] enables fast deployment of deep learning  ... 
arXiv:2109.06266v1 fatcat:ixszcn5fo5dmfn442e33k34a7q

Pushing the boundaries of parallel Deep Learning -- A practical approach [article]

Paolo Viviani, Maurizio Drocco, Marco Aldinucci
2018 arXiv   pre-print
This work aims to assess the state of the art of data parallel deep neural network training, trying to identify potential research tracks to be exploited for performance improvement.  ...  Beside, it presents a design for a practical C++ library dedicated at implementing and unifying the current state of the art methodologies for parallel training in a performance-conscious framework, allowing  ...  In order to provide a truly general purpose tool, as well as to exploit the peculiarities of the different deep learning frameworks available, the proposed FAST (Flexible (A)synchronous Scalable Training  ... 
arXiv:1806.09528v1 fatcat:hakyhxanivfmfnqkgnyjlycsie

Deep Learning at Scale

Paolo Viviani, Maurizio Drocco, Daniele Baccega, Iacopo Colonnelli, Marco Aldinucci
2019 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)  
Established techniques for data parallel training are discussed from both a parallel computing and deep learning perspective, then a different approach is presented that is meant to allow DNN training  ...  This work presents a novel approach to distributed training of deep neural networks (DNNs) that aims to overcome the issues related to mainstream approaches to data parallel training.  ...  In order to provide a truly general purpose tool, as well as to exploit the peculiarities of the different deep learning frameworks available, the proposed FAST (Flexible (A)synchronous Scalable Training  ... 
doi:10.1109/empdp.2019.8671552 dblp:conf/pdp/VivianiDBCA19 fatcat:cf7wan67i5f63oipufn77r2pxi

Integrating Deep Learning in Domain Sciences at Exascale [article]

Rick Archibald, Edmond Chow, Eduardo D'Azevedo, Jack Dongarra, Markus Eisenbach, Rocco Febbo, Florent Lopez, Daniel Nichols, Stanimire Tomov, Kwai Wong, Junqi Yin
2020 arXiv   pre-print
These developments, along with existing HPC AI software capabilities, have been integrated into MagmaDNN, an open-source HPC deep learning framework.  ...  This paper discusses the necessities of an HPC deep learning framework and how those needs can be provided (e.g., as in MagmaDNN) through a deep integration with existing HPC libraries, such as MAGMA and  ...  Applications Materials Science and Microscopy There are multiple opportunities to exploit machine learning techniques in materials science.  ... 
arXiv:2011.11188v1 fatcat:zhmnfvhvbjhjzj72tu6ikrcpla

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox [article]

Qiyue Yin, Tongtong Yu, Shengqi Shen, Jun Yang, Meijing Zhao, Kaiqi Huang, Bin Liang, Liang Wang
2022 arXiv   pre-print
With the breakthrough of AlphaGo, deep reinforcement learning becomes a recognized technique for solving sequential decision-making problems.  ...  learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning.  ...  Due to the structured computation pattern of deep learning algorithms, some successful distributed learning methods are proposed for parallelism in deep learning [20] , [21] .  ... 
arXiv:2212.00253v1 fatcat:2gymorseene2dezdgny5fddyde

Distributed Intelligence on the Edge-to-Cloud Continuum: A Systematic Literature Review

Daniel Rosendo, Alexandru Costan, Patrick Valduriez, Gabriel Antoniu
2022 Journal of Parallel and Distributed Computing  
This is necessary to help understand the performance trade-offs that result from combining a variety of learning paradigms and supportive frameworks.  ...  It describes the main learning paradigms enabling learning-based analytics on the Edge-to-Cloud Continuum.  ...  Marked with a ⋆ are the mostly exploited in the experiments. Framework/Library Qty.  ... 
doi:10.1016/j.jpdc.2022.04.004 fatcat:mopdegh4vrgt5k47vrmc7xum24

Database Meets Deep Learning

Wei Wang, Meihui Zhang, Gang Chen, H. V. Jagadish, Beng Chin Ooi, Kian-Lee Tan
2016 SIGMOD record  
In particular, we discuss possible improvements for deep learning systems from a database perspective, and analyze database applications that may benefit from deep learning techniques.  ...  Deep learning has recently become very popular on account of its incredible success in many complex data-driven applications, such as image classification and speech recognition.  ...  With the advancements of deep learning models in NLP [13] , it is opportune to consider deep learning for these problems.  ... 
doi:10.1145/3003665.3003669 fatcat:erig6kfsk5c3jmxintphi5fzl4

Corella: A Private Multi Server Learning Approach based on Correlated Queries [article]

Hamidreza Ehteram, Mohammad Ali Maddah-Ali, Mahtab Mirmohseni
2020 arXiv   pre-print
The proposed scheme relies on a cluster of servers, where at most T ∈N of them may collude, each running a learning model (e.g., a deep neural network).  ...  Simulation results for various datasets demonstrate the accuracy of the proposed approach for the classification, using deep neural networks, and the autoencoder, as supervised and unsupervised learning  ...  Indeed, the system gradually learns to exploit this opportunity during the training phase.  ... 
arXiv:2003.12052v2 fatcat:zbmbcjsln5d3xjynwtckdlg5sa

Deep Learning on FPGAs: Past, Present, and Future [article]

Griffin Lacey, Graham W. Taylor, Shawki Areibi
2016 arXiv   pre-print
Current trends in design tools for FPGAs have made them more compatible with the high-level software practices typically practiced in the deep learning community, making FPGAs more accessible to those  ...  how FPGAs may best serve the needs of the deep learning community moving forward.  ...  They are also capable of exploiting distributed on-chip memory, as well as large degrees of pipeline parallelism, which fit naturally with the feed-forward nature deep learning methods.  ... 
arXiv:1602.04283v1 fatcat:xffu7dm7ifbxjir7ivskhxozyi

Understanding Scalability and Fine-Grain Parallelism of Synchronous Data Parallel Training

Jiali Li, Bogdan Nicolae, Justin Wozniak, George Bosilca
2019 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)  
Index Terms-deep learning, data-parallel training, behavior analysis  ...  In the age of big data, deep learning has emerged as a powerful tool to extract insight and exploit its value, both in industry and scientific applications.  ...  Thanks to this ease-of-use, Horovod is one of the most widely used synchronous deep learning frameworks.  ... 
doi:10.1109/mlhpc49564.2019.00006 dblp:conf/sc/LiNWB19 fatcat:pcxwhll7xncrdp2m652gpx323u

De-specializing an HLS library for Deep Neural Networks: improvements upon hls4ml [article]

Serena Curzel, Nicolò Ghielmetti, Michele Fiorito, Fabrizio Ferrandi
2021 arXiv   pre-print
with the help of High-Level Synthesis. hls4ml is a framework that translates Deep Neural Networks into annotated C++ code for High-Level Synthesis, offering a complete and user-friendly design process  ...  The gap between high-level Machine Learning frameworks (e.g., Tensorflow, Pytorch) and low-level hardware design in Verilog/VHDL creates a barrier to widespread adoption of FPGAs, which can be overcome  ...  HLS TOOLS AND COMPILERS FOR DEEP LEARNING Extensive research has been published on specialized processors for Deep Learning, and many of them were implemented on FPGA.  ... 
arXiv:2103.13060v1 fatcat:pjcfqysla5fsdlggstud5rgxby
« Previous Showing results 1 — 15 out of 72,296 results