Exploiting Parallelism Opportunities with Deep Learning Frameworks.

This paper takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. ... State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning ... In this section, we describe how deep learning framework design choices (Section 2.1) exploit parallelism opportunities exposed in deep learning workloads (Section 2.2), and overview our framework parameter ...

arXiv:1908.04705v2 fatcat:fnmcly3f3vanvlc6hi6uxxj6pi

Open Access Multiple Versions

With a benchmark from deep learning-based cancerous region detection algorithm, good parallel efficiency are obtained for at most 1024 processors, revealing the great opportunity for joint combination ... of deep learning and HPC system. ... With a benchmark from deep learning-based cancerous region detection algorithm, good parallel efficiency are obtained for at most 1024 processors, revealing the great opportunity for joint combination ...

doi:10.1007/s42514-020-00053-6 fatcat:nthaiyn6m5eqvisdxwiz7r7u2m

The recent advances in Deep Learning (DL) has led to many exciting challenges and opportunities for CS and AI researchers alike. ... We will also present an overview of different DNN architectures and DL frameworks. Most DL frameworks started with a single-node design. ... His research interests include parallel computer architecture, high performance networking, InfiniBand, network-based computing, exascale computing, programming models, GPUs and accelerators, high performance ...

doi:10.1109/hoti51249.2020.00012 fatcat:ptxu3fuk5vghflln7ezaitoeyq

Transformer verification draws increasing attention in machine learning research and industry. ... It formally verifies the robustness of transformers against adversarial attacks such as exchanging words in a sentence with synonyms. ... Deep Learning Frameworks on GPUs GPUs have been widely exploited to accelerate deep learning workload [13, 39, 40, 46, 49] . ...

arXiv:2209.12708v1 fatcat:xe5aatchkvc7xdyd4btbc4ci2q

Open Access

Modern deep learning (DL) applications are built using DL libraries and frameworks such as TensorFlow and PyTorch. ... Manual tuning requires deep knowledge of the user-controllable parameters of DL frameworks as well as the underlying hardware. ... The availability of open-source deep learning software frameworks, such as PyTorch [11] and TensorFlow [1] , along with the suites of neural network models [15] enables fast deployment of deep learning ...

arXiv:2109.06266v1 fatcat:ixszcn5fo5dmfn442e33k34a7q

This work aims to assess the state of the art of data parallel deep neural network training, trying to identify potential research tracks to be exploited for performance improvement. ... Beside, it presents a design for a practical C++ library dedicated at implementing and unifying the current state of the art methodologies for parallel training in a performance-conscious framework, allowing ... In order to provide a truly general purpose tool, as well as to exploit the peculiarities of the different deep learning frameworks available, the proposed FAST (Flexible (A)synchronous Scalable Training ...

arXiv:1806.09528v1 fatcat:hakyhxanivfmfnqkgnyjlycsie

Established techniques for data parallel training are discussed from both a parallel computing and deep learning perspective, then a different approach is presented that is meant to allow DNN training ... This work presents a novel approach to distributed training of deep neural networks (DNNs) that aims to overcome the issues related to mainstream approaches to data parallel training. ... In order to provide a truly general purpose tool, as well as to exploit the peculiarities of the different deep learning frameworks available, the proposed FAST (Flexible (A)synchronous Scalable Training ...

doi:10.1109/empdp.2019.8671552 dblp:conf/pdp/VivianiDBCA19 fatcat:cf7wan67i5f63oipufn77r2pxi

These developments, along with existing HPC AI software capabilities, have been integrated into MagmaDNN, an open-source HPC deep learning framework. ... This paper discusses the necessities of an HPC deep learning framework and how those needs can be provided (e.g., as in MagmaDNN) through a deep integration with existing HPC libraries, such as MAGMA and ... Applications Materials Science and Microscopy There are multiple opportunities to exploit machine learning techniques in materials science. ...

arXiv:2011.11188v1 fatcat:zhmnfvhvbjhjzj72tu6ikrcpla

Open Access

With the breakthrough of AlphaGo, deep reinforcement learning becomes a recognized technique for solving sequential decision-making problems. ... learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. ... Due to the structured computation pattern of deep learning algorithms, some successful distributed learning methods are proposed for parallelism in deep learning [20] , [21] . ...

arXiv:2212.00253v1 fatcat:2gymorseene2dezdgny5fddyde

This is necessary to help understand the performance trade-offs that result from combining a variety of learning paradigms and supportive frameworks. ... It describes the main learning paradigms enabling learning-based analytics on the Edge-to-Cloud Continuum. ... Marked with a ⋆ are the mostly exploited in the experiments. Framework/Library Qty. ...

doi:10.1016/j.jpdc.2022.04.004 fatcat:mopdegh4vrgt5k47vrmc7xum24

Multiple Versions

In particular, we discuss possible improvements for deep learning systems from a database perspective, and analyze database applications that may benefit from deep learning techniques. ... Deep learning has recently become very popular on account of its incredible success in many complex data-driven applications, such as image classification and speech recognition. ... With the advancements of deep learning models in NLP [13] , it is opportune to consider deep learning for these problems. ...

doi:10.1145/3003665.3003669 fatcat:erig6kfsk5c3jmxintphi5fzl4

Multiple Versions

The proposed scheme relies on a cluster of servers, where at most T ∈N of them may collude, each running a learning model (e.g., a deep neural network). ... Simulation results for various datasets demonstrate the accuracy of the proposed approach for the classification, using deep neural networks, and the autoencoder, as supervised and unsupervised learning ... Indeed, the system gradually learns to exploit this opportunity during the training phase. ...

arXiv:2003.12052v2 fatcat:zbmbcjsln5d3xjynwtckdlg5sa

Multiple Versions

Current trends in design tools for FPGAs have made them more compatible with the high-level software practices typically practiced in the deep learning community, making FPGAs more accessible to those ... how FPGAs may best serve the needs of the deep learning community moving forward. ... They are also capable of exploiting distributed on-chip memory, as well as large degrees of pipeline parallelism, which fit naturally with the feed-forward nature deep learning methods. ...

arXiv:1602.04283v1 fatcat:xffu7dm7ifbxjir7ivskhxozyi

Index Terms-deep learning, data-parallel training, behavior analysis ... In the age of big data, deep learning has emerged as a powerful tool to extract insight and exploit its value, both in industry and scientific applications. ... Thanks to this ease-of-use, Horovod is one of the most widely used synchronous deep learning frameworks. ...

doi:10.1109/mlhpc49564.2019.00006 dblp:conf/sc/LiNWB19 fatcat:pcxwhll7xncrdp2m652gpx323u

with the help of High-Level Synthesis. hls4ml is a framework that translates Deep Neural Networks into annotated C++ code for High-Level Synthesis, offering a complete and user-friendly design process ... The gap between high-level Machine Learning frameworks (e.g., Tensorflow, Pytorch) and low-level hardware design in Verilog/VHDL creates a barrier to widespread adoption of FPGAs, which can be overcome ... HLS TOOLS AND COMPILERS FOR DEEP LEARNING Extensive research has been published on specialized processors for Deep Learning, and many of them were implemented on FPGA. ...

arXiv:2103.13060v1 fatcat:pjcfqysla5fsdlggstud5rgxby

Exploiting Parallelism Opportunities with Deep Learning Frameworks [article]

Preserved Fulltext

Other Versions

Editorial for the special issue on operating systems and programming systems for HPC

Preserved Fulltext

HOTI 2020 Commentary

Preserved Fulltext

Faith: An Efficient Framework for Transformer Verification on GPUs [article]

Preserved Fulltext

Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free Optimization Algorithms [article]

Preserved Fulltext

Pushing the boundaries of parallel Deep Learning -- A practical approach [article]

Preserved Fulltext

Deep Learning at Scale

Preserved Fulltext

Integrating Deep Learning in Domain Sciences at Exascale [article]

Preserved Fulltext

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox [article]

Preserved Fulltext

Distributed Intelligence on the Edge-to-Cloud Continuum: A Systematic Literature Review

Preserved Fulltext

Database Meets Deep Learning

Preserved Fulltext

Other Versions

Corella: A Private Multi Server Learning Approach based on Correlated Queries [article]

Preserved Fulltext

Other Versions

Deep Learning on FPGAs: Past, Present, and Future [article]

Preserved Fulltext

Understanding Scalability and Fine-Grain Parallelism of Synchronous Data Parallel Training

Preserved Fulltext

De-specializing an HLS library for Deep Neural Networks: improvements upon hls4ml [article]

Preserved Fulltext