Learning Halfspaces and Neural Networks with Random Initialization.

We study non-convex empirical risk minimization for learning halfspaces and neural networks. ... For loss functions that are L-Lipschitz continuous, we present algorithms to learn halfspaces and multi-layer neural networks that achieve arbitrarily small excess risk ϵ>0. ... MJ and YZ were partially supported by the U.S.ARL and the U.S.ARO under contract/grant number W911NF-11-1-0391. We thank Sivaraman Balakrishnan for helpful comments on an earlier draft. ...

arXiv:1511.07948v1 fatcat:3oiaouh33zc25d4wismghcdpra

We prove that SGD produces neural networks that have classification accuracy competitive with that of the best halfspace over the distribution for a broad class of distributions that includes log-concave ... To the best of our knowledge, this is the first work to show that overparameterized neural networks trained by SGD can generalize when the data is corrupted with adversarial label noise. ... We thank Maria-Florina Balcan for pointing us to a number of works on learning halfspaces in the presence of noise. ...

arXiv:2101.01152v3 fatcat:rhygrb6cmrcslbumz3panrv6ym

Multiple Versions

Our results include: hardness of learning shallow ReLU neural networks under the Gaussian distribution and other distributions; hardness of learning intersections of ω(1) halfspaces, DNF formulas with ... We also establish lower bounds on the complexity of learning intersections of a constant number of halfspaces, and ReLU networks with a constant number of hidden neurons. ... Acknowledgements We thank Benny Applebaum and anonymous reviewers for their valuable comments. This research is partially supported by ISF grant 2258/19. ...

arXiv:2101.08303v2 fatcat:mej6qudnvjeata6mhuyugmwjai

Multiple Versions

What distinguishes GLNs from contemporary neural networks is the distributed and local nature of their credit assignment mechanism; each neuron directly predicts the target, forgoing the ability to learn ... We show that this architecture gives rise to universal learning capabilities in the limit, with effective model capacity increasing as a function of network size in a manner comparable with deep ReLU networks ... Unlike contemporary neural networks, we demonstrate that the halfspace-gated GLN architecture and learning rule is naturally robust to catastrophic forgetting without any modifications or knowledge of ...

arXiv:1910.01526v2 fatcat:fbgnq4rfwzbspis4jmv6qeudgq

Multiple Versions

What distinguishes GLNs from contemporary neural networks is the distributed and local nature of their credit assignment mechanism; each neuron directly predicts the target, forgoing the ability to learn ... We show that this architecture gives rise to universal learning capabilities in the limit, with effective model capacity increasing as a function of network size in a manner comparable with deep ReLU networks ... Unlike contemporary neural networks, we demonstrate that the halfspace-gated GLN architecture and learning rule is naturally robust to catastrophic forgetting without any modifications or knowledge of ...

doi:10.1609/aaai.v35i11.17202 fatcat:c57f567vajdrnjxfbbzlwol7xm

Citation

Joel Veness, Tor Lattimore, David Budden, Avishkar Bhoopchand, Christopher Mattern, Agnieszka Grabska-Barwinska, Eren Sezener, Jianan Wang, Peter Toth, Simon Schmitt, Marcus Hutter. "Gated Linear Networks." PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE 35.11 (2021) 10015-10023

We construct two neural networks based on these hidden units and show that they correctly compute the given but arbitrary multiple-valued function. ... Preliminary experimental results are presented and discussed. Index Terms-Constructive algorithm, genetic algorithm, multiple-threshold perceptron, multiple-valued logic, neural network, partitioning. ... ACKNOWLEDGMENT The authors would like to thank the referees for their important and interesting suggestions. ...

doi:10.1109/72.914519 pmid:18244379 fatcat:ohwx3vafybeaphciww5ewsm7my

Neural networks have extensively been used before as approximators; in this work, we make a step further and use them for the first time as abstractions. ... By using neural ODEs with ReLU activation functions as abstractions, we cast the safety verification problem for nonlinear dynamical models into that of hybrid automata with affine dynamics, which we verify ... Alec was supported by the EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems (EP/S024050/1). ...

arXiv:2301.11683v1 fatcat:nhwahth75fhspowkvyu2lajjqq

We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multilayer feed-forward networks. ... We argue, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning. ... Hence, the hypothesis class of neural intersection of k/2 halfspaces is a subset of hypothesis class of feed-forward neural networks with k hidden units in a single hidden layer. ...

arXiv:1412.6614v4 fatcat:vsmrbijxrfd2zhgtirk5p32gfu

Multiple Versions

As a result of the theory, the vulnerability of neural networks to small adversarial perturbations is a logical consequence of the amount of test error observed. ... Despite substantial research interest, the cause of the phenomenon is still poorly understood and remains unsolved. ... Acknowledgments Special thanks to Surya Ganguli, Jascha Sohl-dickstein, Jeffrey Pennington, and Sam Smith for interesting discussions on this problem. ...

arXiv:1801.02774v3 fatcat:tneoo2mzpzbdxag4uzha6hecgm

Multiple Versions

Deep generative neural networks (DGNNs) have achieved realistic and high-quality data generation. ... We define generative boundaries which determine the activation of nodes in the internal layer and probe inside the model with this information. ... Explaining deep neural networks One can explain an output of neural networks by the sensitivity analysis, which aims to figure out which portion of an input contributes to the output. ...

arXiv:1912.05827v1 fatcat:w5gypgetnjbltdjb4omkpc6zfi

This is the first assumption-free, provably efficient algorithm for learning neural networks with two nonlinear layers. ... We give a polynomial-time algorithm for learning neural networks with one layer of sigmoids feeding into any Lipschitz, monotone activation function (e.g., sigmoid or ReLU). ... [ZLJ16] to obtain results for learning sparse neural networks with certain smooth activations, and Goel et al. ...

arXiv:1709.06010v4 fatcat:dfy27fty6vfwzagcbl7vbeq4h4

Multiple Versions

This algorithm is based on Gated Linear Networks (GLNs), a recently introduced deep learning architecture with properties well-suited to the online setting. ... We empirically evaluate GLCB compared to 9 state-of-the-art algorithms that leverage deep neural networks, on a standard benchmark suite of discrete and continuous contextual bandit problems. ... . • Neural Greedy estimates action-values with a neural network and follows -greedy policy. • Neural Linear utilizes a neural network to extract latent features, from which action values are estimated ...

arXiv:2002.11611v2 fatcat:vm65osogrrh2vbreila2kvrk3a

Multiple Versions

Agnostically learning halfspaces with a constant approximation ratio is hard. 3. Learning an intersection of ω(1) halfspaces is hard. ... There is essentially only one known approach to proving lower bounds on improper learning. It was initiated in (Kearns and Valiant 89) and relies on cryptographic assumptions. ... Acknowledgements: Amit Daniely is a recipient of the Google Europe Fellowship in Learning Theory, and this research is supported in part by this Google Fellowship. ...

arXiv:1311.2272v2 fatcat:4j35d76anjalradrkkois6fu7m

Multiple Versions

Deep generative neural networks (DGNNs) have achieved realistic and high-quality data generation. ... We define generative boundaries which determine the activation of nodes in the internal layer and probe inside the model with this information. ... Explaining deep neural networks One can explain an output of neural networks by the sensitivity analysis, which aims to figure out which portion of an input contributes to the output. ...

doi:10.1609/aaai.v34i04.5852 fatcat:q7vli524vzffpf5j5ppjxbeooq

We show that such learning systems are able to model simple dynamical systems and can be combined with additional deep generative models to learn complex dynamics, such as video textures, in a fully end-to-end ... The approach works by jointly learning a dynamics model and Lyapunov function that guarantees non-expansiveness of the dynamics under the learned Lyapunov function. ... Specifically, we letf be defined by a 2-100-100-2 fully connected network, and V be a 2-100-100-1 ICNN, with both networks initialized via the default weights of PyTorch (the Kaiming uniform initialization ...

arXiv:2001.06116v1 fatcat:5ixd3au4wjhplhifpwab7vtooy

Learning Halfspaces and Neural Networks with Random Initialization [article]

Preserved Fulltext

Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise [article]

Preserved Fulltext

Other Versions

From Local Pseudorandom Generators to Hardness of Learning [article]

Preserved Fulltext

Other Versions

Gated Linear Networks [article]

Preserved Fulltext

Other Versions

Gated Linear Networks

Preserved Fulltext

STRIP - a strip-based neural-network growth algorithm for learning multiple-valued functions

Preserved Fulltext

Neural Abstractions [article]

Preserved Fulltext

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning [article]

Preserved Fulltext

Other Versions

Adversarial Spheres [article]

Preserved Fulltext

Other Versions

An Efficient Explorative Sampling Considering the Generative Boundaries of Deep Generative Neural Networks [article]

Preserved Fulltext

Learning Neural Networks with Two Nonlinear Layers in Polynomial Time [article]

Preserved Fulltext

Other Versions

Online Learning in Contextual Bandits using Gated Linear Networks [article]

Preserved Fulltext

From average case complexity to improper learning complexity [article]

Preserved Fulltext

Other Versions

An Efficient Explorative Sampling Considering the Generative Boundaries of Deep Generative Neural Networks

Preserved Fulltext

Learning Stable Deep Dynamics Models [article]

Preserved Fulltext