On the Local Minima of the Empirical Risk.

Our objective is to find the ϵ-approximate local minima of the underlying function F while avoiding the shallow local minima---arising because of the tolerance ν---which exist only in f. ... Population risk is always of primary interest in machine learning; however, learning algorithms only have access to the empirical risk. ... In the context of empirical risk minimization, such a result would allow fewer samples to be taken while still providing a strong guarantee on avoiding local minima. ...

arXiv:1803.09357v2 fatcat:6zvclcqzanhdtl4fjfhsuufkii

Multiple Versions

This work proposes a new regularization scheme, based on the understanding that the flat local minima of the empirical risk cause the model to generalize better. ... Comparing with most existing regularization schemes, AMP has strong theoretical justifications, in that minimizing the AMP loss can be shown theoretically to favour flat local minima of the empirical risk ... Figure 1 (Figure 2 : 12 left) sketches an empirical risk curve L ERM , which contains two local minima, a sharp one on the left and a flat one on the right. ...

arXiv:2010.04925v4 fatcat:cscc5e5tczhurgh7ctz3ldimlq

Multiple Versions

For non-convex problems with d model parameters such that d/n is smaller than a threshold independent of n, the order of (1/n) can be maintained if the empirical risk has no spurious local minima with ... We establish upper bounds for the expected excess risk of models trained by proper iterative algorithms which approximate the local minima. ... We first show a fact that the locally strongly convexity around the local minima of population risk (population local minima) can be generalized to the local minima of empirical risk (empirical local minima ...

arXiv:2012.02456v4 fatcat:5dscclksvzgcjb3cijfyp6357m

Multiple Versions

We further experimentally explored and visualized the landscape of empirical risk of a DCNN on CIFAR-10 during the entire training process and especially the global minima. ... Previous theoretical work on deep learning and neural network optimization tend to focus on avoiding saddle points and local minima. ... The second part is about the landscape of the minima of the empirical risk: what can we say in general about global and local minima? ...

arXiv:1703.09833v2 fatcat:gpqjwxkajzcxhgi4cog7junc6y

Multiple Versions

We first prove that the loss surfaces of many neural networks have infinite spurious local minima which are defined as the local minima with higher empirical risks than the global minima. ... The constructed spurious local minima are concentrated in one cell as a valley: they are connected with each other by a continuous path, on which empirical risk is invariant. ... All local minima in a cell are concentrated as a local minimum valley: on a local minimum valley, all local minima are connected with each other by a continuous path, on which the empirical risk is invariant ...

arXiv:2003.12236v1 fatcat:r54rh2tczzbkhgbduhwcocl3zm

The first part of our analysis focuses on the localized excess risk in the vicinity of a fixed local minimizer. ... This result is then extended to bounds on the global excess risk, by characterizing probabilities of local minima (and their complement) under Gibbs densities, a results which might be of independent interest ... ACKNOWLEDGMENTS Authors would like to thank Olivier Bousquet, Sébastien Gerchinovitz, and Abbas Mehrabian for stimulating discussions on this work. ...

arXiv:1902.01846v1 fatcat:jro2br2slzg25fodq257s5wo24

2 ), the empirical risk has exactly two local minima θ + , θ− related by an exchange of the two classes. ... In general, gradient descent and other local optimization procedures are expected to converge to local minima of the empirical risk R n (θ ). ...

doi:10.1214/17-aos1637 fatcat:646bhqjaovclbdqcsuof2esoam

Szczepanski

In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. ... The main new result in this paper is theoretical and experimental evidence for the following conjecture about SGD: SGD concentrates in probability -- like the classical Langevin equation -- on large volume ... We gratefully acknowledge the support of NVIDIA Corporation with the donation of the DGX-1 used for this research. ...

arXiv:1801.02254v1 fatcat:osy2wb6cojh3ze5yqybg3nf5zy

While it is possible that the error functions studied contain local minima, a mathematical study of these local minima is beyond the scope of this paper. ... Empirical Risk Minimization 149 Finally, in the Hebb learning algorithm (recently studied in a very similar context by Barkai ef al. 1993) one has 1 m w= a yx! ...

Findings: The empirical results indicate that the GL distribution best fitted the empirical data over the period of study. ... Purpose: To investigate the asymptotic distribution of the extreme daily stock returns in African stock markets over the period 1996 to 2007 and examine the implications for downside risk measurement. ... The determination of these capital requirements is based on inputs provided by models (i.e. Value-at-Risk) which are based on distributional assumptions. ...

doi:10.1108/03074351111113324 fatcat:nxscip6wcvhsnoszfcmuk36xva

First, we show that there exist datasets for which all local minima of a fully-connected ReLU network are no better than the best linear predictor, whereas a ResNet has strictly better local minima. ... Recent results in the literature indicate that a residual network (ResNet) composed of a single residual block outperforms linear predictors, in the sense that all local minima in its optimization landscape ... Acknowledgments All the authors acknowledge support from DARPA Lagrange. Chulhee Yun also thanks Korea Foundation for Advanced Studies for their support. ...

arXiv:1907.03922v2 fatcat:4yf3rjp5oveujme23y5iotyy2q

Multiple Versions

We derive bounds on the sample complexity of empirical risk minimization (ERM) in the context of minimizing non-convex risks that admit the strict saddle property. ... Our results and techniques may pave the way for statistical analyses of additional strict saddle problems. ... More generally, these works consider the task of approximating the k leading eigenvectors. It is not hard to extend our results to this task as well. ...

arXiv:1701.04271v4 fatcat:4ebqm7probetfoi5rb4uemmzbi

Multiple Versions

We highlight the importance of constraints for such networks and show that constraint -- as well as unconstraint -- empirical-risk minimization over such networks has no confined points, that is, suboptimal ... We analyze the optimization landscapes of deep learning with wide networks. ... We prove that the optimization landscapes of empirical-risk minimizers over wide feedforward networks have no spurious local minima. ...

arXiv:2010.00885v2 fatcat:pp2cmwcr5bgfbgazc3mw7x7v7y

Multiple Versions

We investigate the loss surface of neural networks. We prove that even for one-hidden-layer networks with "slightest" nonlinearity, the empirical risks have spurious local minima in most cases. ... .), for which there exists a bad local minimum. Our results make the least restrictive assumptions relative to existing results on spurious local optima in neural networks. ... But success stories of deep learning suggest that local minima of the empirical risk could be close to global minima. Choromanska et al. ...

arXiv:1802.03487v4 fatcat:fuxyiuxmejem7o3tjd7xbejfb4

Multiple Versions

Results indicate that volatility does affect the strength of dependence. The in-sample Value-at-Risk based on the dynamic model outperforms those based on the empirical estimates. ... In this paper we exploit this stylized fact combined with local maximum likelihood estimation of copula models to analyze the dynamic joint behavior of series of financial log returns. ... Acknowledgments The authors wish to thank the Editor and an anonymous referee whose comments and suggestions helped to greatly improve the quality of the paper. ...

doi:10.1142/s0219024910005759 fatcat:3rdqr2on2ffvnpbiee5ppcul24

On the Local Minima of the Empirical Risk [article]

Preserved Fulltext

Other Versions

Regularizing Neural Networks via Adversarial Model Perturbation [article]

Preserved Fulltext

Other Versions

Characterization of Excess Risk for Locally Strongly Convex Population Risk [article]

Preserved Fulltext

Other Versions

Theory II: Landscape of the Empirical Risk in Deep Learning [article]

Preserved Fulltext

Other Versions

Piecewise linear activations substantially shape the loss surfaces of neural networks [article]

Preserved Fulltext

Distribution-Dependent Analysis of Gibbs-ERM Principle [article]

Preserved Fulltext

The landscape of empirical risk for nonconvex losses

Preserved Fulltext

Theory of Deep Learning IIb: Optimization Properties of SGD [article]

Preserved Fulltext

Page 149 of Neural Computation Vol. 7, Issue 1 [page]

Preserved Fulltext

The rare event risk in African emerging stock markets

Preserved Fulltext

Are deep ResNets provably better than linear predictors? [article]

Preserved Fulltext

Other Versions

Fast Rates for Empirical Risk Minimization of Strict Saddle Problems [article]

Preserved Fulltext

Other Versions

Optimization Landscapes of Wide Deep Neural Networks Are Benign [article]

Preserved Fulltext

Small nonlinearities in activation functions create bad local minima in neural networks [article]

Preserved Fulltext

Other Versions

LOCAL ESTIMATION OF DYNAMIC COPULA MODELS

Preserved Fulltext