A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
On the Local Minima of the Empirical Risk
[article]
2018
arXiv
pre-print
Our objective is to find the ϵ-approximate local minima of the underlying function F while avoiding the shallow local minima---arising because of the tolerance ν---which exist only in f. ...
Population risk is always of primary interest in machine learning; however, learning algorithms only have access to the empirical risk. ...
In the context of empirical risk minimization, such a result would allow fewer samples to be taken while still providing a strong guarantee on avoiding local minima. ...
arXiv:1803.09357v2
fatcat:6zvclcqzanhdtl4fjfhsuufkii
Regularizing Neural Networks via Adversarial Model Perturbation
[article]
2021
arXiv
pre-print
This work proposes a new regularization scheme, based on the understanding that the flat local minima of the empirical risk cause the model to generalize better. ...
Comparing with most existing regularization schemes, AMP has strong theoretical justifications, in that minimizing the AMP loss can be shown theoretically to favour flat local minima of the empirical risk ...
Figure 1 (Figure 2 : 12 left) sketches an empirical risk curve L ERM , which contains two local minima, a sharp one on the left and a flat one on the right. ...
arXiv:2010.04925v4
fatcat:cscc5e5tczhurgh7ctz3ldimlq
Characterization of Excess Risk for Locally Strongly Convex Population Risk
[article]
2022
arXiv
pre-print
For non-convex problems with d model parameters such that d/n is smaller than a threshold independent of n, the order of (1/n) can be maintained if the empirical risk has no spurious local minima with ...
We establish upper bounds for the expected excess risk of models trained by proper iterative algorithms which approximate the local minima. ...
We first show a fact that the locally strongly convexity around the local minima of population risk (population local minima) can be generalized to the local minima of empirical risk (empirical local minima ...
arXiv:2012.02456v4
fatcat:5dscclksvzgcjb3cijfyp6357m
Theory II: Landscape of the Empirical Risk in Deep Learning
[article]
2017
arXiv
pre-print
We further experimentally explored and visualized the landscape of empirical risk of a DCNN on CIFAR-10 during the entire training process and especially the global minima. ...
Previous theoretical work on deep learning and neural network optimization tend to focus on avoiding saddle points and local minima. ...
The second part is about the landscape of the minima of the empirical risk: what can we say in general about global and local minima? ...
arXiv:1703.09833v2
fatcat:gpqjwxkajzcxhgi4cog7junc6y
Piecewise linear activations substantially shape the loss surfaces of neural networks
[article]
2020
arXiv
pre-print
We first prove that the loss surfaces of many neural networks have infinite spurious local minima which are defined as the local minima with higher empirical risks than the global minima. ...
The constructed spurious local minima are concentrated in one cell as a valley: they are connected with each other by a continuous path, on which empirical risk is invariant. ...
All local minima in a cell are concentrated as a local minimum valley: on a local minimum valley, all local minima are connected with each other by a continuous path, on which the empirical risk is invariant ...
arXiv:2003.12236v1
fatcat:r54rh2tczzbkhgbduhwcocl3zm
Distribution-Dependent Analysis of Gibbs-ERM Principle
[article]
2019
arXiv
pre-print
The first part of our analysis focuses on the localized excess risk in the vicinity of a fixed local minimizer. ...
This result is then extended to bounds on the global excess risk, by characterizing probabilities of local minima (and their complement) under Gibbs densities, a results which might be of independent interest ...
ACKNOWLEDGMENTS Authors would like to thank Olivier Bousquet, Sébastien Gerchinovitz, and Abbas Mehrabian for stimulating discussions on this work. ...
arXiv:1902.01846v1
fatcat:jro2br2slzg25fodq257s5wo24
The landscape of empirical risk for nonconvex losses
2018
Annals of Statistics
2 ), the empirical risk has exactly two local minima θ + , θ− related by an exchange of the two classes. ...
In general, gradient descent and other local optimization procedures are expected to converge to local minima of the empirical risk R n (θ ). ...
doi:10.1214/17-aos1637
fatcat:646bhqjaovclbdqcsuof2esoam
Theory of Deep Learning IIb: Optimization Properties of SGD
[article]
2018
arXiv
pre-print
In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. ...
The main new result in this paper is theoretical and experimental evidence for the following conjecture about SGD: SGD concentrates in probability -- like the classical Langevin equation -- on large volume ...
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the DGX-1 used for this research. ...
arXiv:1801.02254v1
fatcat:osy2wb6cojh3ze5yqybg3nf5zy
Page 149 of Neural Computation Vol. 7, Issue 1
[page]
1995
Neural Computation
While it is possible that the error functions studied contain local minima, a mathematical study of these local minima is beyond the scope of this paper. ...
Empirical Risk Minimization 149
Finally, in the Hebb learning algorithm (recently studied in a very similar context by Barkai ef al. 1993) one has
1 m w= a yx! ...
The rare event risk in African emerging stock markets
2011
Managerial Finance
Findings: The empirical results indicate that the GL distribution best fitted the empirical data over the period of study. ...
Purpose: To investigate the asymptotic distribution of the extreme daily stock returns in African stock markets over the period 1996 to 2007 and examine the implications for downside risk measurement. ...
The determination of these capital requirements is based on inputs provided by models (i.e. Value-at-Risk) which are based on distributional assumptions. ...
doi:10.1108/03074351111113324
fatcat:nxscip6wcvhsnoszfcmuk36xva
Are deep ResNets provably better than linear predictors?
[article]
2019
arXiv
pre-print
First, we show that there exist datasets for which all local minima of a fully-connected ReLU network are no better than the best linear predictor, whereas a ResNet has strictly better local minima. ...
Recent results in the literature indicate that a residual network (ResNet) composed of a single residual block outperforms linear predictors, in the sense that all local minima in its optimization landscape ...
Acknowledgments All the authors acknowledge support from DARPA Lagrange. Chulhee Yun also thanks Korea Foundation for Advanced Studies for their support. ...
arXiv:1907.03922v2
fatcat:4yf3rjp5oveujme23y5iotyy2q
Fast Rates for Empirical Risk Minimization of Strict Saddle Problems
[article]
2017
arXiv
pre-print
We derive bounds on the sample complexity of empirical risk minimization (ERM) in the context of minimizing non-convex risks that admit the strict saddle property. ...
Our results and techniques may pave the way for statistical analyses of additional strict saddle problems. ...
More generally, these works consider the task of approximating the k leading eigenvectors. It is not hard to extend our results to this task as well. ...
arXiv:1701.04271v4
fatcat:4ebqm7probetfoi5rb4uemmzbi
Optimization Landscapes of Wide Deep Neural Networks Are Benign
[article]
2021
arXiv
pre-print
We highlight the importance of constraints for such networks and show that constraint -- as well as unconstraint -- empirical-risk minimization over such networks has no confined points, that is, suboptimal ...
We analyze the optimization landscapes of deep learning with wide networks. ...
We prove that the optimization landscapes of empirical-risk minimizers over wide feedforward networks have no spurious local minima. ...
arXiv:2010.00885v2
fatcat:pp2cmwcr5bgfbgazc3mw7x7v7y
Small nonlinearities in activation functions create bad local minima in neural networks
[article]
2019
arXiv
pre-print
We investigate the loss surface of neural networks. We prove that even for one-hidden-layer networks with "slightest" nonlinearity, the empirical risks have spurious local minima in most cases. ...
.), for which there exists a bad local minimum. Our results make the least restrictive assumptions relative to existing results on spurious local optima in neural networks. ...
But success stories of deep learning suggest that local minima of the empirical risk could be close to global minima. Choromanska et al. ...
arXiv:1802.03487v4
fatcat:fuxyiuxmejem7o3tjd7xbejfb4
LOCAL ESTIMATION OF DYNAMIC COPULA MODELS
2010
International Journal of Theoretical and Applied Finance
Results indicate that volatility does affect the strength of dependence. The in-sample Value-at-Risk based on the dynamic model outperforms those based on the empirical estimates. ...
In this paper we exploit this stylized fact combined with local maximum likelihood estimation of copula models to analyze the dynamic joint behavior of series of financial log returns. ...
Acknowledgments The authors wish to thank the Editor and an anonymous referee whose comments and suggestions helped to greatly improve the quality of the paper. ...
doi:10.1142/s0219024910005759
fatcat:3rdqr2on2ffvnpbiee5ppcul24
« Previous
Showing results 1 — 15 out of 22,449 results