A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Gradient Descent Only Converges to Minimizers
2016
Annual Conference Computational Learning Theory
We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory. ...
Acknowledgements The authors would like to thank Chi Jin, Tengyu Ma, Robert Nishihara, Mahdi Soltanolkotabi, Yuekai Sun, Jonathan Taylor, and Yuchen Zhang for their insightful feedback. ...
However, a short-step gradient method will only converge to minimizers. ...
dblp:conf/colt/LeeSJR16
fatcat:3yskjaj27vfevgdqtaoo23iunu
Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions
[article]
2016
arXiv
pre-print
Given a non-convex twice differentiable cost function f, we prove that the set of initial conditions so that gradient descent converges to saddle points where \nabla^2 f has at least one strictly negative ...
Moreover, this result extends to forward-invariant convex subspaces, allowing for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce an upper bound on the allowable step-size. ...
We are also thankful to Nisheeth Vishnoi, on whose blog the article appeared, for pointing it out to us. ...
arXiv:1605.00405v2
fatcat:gcq5zhwd4vc6vmocwn5ntb7lku
Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions
2017
Innovations in Theoretical Computer Science
Given a twice continuously differentiable cost function f , we prove that the set of initial conditions so that gradient descent converges to saddle points where ∇ 2 f has at least one strictly negative ...
Moreover, this result extends to forward-invariant convex subspaces, allowing for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce an upper bound on the allowable step-size. ...
We are also thankful to Nisheeth Vishnoi, on whose blog the article appeared, for pointing it out to us. ...
doi:10.4230/lipics.itcs.2017.2
dblp:conf/innovations/PanageasP17
fatcat:7xw7ljqcpjdjfbzn7llzrfgyru
Minimizing Average of Loss Functions Using Gradient Descent and Stochastic Gradient Descent
2016
Dhaka University Journal of Science
This paper deals with minimizing average of loss functions using Gradient Descent (GD) and Stochastic Gradient Descent (SGD). ...
We present these two algorithms for minimizing average of a large number of smooth convex functions. ...
Gradient Descent Gradient descent 5 (also known as steepest descent) is an optimization technique for minimizing unconstrained multidimensional smooth convex function which starts with some initial parameters ...
doi:10.3329/dujs.v64i2.54490
fatcat:ary2gaindbgsbd2i2gywndjce4
Frank-Wolfe optimization for deep networks
[article]
2020
arXiv
pre-print
Although the optimization does converge, it does so slowly and not close to the speed of gradient descent. ...
In this paper, another optimization method, Frank-Wolfe optimization, is applied to a small deep network and compared to gradient descent. ...
They both converge to approximately 95% test accuracy, as do line search and gradient descent. ...
arXiv:2006.03960v1
fatcat:5sebass7jzgxjjik2mbn4zjcem
Gradient Descent Converges to Minimizers
[article]
2016
arXiv
pre-print
We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory. ...
Acknowledgements The authors would like to thank Chi Jin, Tengyu Ma, Robert Nishihara, Mahdi Soltanolkotabi, Yuekai Sun, Jonathan Taylor, and Yuchen Zhang for their insightful feedback. ...
However, a short-step gradient method will only converge to minimizers. Remark 4.2. ...
arXiv:1602.04915v2
fatcat:5tndnl2rffanzlylalwhv6v2uy
Minimizing Quantum Renyi Divergences via Mirror Descent with Polyak Step Size
[article]
2022
arXiv
pre-print
Numerical experiment results show that entropic mirror descent with the Polyak step size converges fast in minimizing quantum Renyi divergences. ...
To compute these quantities requires minimizing some order-α quantum Renyi divergences over the set of quantum states. ...
Previously, the Polyak step size was only considered for gradient descent-type methods. • In practice, numerical results show entropic mirror descent with the Polyak step size converges fast for minimizing ...
arXiv:2109.06054v2
fatcat:pjdtpjh665ghrmhj6pflgdvdle
Comparison Between Steepest Descent Method and Conjugate Gradient Method by Using Matlab
2021
Journal of Studies in Science and Engineering
The Steepest descent method and the Conjugate gradient method to minimize nonlinear functions have been studied in this work. ...
On the other hand, the Steepest descent method converges a function in less time than the Conjugate gradient method. ...
We obtain that the Steepest descent method requires less time than the Conjugate gradient method to minimize the function. ...
doi:10.53898/josse2021113
fatcat:tko2ztzb4zbd5iwj5pabxnkgsy
Step Size Matters in Deep Learning
[article]
2018
arXiv
pre-print
Training a neural network with the gradient descent algorithm gives rise to a discrete-time nonlinear dynamical system. ...
if the algorithm converges to an orbit. ...
Therefore, given a fixed step size δ, the gradient descent can converge to only a subset of the local optima, and there are always some solutions that the gradient descent cannot converge to independent ...
arXiv:1805.08890v2
fatcat:xypywxiypzcz3g3p4hsqhstbji
A Nonconvex Optimization Framework for Low Rank Matrix Estimation
2015
Advances in Neural Information Processing Systems
In particular, we prove that a broad class of nonconvex optimization algorithms, including alternating minimization and gradient-type methods, geometrically converge to the global optimum and exactly recover ...
Both the alternating exact minimization and alternating gradient descent algorithms attain linear rate of convergence for d = 600 and d = 900. ...
gradient descent, alternating exact minimization (i.e., alternating least squares or coordinate descent), as well as alternating gradient descent (i.e., coordinate gradient descent), which are shown in ...
pmid:28316458
pmcid:PMC5354472
fatcat:smlyigamd5dvrpb3ey2bu4lanq
Distributed Low-rank Matrix Factorization With Exact Consensus
2019
Neural Information Processing Systems
In spite of its nonconvexity, this problem has a well-behaved geometric landscape, permitting local search algorithms such as gradient descent to converge to global minimizers. ...
We identify conditions under which this new problem also has a well-behaved geometric landscape, and we propose an extension of distributed gradient descent (DGD) to solve this problem. ...
Thus, the question arises of whether local search algorithms such as gradient descent actually converge to a global minimizer of (13) . ...
dblp:conf/nips/ZhuLYTW19
fatcat:h5xeheanjvbf3pn7cf6d42mehe
Comparison of the conjugate gradient methods of Liu-Storey and Dai-Yuan
2017
Contemporary Engineerng Sciences
The purpose of this paper is to present the capabilities of the conjugate gradient methods based on the theoretical analysis of the gradient method, the precursor of the descent methods. ...
Different test systems are proposed to solve, in order to obtain a solution that can determine the speed of convergence of the conjugate address proposed by Liu-Storey and Dai-Yuan [1]. ...
We would like to thank the referee for his valuable suggestions that improved the presentation of this paper and our gratitude to the Department of Mathematics of the Universidad Tecnológica de Pereira ...
doi:10.12988/ces.2017.711189
fatcat:77klin6duzbsjlpyyp6o7gc54a
Non-approximability of constructive global ℒ^2 minimizers by gradient descent in Deep Learning
[article]
2023
arXiv
pre-print
approximated via the gradient descent flow. ...
We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL) networks. ...
Therefore, while C[x[Z(s)]] always converges to a stationary value of the cost function under the gradient descent flow, Z(s) cannot generally be assumed to converge to a minimizer Z * . ...
arXiv:2311.07065v1
fatcat:z6ib4exbsbhtxe6bgbel5temla
Accelerating Extreme Search of Multidimensional Functions Based on Natural Gradient Descent with Dirichlet Distributions
2022
Mathematics
We provide experiments on test functions in four- and three-dimensional spaces, where natural gradient descent proves its ability to converge in the neighborhood of global minimum. ...
The proposed algorithm is equipped with step-size adaptation, which allows it to obtain higher accuracy, taking a small number of iterations in the process of minimization, compared with the usual gradient ...
Acknowledgments: The authors would like to thank the North Caucasus Federal University for their support in the project competitions of scientific groups and individual scientists of the North Caucasus ...
doi:10.3390/math10193556
fatcat:s43iogi2jfbqzlwl6yy3evkryq
Dictionary Learning with Large Step Gradient Descent for Sparse Representations
[chapter]
2012
Lecture Notes in Computer Science
Olshausen and Field's Sparsenet algorithm relies on a fixed step projected gradient descent. With the right step, it can avoid local minima and converge towards the global minimum. ...
The problem then becomes to find the right step size. In this work we provide the expression of the optimal step for the gradient descent but the step we use is twice as large as the optimal step. ...
MOD finds that minimum in only one iteration, but if each Sparsenet dictionary update was allowed to iterate on its gradient descent with a well chosen step, it would converge towards the result of the ...
doi:10.1007/978-3-642-28551-6_29
fatcat:spenb3d5pvastb6jmupza2j3wi
« Previous
Showing results 1 — 15 out of 121,188 results