Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








121,188 Hits in 3.9 sec

Gradient Descent Only Converges to Minimizers

Jason D. Lee, Max Simchowitz, Michael I. Jordan, Benjamin Recht
2016 Annual Conference Computational Learning Theory  
We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory.  ...  Acknowledgements The authors would like to thank Chi Jin, Tengyu Ma, Robert Nishihara, Mahdi Soltanolkotabi, Yuekai Sun, Jonathan Taylor, and Yuchen Zhang for their insightful feedback.  ...  However, a short-step gradient method will only converge to minimizers.  ... 
dblp:conf/colt/LeeSJR16 fatcat:3yskjaj27vfevgdqtaoo23iunu

Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions [article]

Ioannis Panageas, Georgios Piliouras
2016 arXiv   pre-print
Given a non-convex twice differentiable cost function f, we prove that the set of initial conditions so that gradient descent converges to saddle points where \nabla^2 f has at least one strictly negative  ...  Moreover, this result extends to forward-invariant convex subspaces, allowing for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce an upper bound on the allowable step-size.  ...  We are also thankful to Nisheeth Vishnoi, on whose blog the article appeared, for pointing it out to us.  ... 
arXiv:1605.00405v2 fatcat:gcq5zhwd4vc6vmocwn5ntb7lku

Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions

Ioannis Panageas, Georgios Piliouras, Marc Herbstritt
2017 Innovations in Theoretical Computer Science  
Given a twice continuously differentiable cost function f , we prove that the set of initial conditions so that gradient descent converges to saddle points where ∇ 2 f has at least one strictly negative  ...  Moreover, this result extends to forward-invariant convex subspaces, allowing for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce an upper bound on the allowable step-size.  ...  We are also thankful to Nisheeth Vishnoi, on whose blog the article appeared, for pointing it out to us.  ... 
doi:10.4230/lipics.itcs.2017.2 dblp:conf/innovations/PanageasP17 fatcat:7xw7ljqcpjdjfbzn7llzrfgyru

Minimizing Average of Loss Functions Using Gradient Descent and Stochastic Gradient Descent

Md Rajib Arefin, M Asadujjaman
2016 Dhaka University Journal of Science  
This paper deals with minimizing average of loss functions using Gradient Descent (GD) and Stochastic Gradient Descent (SGD).  ...  We present these two algorithms for minimizing average of a large number of smooth convex functions.  ...  Gradient Descent Gradient descent 5 (also known as steepest descent) is an optimization technique for minimizing unconstrained multidimensional smooth convex function which starts with some initial parameters  ... 
doi:10.3329/dujs.v64i2.54490 fatcat:ary2gaindbgsbd2i2gywndjce4

Frank-Wolfe optimization for deep networks [article]

Jakob Stigenberg
2020 arXiv   pre-print
Although the optimization does converge, it does so slowly and not close to the speed of gradient descent.  ...  In this paper, another optimization method, Frank-Wolfe optimization, is applied to a small deep network and compared to gradient descent.  ...  They both converge to approximately 95% test accuracy, as do line search and gradient descent.  ... 
arXiv:2006.03960v1 fatcat:5sebass7jzgxjjik2mbn4zjcem

Gradient Descent Converges to Minimizers [article]

Jason D. Lee, Max Simchowitz, Michael I. Jordan, Benjamin Recht
2016 arXiv   pre-print
We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory.  ...  Acknowledgements The authors would like to thank Chi Jin, Tengyu Ma, Robert Nishihara, Mahdi Soltanolkotabi, Yuekai Sun, Jonathan Taylor, and Yuchen Zhang for their insightful feedback.  ...  However, a short-step gradient method will only converge to minimizers. Remark 4.2.  ... 
arXiv:1602.04915v2 fatcat:5tndnl2rffanzlylalwhv6v2uy

Minimizing Quantum Renyi Divergences via Mirror Descent with Polyak Step Size [article]

Jun-Kai You and Hao-Chung Cheng and Yen-Huan Li
2022 arXiv   pre-print
Numerical experiment results show that entropic mirror descent with the Polyak step size converges fast in minimizing quantum Renyi divergences.  ...  To compute these quantities requires minimizing some order-α quantum Renyi divergences over the set of quantum states.  ...  Previously, the Polyak step size was only considered for gradient descent-type methods. • In practice, numerical results show entropic mirror descent with the Polyak step size converges fast for minimizing  ... 
arXiv:2109.06054v2 fatcat:pjdtpjh665ghrmhj6pflgdvdle

Comparison Between Steepest Descent Method and Conjugate Gradient Method by Using Matlab

Dana Taha Mohammed Salih, Bawar Mohammed Faraj
2021 Journal of Studies in Science and Engineering  
The Steepest descent method and the Conjugate gradient method to minimize nonlinear functions have been studied in this work.  ...  On the other hand, the Steepest descent method converges a function in less time than the Conjugate gradient method.  ...  We obtain that the Steepest descent method requires less time than the Conjugate gradient method to minimize the function.  ... 
doi:10.53898/josse2021113 fatcat:tko2ztzb4zbd5iwj5pabxnkgsy

Step Size Matters in Deep Learning [article]

Kamil Nar, S. Shankar Sastry
2018 arXiv   pre-print
Training a neural network with the gradient descent algorithm gives rise to a discrete-time nonlinear dynamical system.  ...  if the algorithm converges to an orbit.  ...  Therefore, given a fixed step size δ, the gradient descent can converge to only a subset of the local optima, and there are always some solutions that the gradient descent cannot converge to independent  ... 
arXiv:1805.08890v2 fatcat:xypywxiypzcz3g3p4hsqhstbji

A Nonconvex Optimization Framework for Low Rank Matrix Estimation

Tuo Zhao, Zhaoran Wang, Han Liu
2015 Advances in Neural Information Processing Systems  
In particular, we prove that a broad class of nonconvex optimization algorithms, including alternating minimization and gradient-type methods, geometrically converge to the global optimum and exactly recover  ...  Both the alternating exact minimization and alternating gradient descent algorithms attain linear rate of convergence for d = 600 and d = 900.  ...  gradient descent, alternating exact minimization (i.e., alternating least squares or coordinate descent), as well as alternating gradient descent (i.e., coordinate gradient descent), which are shown in  ... 
pmid:28316458 pmcid:PMC5354472 fatcat:smlyigamd5dvrpb3ey2bu4lanq

Distributed Low-rank Matrix Factorization With Exact Consensus

Zhihui Zhu, Qiuwei Li, Xinshuo Yang, Gongguo Tang, Michael B. Wakin
2019 Neural Information Processing Systems  
In spite of its nonconvexity, this problem has a well-behaved geometric landscape, permitting local search algorithms such as gradient descent to converge to global minimizers.  ...  We identify conditions under which this new problem also has a well-behaved geometric landscape, and we propose an extension of distributed gradient descent (DGD) to solve this problem.  ...  Thus, the question arises of whether local search algorithms such as gradient descent actually converge to a global minimizer of (13) .  ... 
dblp:conf/nips/ZhuLYTW19 fatcat:h5xeheanjvbf3pn7cf6d42mehe

Comparison of the conjugate gradient methods of Liu-Storey and Dai-Yuan

Fernando Mesa, Pedro Pablo Cardenas Alzate, Carlos Alberto Rodriguez Varela
2017 Contemporary Engineerng Sciences  
The purpose of this paper is to present the capabilities of the conjugate gradient methods based on the theoretical analysis of the gradient method, the precursor of the descent methods.  ...  Different test systems are proposed to solve, in order to obtain a solution that can determine the speed of convergence of the conjugate address proposed by Liu-Storey and Dai-Yuan [1].  ...  We would like to thank the referee for his valuable suggestions that improved the presentation of this paper and our gratitude to the Department of Mathematics of the Universidad Tecnológica de Pereira  ... 
doi:10.12988/ces.2017.711189 fatcat:77klin6duzbsjlpyyp6o7gc54a

Non-approximability of constructive global ℒ^2 minimizers by gradient descent in Deep Learning [article]

Thomas Chen, Patricia Muñoz Ewald
2023 arXiv   pre-print
approximated via the gradient descent flow.  ...  We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL) networks.  ...  Therefore, while C[x[Z(s)]] always converges to a stationary value of the cost function under the gradient descent flow, Z(s) cannot generally be assumed to converge to a minimizer Z * .  ... 
arXiv:2311.07065v1 fatcat:z6ib4exbsbhtxe6bgbel5temla

Accelerating Extreme Search of Multidimensional Functions Based on Natural Gradient Descent with Dirichlet Distributions

Ruslan Abdulkadirov, Pavel Lyakhov, Nikolay Nagornov
2022 Mathematics  
We provide experiments on test functions in four- and three-dimensional spaces, where natural gradient descent proves its ability to converge in the neighborhood of global minimum.  ...  The proposed algorithm is equipped with step-size adaptation, which allows it to obtain higher accuracy, taking a small number of iterations in the process of minimization, compared with the usual gradient  ...  Acknowledgments: The authors would like to thank the North Caucasus Federal University for their support in the project competitions of scientific groups and individual scientists of the North Caucasus  ... 
doi:10.3390/math10193556 fatcat:s43iogi2jfbqzlwl6yy3evkryq

Dictionary Learning with Large Step Gradient Descent for Sparse Representations [chapter]

Boris Mailhé, Mark D. Plumbley
2012 Lecture Notes in Computer Science  
Olshausen and Field's Sparsenet algorithm relies on a fixed step projected gradient descent. With the right step, it can avoid local minima and converge towards the global minimum.  ...  The problem then becomes to find the right step size. In this work we provide the expression of the optimal step for the gradient descent but the step we use is twice as large as the optimal step.  ...  MOD finds that minimum in only one iteration, but if each Sparsenet dictionary update was allowed to iterate on its gradient descent with a well chosen step, it would converge towards the result of the  ... 
doi:10.1007/978-3-642-28551-6_29 fatcat:spenb3d5pvastb6jmupza2j3wi
« Previous Showing results 1 — 15 out of 121,188 results