Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








1,007 Hits in 3.5 sec

Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [article]

Chris Junchi Li, Michael I. Jordan
2022 arXiv   pre-print
SSGD generalizes the idea of projected stochastic gradient descent and allows the use of scaled stochastic gradients instead of stochastic gradients.  ...  Motivated by the problem of online canonical correlation analysis, we propose the Stochastic Scaled-Gradient Descent (SSGD) algorithm for minimizing the expectation of a stochastic function over a generic  ...  Acknowledgements We thank the Department of Electrical Engineering and Computer Sciences at UC Berkeley for COVID-19 accommodations during which time this work is completed.  ... 
arXiv:2112.14738v2 fatcat:zkqqc4kifrbznb4ysidx5s5rci

On Nonconvex Optimization for Machine Learning

Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael I. Jordan
2021 Journal of the ACM  
Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning.  ...  While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in machine learning have involved nonconvex optimization, and a  ...  On Nonconvex Optimization for Machine Learning 11:3 Nonconvex optimization problems are intractable in general.  ... 
doi:10.1145/3418526 fatcat:tgzxmy5tmbaw7phpmfzwugt5ne

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima [article]

Yaodong Yu and Pan Xu and Quanquan Gu
2017 arXiv   pre-print
general stochastic optimization setting, where Õ(·) hides logarithm polynomial terms and constants.  ...  We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate  ...  General Stochastic Setting Now we consider the general nonconvex stochastic problem in (1.1).  ... 
arXiv:1712.06585v1 fatcat:vllxzjdwmbgebc5mmk6ub53mwi

The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems [article]

Vivak Patel
2018 arXiv   pre-print
We derive this mechanism based on a detailed analysis of a generic stochastic quadratic problem, which generalizes known results for classical gradient descent.  ...  In several experimental reports on nonconvex optimization problems in machine learning, stochastic gradient descent (SGD) was observed to prefer minimizers with flat basins in comparison to more deterministic  ...  We would also like to thank Mihai Anitescu for his general guidance throughout 29 the preparation of this work. Funding The author is supported by the NSF Research and Training Grant # 1547396.  ... 
arXiv:1709.04718v2 fatcat:qksvlnw2xzcuflt74rztqprjwi

On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points [article]

Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael I. Jordan
2019 arXiv   pre-print
Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning.  ...  While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in machine learning have involved nonconvex optimization, and a  ...  Acknowledgements We thank Tongyang Li and Quanquan Gu for valuable discussions.  ... 
arXiv:1902.04811v2 fatcat:rmdh2zan2vhdxbbzi6if2krnwe

PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization

Songtao Lu, Mingyi Hong, Zhengdao Wang
2019 International Conference on Machine Learning  
In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points  ...  Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps.  ...  There has been a line of work on stochastic gradient descent algorithms, where properly scaled Gaussian noise is added to the iterates of the gradient at each time (also known as stochastic gradient Langevin  ... 
dblp:conf/icml/LuHW19 fatcat:qdlx3v6hx5bwfnvgecdhkxh25e

Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization [article]

Zeyuan Allen-Zhu
2018 arXiv   pre-print
This line corresponds to momentum, and shows how to directly apply momentum to the finite-sum stochastic minimization of sum-of-nonconvex functions.  ...  The problem of minimizing sum-of-nonconvex functions (i.e., convex functions that are average of non-convex ones) is becoming increasingly important in machine learning, and is the core machinery for PCA  ...  The stochastic gradient descent (SGD) method gives a T ∝ ε −2 convergence rate to Problem (1.1), or a T ∝ (σε) −1 rate if f (x) is σ-strongly convex.  ... 
arXiv:1802.03866v1 fatcat:mz3czstycras5joro2qfnpbox4

Near-optimal stochastic approximation for online principal component estimation

Chris Junchi Li, Mengdi Wang, Han Liu, Tong Zhang
2017 Mathematical programming  
In this paper, we cast online PCA into a stochastic nonconvex optimization problem, and we analyze the online PCA algorithm as a stochastic approximation iteration.  ...  The stochastic approximation iteration processes data points incrementally and maintains a running estimate of the principal component.  ...  In contrast, Oja's iteration can be regarded as a projected stochastic gradient descent method.  ... 
doi:10.1007/s10107-017-1182-z fatcat:vdmlqya6ljejlhi2i275dbf2za

Large-Scale Phase Retrieval via Stochastic Reweighted Amplitude Flow

2020 KSII Transactions on Internet and Information Systems  
Phase retrieval, recovering a signal from phaseless measurements, is generally considered to be an NP-hard problem.  ...  This paper adopts an amplitude-based nonconvex optimization cost function to develop a new stochastic gradient algorithm, named stochastic reweighted phase retrieval (SRPR).  ...  Stochastic Reweighted Phase Retrieval SRPR is based on TAF [11] , and the stochastic gradient descent method is used in the initialization and gradient refinement stages.  ... 
doi:10.3837/tiis.2020.11.006 fatcat:coi53kkap5e5dpocyuph2nasai

Efficient Asynchronous Semi-stochastic Block Coordinate Descent Methods for Large-Scale SVD

Fanhua Shang, Zhihui Zhang, Yuanyuan Liu, Hongying Liua, Jing Xu
2021 IEEE Access  
In recent years, many stochastic variance reduction algorithms and randomized coordinate descent algorithms have been developed to efficiently solve the leading eigenvalue problem.  ...  Eigenvector computation such as Singular Value Decomposition (SVD) is one of the most fundamental problems in machine learning, optimization and numerical linear algebra.  ...  Unlike SVD, the problem of computing the top eigenvectors of PCA generally involves large-scale and dense matrices.  ... 
doi:10.1109/access.2021.3094282 fatcat:z6ptyfacpjer5lajuu4sx7doye

Accelerated Stochastic Quasi-Newton Optimization on Riemann Manifolds [article]

Anirban Roychowdhury
2017 arXiv   pre-print
of Karcher means for symmetric positive definite matrices and leading eigenvalues of large scale data matrices.  ...  We discuss a couple of ways to obtain the correction pairs used to calculate the product of the gradient with the inverse Hessian, and empirically demonstrate their use in synthetic experiments on computation  ...  Comparisons of rSVRG with Riemannian gradient descent methods, both batch and stochastic, can be found in [7] .  ... 
arXiv:1704.01700v3 fatcat:6mb7mq2bqvdlbjtigqr625g7x4

Exploiting Negative Curvature in Deterministic and Stochastic Optimization [article]

Frank E. Curtis, Daniel P. Robinson
2018 arXiv   pre-print
We also show that gains can be made in a stochastic setting in cases when a standard stochastic-gradient-type method might make slow progress.  ...  In this paper, we present new frameworks for combining descent and negative curvature directions: alternating two-step approaches and dynamic step approaches.  ...  Algorithm 4 Dynamic Method for Stochastic Optimization Require: x 1 ∈ R n and (L 1 , σ 1 ) ∈ (0, ∞) × (0, ∞) 1: for all k ∈ N + do 2: generate a stochastic gradient g k and stochastic Hessian H k 3: run  ... 
arXiv:1703.00412v3 fatcat:hh36t5xzgrdj5a5ppbi7aocsuu

A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization

Tianyi Liu, Zhehui Chen, Enlu Zhou, Tuo Zhao
2021 Stochastic Systems  
Momentum stochastic gradient descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning (e.g., training deep neural networks, variational Bayesian inference  ...  To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima.  ...  Shuangchi He and Prof. Le Chen for helpful suggestions.  ... 
doi:10.1287/stsy.2021.0083 fatcat:jk4p2hk6rzal7ninpvzcpgwyg4

Differentially private Riemannian optimization [article]

Andi Han, Bamdev Mishra, Pratik Jawanpuria, Junbin Gao
2022 arXiv   pre-print
We further show privacy guarantees of the proposed differentially private Riemannian (stochastic) gradient descent using an extension of the moments accountant technique.  ...  Additionally, we prove utility guarantees under geodesic (strongly) convex, general nonconvex objectives as well as under the Riemannian Polyak-Łojasiewicz condition.  ...  To show privacy guarantees of Rt , we consider gradient descent and stochastic gradient descent separately.  ... 
arXiv:2205.09494v1 fatcat:izioq6ycw5dctlyl77xypa2sd4

Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex Optimization [article]

Thomas O'Leary-Roseberry, Nick Alger, Omar Ghattas
2021 arXiv   pre-print
However, generally the analysis shows that second order methods with stochastic Hessian and gradient information may need to take small steps, unlike in deterministic problems.  ...  Additionally, due to perceived costs of forming and factorizing Hessians, second order methods are not used for these problems.  ...  Conversations with Anna Yesypenko, Alex Ames, Brendan Keith and Rachel Ward were helpful during the preparation of this manuscript.  ... 
arXiv:2002.02881v3 fatcat:u3us3hyhkjedxbr22je3nvq6d4
« Previous Showing results 1 — 15 out of 1,007 results