Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems.

SSGD generalizes the idea of projected stochastic gradient descent and allows the use of scaled stochastic gradients instead of stochastic gradients. ... Motivated by the problem of online canonical correlation analysis, we propose the Stochastic Scaled-Gradient Descent (SSGD) algorithm for minimizing the expectation of a stochastic function over a generic ... Acknowledgements We thank the Department of Electrical Engineering and Computer Sciences at UC Berkeley for COVID-19 accommodations during which time this work is completed. ...

arXiv:2112.14738v2 fatcat:zkqqc4kifrbznb4ysidx5s5rci

Multiple Versions

Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. ... While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in machine learning have involved nonconvex optimization, and a ... On Nonconvex Optimization for Machine Learning 11:3 Nonconvex optimization problems are intractable in general. ...

doi:10.1145/3418526 fatcat:tgzxmy5tmbaw7phpmfzwugt5ne

general stochastic optimization setting, where Õ(·) hides logarithm polynomial terms and constants. ... We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate ... General Stochastic Setting Now we consider the general nonconvex stochastic problem in (1.1). ...

arXiv:1712.06585v1 fatcat:vllxzjdwmbgebc5mmk6ub53mwi

We derive this mechanism based on a detailed analysis of a generic stochastic quadratic problem, which generalizes known results for classical gradient descent. ... In several experimental reports on nonconvex optimization problems in machine learning, stochastic gradient descent (SGD) was observed to prefer minimizers with flat basins in comparison to more deterministic ... We would also like to thank Mihai Anitescu for his general guidance throughout 29 the preparation of this work. Funding The author is supported by the NSF Research and Training Grant # 1547396. ...

arXiv:1709.04718v2 fatcat:qksvlnw2xzcuflt74rztqprjwi

Multiple Versions

Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. ... While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in machine learning have involved nonconvex optimization, and a ... Acknowledgements We thank Tongyang Li and Quanquan Gu for valuable discussions. ...

arXiv:1902.04811v2 fatcat:rmdh2zan2vhdxbbzi6if2krnwe

Multiple Versions

In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points ... Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. ... There has been a line of work on stochastic gradient descent algorithms, where properly scaled Gaussian noise is added to the iterates of the gradient at each time (also known as stochastic gradient Langevin ...

dblp:conf/icml/LuHW19 fatcat:qdlx3v6hx5bwfnvgecdhkxh25e

This line corresponds to momentum, and shows how to directly apply momentum to the finite-sum stochastic minimization of sum-of-nonconvex functions. ... The problem of minimizing sum-of-nonconvex functions (i.e., convex functions that are average of non-convex ones) is becoming increasingly important in machine learning, and is the core machinery for PCA ... The stochastic gradient descent (SGD) method gives a T ∝ ε −2 convergence rate to Problem (1.1), or a T ∝ (σε) −1 rate if f (x) is σ-strongly convex. ...

arXiv:1802.03866v1 fatcat:mz3czstycras5joro2qfnpbox4

In this paper, we cast online PCA into a stochastic nonconvex optimization problem, and we analyze the online PCA algorithm as a stochastic approximation iteration. ... The stochastic approximation iteration processes data points incrementally and maintains a running estimate of the principal component. ... In contrast, Oja's iteration can be regarded as a projected stochastic gradient descent method. ...

doi:10.1007/s10107-017-1182-z fatcat:vdmlqya6ljejlhi2i275dbf2za

Szczepanski Multiple Versions

Phase retrieval, recovering a signal from phaseless measurements, is generally considered to be an NP-hard problem. ... This paper adopts an amplitude-based nonconvex optimization cost function to develop a new stochastic gradient algorithm, named stochastic reweighted phase retrieval (SRPR). ... Stochastic Reweighted Phase Retrieval SRPR is based on TAF [11] , and the stochastic gradient descent method is used in the initialization and gradient refinement stages. ...

doi:10.3837/tiis.2020.11.006 fatcat:coi53kkap5e5dpocyuph2nasai

In recent years, many stochastic variance reduction algorithms and randomized coordinate descent algorithms have been developed to efficiently solve the leading eigenvalue problem. ... Eigenvector computation such as Singular Value Decomposition (SVD) is one of the most fundamental problems in machine learning, optimization and numerical linear algebra. ... Unlike SVD, the problem of computing the top eigenvectors of PCA generally involves large-scale and dense matrices. ...

doi:10.1109/access.2021.3094282 fatcat:z6ptyfacpjer5lajuu4sx7doye

DOAJ

of Karcher means for symmetric positive definite matrices and leading eigenvalues of large scale data matrices. ... We discuss a couple of ways to obtain the correction pairs used to calculate the product of the gradient with the inverse Hessian, and empirically demonstrate their use in synthetic experiments on computation ... Comparisons of rSVRG with Riemannian gradient descent methods, both batch and stochastic, can be found in [7] . ...

arXiv:1704.01700v3 fatcat:6mb7mq2bqvdlbjtigqr625g7x4

Multiple Versions

We also show that gains can be made in a stochastic setting in cases when a standard stochastic-gradient-type method might make slow progress. ... In this paper, we present new frameworks for combining descent and negative curvature directions: alternating two-step approaches and dynamic step approaches. ... Algorithm 4 Dynamic Method for Stochastic Optimization Require: x 1 ∈ R n and (L 1 , σ 1 ) ∈ (0, ∞) × (0, ∞) 1: for all k ∈ N + do 2: generate a stochastic gradient g k and stochastic Hessian H k 3: run ...

arXiv:1703.00412v3 fatcat:hh36t5xzgrdj5a5ppbi7aocsuu

Multiple Versions

Momentum stochastic gradient descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning (e.g., training deep neural networks, variational Bayesian inference ... To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima. ... Shuangchi He and Prof. Le Chen for helpful suggestions. ...

doi:10.1287/stsy.2021.0083 fatcat:jk4p2hk6rzal7ninpvzcpgwyg4

DOAJ Szczepanski

We further show privacy guarantees of the proposed differentially private Riemannian (stochastic) gradient descent using an extension of the moments accountant technique. ... Additionally, we prove utility guarantees under geodesic (strongly) convex, general nonconvex objectives as well as under the Riemannian Polyak-Łojasiewicz condition. ... To show privacy guarantees of Rt , we consider gradient descent and stochastic gradient descent separately. ...

arXiv:2205.09494v1 fatcat:izioq6ycw5dctlyl77xypa2sd4

However, generally the analysis shows that second order methods with stochastic Hessian and gradient information may need to take small steps, unlike in deterministic problems. ... Additionally, due to perceived costs of forming and factorizing Hessians, second order methods are not used for these problems. ... Conversations with Anna Yesypenko, Alex Ames, Brendan Keith and Rachel Ward were helpful during the preparation of this manuscript. ...

arXiv:2002.02881v3 fatcat:u3us3hyhkjedxbr22je3nvq6d4

Multiple Versions

Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [article]

Preserved Fulltext

Other Versions

On Nonconvex Optimization for Machine Learning

Preserved Fulltext

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima [article]

Preserved Fulltext

The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems [article]

Preserved Fulltext

Other Versions

On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points [article]

Preserved Fulltext

Other Versions

PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization

Preserved Fulltext

Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization [article]

Preserved Fulltext

Near-optimal stochastic approximation for online principal component estimation

Preserved Fulltext

Other Versions

Large-Scale Phase Retrieval via Stochastic Reweighted Amplitude Flow

Preserved Fulltext

Efficient Asynchronous Semi-stochastic Block Coordinate Descent Methods for Large-Scale SVD

Preserved Fulltext

Accelerated Stochastic Quasi-Newton Optimization on Riemann Manifolds [article]

Preserved Fulltext

Other Versions

Exploiting Negative Curvature in Deterministic and Stochastic Optimization [article]

Preserved Fulltext

Other Versions

A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization

Preserved Fulltext

Differentially private Riemannian optimization [article]

Preserved Fulltext

Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex Optimization [article]

Preserved Fulltext

Other Versions