A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems
[article]
2022
arXiv
pre-print
SSGD generalizes the idea of projected stochastic gradient descent and allows the use of scaled stochastic gradients instead of stochastic gradients. ...
Motivated by the problem of online canonical correlation analysis, we propose the Stochastic Scaled-Gradient Descent (SSGD) algorithm for minimizing the expectation of a stochastic function over a generic ...
Acknowledgements We thank the Department of Electrical Engineering and Computer Sciences at UC Berkeley for COVID-19 accommodations during which time this work is completed. ...
arXiv:2112.14738v2
fatcat:zkqqc4kifrbznb4ysidx5s5rci
On Nonconvex Optimization for Machine Learning
2021
Journal of the ACM
Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. ...
While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in machine learning have involved nonconvex optimization, and a ...
On Nonconvex Optimization for Machine Learning 11:3 Nonconvex optimization problems are intractable in general. ...
doi:10.1145/3418526
fatcat:tgzxmy5tmbaw7phpmfzwugt5ne
Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima
[article]
2017
arXiv
pre-print
general stochastic optimization setting, where Õ(·) hides logarithm polynomial terms and constants. ...
We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate ...
General Stochastic Setting Now we consider the general nonconvex stochastic problem in (1.1). ...
arXiv:1712.06585v1
fatcat:vllxzjdwmbgebc5mmk6ub53mwi
The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems
[article]
2018
arXiv
pre-print
We derive this mechanism based on a detailed analysis of a generic stochastic quadratic problem, which generalizes known results for classical gradient descent. ...
In several experimental reports on nonconvex optimization problems in machine learning, stochastic gradient descent (SGD) was observed to prefer minimizers with flat basins in comparison to more deterministic ...
We would also like to thank Mihai Anitescu for his general guidance throughout 29 the preparation of this work.
Funding The author is supported by the NSF Research and Training Grant # 1547396. ...
arXiv:1709.04718v2
fatcat:qksvlnw2xzcuflt74rztqprjwi
On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points
[article]
2019
arXiv
pre-print
Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. ...
While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in machine learning have involved nonconvex optimization, and a ...
Acknowledgements We thank Tongyang Li and Quanquan Gu for valuable discussions. ...
arXiv:1902.04811v2
fatcat:rmdh2zan2vhdxbbzi6if2krnwe
PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization
2019
International Conference on Machine Learning
In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points ...
Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. ...
There has been a line of work on stochastic gradient descent algorithms, where properly scaled Gaussian noise is added to the iterates of the gradient at each time (also known as stochastic gradient Langevin ...
dblp:conf/icml/LuHW19
fatcat:qdlx3v6hx5bwfnvgecdhkxh25e
Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization
[article]
2018
arXiv
pre-print
This line corresponds to momentum, and shows how to directly apply momentum to the finite-sum stochastic minimization of sum-of-nonconvex functions. ...
The problem of minimizing sum-of-nonconvex functions (i.e., convex functions that are average of non-convex ones) is becoming increasingly important in machine learning, and is the core machinery for PCA ...
The stochastic gradient descent (SGD) method gives a T ∝ ε −2 convergence rate to Problem (1.1), or a T ∝ (σε) −1 rate if f (x) is σ-strongly convex. ...
arXiv:1802.03866v1
fatcat:mz3czstycras5joro2qfnpbox4
Near-optimal stochastic approximation for online principal component estimation
2017
Mathematical programming
In this paper, we cast online PCA into a stochastic nonconvex optimization problem, and we analyze the online PCA algorithm as a stochastic approximation iteration. ...
The stochastic approximation iteration processes data points incrementally and maintains a running estimate of the principal component. ...
In contrast, Oja's iteration can be regarded as a projected stochastic gradient descent method. ...
doi:10.1007/s10107-017-1182-z
fatcat:vdmlqya6ljejlhi2i275dbf2za
Large-Scale Phase Retrieval via Stochastic Reweighted Amplitude Flow
2020
KSII Transactions on Internet and Information Systems
Phase retrieval, recovering a signal from phaseless measurements, is generally considered to be an NP-hard problem. ...
This paper adopts an amplitude-based nonconvex optimization cost function to develop a new stochastic gradient algorithm, named stochastic reweighted phase retrieval (SRPR). ...
Stochastic Reweighted Phase Retrieval SRPR is based on TAF [11] , and the stochastic gradient descent method is used in the initialization and gradient refinement stages. ...
doi:10.3837/tiis.2020.11.006
fatcat:coi53kkap5e5dpocyuph2nasai
Efficient Asynchronous Semi-stochastic Block Coordinate Descent Methods for Large-Scale SVD
2021
IEEE Access
In recent years, many stochastic variance reduction algorithms and randomized coordinate descent algorithms have been developed to efficiently solve the leading eigenvalue problem. ...
Eigenvector computation such as Singular Value Decomposition (SVD) is one of the most fundamental problems in machine learning, optimization and numerical linear algebra. ...
Unlike SVD, the problem of computing the top eigenvectors of PCA generally involves large-scale and dense matrices. ...
doi:10.1109/access.2021.3094282
fatcat:z6ptyfacpjer5lajuu4sx7doye
Accelerated Stochastic Quasi-Newton Optimization on Riemann Manifolds
[article]
2017
arXiv
pre-print
of Karcher means for symmetric positive definite matrices and leading eigenvalues of large scale data matrices. ...
We discuss a couple of ways to obtain the correction pairs used to calculate the product of the gradient with the inverse Hessian, and empirically demonstrate their use in synthetic experiments on computation ...
Comparisons of rSVRG with Riemannian gradient descent methods, both batch and stochastic, can be found in [7] . ...
arXiv:1704.01700v3
fatcat:6mb7mq2bqvdlbjtigqr625g7x4
Exploiting Negative Curvature in Deterministic and Stochastic Optimization
[article]
2018
arXiv
pre-print
We also show that gains can be made in a stochastic setting in cases when a standard stochastic-gradient-type method might make slow progress. ...
In this paper, we present new frameworks for combining descent and negative curvature directions: alternating two-step approaches and dynamic step approaches. ...
Algorithm 4 Dynamic Method for Stochastic Optimization Require: x 1 ∈ R n and (L 1 , σ 1 ) ∈ (0, ∞) × (0, ∞) 1: for all k ∈ N + do 2: generate a stochastic gradient g k and stochastic Hessian H k 3: run ...
arXiv:1703.00412v3
fatcat:hh36t5xzgrdj5a5ppbi7aocsuu
A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization
2021
Stochastic Systems
Momentum stochastic gradient descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning (e.g., training deep neural networks, variational Bayesian inference ...
To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima. ...
Shuangchi He and Prof. Le Chen for helpful suggestions. ...
doi:10.1287/stsy.2021.0083
fatcat:jk4p2hk6rzal7ninpvzcpgwyg4
Differentially private Riemannian optimization
[article]
2022
arXiv
pre-print
We further show privacy guarantees of the proposed differentially private Riemannian (stochastic) gradient descent using an extension of the moments accountant technique. ...
Additionally, we prove utility guarantees under geodesic (strongly) convex, general nonconvex objectives as well as under the Riemannian Polyak-Łojasiewicz condition. ...
To show privacy guarantees of Rt , we consider gradient descent and stochastic gradient descent separately. ...
arXiv:2205.09494v1
fatcat:izioq6ycw5dctlyl77xypa2sd4
Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex Optimization
[article]
2021
arXiv
pre-print
However, generally the analysis shows that second order methods with stochastic Hessian and gradient information may need to take small steps, unlike in deterministic problems. ...
Additionally, due to perceived costs of forming and factorizing Hessians, second order methods are not used for these problems. ...
Conversations with Anna Yesypenko, Alex Ames, Brendan Keith and Rachel Ward were helpful during the preparation of this manuscript. ...
arXiv:2002.02881v3
fatcat:u3us3hyhkjedxbr22je3nvq6d4
« Previous
Showing results 1 — 15 out of 1,007 results