A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points
[article]
2019
arXiv
pre-print
Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. ...
While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in machine learning have involved nonconvex optimization, and a ...
Acknowledgements We thank Tongyang Li and Quanquan Gu for valuable discussions. ...
arXiv:1902.04811v2
fatcat:rmdh2zan2vhdxbbzi6if2krnwe
On Nonconvex Optimization for Machine Learning
2021
Journal of the ACM
Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. ...
While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in machine learning have involved nonconvex optimization, and a ...
On Nonconvex Optimization for Machine Learning 11:3 Nonconvex optimization problems are intractable in general. ...
doi:10.1145/3418526
fatcat:tgzxmy5tmbaw7phpmfzwugt5ne
Adaptive Stochastic Gradient Langevin Dynamics: Taming Convergence and Saddle Point Escape Time
[article]
2018
arXiv
pre-print
Langevin dynamics(AGLD), for non-convex optimization problems. ...
In this paper, we propose a new adaptive stochastic gradient Langevin dynamics (ASGLD) algorithmic framework and its two specialized versions, namely adaptive stochastic gradient (ASG) and adaptive gradient ...
Related work In this section, we provide an overview on saddle point escape for non-convex learning, which has attracted a lot of attention in the machine learning community recently. ...
arXiv:1805.09416v1
fatcat:5qbqlfpfnnfkth3ogmlwzahw4a
Switch and Conquer: Efficient Algorithms By Switching Stochastic Gradient Oracles For Decentralized Saddle Point Problems
[article]
2023
arXiv
pre-print
Numerical experiments on two benchmark machine learning applications show C-DPSSG's competitive performance which validate our theoretical findings. ...
To tackle this, we develop a simple and effective switching idea, where a generalized stochastic gradient (GSG) computation oracle is employed to hasten the iterates' progress to a saddle point solution ...
Data Sets We rely on four binary classification datasets namely, a4a, phishing and ijcnn1 from https://www.csie.ntu.edu. tw/ ˜cjlin/libsvmtools/datasets/ and sido data from http://www.causality.inf.ethz.ch ...
arXiv:2309.00997v2
fatcat:m5crfnh3fzditcxon5uw4g3tta
PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization
2019
International Conference on Machine Learning
In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points ...
Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. ...
Many recent works have analyzed the saddle points in machine learning problems (Kawaguchi, 2016) . ...
dblp:conf/icml/LuHW19
fatcat:qdlx3v6hx5bwfnvgecdhkxh25e
A Cubic Regularization Approach for Finding Local Minimax Points in Nonconvex Minimax Optimization
[article]
2023
arXiv
pre-print
Gradient descent-ascent (GDA) is a widely used algorithm for minimax optimization. ...
However, GDA has been proved to converge to stationary points for nonconvex minimax optimization, which are suboptimal compared with local minimax points. ...
Although GDA can find stationary points in nonconvex minimax optimization, the stationary points may include candidate solutions that are far more sub-optimal than global minimax points, (e.g. saddle points ...
arXiv:2110.07098v5
fatcat:upsz2vcyhjfzdg5kd5zj6suj7u
Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning
[article]
2017
arXiv
pre-print
The latter half of the tutorial focuses on optimization algorithms, first for convex logistic regression, for which we discuss the use of first-order methods, the stochastic gradient method, variance reducing ...
The goal of this tutorial is to introduce key models, algorithms, and open questions related to the use of optimization methods for solving problems arising in machine learning. ...
It remains an open question how one can design stochastic-gradient-type methods to optimize parameters of a DNN so as to find good local minimizers and avoid poor local minimizers and/or saddle points. ...
arXiv:1706.10207v1
fatcat:mezejqzn3bgozjhgpafyick3xy
Online Partial Least Square Optimization: Dropping Convexity for Better Efficiency and Scalability
2017
International Conference on Machine Learning
Such a gap between theory and practice motivates us to study a nonconvex formulation for multiview representation learning, which can be efficiently solved by a simple stochastic gradient descent method ...
Multiview representation learning is popular for latent factor analysis. ...
We then provide a real data experiment for comparing the computational performance our nonconvex stochastic gradient algorithm for solving (2.1) with the convex stochastic gradient algorithm for solving ...
dblp:conf/icml/ChenYLZ17
fatcat:mhrlr4fm3vcrtly67fj72a4edi
Batch and online learning algorithms for nonconvex neyman-pearson classification
2011
ACM Transactions on Intelligent Systems and Technology
We investigated batch algorithm based on DC programming and stochastic gradient method well suited for large scale datasets. Empirical evidences illustrate the potential of the proposed methods. ...
NP classification is a nonconvex problem involving a constraint on false negatives rate. ...
The first algorithm leverages modern nonconvex optimization techniques [Tao and An 1998 ]. The second algorithm is a stochastic gradient algorithm suitable for very large datasets. ...
doi:10.1145/1961189.1961200
fatcat:yykk3w5gc5a37cjmvwrlxh3n7i
Recent Advances in Stochastic Gradient Descent in Deep Learning
2023
Mathematics
Among machine learning models, stochastic gradient descent (SGD) is not only simple but also very effective. ...
Following that, this study introduces several versions of SGD and its variant, which are already in the PyTorch optimizer, including SGD, Adagrad, adadelta, RMSprop, Adam, AdamW, and so on. ...
Stochastic gradient descent (SGD) [33] is simple and successful among machine learning models. ...
doi:10.3390/math11030682
fatcat:6sqjnyl3xnfnpeyco7uvgyq22a
Distributed Stochastic Nonconvex Optimization and Learning based on Successive Convex Approximation
[article]
2020
arXiv
pre-print
We study distributed stochastic nonconvex optimization in multi-agent networks. ...
The proposed method hinges on successive convex approximation (SCA) techniques, leveraging dynamic consensus as a mechanism to track the average gradient among the agents, and recursive averaging to recover ...
control and coordination, and distributed machine learning, just to name a few. ...
arXiv:2004.14882v1
fatcat:oat7muwqzvfovjtx5atmmvna7m
Dropping Convexity for More Efficient and Scalable Online Multiview Learning
[article]
2019
arXiv
pre-print
Such a gap between theory and practice motivates us to study a nonconvex formulation for multiview representation learning, which can be efficiently solved by a simple stochastic gradient descent (SGD) ...
It naturally arises in many data analysis, machine learning, and information retrieval applications to model dependent structures among multiple data sources. ...
Journal of Machine Learning Research 6 1817-1853. A , R., C , A., L , K. and S , N. (2012). Stochastic optimization for pca and pls. ...
arXiv:1702.08134v10
fatcat:nd4hsrzuvnbcpmkuhk3qsjuhiq
A Primer on Zeroth-Order Optimization in Signal Processing and Machine Learning
[article]
2020
arXiv
pre-print
Zeroth-order (ZO) optimization is a subset of gradient-free optimization that emerges in many signal processing and machine learning applications. ...
It is used for solving optimization problems similarly to gradient-based methods. However, it does not require the gradient, using only function evaluations. ...
The work [41] and [63] focused on stochastic optimization and deterministic optimization, respectively.
4) Constrained nonconvex optimization: The criterion for convergence is commonly determined ...
arXiv:2006.06224v2
fatcat:fx624eqhifbqpp5hbd5a5cmsny
Scaling-up Distributed Processing of Data Streams for Machine Learning
[article]
2020
arXiv
pre-print
Emerging applications of machine learning in numerous areas involve continuous gathering of and learning from streams of data. ...
In particular, it focuses on methods that solve: (i) distributed stochastic convex problems, and (ii) distributed principal component analysis, which is a nonconvex problem with geometric structure that ...
Nonconvex Problems. Nonconvex functions can have three types of critical points, defined as points w for which ∇f (w) = 0: saddle points, local minima, and global minima. ...
arXiv:2005.08854v2
fatcat:y6fvajvq2naajeqs6lo3trrgwy
Learning Rate Dropout
[article]
2019
arXiv
pre-print
The performance of a deep neural network is highly dependent on its training, and finding better local optimal solutions is the goal of many optimization algorithms. ...
In this work, we propose Learning Rate Dropout (LRD), a simple gradient descent technique for training related to coordinate descent. ...
By adding stochasticity to loss descent path, this technique helps the model to traverse quickly through the "transient" plateau (e.g. saddle points or local minima) and gives the model more chances to ...
arXiv:1912.00144v2
fatcat:drapxgb5uzdsfnlbwernlzloce
« Previous
Showing results 1 — 15 out of 1,179 results