Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








1,179 Hits in 5.0 sec

On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points [article]

Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael I. Jordan
2019 arXiv   pre-print
Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning.  ...  While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in machine learning have involved nonconvex optimization, and a  ...  Acknowledgements We thank Tongyang Li and Quanquan Gu for valuable discussions.  ... 
arXiv:1902.04811v2 fatcat:rmdh2zan2vhdxbbzi6if2krnwe

On Nonconvex Optimization for Machine Learning

Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, Michael I. Jordan
2021 Journal of the ACM  
Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning.  ...  While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in machine learning have involved nonconvex optimization, and a  ...  On Nonconvex Optimization for Machine Learning 11:3 Nonconvex optimization problems are intractable in general.  ... 
doi:10.1145/3418526 fatcat:tgzxmy5tmbaw7phpmfzwugt5ne

Adaptive Stochastic Gradient Langevin Dynamics: Taming Convergence and Saddle Point Escape Time [article]

Hejian Sang, Jia Liu
2018 arXiv   pre-print
Langevin dynamics(AGLD), for non-convex optimization problems.  ...  In this paper, we propose a new adaptive stochastic gradient Langevin dynamics (ASGLD) algorithmic framework and its two specialized versions, namely adaptive stochastic gradient (ASG) and adaptive gradient  ...  Related work In this section, we provide an overview on saddle point escape for non-convex learning, which has attracted a lot of attention in the machine learning community recently.  ... 
arXiv:1805.09416v1 fatcat:5qbqlfpfnnfkth3ogmlwzahw4a

Switch and Conquer: Efficient Algorithms By Switching Stochastic Gradient Oracles For Decentralized Saddle Point Problems [article]

Chhavi Sharma, Vishnu Narayanan, P. Balamurugan
2023 arXiv   pre-print
Numerical experiments on two benchmark machine learning applications show C-DPSSG's competitive performance which validate our theoretical findings.  ...  To tackle this, we develop a simple and effective switching idea, where a generalized stochastic gradient (GSG) computation oracle is employed to hasten the iterates' progress to a saddle point solution  ...  Data Sets We rely on four binary classification datasets namely, a4a, phishing and ijcnn1 from https://www.csie.ntu.edu. tw/ ˜cjlin/libsvmtools/datasets/ and sido data from http://www.causality.inf.ethz.ch  ... 
arXiv:2309.00997v2 fatcat:m5crfnh3fzditcxon5uw4g3tta

PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization

Songtao Lu, Mingyi Hong, Zhengdao Wang
2019 International Conference on Machine Learning  
In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points  ...  Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps.  ...  Many recent works have analyzed the saddle points in machine learning problems (Kawaguchi, 2016) .  ... 
dblp:conf/icml/LuHW19 fatcat:qdlx3v6hx5bwfnvgecdhkxh25e

A Cubic Regularization Approach for Finding Local Minimax Points in Nonconvex Minimax Optimization [article]

Ziyi Chen, Zhengyang Hu, Qunwei Li, Zhe Wang, Yi Zhou
2023 arXiv   pre-print
Gradient descent-ascent (GDA) is a widely used algorithm for minimax optimization.  ...  However, GDA has been proved to converge to stationary points for nonconvex minimax optimization, which are suboptimal compared with local minimax points.  ...  Although GDA can find stationary points in nonconvex minimax optimization, the stationary points may include candidate solutions that are far more sub-optimal than global minimax points, (e.g. saddle points  ... 
arXiv:2110.07098v5 fatcat:upsz2vcyhjfzdg5kd5zj6suj7u

Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning [article]

Frank E. Curtis, Katya Scheinberg
2017 arXiv   pre-print
The latter half of the tutorial focuses on optimization algorithms, first for convex logistic regression, for which we discuss the use of first-order methods, the stochastic gradient method, variance reducing  ...  The goal of this tutorial is to introduce key models, algorithms, and open questions related to the use of optimization methods for solving problems arising in machine learning.  ...  It remains an open question how one can design stochastic-gradient-type methods to optimize parameters of a DNN so as to find good local minimizers and avoid poor local minimizers and/or saddle points.  ... 
arXiv:1706.10207v1 fatcat:mezejqzn3bgozjhgpafyick3xy

Online Partial Least Square Optimization: Dropping Convexity for Better Efficiency and Scalability

Zhehui Chen, Lin F. Yang, Chris Junchi Li, Tuo Zhao
2017 International Conference on Machine Learning  
Such a gap between theory and practice motivates us to study a nonconvex formulation for multiview representation learning, which can be efficiently solved by a simple stochastic gradient descent method  ...  Multiview representation learning is popular for latent factor analysis.  ...  We then provide a real data experiment for comparing the computational performance our nonconvex stochastic gradient algorithm for solving (2.1) with the convex stochastic gradient algorithm for solving  ... 
dblp:conf/icml/ChenYLZ17 fatcat:mhrlr4fm3vcrtly67fj72a4edi

Batch and online learning algorithms for nonconvex neyman-pearson classification

Gilles Gasso, Aristidis Pappaioannou, Marina Spivak, Léon Bottou
2011 ACM Transactions on Intelligent Systems and Technology  
We investigated batch algorithm based on DC programming and stochastic gradient method well suited for large scale datasets. Empirical evidences illustrate the potential of the proposed methods.  ...  NP classification is a nonconvex problem involving a constraint on false negatives rate.  ...  The first algorithm leverages modern nonconvex optimization techniques [Tao and An 1998 ]. The second algorithm is a stochastic gradient algorithm suitable for very large datasets.  ... 
doi:10.1145/1961189.1961200 fatcat:yykk3w5gc5a37cjmvwrlxh3n7i

Recent Advances in Stochastic Gradient Descent in Deep Learning

Yingjie Tian, Yuqi Zhang, Haibin Zhang
2023 Mathematics  
Among machine learning models, stochastic gradient descent (SGD) is not only simple but also very effective.  ...  Following that, this study introduces several versions of SGD and its variant, which are already in the PyTorch optimizer, including SGD, Adagrad, adadelta, RMSprop, Adam, AdamW, and so on.  ...  Stochastic gradient descent (SGD) [33] is simple and successful among machine learning models.  ... 
doi:10.3390/math11030682 fatcat:6sqjnyl3xnfnpeyco7uvgyq22a

Distributed Stochastic Nonconvex Optimization and Learning based on Successive Convex Approximation [article]

Paolo Di Lorenzo, Simone Scardapane
2020 arXiv   pre-print
We study distributed stochastic nonconvex optimization in multi-agent networks.  ...  The proposed method hinges on successive convex approximation (SCA) techniques, leveraging dynamic consensus as a mechanism to track the average gradient among the agents, and recursive averaging to recover  ...  control and coordination, and distributed machine learning, just to name a few.  ... 
arXiv:2004.14882v1 fatcat:oat7muwqzvfovjtx5atmmvna7m

Dropping Convexity for More Efficient and Scalable Online Multiview Learning [article]

Zhehui Chen, Lin F. Yang, Chris J. Li, Tuo Zhao
2019 arXiv   pre-print
Such a gap between theory and practice motivates us to study a nonconvex formulation for multiview representation learning, which can be efficiently solved by a simple stochastic gradient descent (SGD)  ...  It naturally arises in many data analysis, machine learning, and information retrieval applications to model dependent structures among multiple data sources.  ...  Journal of Machine Learning Research 6 1817-1853. A , R., C , A., L , K. and S , N. (2012). Stochastic optimization for pca and pls.  ... 
arXiv:1702.08134v10 fatcat:nd4hsrzuvnbcpmkuhk3qsjuhiq

A Primer on Zeroth-Order Optimization in Signal Processing and Machine Learning [article]

Sijia Liu, Pin-Yu Chen, Bhavya Kailkhura, Gaoyuan Zhang, Alfred Hero, Pramod K. Varshney
2020 arXiv   pre-print
Zeroth-order (ZO) optimization is a subset of gradient-free optimization that emerges in many signal processing and machine learning applications.  ...  It is used for solving optimization problems similarly to gradient-based methods. However, it does not require the gradient, using only function evaluations.  ...  The work [41] and [63] focused on stochastic optimization and deterministic optimization, respectively. 4) Constrained nonconvex optimization: The criterion for convergence is commonly determined  ... 
arXiv:2006.06224v2 fatcat:fx624eqhifbqpp5hbd5a5cmsny

Scaling-up Distributed Processing of Data Streams for Machine Learning [article]

Matthew Nokleby, Haroon Raja, Waheed U. Bajwa
2020 arXiv   pre-print
Emerging applications of machine learning in numerous areas involve continuous gathering of and learning from streams of data.  ...  In particular, it focuses on methods that solve: (i) distributed stochastic convex problems, and (ii) distributed principal component analysis, which is a nonconvex problem with geometric structure that  ...  Nonconvex Problems. Nonconvex functions can have three types of critical points, defined as points w for which ∇f (w) = 0: saddle points, local minima, and global minima.  ... 
arXiv:2005.08854v2 fatcat:y6fvajvq2naajeqs6lo3trrgwy

Learning Rate Dropout [article]

Huangxing Lin, Weihong Zeng, Xinghao Ding, Yue Huang, Chenxi Huang and John Paisley
2019 arXiv   pre-print
The performance of a deep neural network is highly dependent on its training, and finding better local optimal solutions is the goal of many optimization algorithms.  ...  In this work, we propose Learning Rate Dropout (LRD), a simple gradient descent technique for training related to coordinate descent.  ...  By adding stochasticity to loss descent path, this technique helps the model to traverse quickly through the "transient" plateau (e.g. saddle points or local minima) and gives the model more chances to  ... 
arXiv:1912.00144v2 fatcat:drapxgb5uzdsfnlbwernlzloce
« Previous Showing results 1 — 15 out of 1,179 results