Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








2,100 Hits in 4.4 sec

A Linearly-Convergent Stochastic L-BFGS Algorithm [article]

Philipp Moritz, Robert Nishihara, Michael I. Jordan
2016 arXiv   pre-print
We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions.  ...  Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and  ...  Discussion This paper introduces a stochastic version of L-BFGS and proves a linear rate of convergence in the strongly convex case.  ... 
arXiv:1508.02087v2 fatcat:qffeae2oufgzzhxcmfa6jgknze

Quasi-Newton Methods: Superlinear Convergence Without Line Searches for Self-Concordant Functions [article]

Wenbo Gao, Donald Goldfarb
2018 arXiv   pre-print
We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant  ...  of stochastic gradient descent on stochastic optimization problems.  ...  Then the adaptive L-BFGS method is globally R-linearly convergent.  ... 
arXiv:1612.06965v3 fatcat:5dt4s3uemvatpl7enjv43rdcsm

Quasi-Newton methods: superlinear convergence without line searches for self-concordant functions

Wenbo Gao, Donald Goldfarb
2018 Optimization Methods and Software  
We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant  ...  of stochastic gradient descent on stochastic optimization problems.  ...  Then the adaptive L-BFGS method is globally R-linearly convergent.  ... 
doi:10.1080/10556788.2018.1510927 fatcat:xtmmqflsq5grplwt3guuysd6ty

Recent Advances in Stochastic Gradient Descent in Deep Learning

Yingjie Tian, Yuqi Zhang, Haibin Zhang
2023 Mathematics  
Finally, we propose theoretical conditions under which these methods are applicable and discover that there is still a gap between theoretical conditions under which the algorithms converge and practical  ...  Among machine learning models, stochastic gradient descent (SGD) is not only simple but also very effective.  ...  A combination of the stochastic method and BFGS. convergent Strong convex: globally - Others - structured quasi-Newton, variance reduced L-BFGS/block L-BFGS [127-129] Strong convex: variance reduced L-BFGS  ... 
doi:10.3390/math11030682 fatcat:6sqjnyl3xnfnpeyco7uvgyq22a

Stochastic Adaptive Quasi-Newton Methods for Minimizing Expected Values

Chaoxu Zhou, Wenbo Gao, Donald Goldfarb
2017 International Conference on Machine Learning  
We show that, given a suitable amount of sampling, the stochastic adaptive GD attains linear convergence in expectation, and with further sampling, the stochastic adaptive BFGS attains R-superlinear convergence  ...  We propose a novel class of stochastic, adaptive methods for minimizing self-concordant functions which can be expressed as an expected value.  ...  Linearly convergent stochastic Limited Memory BFGS algorithms (Byrd et al., 2016 ) (Moritz et al., 2016 ) (Gower et al., 2016) have also been proposed.  ... 
dblp:conf/icml/ZhouGG17 fatcat:vu5mzdhvpjdevbrfyk4g2seax4

Stochastic Damped L-BFGS with Controlled Norm of the Hessian Approximation [article]

Sanae Lotfi and Tiphaine Bonniot de Ruisselet and Dominique Orban and Andrea Lodi
2020 arXiv   pre-print
Our algorithm, VARCHEN, draws from previous work that proposed a novel stochastic damped L-BFGS algorithm called SdLBFGS.  ...  We propose a new stochastic variance-reduced damped L-BFGS algorithm, where we leverage estimates of bounds on the largest and smallest eigenvalues of the Hessian approximation to balance its quality and  ...  [33] proposed a stochastic damped L-BFGS (SdLBFGS) algorithm and proved almost sure convergence to a stationary point.  ... 
arXiv:2012.05783v1 fatcat:nwjtqfjjnbeqjcejbldgaul56i

Stochastic L-BFGS: Improved Convergence Rates and Practical Acceleration Strategies

Renbo Zhao, William Benjamin Haskell, Vincent Y. F. Tan
2018 IEEE Transactions on Signal Processing  
We revisit the stochastic limited-memory BFGS (L-BFGS) algorithm.  ...  By proposing a new framework for the convergence analysis, we prove improved convergence rates and computational complexities of the stochastic L-BFGS algorithms compared to previous works.  ...  In the literature (Gower et al., 2016; Moritz et al., 2016) , usually option I or II (in Algorithm 1) is analyzed theoretically to prove that the stochastic L-BFGS algorithms therein converge linearly  ... 
doi:10.1109/tsp.2017.2784360 fatcat:u7jv3vxpave7jkbxnmpm7gcp54

Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation [article]

Si Yi Meng, Sharan Vaswani, Issam Laradji, Mark Schmidt, Simon Lacoste-Julien
2020 arXiv   pre-print
We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping.  ...  Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence.  ...  A linearly-convergent stochastic L-BFGS algorithm. In Artificial Intelligence and Statistics, pages 249- 258, 2016. Yurii Nesterov. Lectures on convex optimization, vol- ume 137. Springer, 2018.  ... 
arXiv:1910.04920v2 fatcat:jfzvxawxdrcp3ocfbnxmzh4fi4

An Inexact Variable Metric Proximal Point Algorithm for Generic Quasi-Newton Acceleration [article]

Hongzhou Lin, Zaid Harchaoui
2019 arXiv   pre-print
When combined with limited-memory BFGS rules, QNing is particularly effective to solve high-dimensional optimization problems, while enjoying a worst-case linear convergence rate for strongly convex problems  ...  The proposed scheme, called QNing can be notably applied to incremental first-order methods such as the stochastic variance-reduced gradient descent algorithm (SVRG) and other randomized incremental optimization  ...  This work was supported by the ERC grant SOLARIS (number 714381), a grant from ANR (MACARON project ANR-14-CE23-0003-01), and the program "Learning in Machines and Brains" (CIFAR).  ... 
arXiv:1610.00960v4 fatcat:flhlxb6pa5athlm6enwswmy6zq

Stochastic Block BFGS: Squeezing More Curvature out of Data [article]

Robert M. Gower, Donald Goldfarb, Peter Richtárik
2016 arXiv   pre-print
We propose a novel limited-memory stochastic block BFGS update for incorporating enriched curvature information in stochastic approximation methods.  ...  We propose several sketching strategies, present a new quasi-Newton method that uses stochastic block BFGS updates combined with the variance reduction approach SVRG to compute batch stochastic gradients  ...  Convergence In this section we prove that Algorithm 1 converges linearly.  ... 
arXiv:1603.09649v1 fatcat:7tduh5ikgnchzmcz4n6mvxgo4a

Deep Reinforcement Learning via L-BFGS Optimization [article]

Jacob Rafati, Roummel F. Marcia
2019 arXiv   pre-print
Methods for solving the optimization problems in deep RL are restricted to the class of first-order algorithms, such as stochastic gradient descent (SGD).  ...  The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive definite Hessian approximations.  ...  We used stochastic line search L-BFGS method as the optimization method (Algorithm 2).  ... 
arXiv:1811.02693v2 fatcat:dgqrwcko5vbmddysmhch5fg244

Asynchronous Parallel Stochastic Quasi-Newton Methods [article]

Qianqian Tong, Guannan Liang, Xingyu Cai, Chunjiang Zhu, Jinbo Bi
2020 arXiv   pre-print
Adopting the variance reduction technique, a prior stochastic L-BFGS, which has not been designed for parallel computing, reaches a linear convergence rate.  ...  Unlike prior attempts, which parallelize only the calculation for gradient or the two-loop recursion of L-BFGS, our algorithm is the first one that truly parallelizes L-BFGS with a convergence guarantee  ...  L-BFGS) enjoy a superlinear convergence rate, while the stochastic version of quasi-Newton methods (including L-BFGS) will have a sublinear convergence rate in strongly convex optimization as a sacrifice  ... 
arXiv:2011.00667v1 fatcat:w2so7l3imna73aoyhwpn52qfeq

Practical Quasi-Newton Methods for Training Deep Neural Networks [article]

Donald Goldfarb, Yi Ren, Achraf Bahamou
2021 arXiv   pre-print
We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block-diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs).  ...  Consequently, computing and storing a full n × n BFGS approximation or storing a modest number of (step, change in gradient) vector pairs for use in an L-BFGS implementation is out of the question.  ...  In this section, we prove the convergence of Algorithm 5, a variant of K-BFGS(L). Algorithm 5 is very similar to our actual implementation of K-BFGS(L) (i.e.  ... 
arXiv:2006.08877v3 fatcat:oow4oj6kcvf6bpcvlaakluzlwm

Stochastic Second-Order Optimization via von Neumann Series [article]

Mojmir Mutny
2017 arXiv   pre-print
A stochastic iterative algorithm approximating second-order information using von Neumann series is discussed. We present convergence guarantees for strongly-convex and smooth functions.  ...  In numerical experiments, the behavior of the error is similar to the second-order algorithm L-BFGS, and improves the performance of LISSA for quadratic objective function.  ...  Figure 2 : 2 A numerical experiment comparing ISSA, LISSA, BFGS and L-BFGS algorithms. In a) the τ = 5 for ISSA and L-BFGS used last 5 gradient information as well.  ... 
arXiv:1612.04694v4 fatcat:5mr4n6lhy5crnojym2isn5z6li

Batch-Expansion Training: An Efficient Optimization Framework [article]

Michał Dereziński and Dhruv Mahajan and S. Sathiya Keerthi and S. V. N. Vishwanathan and Markus Weimer
2018 arXiv   pre-print
We propose Batch-Expansion Training (BET), a framework for running a batch optimizer on a gradually expanding dataset.  ...  As opposed to stochastic approaches, batches do not need to be resampled i.i.d. at every iteration, thus making BET more resource efficient in a distributed setting, and when disk-access is constrained  ...  Furthermore, the time complexity of performing a single iteration for many of those algorithms (including GD and L-BFGS) is linearly proportional to the data size.  ... 
arXiv:1704.06731v3 fatcat:zxjm4ezou5aptliuexs6bfcdc4
« Previous Showing results 1 — 15 out of 2,100 results