A Linearly-Convergent Stochastic L-BFGS Algorithm.

We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions. ... Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and ... Discussion This paper introduces a stochastic version of L-BFGS and proves a linear rate of convergence in the strongly convex case. ...

arXiv:1508.02087v2 fatcat:qffeae2oufgzzhxcmfa6jgknze

Multiple Versions

We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant ... of stochastic gradient descent on stochastic optimization problems. ... Then the adaptive L-BFGS method is globally R-linearly convergent. ...

arXiv:1612.06965v3 fatcat:5dt4s3uemvatpl7enjv43rdcsm

Multiple Versions

We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant ... of stochastic gradient descent on stochastic optimization problems. ... Then the adaptive L-BFGS method is globally R-linearly convergent. ...

doi:10.1080/10556788.2018.1510927 fatcat:xtmmqflsq5grplwt3guuysd6ty

Finally, we propose theoretical conditions under which these methods are applicable and discover that there is still a gap between theoretical conditions under which the algorithms converge and practical ... Among machine learning models, stochastic gradient descent (SGD) is not only simple but also very effective. ... A combination of the stochastic method and BFGS. convergent Strong convex: globally - Others - structured quasi-Newton, variance reduced L-BFGS/block L-BFGS [127-129] Strong convex: variance reduced L-BFGS ...

doi:10.3390/math11030682 fatcat:6sqjnyl3xnfnpeyco7uvgyq22a

DOAJ Szczepanski

We show that, given a suitable amount of sampling, the stochastic adaptive GD attains linear convergence in expectation, and with further sampling, the stochastic adaptive BFGS attains R-superlinear convergence ... We propose a novel class of stochastic, adaptive methods for minimizing self-concordant functions which can be expressed as an expected value. ... Linearly convergent stochastic Limited Memory BFGS algorithms (Byrd et al., 2016 ) (Moritz et al., 2016 ) (Gower et al., 2016) have also been proposed. ...

dblp:conf/icml/ZhouGG17 fatcat:vu5mzdhvpjdevbrfyk4g2seax4

Our algorithm, VARCHEN, draws from previous work that proposed a novel stochastic damped L-BFGS algorithm called SdLBFGS. ... We propose a new stochastic variance-reduced damped L-BFGS algorithm, where we leverage estimates of bounds on the largest and smallest eigenvalues of the Hessian approximation to balance its quality and ... [33] proposed a stochastic damped L-BFGS (SdLBFGS) algorithm and proved almost sure convergence to a stationary point. ...

arXiv:2012.05783v1 fatcat:nwjtqfjjnbeqjcejbldgaul56i

Open Access Multiple Versions

We revisit the stochastic limited-memory BFGS (L-BFGS) algorithm. ... By proposing a new framework for the convergence analysis, we prove improved convergence rates and computational complexities of the stochastic L-BFGS algorithms compared to previous works. ... In the literature (Gower et al., 2016; Moritz et al., 2016) , usually option I or II (in Algorithm 1) is analyzed theoretically to prove that the stochastic L-BFGS algorithms therein converge linearly ...

doi:10.1109/tsp.2017.2784360 fatcat:u7jv3vxpave7jkbxnmpm7gcp54

Multiple Versions

We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. ... Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. ... A linearly-convergent stochastic L-BFGS algorithm. In Artificial Intelligence and Statistics, pages 249- 258, 2016. Yurii Nesterov. Lectures on convex optimization, vol- ume 137. Springer, 2018. ...

arXiv:1910.04920v2 fatcat:jfzvxawxdrcp3ocfbnxmzh4fi4

Multiple Versions

When combined with limited-memory BFGS rules, QNing is particularly effective to solve high-dimensional optimization problems, while enjoying a worst-case linear convergence rate for strongly convex problems ... The proposed scheme, called QNing can be notably applied to incremental first-order methods such as the stochastic variance-reduced gradient descent algorithm (SVRG) and other randomized incremental optimization ... This work was supported by the ERC grant SOLARIS (number 714381), a grant from ANR (MACARON project ANR-14-CE23-0003-01), and the program "Learning in Machines and Brains" (CIFAR). ...

arXiv:1610.00960v4 fatcat:flhlxb6pa5athlm6enwswmy6zq

Multiple Versions

We propose a novel limited-memory stochastic block BFGS update for incorporating enriched curvature information in stochastic approximation methods. ... We propose several sketching strategies, present a new quasi-Newton method that uses stochastic block BFGS updates combined with the variance reduction approach SVRG to compute batch stochastic gradients ... Convergence In this section we prove that Algorithm 1 converges linearly. ...

arXiv:1603.09649v1 fatcat:7tduh5ikgnchzmcz4n6mvxgo4a

Methods for solving the optimization problems in deep RL are restricted to the class of first-order algorithms, such as stochastic gradient descent (SGD). ... The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive definite Hessian approximations. ... We used stochastic line search L-BFGS method as the optimization method (Algorithm 2). ...

arXiv:1811.02693v2 fatcat:dgqrwcko5vbmddysmhch5fg244

Multiple Versions

Adopting the variance reduction technique, a prior stochastic L-BFGS, which has not been designed for parallel computing, reaches a linear convergence rate. ... Unlike prior attempts, which parallelize only the calculation for gradient or the two-loop recursion of L-BFGS, our algorithm is the first one that truly parallelizes L-BFGS with a convergence guarantee ... L-BFGS) enjoy a superlinear convergence rate, while the stochastic version of quasi-Newton methods (including L-BFGS) will have a sublinear convergence rate in strongly convex optimization as a sacrifice ...

arXiv:2011.00667v1 fatcat:w2so7l3imna73aoyhwpn52qfeq

We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block-diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs). ... Consequently, computing and storing a full n × n BFGS approximation or storing a modest number of (step, change in gradient) vector pairs for use in an L-BFGS implementation is out of the question. ... In this section, we prove the convergence of Algorithm 5, a variant of K-BFGS(L). Algorithm 5 is very similar to our actual implementation of K-BFGS(L) (i.e. ...

arXiv:2006.08877v3 fatcat:oow4oj6kcvf6bpcvlaakluzlwm

Multiple Versions

A stochastic iterative algorithm approximating second-order information using von Neumann series is discussed. We present convergence guarantees for strongly-convex and smooth functions. ... In numerical experiments, the behavior of the error is similar to the second-order algorithm L-BFGS, and improves the performance of LISSA for quadratic objective function. ... Figure 2 : 2 A numerical experiment comparing ISSA, LISSA, BFGS and L-BFGS algorithms. In a) the τ = 5 for ISSA and L-BFGS used last 5 gradient information as well. ...

arXiv:1612.04694v4 fatcat:5mr4n6lhy5crnojym2isn5z6li

Multiple Versions

We propose Batch-Expansion Training (BET), a framework for running a batch optimizer on a gradually expanding dataset. ... As opposed to stochastic approaches, batches do not need to be resampled i.i.d. at every iteration, thus making BET more resource efficient in a distributed setting, and when disk-access is constrained ... Furthermore, the time complexity of performing a single iteration for many of those algorithms (including GD and L-BFGS) is linearly proportional to the data size. ...

arXiv:1704.06731v3 fatcat:zxjm4ezou5aptliuexs6bfcdc4

Multiple Versions

A Linearly-Convergent Stochastic L-BFGS Algorithm [article]

Preserved Fulltext

Other Versions

Quasi-Newton Methods: Superlinear Convergence Without Line Searches for Self-Concordant Functions [article]

Preserved Fulltext

Other Versions

Quasi-Newton methods: superlinear convergence without line searches for self-concordant functions

Preserved Fulltext

Recent Advances in Stochastic Gradient Descent in Deep Learning

Preserved Fulltext

Stochastic Adaptive Quasi-Newton Methods for Minimizing Expected Values

Preserved Fulltext

Stochastic Damped L-BFGS with Controlled Norm of the Hessian Approximation [article]

Preserved Fulltext

Other Versions

Stochastic L-BFGS: Improved Convergence Rates and Practical Acceleration Strategies

Preserved Fulltext

Other Versions

Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation [article]

Preserved Fulltext

Other Versions

An Inexact Variable Metric Proximal Point Algorithm for Generic Quasi-Newton Acceleration [article]

Preserved Fulltext

Other Versions

Stochastic Block BFGS: Squeezing More Curvature out of Data [article]

Preserved Fulltext

Deep Reinforcement Learning via L-BFGS Optimization [article]

Preserved Fulltext

Other Versions

Asynchronous Parallel Stochastic Quasi-Newton Methods [article]

Preserved Fulltext

Practical Quasi-Newton Methods for Training Deep Neural Networks [article]

Preserved Fulltext

Other Versions

Stochastic Second-Order Optimization via von Neumann Series [article]

Preserved Fulltext

Other Versions

Batch-Expansion Training: An Efficient Optimization Framework [article]

Preserved Fulltext

Other Versions