A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Linearly-Convergent Stochastic L-BFGS Algorithm
[article]
2016
arXiv
pre-print
We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions. ...
Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and ...
Discussion This paper introduces a stochastic version of L-BFGS and proves a linear rate of convergence in the strongly convex case. ...
arXiv:1508.02087v2
fatcat:qffeae2oufgzzhxcmfa6jgknze
Quasi-Newton Methods: Superlinear Convergence Without Line Searches for Self-Concordant Functions
[article]
2018
arXiv
pre-print
We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant ...
of stochastic gradient descent on stochastic optimization problems. ...
Then the adaptive L-BFGS method is globally R-linearly convergent. ...
arXiv:1612.06965v3
fatcat:5dt4s3uemvatpl7enjv43rdcsm
Quasi-Newton methods: superlinear convergence without line searches for self-concordant functions
2018
Optimization Methods and Software
We show that using this step size in the BFGS method (and quasi-Newton methods in the Broyden convex class other than the DFP method) results in superlinear convergence for strongly convex self-concordant ...
of stochastic gradient descent on stochastic optimization problems. ...
Then the adaptive L-BFGS method is globally R-linearly convergent. ...
doi:10.1080/10556788.2018.1510927
fatcat:xtmmqflsq5grplwt3guuysd6ty
Recent Advances in Stochastic Gradient Descent in Deep Learning
2023
Mathematics
Finally, we propose theoretical conditions under which these methods are applicable and discover that there is still a gap between theoretical conditions under which the algorithms converge and practical ...
Among machine learning models, stochastic gradient descent (SGD) is not only simple but also very effective. ...
A combination of the stochastic method and BFGS. convergent Strong convex: globally - Others - structured quasi-Newton, variance reduced L-BFGS/block L-BFGS [127-129] Strong convex: variance reduced L-BFGS ...
doi:10.3390/math11030682
fatcat:6sqjnyl3xnfnpeyco7uvgyq22a
Stochastic Adaptive Quasi-Newton Methods for Minimizing Expected Values
2017
International Conference on Machine Learning
We show that, given a suitable amount of sampling, the stochastic adaptive GD attains linear convergence in expectation, and with further sampling, the stochastic adaptive BFGS attains R-superlinear convergence ...
We propose a novel class of stochastic, adaptive methods for minimizing self-concordant functions which can be expressed as an expected value. ...
Linearly convergent stochastic Limited Memory BFGS algorithms (Byrd et al., 2016 ) (Moritz et al., 2016 ) (Gower et al., 2016) have also been proposed. ...
dblp:conf/icml/ZhouGG17
fatcat:vu5mzdhvpjdevbrfyk4g2seax4
Stochastic Damped L-BFGS with Controlled Norm of the Hessian Approximation
[article]
2020
arXiv
pre-print
Our algorithm, VARCHEN, draws from previous work that proposed a novel stochastic damped L-BFGS algorithm called SdLBFGS. ...
We propose a new stochastic variance-reduced damped L-BFGS algorithm, where we leverage estimates of bounds on the largest and smallest eigenvalues of the Hessian approximation to balance its quality and ...
[33] proposed a stochastic damped L-BFGS (SdLBFGS) algorithm and proved almost sure convergence to a stationary point. ...
arXiv:2012.05783v1
fatcat:nwjtqfjjnbeqjcejbldgaul56i
Stochastic L-BFGS: Improved Convergence Rates and Practical Acceleration Strategies
2018
IEEE Transactions on Signal Processing
We revisit the stochastic limited-memory BFGS (L-BFGS) algorithm. ...
By proposing a new framework for the convergence analysis, we prove improved convergence rates and computational complexities of the stochastic L-BFGS algorithms compared to previous works. ...
In the literature (Gower et al., 2016; Moritz et al., 2016) , usually option I or II (in Algorithm 1) is analyzed theoretically to prove that the stochastic L-BFGS algorithms therein converge linearly ...
doi:10.1109/tsp.2017.2784360
fatcat:u7jv3vxpave7jkbxnmpm7gcp54
Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation
[article]
2020
arXiv
pre-print
We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. ...
Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. ...
A linearly-convergent stochastic L-BFGS algorithm.
In Artificial Intelligence and Statistics, pages 249-
258, 2016.
Yurii Nesterov. Lectures on convex optimization, vol-
ume 137. Springer, 2018. ...
arXiv:1910.04920v2
fatcat:jfzvxawxdrcp3ocfbnxmzh4fi4
An Inexact Variable Metric Proximal Point Algorithm for Generic Quasi-Newton Acceleration
[article]
2019
arXiv
pre-print
When combined with limited-memory BFGS rules, QNing is particularly effective to solve high-dimensional optimization problems, while enjoying a worst-case linear convergence rate for strongly convex problems ...
The proposed scheme, called QNing can be notably applied to incremental first-order methods such as the stochastic variance-reduced gradient descent algorithm (SVRG) and other randomized incremental optimization ...
This work was supported by the ERC grant SOLARIS (number 714381), a grant from ANR (MACARON project ANR-14-CE23-0003-01), and the program "Learning in Machines and Brains" (CIFAR). ...
arXiv:1610.00960v4
fatcat:flhlxb6pa5athlm6enwswmy6zq
Stochastic Block BFGS: Squeezing More Curvature out of Data
[article]
2016
arXiv
pre-print
We propose a novel limited-memory stochastic block BFGS update for incorporating enriched curvature information in stochastic approximation methods. ...
We propose several sketching strategies, present a new quasi-Newton method that uses stochastic block BFGS updates combined with the variance reduction approach SVRG to compute batch stochastic gradients ...
Convergence In this section we prove that Algorithm 1 converges linearly. ...
arXiv:1603.09649v1
fatcat:7tduh5ikgnchzmcz4n6mvxgo4a
Deep Reinforcement Learning via L-BFGS Optimization
[article]
2019
arXiv
pre-print
Methods for solving the optimization problems in deep RL are restricted to the class of first-order algorithms, such as stochastic gradient descent (SGD). ...
The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive definite Hessian approximations. ...
We used stochastic line search L-BFGS method as the optimization method (Algorithm 2). ...
arXiv:1811.02693v2
fatcat:dgqrwcko5vbmddysmhch5fg244
Asynchronous Parallel Stochastic Quasi-Newton Methods
[article]
2020
arXiv
pre-print
Adopting the variance reduction technique, a prior stochastic L-BFGS, which has not been designed for parallel computing, reaches a linear convergence rate. ...
Unlike prior attempts, which parallelize only the calculation for gradient or the two-loop recursion of L-BFGS, our algorithm is the first one that truly parallelizes L-BFGS with a convergence guarantee ...
L-BFGS) enjoy a superlinear convergence rate, while the stochastic version of quasi-Newton methods (including L-BFGS) will have a sublinear convergence rate in strongly convex optimization as a sacrifice ...
arXiv:2011.00667v1
fatcat:w2so7l3imna73aoyhwpn52qfeq
Practical Quasi-Newton Methods for Training Deep Neural Networks
[article]
2021
arXiv
pre-print
We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block-diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs). ...
Consequently, computing and storing a full n × n BFGS approximation or storing a modest number of (step, change in gradient) vector pairs for use in an L-BFGS implementation is out of the question. ...
In this section, we prove the convergence of Algorithm 5, a variant of K-BFGS(L). Algorithm 5 is very similar to our actual implementation of K-BFGS(L) (i.e. ...
arXiv:2006.08877v3
fatcat:oow4oj6kcvf6bpcvlaakluzlwm
Stochastic Second-Order Optimization via von Neumann Series
[article]
2017
arXiv
pre-print
A stochastic iterative algorithm approximating second-order information using von Neumann series is discussed. We present convergence guarantees for strongly-convex and smooth functions. ...
In numerical experiments, the behavior of the error is similar to the second-order algorithm L-BFGS, and improves the performance of LISSA for quadratic objective function. ...
Figure 2 : 2 A numerical experiment comparing ISSA, LISSA, BFGS and L-BFGS algorithms. In a) the τ = 5 for ISSA and L-BFGS used last 5 gradient information as well. ...
arXiv:1612.04694v4
fatcat:5mr4n6lhy5crnojym2isn5z6li
Batch-Expansion Training: An Efficient Optimization Framework
[article]
2018
arXiv
pre-print
We propose Batch-Expansion Training (BET), a framework for running a batch optimizer on a gradually expanding dataset. ...
As opposed to stochastic approaches, batches do not need to be resampled i.i.d. at every iteration, thus making BET more resource efficient in a distributed setting, and when disk-access is constrained ...
Furthermore, the time complexity of performing a single iteration for many of those algorithms (including GD and L-BFGS) is linearly proportional to the data size. ...
arXiv:1704.06731v3
fatcat:zxjm4ezou5aptliuexs6bfcdc4
« Previous
Showing results 1 — 15 out of 2,100 results