Is There an Analog of Nesterov Acceleration for MCMC?

As an application, we show that accelerated rates can be obtained for a class of nonconvex functions with the Langevin algorithm. ... We show that an underdamped form of the Langevin algorithm performs accelerated gradient descent in this metric. ... Acknowledgements We would like to thank Jianfeng Lu, Chi Jin, and Nilesh Tripuraneni for many helpful discussions and insights. ...

arXiv:1902.00996v2 fatcat:voqhaoreerhdhnxp7bnmtpcb4a

Multiple Versions

This is a significant improvement over the best known rate for overdamped Langevin MCMC, which is O(d/ε^2) steps under the same smoothness/concavity assumptions. ... The underdamped Langevin MCMC scheme can be viewed as a version of Hamiltonian Monte Carlo (HMC) which has been observed to outperform overdamped Langevin MCMC methods in a number of application areas. ... Note 2: We will only be analyzing the solutions to (3) for small t. Think of an integral solution of (3) as a single step of the discrete Langevin MCMC. ...

arXiv:1707.03663v7 fatcat:frsh2unczjfn5lm4c5cfu3yoc4

Multiple Versions

We outline a solution strategy for this setting based on stochastic gradient Hamiltonian Monte Carlo sampling (SGHMC) which we alter to include an elastic coupling term that ties together multiple MCMC ... First experiments empirically show that the resulting parallel sampler significantly speeds up exploration of the target distribution, when compared to standard SGHMC, and is less prone to the harmful ... While this is an interesting problem in its own right and there already exists a considerable amount of literature on running parallel MCMC over sub-sets of data Scott et al. ...

arXiv:1612.00767v2 fatcat:s5qk5sq4grb7nj44exvmo72iki

Multiple Versions

For many classes of problems, QA is known to offer computational advantages over simulated annealing. ... We investigate the use of QA for seeding Markov chains as an alternative to contrastive divergence (CD) and persistent contrastive divergence (PCD). ... These plots show the evolution, over the SGD iterations, of test set KL divergences KL P Dtest (·) B(·|θ θ θ t ) for software runs and KL P Dtest (·) P 0 (·|θ θ θ t ) for QA runs (the dotted red line is ...

arXiv:1611.04528v1 fatcat:6rt5auq4b5capafe5qjc5hsvai

This paper presents a methodology and numerical algorithms for constructing accelerated gradient flows on the space of probability distributions. ... The variational problem is modeled as a mean-field optimal control problem. The maximum principle of optimal control theory is used to derive Hamilton's equations for the optimal gradient flow. ... For the case where there is only one particle ( N = 1), the interaction term is zero and the system (17) reduces to the Nesterov ODE (5). Remark 3. ...

arXiv:1901.03317v2 fatcat:xgf7kziokvaobndzdtmgi4bf3q

Multiple Versions

One of the key challenges for optimization on manifolds is the difficulty of verifying the complexity of the objective function, e.g., whether the objective function is convex or non-convex, and the degree ... We show that when the objective function is convex, the algorithm provably converges to the optimum and leads to accelerated convergence. ... Acknowledgments Lizhen Lin would like to thank Dong Quan Nguyen for very helpful discussions. ...

arXiv:2010.08908v1 fatcat:o3faqvfuercbpjjv7ec5qyg47e

Deep learning is a form of machine learning for nonlinear high dimensional pattern matching and prediction. ... To illustrate our methodology, we provide an analysis of international bookings on Airbnb. Finally, we conclude with directions for future research. ... The momentumbased versions of SGD, or so-called accelerated algorithms were originally proposed by Nesterov [1983] . For more recent discussion, see Nesterov [2013] . ...

doi:10.1214/17-ba1082 fatcat:qhkij4i2hnbxtmo4ed2ovlaltq

Multiple Versions

In simulation studies, this gives us the Uncertainty Quantification and low bias properties of simulation-based approaches with an order of magnitude less computation. ... One possible solution is sparsity: making inference such that many of the parameters are estimated as being identically 0, which may be imposed through the use of nonsmooth penalties such as the ℓ_1 penalty ... We also acknowledge the Georgetown Data Lab and MDI Technical Team for ...

arXiv:2211.05089v2 fatcat:okuxdpj5fzbbdmqwpfdifcdsbi

Open Access Multiple Versions

We present a framework for Nesterov's accelerated gradient flows in probability space to design efficient mean-field Markov chain Monte Carlo (MCMC) algorithms for Bayesian inverse problems. ... For both Fisher-Rao and Wasserstein-2 metrics, we prove convergence properties of accelerated gradient flows. ... To accelerate the gradient descent method, Nesterov introduced an accelerated method Nesterov (1983) : x k = y k−1 − τ k ∇f (y k−1 ), y k = x k + α k (x k − x k−1 ). ...

arXiv:1909.02102v3 fatcat:suk6fg7clfgqzirulc7bmpgyd4

Multiple Versions

A particular case of SO(n) is then studied in details, with objective functions corresponding to leading Generalized EigenValue problems: the Lie-NAG dynamics are first made explicit in coordinates, and ... Numerical experiments on both synthetic data and practical problem (LDA for MNIST) demonstrate the effectiveness of the proposed methods as optimization algorithms (not as a classification method). ... Acknowledgements The authors thank Tuo Zhao and Justin Romberg for insightful discussions. ...

arXiv:2001.10006v1 fatcat:ob2erldscbbzvifsijfkazraai

However, this impressive evolution in capacity is confronted by an even steeper increase in the complexity of the datasets to be addressed. ... Recent decades have seen enormous improvements in computational inference for statistical models; there have been competitive continual enhancements in a wide range of computational tools. ... Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/), which permits unrestricted use, distribution ...

doi:10.1007/s11222-015-9574-5 fatcat:mdlw3fdtvjfkxjyo2ivgwdv4oe

transitions is sparse. ... We propose a compressed sensing framework that significantly accelerates the generating function method, decreasing computational cost up to a logarithmic factor by only assuming the probability mass of ... JX was supported by an NDSEG fellowship. ...

pmid:26949377 pmcid:PMC4775097 fatcat:ec2mt6dkpnc65guny3ialatmcq

However, this impressive evolution in capacity is confronted by an even steeper increase in the complexity of the models and datasets to be addressed. ... The difficulties of modelling and then handling ever more complex datasets most likely call for a new type of tool for computational inference that dramatically reduce the dimension and size of the raw ... A remarkable property of (12) is that it can be accelerated to converge with rate O(1/k 2 ), which is optimal for this class of problems (Nesterov 2004 ). ...

arXiv:1502.01148v3 fatcat:hkqwy2o35rf2jgzqpgsoj4uvde

Multiple Versions

Training a deep learning architecture is computationally intensive, and efficient linear algebra libraries is the key for training and inference. ... Deep learning uses network architectures consisting of hierarchical layers of latent variables to construct predictors for high-dimensional input-output models. ... The momentum-based versions of SGD, or so-called accelerated algorithms were originally proposed by Nesterov (1983) . For a more recent discussion, see Nesterov (2013) . ...

arXiv:1808.08618v2 fatcat:7bysecekdzcv3aiegqkyjdrp4e

Multiple Versions

The acceleration effect of the new hyperparameter is quantified and it is not an artificial one created by time-rescaling. ... This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods. ... Acknowledgements The authors sincerely thank Michael Tretyakov, Yian Ma, Wenlong Mou, and Lingjiong Zhu for helpful discussions. MT was partially supported by NSF grants DMS-1847802 and ECCS-1936776. ...

arXiv:2006.09230v4 fatcat:lw3fp43hjnczpngaykiixsiw74

Multiple Versions

Is There an Analog of Nesterov Acceleration for MCMC? [article]

Preserved Fulltext

Other Versions

Underdamped Langevin MCMC: A non-asymptotic analysis [article]

Preserved Fulltext

Other Versions

Asynchronous Stochastic Gradient MCMC with Elastic Coupling [article]

Preserved Fulltext

Other Versions

Benchmarking Quantum Hardware for Training of Fully Visible Boltzmann Machines [article]

Preserved Fulltext

Accelerated Flow for Probability Distributions [article]

Preserved Fulltext

Other Versions

Accelerated Algorithms for Convex and Non-Convex Optimization on Manifolds [article]

Preserved Fulltext

Deep Learning: A Bayesian Perspective

Preserved Fulltext

Other Versions

Sparse Bayesian Lasso via a Variable-Coefficient ℓ_1 Penalty [article]

Preserved Fulltext

Other Versions

Accelerated Information Gradient flow [article]

Preserved Fulltext

Other Versions

Variational Optimization on Lie Groups, with Examples of Leading (Generalized) Eigenvalue Problems [article]

Preserved Fulltext

Bayesian computation: a summary of the current state, and samples backwards and forwards

Preserved Fulltext

Efficient Transition Probability Computation for Continuous-Time Branching Processes via Compressed Sensing

Preserved Fulltext

Bayesian computation: a perspective on the current state, and sampling backwards and forwards [article]

Preserved Fulltext

Other Versions

Deep Learning: Computational Aspects [article]

Preserved Fulltext

Other Versions

Hessian-Free High-Resolution Nesterov Acceleration for Sampling [article]

Preserved Fulltext

Other Versions