A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Is There an Analog of Nesterov Acceleration for MCMC?
[article]
2019
arXiv
pre-print
As an application, we show that accelerated rates can be obtained for a class of nonconvex functions with the Langevin algorithm. ...
We show that an underdamped form of the Langevin algorithm performs accelerated gradient descent in this metric. ...
Acknowledgements We would like to thank Jianfeng Lu, Chi Jin, and Nilesh Tripuraneni for many helpful discussions and insights. ...
arXiv:1902.00996v2
fatcat:voqhaoreerhdhnxp7bnmtpcb4a
Underdamped Langevin MCMC: A non-asymptotic analysis
[article]
2018
arXiv
pre-print
This is a significant improvement over the best known rate for overdamped Langevin MCMC, which is O(d/ε^2) steps under the same smoothness/concavity assumptions. ...
The underdamped Langevin MCMC scheme can be viewed as a version of Hamiltonian Monte Carlo (HMC) which has been observed to outperform overdamped Langevin MCMC methods in a number of application areas. ...
Note 2: We will only be analyzing the solutions to (3) for small t. Think of an integral solution of (3) as a single step of the discrete Langevin MCMC. ...
arXiv:1707.03663v7
fatcat:frsh2unczjfn5lm4c5cfu3yoc4
Asynchronous Stochastic Gradient MCMC with Elastic Coupling
[article]
2016
arXiv
pre-print
We outline a solution strategy for this setting based on stochastic gradient Hamiltonian Monte Carlo sampling (SGHMC) which we alter to include an elastic coupling term that ties together multiple MCMC ...
First experiments empirically show that the resulting parallel sampler significantly speeds up exploration of the target distribution, when compared to standard SGHMC, and is less prone to the harmful ...
While this is an interesting problem in its own right and there already exists a considerable amount of literature on running parallel MCMC over sub-sets of data Scott et al. ...
arXiv:1612.00767v2
fatcat:s5qk5sq4grb7nj44exvmo72iki
Benchmarking Quantum Hardware for Training of Fully Visible Boltzmann Machines
[article]
2016
arXiv
pre-print
For many classes of problems, QA is known to offer computational advantages over simulated annealing. ...
We investigate the use of QA for seeding Markov chains as an alternative to contrastive divergence (CD) and persistent contrastive divergence (PCD). ...
These plots show the evolution, over the SGD iterations, of test set KL divergences KL P Dtest (·) B(·|θ θ θ t ) for software runs and KL P Dtest (·) P 0 (·|θ θ θ t ) for QA runs (the dotted red line is ...
arXiv:1611.04528v1
fatcat:6rt5auq4b5capafe5qjc5hsvai
Accelerated Flow for Probability Distributions
[article]
2019
arXiv
pre-print
This paper presents a methodology and numerical algorithms for constructing accelerated gradient flows on the space of probability distributions. ...
The variational problem is modeled as a mean-field optimal control problem. The maximum principle of optimal control theory is used to derive Hamilton's equations for the optimal gradient flow. ...
For the case where there is only one particle ( N = 1), the interaction term is zero and the system (17) reduces to the Nesterov ODE (5).
Remark 3. ...
arXiv:1901.03317v2
fatcat:xgf7kziokvaobndzdtmgi4bf3q
Accelerated Algorithms for Convex and Non-Convex Optimization on Manifolds
[article]
2020
arXiv
pre-print
One of the key challenges for optimization on manifolds is the difficulty of verifying the complexity of the objective function, e.g., whether the objective function is convex or non-convex, and the degree ...
We show that when the objective function is convex, the algorithm provably converges to the optimum and leads to accelerated convergence. ...
Acknowledgments Lizhen Lin would like to thank Dong Quan Nguyen for very helpful discussions. ...
arXiv:2010.08908v1
fatcat:o3faqvfuercbpjjv7ec5qyg47e
Deep Learning: A Bayesian Perspective
2017
Bayesian Analysis
Deep learning is a form of machine learning for nonlinear high dimensional pattern matching and prediction. ...
To illustrate our methodology, we provide an analysis of international bookings on Airbnb. Finally, we conclude with directions for future research. ...
The momentumbased versions of SGD, or so-called accelerated algorithms were originally proposed by Nesterov [1983] . For more recent discussion, see Nesterov [2013] . ...
doi:10.1214/17-ba1082
fatcat:qhkij4i2hnbxtmo4ed2ovlaltq
Sparse Bayesian Lasso via a Variable-Coefficient ℓ_1 Penalty
[article]
2022
arXiv
pre-print
In simulation studies, this gives us the Uncertainty Quantification and low bias properties of simulation-based approaches with an order of magnitude less computation. ...
One possible solution is sparsity: making inference such that many of the parameters are estimated as being identically 0, which may be imposed through the use of nonsmooth penalties such as the ℓ_1 penalty ...
We also acknowledge the Georgetown Data Lab and MDI Technical Team for ...
arXiv:2211.05089v2
fatcat:okuxdpj5fzbbdmqwpfdifcdsbi
Accelerated Information Gradient flow
[article]
2022
arXiv
pre-print
We present a framework for Nesterov's accelerated gradient flows in probability space to design efficient mean-field Markov chain Monte Carlo (MCMC) algorithms for Bayesian inverse problems. ...
For both Fisher-Rao and Wasserstein-2 metrics, we prove convergence properties of accelerated gradient flows. ...
To accelerate the gradient descent method, Nesterov introduced an accelerated method Nesterov (1983) : x k = y k−1 − τ k ∇f (y k−1 ), y k = x k + α k (x k − x k−1 ). ...
arXiv:1909.02102v3
fatcat:suk6fg7clfgqzirulc7bmpgyd4
Variational Optimization on Lie Groups, with Examples of Leading (Generalized) Eigenvalue Problems
[article]
2020
arXiv
pre-print
A particular case of SO(n) is then studied in details, with objective functions corresponding to leading Generalized EigenValue problems: the Lie-NAG dynamics are first made explicit in coordinates, and ...
Numerical experiments on both synthetic data and practical problem (LDA for MNIST) demonstrate the effectiveness of the proposed methods as optimization algorithms (not as a classification method). ...
Acknowledgements The authors thank Tuo Zhao and Justin Romberg for insightful discussions. ...
arXiv:2001.10006v1
fatcat:ob2erldscbbzvifsijfkazraai
Bayesian computation: a summary of the current state, and samples backwards and forwards
2015
Statistics and computing
However, this impressive evolution in capacity is confronted by an even steeper increase in the complexity of the datasets to be addressed. ...
Recent decades have seen enormous improvements in computational inference for statistical models; there have been competitive continual enhancements in a wide range of computational tools. ...
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/), which permits unrestricted use, distribution ...
doi:10.1007/s11222-015-9574-5
fatcat:mdlw3fdtvjfkxjyo2ivgwdv4oe
Efficient Transition Probability Computation for Continuous-Time Branching Processes via Compressed Sensing
2015
Uncertainty in artificial intelligence : proceedings of the ... conference. Conference on Uncertainty in Artificial Intelligence
transitions is sparse. ...
We propose a compressed sensing framework that significantly accelerates the generating function method, decreasing computational cost up to a logarithmic factor by only assuming the probability mass of ...
JX was supported by an NDSEG fellowship. ...
pmid:26949377
pmcid:PMC4775097
fatcat:ec2mt6dkpnc65guny3ialatmcq
Bayesian computation: a perspective on the current state, and sampling backwards and forwards
[article]
2015
arXiv
pre-print
However, this impressive evolution in capacity is confronted by an even steeper increase in the complexity of the models and datasets to be addressed. ...
The difficulties of modelling and then handling ever more complex datasets most likely call for a new type of tool for computational inference that dramatically reduce the dimension and size of the raw ...
A remarkable property of (12) is that it can be accelerated to converge with rate O(1/k 2 ), which is optimal for this class of problems (Nesterov 2004 ). ...
arXiv:1502.01148v3
fatcat:hkqwy2o35rf2jgzqpgsoj4uvde
Deep Learning: Computational Aspects
[article]
2019
arXiv
pre-print
Training a deep learning architecture is computationally intensive, and efficient linear algebra libraries is the key for training and inference. ...
Deep learning uses network architectures consisting of hierarchical layers of latent variables to construct predictors for high-dimensional input-output models. ...
The momentum-based versions of SGD, or so-called accelerated algorithms were originally proposed by Nesterov (1983) . For a more recent discussion, see Nesterov (2013) . ...
arXiv:1808.08618v2
fatcat:7bysecekdzcv3aiegqkyjdrp4e
Hessian-Free High-Resolution Nesterov Acceleration for Sampling
[article]
2022
arXiv
pre-print
The acceleration effect of the new hyperparameter is quantified and it is not an artificial one created by time-rescaling. ...
This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods. ...
Acknowledgements The authors sincerely thank Michael Tretyakov, Yian Ma, Wenlong Mou, and Lingjiong Zhu for helpful discussions. MT was partially supported by NSF grants DMS-1847802 and ECCS-1936776. ...
arXiv:2006.09230v4
fatcat:lw3fp43hjnczpngaykiixsiw74
« Previous
Showing results 1 — 15 out of 82 results