Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








82 Hits in 4.9 sec

Is There an Analog of Nesterov Acceleration for MCMC? [article]

Yi-An Ma and Niladri Chatterji and Xiang Cheng and Nicolas Flammarion and Peter Bartlett and Michael I. Jordan
2019 arXiv   pre-print
As an application, we show that accelerated rates can be obtained for a class of nonconvex functions with the Langevin algorithm.  ...  We show that an underdamped form of the Langevin algorithm performs accelerated gradient descent in this metric.  ...  Acknowledgements We would like to thank Jianfeng Lu, Chi Jin, and Nilesh Tripuraneni for many helpful discussions and insights.  ... 
arXiv:1902.00996v2 fatcat:voqhaoreerhdhnxp7bnmtpcb4a

Underdamped Langevin MCMC: A non-asymptotic analysis [article]

Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, Michael I. Jordan
2018 arXiv   pre-print
This is a significant improvement over the best known rate for overdamped Langevin MCMC, which is O(d/ε^2) steps under the same smoothness/concavity assumptions.  ...  The underdamped Langevin MCMC scheme can be viewed as a version of Hamiltonian Monte Carlo (HMC) which has been observed to outperform overdamped Langevin MCMC methods in a number of application areas.  ...  Note 2: We will only be analyzing the solutions to (3) for small t. Think of an integral solution of (3) as a single step of the discrete Langevin MCMC.  ... 
arXiv:1707.03663v7 fatcat:frsh2unczjfn5lm4c5cfu3yoc4

Asynchronous Stochastic Gradient MCMC with Elastic Coupling [article]

Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, Frank Hutter
2016 arXiv   pre-print
We outline a solution strategy for this setting based on stochastic gradient Hamiltonian Monte Carlo sampling (SGHMC) which we alter to include an elastic coupling term that ties together multiple MCMC  ...  First experiments empirically show that the resulting parallel sampler significantly speeds up exploration of the target distribution, when compared to standard SGHMC, and is less prone to the harmful  ...  While this is an interesting problem in its own right and there already exists a considerable amount of literature on running parallel MCMC over sub-sets of data Scott et al.  ... 
arXiv:1612.00767v2 fatcat:s5qk5sq4grb7nj44exvmo72iki

Benchmarking Quantum Hardware for Training of Fully Visible Boltzmann Machines [article]

Dmytro Korenkevych, Yanbo Xue, Zhengbing Bian, Fabian Chudak, William G. Macready, Jason Rolfe, Evgeny Andriyash
2016 arXiv   pre-print
For many classes of problems, QA is known to offer computational advantages over simulated annealing.  ...  We investigate the use of QA for seeding Markov chains as an alternative to contrastive divergence (CD) and persistent contrastive divergence (PCD).  ...  These plots show the evolution, over the SGD iterations, of test set KL divergences KL P Dtest (·) B(·|θ θ θ t ) for software runs and KL P Dtest (·) P 0 (·|θ θ θ t ) for QA runs (the dotted red line is  ... 
arXiv:1611.04528v1 fatcat:6rt5auq4b5capafe5qjc5hsvai

Accelerated Flow for Probability Distributions [article]

Amirhossein Taghvaei, Prashant G. Mehta
2019 arXiv   pre-print
This paper presents a methodology and numerical algorithms for constructing accelerated gradient flows on the space of probability distributions.  ...  The variational problem is modeled as a mean-field optimal control problem. The maximum principle of optimal control theory is used to derive Hamilton's equations for the optimal gradient flow.  ...  For the case where there is only one particle ( N = 1), the interaction term is zero and the system (17) reduces to the Nesterov ODE (5). Remark 3.  ... 
arXiv:1901.03317v2 fatcat:xgf7kziokvaobndzdtmgi4bf3q

Accelerated Algorithms for Convex and Non-Convex Optimization on Manifolds [article]

Lizhen Lin, Bayan Saparbayeva, Michael Minyi Zhang, David B. Dunson
2020 arXiv   pre-print
One of the key challenges for optimization on manifolds is the difficulty of verifying the complexity of the objective function, e.g., whether the objective function is convex or non-convex, and the degree  ...  We show that when the objective function is convex, the algorithm provably converges to the optimum and leads to accelerated convergence.  ...  Acknowledgments Lizhen Lin would like to thank Dong Quan Nguyen for very helpful discussions.  ... 
arXiv:2010.08908v1 fatcat:o3faqvfuercbpjjv7ec5qyg47e

Deep Learning: A Bayesian Perspective

Nicholas G. Polson, Vadim Sokolov
2017 Bayesian Analysis  
Deep learning is a form of machine learning for nonlinear high dimensional pattern matching and prediction.  ...  To illustrate our methodology, we provide an analysis of international bookings on Airbnb. Finally, we conclude with directions for future research.  ...  The momentumbased versions of SGD, or so-called accelerated algorithms were originally proposed by Nesterov [1983] . For more recent discussion, see Nesterov [2013] .  ... 
doi:10.1214/17-ba1082 fatcat:qhkij4i2hnbxtmo4ed2ovlaltq

Sparse Bayesian Lasso via a Variable-Coefficient ℓ_1 Penalty [article]

Nathan Wycoff, Ali Arab, Katharine M. Donato, Lisa O. Singh
2022 arXiv   pre-print
In simulation studies, this gives us the Uncertainty Quantification and low bias properties of simulation-based approaches with an order of magnitude less computation.  ...  One possible solution is sparsity: making inference such that many of the parameters are estimated as being identically 0, which may be imposed through the use of nonsmooth penalties such as the ℓ_1 penalty  ...  We also acknowledge the Georgetown Data Lab and MDI Technical Team for  ... 
arXiv:2211.05089v2 fatcat:okuxdpj5fzbbdmqwpfdifcdsbi

Accelerated Information Gradient flow [article]

Yifei Wang, Wuchen Li
2022 arXiv   pre-print
We present a framework for Nesterov's accelerated gradient flows in probability space to design efficient mean-field Markov chain Monte Carlo (MCMC) algorithms for Bayesian inverse problems.  ...  For both Fisher-Rao and Wasserstein-2 metrics, we prove convergence properties of accelerated gradient flows.  ...  To accelerate the gradient descent method, Nesterov introduced an accelerated method Nesterov (1983) : x k = y k−1 − τ k ∇f (y k−1 ), y k = x k + α k (x k − x k−1 ).  ... 
arXiv:1909.02102v3 fatcat:suk6fg7clfgqzirulc7bmpgyd4

Variational Optimization on Lie Groups, with Examples of Leading (Generalized) Eigenvalue Problems [article]

Molei Tao, Tomoki Ohsawa
2020 arXiv   pre-print
A particular case of SO(n) is then studied in details, with objective functions corresponding to leading Generalized EigenValue problems: the Lie-NAG dynamics are first made explicit in coordinates, and  ...  Numerical experiments on both synthetic data and practical problem (LDA for MNIST) demonstrate the effectiveness of the proposed methods as optimization algorithms (not as a classification method).  ...  Acknowledgements The authors thank Tuo Zhao and Justin Romberg for insightful discussions.  ... 
arXiv:2001.10006v1 fatcat:ob2erldscbbzvifsijfkazraai

Bayesian computation: a summary of the current state, and samples backwards and forwards

Peter J. Green, Krzysztof Łatuszyński, Marcelo Pereyra, Christian P. Robert
2015 Statistics and computing  
However, this impressive evolution in capacity is confronted by an even steeper increase in the complexity of the datasets to be addressed.  ...  Recent decades have seen enormous improvements in computational inference for statistical models; there have been competitive continual enhancements in a wide range of computational tools.  ...  Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/), which permits unrestricted use, distribution  ... 
doi:10.1007/s11222-015-9574-5 fatcat:mdlw3fdtvjfkxjyo2ivgwdv4oe

Efficient Transition Probability Computation for Continuous-Time Branching Processes via Compressed Sensing

Jason Xu, Vladimir N Minin
2015 Uncertainty in artificial intelligence : proceedings of the ... conference. Conference on Uncertainty in Artificial Intelligence  
transitions is sparse.  ...  We propose a compressed sensing framework that significantly accelerates the generating function method, decreasing computational cost up to a logarithmic factor by only assuming the probability mass of  ...  JX was supported by an NDSEG fellowship.  ... 
pmid:26949377 pmcid:PMC4775097 fatcat:ec2mt6dkpnc65guny3ialatmcq

Bayesian computation: a perspective on the current state, and sampling backwards and forwards [article]

Peter J. Green , Marcelo Pereyra, Christian P. Robert (Paris-Dauphine and Warwick)
2015 arXiv   pre-print
However, this impressive evolution in capacity is confronted by an even steeper increase in the complexity of the models and datasets to be addressed.  ...  The difficulties of modelling and then handling ever more complex datasets most likely call for a new type of tool for computational inference that dramatically reduce the dimension and size of the raw  ...  A remarkable property of (12) is that it can be accelerated to converge with rate O(1/k 2 ), which is optimal for this class of problems (Nesterov 2004 ).  ... 
arXiv:1502.01148v3 fatcat:hkqwy2o35rf2jgzqpgsoj4uvde

Deep Learning: Computational Aspects [article]

Nicholas Polson, Vadim Sokolov
2019 arXiv   pre-print
Training a deep learning architecture is computationally intensive, and efficient linear algebra libraries is the key for training and inference.  ...  Deep learning uses network architectures consisting of hierarchical layers of latent variables to construct predictors for high-dimensional input-output models.  ...  The momentum-based versions of SGD, or so-called accelerated algorithms were originally proposed by Nesterov (1983) . For a more recent discussion, see Nesterov (2013) .  ... 
arXiv:1808.08618v2 fatcat:7bysecekdzcv3aiegqkyjdrp4e

Hessian-Free High-Resolution Nesterov Acceleration for Sampling [article]

Ruilin Li, Hongyuan Zha, Molei Tao
2022 arXiv   pre-print
The acceleration effect of the new hyperparameter is quantified and it is not an artificial one created by time-rescaling.  ...  This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.  ...  Acknowledgements The authors sincerely thank Michael Tretyakov, Yian Ma, Wenlong Mou, and Lingjiong Zhu for helpful discussions. MT was partially supported by NSF grants DMS-1847802 and ECCS-1936776.  ... 
arXiv:2006.09230v4 fatcat:lw3fp43hjnczpngaykiixsiw74
« Previous Showing results 1 — 15 out of 82 results