A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient
[article]
2020
arXiv
pre-print
In this paper, we develop a new stochastic compositional variance-reduced gradient algorithm with the sample complexity of O((m+n)log(1/ϵ)+1/ϵ^3) where m+n is the total number of samples. ...
Convex composition optimization is an emerging topic that covers a wide range of applications arising from stochastic optimal control, reinforcement learning and multi-stage stochastic programming. ...
CONCLUSIONS We propose a stochastic compositional variance gradient algorithm for convex composition optimization with an improved sample complexity. ...
arXiv:1806.00458v5
fatcat:2u3hxtqg4nao3iom4owbas7ire
Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent
2019
Neural Information Processing Systems
Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization. ...
We employ a recently developed idea of Stochastic Recursive Gradient Descent to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper ...
Conclusion In this paper, we propose a novel algorithm called SARAH-Compositional for solving stochastic compositional optimization problems using the idea of a recently proposed variance reduced gradient ...
dblp:conf/nips/YuanLLLH19
fatcat:l3ld7pyycbdjnmdca7jvjxl2qq
Accelerated Method for Stochastic Composition Optimization With Nonsmooth Regularization
2018
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
; for general composition problem, our algorithm significantly improves the state-of-the-art convergence rate from O(T–1/2) to O((n1+n2)2/3T-1). ...
To the best of our knowledge, our method admits the fastest convergence rate for stochastic composition optimization: for strongly convex composition problem, our algorithm is proved to admit linear convergence ...
(x s+1 t − ηv s+1 t ); (8) 9: end for 10: xs+1 ← x s+1 m ; 11: end for
Variance Reduced Stochastic Compositional Proximal Gradient In this section, we propose variance reduced stochastic compositional ...
doi:10.1609/aaai.v32i1.11795
fatcat:tzozwu24undftfo2frzjxwgoem
Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex Compositional Optimization
[article]
2020
arXiv
pre-print
Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization, and is believed to be optimal. ...
We employ a recently developed idea of Stochastic Recursive Gradient Descent to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper ...
Improved oracle complexity for stochastic compositional variance reduced gradient. arXiv preprint arXiv:1806.00458, 2018. Liu Liu, Ji Liu, and Dacheng Tao. ...
arXiv:1912.13515v2
fatcat:764ag2w3ujaydpseimqu6mi2ra
Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems
2021
Neural Information Processing Systems
Under certain regularity conditions, applying our results to stochastic compositional, min-max, and reinforcement learning problems either improves or matches the best-known sample complexity in the respective ...
This paper unifies several SGD-type updates for stochastic nested problems into a single SGD approach that we term ALternating Stochastic gradient dEscenT (ALSET) method. ...
As a by-product, this general result also improves the existing sample complexity of the min-max and compositional cases. It matches the sample complexity of SGD for single-level stochastic problems. ...
dblp:conf/nips/ChenSY21
fatcat:b6r6djgpcbf73ajblcoptppuuy
Improved Oracle Complexity of Variance Reduced Methods for Nonsmooth Convex Stochastic Composition Optimization
[article]
2019
arXiv
pre-print
We consider the nonsmooth convex composition optimization problem where the objective is a composition of two finite-sum functions and analyze stochastic compositional variance reduced gradient (SCVRG) ...
More specifically, our method achieves the total IFO complexity of O((m+n)log(1/ϵ)+1/ϵ^3) which improves that of O(1/ϵ^3.5) and O((m+n)/√(ϵ)) obtained by SCGD and accelerated gradient descent (AGD) respectively ...
Fast stochastic variance reduced admm for stochastic composition optimization. ...
arXiv:1802.02339v7
fatcat:b55jnnsukngwddiqevwtjxxkfy
Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks
[article]
2021
arXiv
pre-print
We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding ...
However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. ...
reduce the variance in unbiased stochastic gradients. ...
arXiv:2006.13866v2
fatcat:em7gsyj23vcrbfgkt4dzwo45fa
Momentum Schemes with Stochastic Variance Reduction for Nonconvex Composite Optimization
[article]
2019
arXiv
pre-print
Two new stochastic variance-reduced algorithms named SARAH and SPIDER have been recently proposed, and SPIDER has been shown to achieve a near-optimal gradient oracle complexity for nonconvex optimization ...
the near-optimal gradient oracle complexity for achieving a generalized first-order stationary condition. ...
Such an issue has been successfully resolved by using more advanced stochastic variance-reduced gradient estimators that induce a smaller variance, leading to the design of a variety of stochastic variance-reduced ...
arXiv:1902.02715v3
fatcat:arhpvqvorngv3pqlktcslgkbbu
Stochastic Gradient Made Stable: A Manifold Propagation Approach for Large-Scale Optimization
[article]
2016
arXiv
pre-print
A stochastic gradient is typically calculated from a limited number of samples (known as mini-batch), so it potentially incurs a high variance and causes the estimated parameters bounce around the optimal ...
This way S3GD is able to generate a highly-accurate estimate of the exact gradient from each mini-batch with largely-reduced computational complexity. ...
For example, the work in [28] explicitly expresses the stochastic gradient variance and proves that constructing mini-batch using special nonuniform sampling strategy is able to reduce the stochastic ...
arXiv:1506.08350v2
fatcat:hw4n7r64p5ddfic7lacyofjjza
Fast Training Method for Stochastic Compositional Optimization Problems
2021
Neural Information Processing Systems
To address this problem, we propose novel decentralized stochastic compositional gradient descent methods to efficiently train the largescale stochastic compositional optimization problem. ...
Existing methods for the stochastic compositional optimization problem only focus on the single machine scenario, which is far from satisfactory when data are distributed on different devices. ...
But it has a worse convergence rate than the standard stochastic gradient descent method. To improve it, a series of variance-reduced methods have been proposed. ...
dblp:conf/nips/GaoH21
fatcat:lf4sg4ivefahxdpi6yf2pdwt2q
Achieving Linear Speedup in Decentralized Stochastic Compositional Minimax Optimization
[article]
2023
arXiv
pre-print
To address this issue, we developed a novel decentralized stochastic compositional gradient descent ascent with momentum algorithm to reduce the consensus error in the inner-level function. ...
The stochastic compositional minimax problem has attracted a surge of attention in recent years since it covers many emerging machine learning models. ...
In addition, some efforts have been made to improve the sample complexity and communication complexity by compressing the communicated variables [18, 11] or reducing the gradient variance [17, 33] . ...
arXiv:2307.13430v2
fatcat:esxqcr3odfbrlnv7j4bffufpiq
On the Convergence of Local Stochastic Compositional Gradient Descent with Momentum
2022
International Conference on Machine Learning
In this paper, we developed a novel local stochastic compositional gradient descent with momentum method, which facilitates Federated Learning for the stochastic compositional problem. ...
Meanwhile, our communication complexity O(1/ϵ 3 ) can match existing methods. To the best of our knowledge, this is the first work achieving such favorable sample and communication complexities. ...
2021) and reducing the variance of stochastic gradients (Khanduri et al., 2021; Karimireddy et al., 2020a; Das et al., 2020) . ...
dblp:conf/icml/GaoLH22
fatcat:mhwjrykvtzg2re5luccjcxjiuu
Optimal Algorithms for Stochastic Multi-Level Compositional Optimization
[article]
2022
arXiv
pre-print
To address these limitations, we propose a Stochastic Multi-level Variance Reduction method (SMVR), which achieves the optimal sample complexity of 𝒪(1 / ϵ^3) to find an ϵ-stationary point for non-convex ...
Furthermore, when the objective function satisfies the convexity or Polyak-Łojasiewicz (PL) condition, we propose a stage-wise variant of SMVR and improve the sample complexity to 𝒪(1 / ϵ^2) for convex ...
The authors would like to thank the anonymous reviewers for their helpful comments. ...
arXiv:2202.07530v4
fatcat:42fwzrnfqbf7jlltx75ibpfuou
Randomized Smoothing SVRG for Large-scale Nonsmooth Convex Optimization
[article]
2018
arXiv
pre-print
We develop and analyze a new algorithm that achieves robust linear convergence rate, and both its time complexity and gradient complexity are superior than state-of-art nonsmooth algorithms and subgradient-based ...
In ( [Johnson and Zhang, 2013] ) and its proximal extension in ( [Xiao and Zhang, 2014] ), stochastic variance reduced gradient (SVRG) is proposed that reduces the variance of stochastic gradient descent ...
Actually, same reduced variance bounds and convergence rate holds for problems with or without composite functions R(x). 3. ...
arXiv:1805.05189v1
fatcat:7mjcgk7acrg2zgw3nv2tqc5cr4
Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient
[article]
2020
arXiv
pre-print
Stochastic variance-reduced gradient methods such as SVRG have been applied to reduce the estimation variance (Zhao et al. 2019). ...
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance, resulting in unstable training and poor sampling efficiency. ...
of our stochastic recursive gradient for the variance reduction in DQN. ...
arXiv:2007.12817v1
fatcat:n4ta5mxeyjeulbilojziu7kvnq
« Previous
Showing results 1 — 15 out of 26,397 results