Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient.

In this paper, we develop a new stochastic compositional variance-reduced gradient algorithm with the sample complexity of O((m+n)log(1/ϵ)+1/ϵ^3) where m+n is the total number of samples. ... Convex composition optimization is an emerging topic that covers a wide range of applications arising from stochastic optimal control, reinforcement learning and multi-stage stochastic programming. ... CONCLUSIONS We propose a stochastic compositional variance gradient algorithm for convex composition optimization with an improved sample complexity. ...

arXiv:1806.00458v5 fatcat:2u3hxtqg4nao3iom4owbas7ire

Multiple Versions

Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization. ... We employ a recently developed idea of Stochastic Recursive Gradient Descent to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper ... Conclusion In this paper, we propose a novel algorithm called SARAH-Compositional for solving stochastic compositional optimization problems using the idea of a recently proposed variance reduced gradient ...

dblp:conf/nips/YuanLLLH19 fatcat:l3ld7pyycbdjnmdca7jvjxl2qq

; for general composition problem, our algorithm significantly improves the state-of-the-art convergence rate from O(T–1/2) to O((n1+n2)2/3T-1). ... To the best of our knowledge, our method admits the fastest convergence rate for stochastic composition optimization: for strongly convex composition problem, our algorithm is proved to admit linear convergence ... (x s+1 t − ηv s+1 t ); (8) 9: end for 10: xs+1 ← x s+1 m ; 11: end for Variance Reduced Stochastic Compositional Proximal Gradient In this section, we propose variance reduced stochastic compositional ...

doi:10.1609/aaai.v32i1.11795 fatcat:tzozwu24undftfo2frzjxwgoem

Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization, and is believed to be optimal. ... We employ a recently developed idea of Stochastic Recursive Gradient Descent to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper ... Improved oracle complexity for stochastic compositional variance reduced gradient. arXiv preprint arXiv:1806.00458, 2018. Liu Liu, Ji Liu, and Dacheng Tao. ...

arXiv:1912.13515v2 fatcat:764ag2w3ujaydpseimqu6mi2ra

Multiple Versions

Under certain regularity conditions, applying our results to stochastic compositional, min-max, and reinforcement learning problems either improves or matches the best-known sample complexity in the respective ... This paper unifies several SGD-type updates for stochastic nested problems into a single SGD approach that we term ALternating Stochastic gradient dEscenT (ALSET) method. ... As a by-product, this general result also improves the existing sample complexity of the min-max and compositional cases. It matches the sample complexity of SGD for single-level stochastic problems. ...

dblp:conf/nips/ChenSY21 fatcat:b6r6djgpcbf73ajblcoptppuuy

We consider the nonsmooth convex composition optimization problem where the objective is a composition of two finite-sum functions and analyze stochastic compositional variance reduced gradient (SCVRG) ... More specifically, our method achieves the total IFO complexity of O((m+n)log(1/ϵ)+1/ϵ^3) which improves that of O(1/ϵ^3.5) and O((m+n)/√(ϵ)) obtained by SCGD and accelerated gradient descent (AGD) respectively ... Fast stochastic variance reduced admm for stochastic composition optimization. ...

arXiv:1802.02339v7 fatcat:b55jnnsukngwddiqevwtjxxkfy

Multiple Versions

We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding ... However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. ... reduce the variance in unbiased stochastic gradients. ...

arXiv:2006.13866v2 fatcat:em7gsyj23vcrbfgkt4dzwo45fa

Multiple Versions

Two new stochastic variance-reduced algorithms named SARAH and SPIDER have been recently proposed, and SPIDER has been shown to achieve a near-optimal gradient oracle complexity for nonconvex optimization ... the near-optimal gradient oracle complexity for achieving a generalized first-order stationary condition. ... Such an issue has been successfully resolved by using more advanced stochastic variance-reduced gradient estimators that induce a smaller variance, leading to the design of a variety of stochastic variance-reduced ...

arXiv:1902.02715v3 fatcat:arhpvqvorngv3pqlktcslgkbbu

Multiple Versions

A stochastic gradient is typically calculated from a limited number of samples (known as mini-batch), so it potentially incurs a high variance and causes the estimated parameters bounce around the optimal ... This way S3GD is able to generate a highly-accurate estimate of the exact gradient from each mini-batch with largely-reduced computational complexity. ... For example, the work in [28] explicitly expresses the stochastic gradient variance and proves that constructing mini-batch using special nonuniform sampling strategy is able to reduce the stochastic ...

arXiv:1506.08350v2 fatcat:hw4n7r64p5ddfic7lacyofjjza

Multiple Versions

To address this problem, we propose novel decentralized stochastic compositional gradient descent methods to efficiently train the largescale stochastic compositional optimization problem. ... Existing methods for the stochastic compositional optimization problem only focus on the single machine scenario, which is far from satisfactory when data are distributed on different devices. ... But it has a worse convergence rate than the standard stochastic gradient descent method. To improve it, a series of variance-reduced methods have been proposed. ...

dblp:conf/nips/GaoH21 fatcat:lf4sg4ivefahxdpi6yf2pdwt2q

To address this issue, we developed a novel decentralized stochastic compositional gradient descent ascent with momentum algorithm to reduce the consensus error in the inner-level function. ... The stochastic compositional minimax problem has attracted a surge of attention in recent years since it covers many emerging machine learning models. ... In addition, some efforts have been made to improve the sample complexity and communication complexity by compressing the communicated variables [18, 11] or reducing the gradient variance [17, 33] . ...

arXiv:2307.13430v2 fatcat:esxqcr3odfbrlnv7j4bffufpiq

Multiple Versions

In this paper, we developed a novel local stochastic compositional gradient descent with momentum method, which facilitates Federated Learning for the stochastic compositional problem. ... Meanwhile, our communication complexity O(1/ϵ 3 ) can match existing methods. To the best of our knowledge, this is the first work achieving such favorable sample and communication complexities. ... 2021) and reducing the variance of stochastic gradients (Khanduri et al., 2021; Karimireddy et al., 2020a; Das et al., 2020) . ...

dblp:conf/icml/GaoLH22 fatcat:mhwjrykvtzg2re5luccjcxjiuu

To address these limitations, we propose a Stochastic Multi-level Variance Reduction method (SMVR), which achieves the optimal sample complexity of 𝒪(1 / ϵ^3) to find an ϵ-stationary point for non-convex ... Furthermore, when the objective function satisfies the convexity or Polyak-Łojasiewicz (PL) condition, we propose a stage-wise variant of SMVR and improve the sample complexity to 𝒪(1 / ϵ^2) for convex ... The authors would like to thank the anonymous reviewers for their helpful comments. ...

arXiv:2202.07530v4 fatcat:42fwzrnfqbf7jlltx75ibpfuou

Multiple Versions

We develop and analyze a new algorithm that achieves robust linear convergence rate, and both its time complexity and gradient complexity are superior than state-of-art nonsmooth algorithms and subgradient-based ... In ( [Johnson and Zhang, 2013] ) and its proximal extension in ( [Xiao and Zhang, 2014] ), stochastic variance reduced gradient (SVRG) is proposed that reduces the variance of stochastic gradient descent ... Actually, same reduced variance bounds and convergence rate holds for problems with or without composite functions R(x). 3. ...

arXiv:1805.05189v1 fatcat:7mjcgk7acrg2zgw3nv2tqc5cr4

Stochastic variance-reduced gradient methods such as SVRG have been applied to reduce the estimation variance (Zhao et al. 2019). ... Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance, resulting in unstable training and poor sampling efficiency. ... of our stochastic recursive gradient for the variance reduction in DQN. ...

arXiv:2007.12817v1 fatcat:n4ta5mxeyjeulbilojziu7kvnq

Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient [article]

Preserved Fulltext

Other Versions

Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent

Preserved Fulltext

Accelerated Method for Stochastic Composition Optimization With Nonsmooth Regularization

Preserved Fulltext

Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex Compositional Optimization [article]

Preserved Fulltext

Other Versions

Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems

Preserved Fulltext

Improved Oracle Complexity of Variance Reduced Methods for Nonsmooth Convex Stochastic Composition Optimization [article]

Preserved Fulltext

Other Versions

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks [article]

Preserved Fulltext

Other Versions

Momentum Schemes with Stochastic Variance Reduction for Nonconvex Composite Optimization [article]

Preserved Fulltext

Other Versions

Stochastic Gradient Made Stable: A Manifold Propagation Approach for Large-Scale Optimization [article]

Preserved Fulltext

Other Versions

Fast Training Method for Stochastic Compositional Optimization Problems

Preserved Fulltext

Achieving Linear Speedup in Decentralized Stochastic Compositional Minimax Optimization [article]

Preserved Fulltext

On the Convergence of Local Stochastic Compositional Gradient Descent with Momentum

Preserved Fulltext

Optimal Algorithms for Stochastic Multi-Level Compositional Optimization [article]

Preserved Fulltext

Other Versions

Randomized Smoothing SVRG for Large-scale Nonsmooth Convex Optimization [article]

Preserved Fulltext

Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient [article]

Preserved Fulltext