Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Statistical inference using
stochastic gradient descent
Constantine Caramanis1
Liu Liu1 Anastasios (Tasos) Kyrillidis2 Tianyang Li1
1The University of Texas at Austin
2IBM T.J. Watson Research Center, Yorktown Heights → Rice University
Statistical inference is important
Quantifying uncertainty
Signal? Noise?
Skill? Luck?
Frequentist inference
confidence interval
hypothesis testing
Statistical inference is important
Quantifying uncertainty
Signal? Noise?
Skill? Luck?
Frequentist inference
confidence interval
hypothesis testing
Confidence intervals can be used to detect adversarial
attacks.
Outline of This Work
(a) Large Scale Problems: Point Estimates computed via SGD
(b) Confidence Intervals computed by Boostrap: too expensive.
(c) This talk: we can compute using SGD.
(d) Application to adversarial attacks: implicitly learning the
manifold.
SGD in ERM – mini batch SGD
To solve empirical risk minimization (ERM)
f (θ) =
1
n
n
i=1
fi (θ),
where fi (θ) = θ(Zi ).
At each step:
Draw S i.i.d. uniformly random indices It from [n] (with
replacement)
Compute stochastic gradient gs(θt) = 1
S i∈It
fi (θt)
θt+1 = θt − ηgs(θt)
Asymptotic normality – classical results
M-estimator – statistics
When number of samples n → ∞,
√
n(θ − θ∗
) N(0, H∗−1
G∗
H∗−1
),
where G∗ = EZ [ θ θ∗ (Z) θ θ∗ (Z) ] and H∗ = EZ [ 2
θ θ∗ (Z)].
Stochastic approximation – optimization
When number of steps t → ∞,
√
t
1
t
t
i=1
θt − θ N(0, H−1
GH−1
),
where G = E[gs(θ)gs(θ) |= θ] and H = 2f (θ).
Asymptotic normality – classical results
M-estimator – statistics
When number of samples n → ∞,
√
n(θ − θ∗
) N(0, H∗−1
G∗
H∗−1
),
where G∗ = EZ [ θ θ∗ (Z) θ θ∗ (Z) ] and H∗ = EZ [ 2
θ θ∗ (Z)].
Stochastic approximation – optimization
When number of steps t → ∞,
√
t
1
t
t
i=1
θt − θ N(0, H−1
GH−1
),
where G = E[gs(θ)gs(θ) |= θ] and H = 2f (θ).
SGD not only useful for optimization,
but also useful for statistical inference!
Statistical inference using mini batch SGD
burn in
θ−b, θ−b+1, · · · θ−1, θ0,
¯θ
(i)
t =1
t
t
j=1 θ
(i)
j
θ
(1)
1 , θ
(1)
2 , · · · , θ
(1)
t
discarded
θ
(1)
t+1, θ
(1)
t+2, · · · , θ
(1)
t+d
θ
(2)
1 , θ
(2)
2 , · · · , θ
(2)
t θ
(2)
t+1, θ
(2)
t+2, · · · , θ
(2)
t+d
...
θ
(R)
1 , θ
(R)
2 , · · · , θ
(R)
t θ
(R)
t+1, θ
(R)
t+2, · · · , θ
(R)
t+d
At each step:
Draw S i.i.d. uniformly random
indices It from [n] (with replacement)
Compute stochastic gradient
gs(θt) = 1
S i∈It
fi (θt)
θt+1 = θt − ηgs(θt)
Use an ensemble of i = 1, 2, . . . , R estima-
tors for statistical inference:
θ(i)
= θ +
√
S
√
t
√
n
(¯θ
(i)
t − θ).
Advantages of SGD inference
empirically not more expensive, uses
many fewer operations than
bootstrap
can be used when training neural
networks with SGD
easy to plug into existing SGD code
Other statistical inference
methods
directly computing inverse
Fisher information matrix
resampling:
bootstrap, subsampling
Advantages of SGD inference
empirically not more expensive, uses
many fewer operations than
bootstrap
can be used when training neural
networks with SGD
easy to plug into existing SGD code
Other statistical inference
methods
directly computing inverse
Fisher information matrix
resampling:
bootstrap, subsampling
Too computationally expensive,
not suited for “big data”!
Intuition – Ornstein-Uhlenbeck process approximation
In SGD, denote ∆t = θt − θ, and we have
∆t+1 = ∆t − ηgs(θ + ∆t).
∆t can be approximated by the Ornstein-Uhlenbeck process
d∆(T) = −H∆ dT +
√
ηG
1
2 dB(T),
where B(T) is a standard Brownian motion.
Intuition – Ornstein-Uhlenbeck process approximation
Denote ¯θt = 1
t
t
i=1 θt.
√
t(¯θt − θ) can be approximated as
√
t(¯θt − θ) = 1√
t
t
i=1
(θi − θ)
= 1
η
√
t
t
i=1
(θi − θ)η ≈ 1
η
√
t
tη
0
∆(T) dT,
(1)
where we use the approximation that η ≈ dT. By rearranging terms and multiplying both sides by H−1,
we can rewrite the stochastic differential equation as ∆(T) dT = −H−1 d∆(T) +
√
ηH−1G
1
2 dB(T).
Thus, we have
tη
0
∆(T) dT = −H−1
(∆(tη) − ∆(0)) +
√
ηH−1
G
1
2 B(tη). (2)
After plugging (2) into (1) we have
√
t ¯θt − θ ≈ − 1
η
√
t
H−1
(∆(tη) − ∆(0)) + 1√
tη
H−1
G
1
2 B(tη).
When ∆(0) = 0, the variance Var −1/η
√
t · H−1 (∆(tη) − ∆(0)) = O (1/tη). Since 1/√
tη ·
H−1G
1
2 B(tη) ∼ N(0, H−1GH−1), when η → 0 and ηt → ∞, we conclude that
√
t(¯θt − θ) ∼ N(0, H−1
GH−1
).
Theoretical guarantee
Theorem
For a differentiable convex function f (θ) = 1
n
n
i=1 fi (θ), with gradient f (θ), let θ ∈ Rp be
its minimizer, and denote its Hessian at θ by H := 2f (θ) . Assume that ∀θ ∈ Rp, f satisfies:
(F1) Weak strong convexity: (θ − θ) f (θ) ≥ α θ − θ 2
2, for constant α > 0,
(F2) Lipschitz gradient continuity: f (θ) 2 ≤ L θ − θ 2, for constant L > 0,
(F3) Bounded Taylor remainder: f (θ) − H(θ − θ) 2 ≤ E θ − θ 2
2, for constant E > 0,
(F4) Bounded Hessian spectrum at θ: 0 < λL ≤ λi (H) ≤ λU < ∞, ∀i.
Furthermore, let gs(θ) be a stochastic gradient of f , satisfying:
(G1) E [gs(θ) | θ] = f (θ),
(G2) E gs(θ) 2
2 | θ ≤ A θ − θ 2
2 + B,
(G3) E gs(θ) 4
2 | θ ≤ C θ − θ 4
2 + D,
(G4) E gs(θ)gs(θ) | θ − G 2
≤ A1 θ − θ 2 + A2 θ − θ 2
2 + A3 θ − θ 3
2 + A4 θ − θ 4
2,
for positive, data dependent constants A, B, C, D, Ai , for i = 1, . . . , 4. Assume that
θ1 − θ 2
2 = O(η); then for sufficiently small step size η > 0, the average SGD sequence
θt = 1
t
n
i=1 θi satisfies:
tE[(¯θt − θ)(¯θt − θ) ] − H−1
GH−1
2
√
η + 1
tη + tη2,
where G = E[gs(θ)gs(θ) | θ].
Theoretical guarantee
Theorem
For a differentiable convex function f (θ) = 1
n
n
i=1 fi (θ), with gradient f (θ), let θ ∈ Rp be
its minimizer, and denote its Hessian at θ by H := 2f (θ) . Assume that ∀θ ∈ Rp, f satisfies:
(F1) Weak strong convexity: (θ − θ) f (θ) ≥ α θ − θ 2
2, for constant α > 0,
(F2) Lipschitz gradient continuity: f (θ) 2 ≤ L θ − θ 2, for constant L > 0,
(F3) Bounded Taylor remainder: f (θ) − H(θ − θ) 2 ≤ E θ − θ 2
2, for constant E > 0,
(F4) Bounded Hessian spectrum at θ: 0 < λL ≤ λi (H) ≤ λU < ∞, ∀i.
Furthermore, let gs(θ) be a stochastic gradient of f , satisfying:
(G1) E [gs(θ) | θ] = f (θ),
(G2) E gs(θ) 2
2 | θ ≤ A θ − θ 2
2 + B,
(G3) E gs(θ) 4
2 | θ ≤ C θ − θ 4
2 + D,
(G4) E gs(θ)gs(θ) | θ − G 2
≤ A1 θ − θ 2 + A2 θ − θ 2
2 + A3 θ − θ 3
2 + A4 θ − θ 4
2,
for positive, data dependent constants A, B, C, D, Ai , for i = 1, . . . , 4. Assume that
θ1 − θ 2
2 = O(η); then for sufficiently small step size η > 0, the average SGD sequence
θt = 1
t
n
i=1 θi satisfies:
tE[(¯θt − θ)(¯θt − θ) ] − H−1
GH−1
2
√
η + 1
tη + tη2,
where G = E[gs(θ)gs(θ) | θ].
Proof idea: H−1 = η i≥0(I − ηH)i
Comparison with bootstrap
Univariate model estimation
−1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00
0.0
0.5
1.0
1.5
2.0
N(0, 1/n)
θSGD − ¯θSGD
θbootstrap − ¯θbootstrap
(a) Normal
1√
2π
exp(−(x−µ)2
2
)
µ = 0
0.8 1.0 1.2 1.4 1.6
0
1
2
3
4
SGD
bootstrap
(b) Exponential
µe−µx
µ = 1
0.8 1.0 1.2 1.4
0
1
2
3
4
5
SGD
bootstrap
(c) Poisson
µx
e−µx
x!
µ = 1
95% confidence interval coverage simulation
η t = 100 t = 500 t = 2500
0.1 (0.957, 4.41) (0.955, 4.51) (0.960, 4.53)
0.02 (0.869, 3.30) (0.923, 3.77) (0.918, 3.87)
0.004 (0.634, 2.01) (0.862, 3.20) (0.916, 3.70)
(a) Bootstrap (0.941, 4.14), normal approximation (0.928, 3.87)
η t = 100 t = 500 t = 2500
0.1 (0.949, 4.74) (0.962, 4.91) (0.963, 4.94)
0.02 (0.845, 3.37) (0.916, 4.01) (0.927, 4.17)
0.004 (0.616, 2.00) (0.832, 3.30) (0.897, 3.93)
(b) Bootstrap (0.938, 4.47), normal approximation (0.925, 4.18)
Table 1: Linear regression: dimension = 10, 100 samples. (a) diagonal
covariance (b) non-diagonal covariance
η t = 100 t = 500 t = 2500
0.1 (0.872, 0.204) (0.937, 0.249) (0.939, 0.258)
0.02 (0.610, 0.112) (0.871, 0.196) (0.926, 0.237)
0.004 (0.312, 0.051) (0.596, 0.111) (0.86, 0.194)
(a) Bootstrap (0.932, 0.253), normal approximation (0.957, 0.264)
η t = 100 t = 500 t = 2500
0.1 (0.859, 0.206) (0.931, 0.255) (0.947, 0.266)
0.02 (0.600, 0.112) (0.847, 0.197) (0.931, 0.244)
0.004 (0.302, 0.051) (0.583, 0.111) (0.851, 0.195)
(b) Bootstrap (0.932, 0.245), normal approximation (0.954, 0.256)
Table 2: Logistic regression: dimension = 10, 1000 samples. (a) diagonal
covariance (b) non-diagonal covariance
Better when
each replicate’s average uses a longer consecutive sequence
larger step size
(coverage probability, confidence interval width)
Adversarial Attacks
Neural network classifiers with very high accuracy on test sets are
extremely susceptible to nearly imperceptible adversarial attacks.
Adversarial Attacks
Adversarial Attacks
Confidence intervals for mitigating adversarial examples
MNIST – logistic regression
0 5 10 15 20 25
0
5
10
15
20
25
(b) Original “0”:
P{0 | image} ≈ 1 − e−46
CI ≈ (1 − e−28
, 1 − e−64
)
0 5 10 15 20 25
0
5
10
15
20
25
(c) Adversarial “0”:
P{0 | image} ≈ e−17
CI ≈ (e−31
, 1 − e−11
)
0 5 10 15 20 25
0
5
10
15
20
25
Figure 1: MNIST adversarial perturbation
(scaled for display)
Adversarial examples produced by gradient attack have
large confidence intervals!

More Related Content

What's hot

Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)Shane Nicklas
 
ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022
 ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022 ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022
ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022
anasKhalaf4
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
Sean Meyn
 
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsReinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Sean Meyn
 
Tetsunao Matsuta
Tetsunao MatsutaTetsunao Matsuta
Tetsunao Matsuta
Suurist
 
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
The Statistical and Applied Mathematical Sciences Institute
 
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Alexander Litvinenko
 
Common fixed point theorems for random operators in hilbert space
Common fixed point theorems  for  random operators in hilbert spaceCommon fixed point theorems  for  random operators in hilbert space
Common fixed point theorems for random operators in hilbert space
Alexander Decker
 
Vancouver18
Vancouver18Vancouver18
Vancouver18
Christian Robert
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)Shane Nicklas
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)Shane Nicklas
 
Hiroaki Shiokawa
Hiroaki ShiokawaHiroaki Shiokawa
Hiroaki Shiokawa
Suurist
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
Mark Chang
 
On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restriction
VjekoslavKovac1
 
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,aTheta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
ijcsa
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theorem
JamesMa54
 
Iit jee question_paper
Iit jee question_paperIit jee question_paper
Iit jee question_paper
RahulMishra774
 

What's hot (18)

Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
 
ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022
 ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022 ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022
ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
 
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsReinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
 
Tetsunao Matsuta
Tetsunao MatsutaTetsunao Matsuta
Tetsunao Matsuta
 
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
 
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
 
Common fixed point theorems for random operators in hilbert space
Common fixed point theorems  for  random operators in hilbert spaceCommon fixed point theorems  for  random operators in hilbert space
Common fixed point theorems for random operators in hilbert space
 
Vancouver18
Vancouver18Vancouver18
Vancouver18
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
 
Hiroaki Shiokawa
Hiroaki ShiokawaHiroaki Shiokawa
Hiroaki Shiokawa
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restriction
 
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,aTheta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theorem
 
Sol7
Sol7Sol7
Sol7
 
Iit jee question_paper
Iit jee question_paperIit jee question_paper
Iit jee question_paper
 

Similar to Statistical Inference Using Stochastic Gradient Descent

Complex analysis notes
Complex analysis notesComplex analysis notes
Complex analysis notes
Prakash Dabhi
 
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
asahiushio1
 
3 grechnikov
3 grechnikov3 grechnikov
3 grechnikov
Yandex
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
Akira Tanimoto
 
Recurrence Relation for Achromatic Number of Line Graph of Graph
Recurrence Relation for Achromatic Number of Line Graph of GraphRecurrence Relation for Achromatic Number of Line Graph of Graph
Recurrence Relation for Achromatic Number of Line Graph of Graph
IRJET Journal
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
Daisuke Yoneoka
 
A common random fixed point theorem for rational inequality in hilbert space
A common random fixed point theorem for rational inequality in hilbert spaceA common random fixed point theorem for rational inequality in hilbert space
A common random fixed point theorem for rational inequality in hilbert space
Alexander Decker
 
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
The Statistical and Applied Mathematical Sciences Institute
 
Fixed point theorems for random variables in complete metric spaces
Fixed point theorems for random variables in complete metric spacesFixed point theorems for random variables in complete metric spaces
Fixed point theorems for random variables in complete metric spaces
Alexander Decker
 
Ejercicios prueba de algebra de la UTN- widmar aguilar
Ejercicios prueba de algebra de la UTN-  widmar aguilarEjercicios prueba de algebra de la UTN-  widmar aguilar
Ejercicios prueba de algebra de la UTN- widmar aguilar
Widmar Aguilar Gonzalez
 
Radiation
RadiationRadiation
Radiation
Soumith V
 
stochastic processes assignment help
stochastic processes assignment helpstochastic processes assignment help
stochastic processes assignment help
Statistics Homework Helper
 
Ejercicio de fasores
Ejercicio de fasoresEjercicio de fasores
Ejercicio de fasores
dpancheins
 
A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...
Alexander Decker
 
A common random fixed point theorem for rational ineqality in hilbert space ...
 A common random fixed point theorem for rational ineqality in hilbert space ... A common random fixed point theorem for rational ineqality in hilbert space ...
A common random fixed point theorem for rational ineqality in hilbert space ...
Alexander Decker
 
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docxATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ikirkton
 
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Hayato Watanabe
 
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
Tomonari Masada
 
Cálculo ii howard anton - capítulo 16 [tópicos do cálculo vetorial]
Cálculo ii   howard anton - capítulo 16 [tópicos do cálculo vetorial]Cálculo ii   howard anton - capítulo 16 [tópicos do cálculo vetorial]
Cálculo ii howard anton - capítulo 16 [tópicos do cálculo vetorial]
Henrique Covatti
 

Similar to Statistical Inference Using Stochastic Gradient Descent (20)

Complex analysis notes
Complex analysis notesComplex analysis notes
Complex analysis notes
 
Ps02 cmth03 unit 1
Ps02 cmth03 unit 1Ps02 cmth03 unit 1
Ps02 cmth03 unit 1
 
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
 
3 grechnikov
3 grechnikov3 grechnikov
3 grechnikov
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
 
Recurrence Relation for Achromatic Number of Line Graph of Graph
Recurrence Relation for Achromatic Number of Line Graph of GraphRecurrence Relation for Achromatic Number of Line Graph of Graph
Recurrence Relation for Achromatic Number of Line Graph of Graph
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
A common random fixed point theorem for rational inequality in hilbert space
A common random fixed point theorem for rational inequality in hilbert spaceA common random fixed point theorem for rational inequality in hilbert space
A common random fixed point theorem for rational inequality in hilbert space
 
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
 
Fixed point theorems for random variables in complete metric spaces
Fixed point theorems for random variables in complete metric spacesFixed point theorems for random variables in complete metric spaces
Fixed point theorems for random variables in complete metric spaces
 
Ejercicios prueba de algebra de la UTN- widmar aguilar
Ejercicios prueba de algebra de la UTN-  widmar aguilarEjercicios prueba de algebra de la UTN-  widmar aguilar
Ejercicios prueba de algebra de la UTN- widmar aguilar
 
Radiation
RadiationRadiation
Radiation
 
stochastic processes assignment help
stochastic processes assignment helpstochastic processes assignment help
stochastic processes assignment help
 
Ejercicio de fasores
Ejercicio de fasoresEjercicio de fasores
Ejercicio de fasores
 
A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...
 
A common random fixed point theorem for rational ineqality in hilbert space ...
 A common random fixed point theorem for rational ineqality in hilbert space ... A common random fixed point theorem for rational ineqality in hilbert space ...
A common random fixed point theorem for rational ineqality in hilbert space ...
 
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docxATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
 
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
 
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
 
Cálculo ii howard anton - capítulo 16 [tópicos do cálculo vetorial]
Cálculo ii   howard anton - capítulo 16 [tópicos do cálculo vetorial]Cálculo ii   howard anton - capítulo 16 [tópicos do cálculo vetorial]
Cálculo ii howard anton - capítulo 16 [tópicos do cálculo vetorial]
 

More from Center for Transportation Research - UT Austin

Flying with SAVES
Flying with SAVESFlying with SAVES
Regret of Queueing Bandits
Regret of Queueing BanditsRegret of Queueing Bandits
Advances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2XAdvances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2X
Center for Transportation Research - UT Austin
 
Collaborative Sensing and Heterogeneous Networking Leveraging Vehicular Fleets
Collaborative Sensing and Heterogeneous Networking Leveraging Vehicular FleetsCollaborative Sensing and Heterogeneous Networking Leveraging Vehicular Fleets
Collaborative Sensing and Heterogeneous Networking Leveraging Vehicular Fleets
Center for Transportation Research - UT Austin
 
Collaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated VehiclesCollaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated Vehicles
Center for Transportation Research - UT Austin
 
CAV/Mixed Transportation Modeling
CAV/Mixed Transportation ModelingCAV/Mixed Transportation Modeling
CAV/Mixed Transportation Modeling
Center for Transportation Research - UT Austin
 
Real-time Signal Control and Traffic Stability / Improved Models for Managed ...
Real-time Signal Control and Traffic Stability / Improved Models for Managed ...Real-time Signal Control and Traffic Stability / Improved Models for Managed ...
Real-time Signal Control and Traffic Stability / Improved Models for Managed ...
Center for Transportation Research - UT Austin
 
Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...
Center for Transportation Research - UT Austin
 
UT SAVES: Situation Aware Vehicular Engineering Systems
UT SAVES: Situation Aware Vehicular Engineering SystemsUT SAVES: Situation Aware Vehicular Engineering Systems
UT SAVES: Situation Aware Vehicular Engineering Systems
Center for Transportation Research - UT Austin
 
Regret of Queueing Bandits
Regret of Queueing BanditsRegret of Queueing Bandits
Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...
Center for Transportation Research - UT Austin
 
CAV/Mixed Transportation Modeling
CAV/Mixed Transportation ModelingCAV/Mixed Transportation Modeling
CAV/Mixed Transportation Modeling
Center for Transportation Research - UT Austin
 
Collaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated VehiclesCollaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated Vehicles
Center for Transportation Research - UT Austin
 
Advances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2XAdvances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2X
Center for Transportation Research - UT Austin
 
Statistical Inference Using Stochastic Gradient Descent
Statistical Inference Using Stochastic Gradient DescentStatistical Inference Using Stochastic Gradient Descent
Statistical Inference Using Stochastic Gradient Descent
Center for Transportation Research - UT Austin
 
Status of two projects: Real-time Signal Control and Traffic Stability; Impro...
Status of two projects: Real-time Signal Control and Traffic Stability; Impro...Status of two projects: Real-time Signal Control and Traffic Stability; Impro...
Status of two projects: Real-time Signal Control and Traffic Stability; Impro...
Center for Transportation Research - UT Austin
 
SAVES general overview
SAVES general overviewSAVES general overview
D-STOP Overview April 2018
D-STOP Overview April 2018D-STOP Overview April 2018
Managing Mobility during Design-Build Highway Construction: Successes and Les...
Managing Mobility during Design-Build Highway Construction: Successes and Les...Managing Mobility during Design-Build Highway Construction: Successes and Les...
Managing Mobility during Design-Build Highway Construction: Successes and Les...
Center for Transportation Research - UT Austin
 
The Future of Fly Ash in Texas Concrete
The Future of Fly Ash in Texas ConcreteThe Future of Fly Ash in Texas Concrete
The Future of Fly Ash in Texas Concrete
Center for Transportation Research - UT Austin
 

More from Center for Transportation Research - UT Austin (20)

Flying with SAVES
Flying with SAVESFlying with SAVES
Flying with SAVES
 
Regret of Queueing Bandits
Regret of Queueing BanditsRegret of Queueing Bandits
Regret of Queueing Bandits
 
Advances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2XAdvances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2X
 
Collaborative Sensing and Heterogeneous Networking Leveraging Vehicular Fleets
Collaborative Sensing and Heterogeneous Networking Leveraging Vehicular FleetsCollaborative Sensing and Heterogeneous Networking Leveraging Vehicular Fleets
Collaborative Sensing and Heterogeneous Networking Leveraging Vehicular Fleets
 
Collaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated VehiclesCollaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated Vehicles
 
CAV/Mixed Transportation Modeling
CAV/Mixed Transportation ModelingCAV/Mixed Transportation Modeling
CAV/Mixed Transportation Modeling
 
Real-time Signal Control and Traffic Stability / Improved Models for Managed ...
Real-time Signal Control and Traffic Stability / Improved Models for Managed ...Real-time Signal Control and Traffic Stability / Improved Models for Managed ...
Real-time Signal Control and Traffic Stability / Improved Models for Managed ...
 
Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...
 
UT SAVES: Situation Aware Vehicular Engineering Systems
UT SAVES: Situation Aware Vehicular Engineering SystemsUT SAVES: Situation Aware Vehicular Engineering Systems
UT SAVES: Situation Aware Vehicular Engineering Systems
 
Regret of Queueing Bandits
Regret of Queueing BanditsRegret of Queueing Bandits
Regret of Queueing Bandits
 
Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...
 
CAV/Mixed Transportation Modeling
CAV/Mixed Transportation ModelingCAV/Mixed Transportation Modeling
CAV/Mixed Transportation Modeling
 
Collaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated VehiclesCollaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated Vehicles
 
Advances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2XAdvances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2X
 
Statistical Inference Using Stochastic Gradient Descent
Statistical Inference Using Stochastic Gradient DescentStatistical Inference Using Stochastic Gradient Descent
Statistical Inference Using Stochastic Gradient Descent
 
Status of two projects: Real-time Signal Control and Traffic Stability; Impro...
Status of two projects: Real-time Signal Control and Traffic Stability; Impro...Status of two projects: Real-time Signal Control and Traffic Stability; Impro...
Status of two projects: Real-time Signal Control and Traffic Stability; Impro...
 
SAVES general overview
SAVES general overviewSAVES general overview
SAVES general overview
 
D-STOP Overview April 2018
D-STOP Overview April 2018D-STOP Overview April 2018
D-STOP Overview April 2018
 
Managing Mobility during Design-Build Highway Construction: Successes and Les...
Managing Mobility during Design-Build Highway Construction: Successes and Les...Managing Mobility during Design-Build Highway Construction: Successes and Les...
Managing Mobility during Design-Build Highway Construction: Successes and Les...
 
The Future of Fly Ash in Texas Concrete
The Future of Fly Ash in Texas ConcreteThe Future of Fly Ash in Texas Concrete
The Future of Fly Ash in Texas Concrete
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 

Statistical Inference Using Stochastic Gradient Descent

  • 1. Statistical inference using stochastic gradient descent Constantine Caramanis1 Liu Liu1 Anastasios (Tasos) Kyrillidis2 Tianyang Li1 1The University of Texas at Austin 2IBM T.J. Watson Research Center, Yorktown Heights → Rice University
  • 2. Statistical inference is important Quantifying uncertainty Signal? Noise? Skill? Luck? Frequentist inference confidence interval hypothesis testing
  • 3. Statistical inference is important Quantifying uncertainty Signal? Noise? Skill? Luck? Frequentist inference confidence interval hypothesis testing Confidence intervals can be used to detect adversarial attacks.
  • 4. Outline of This Work (a) Large Scale Problems: Point Estimates computed via SGD (b) Confidence Intervals computed by Boostrap: too expensive. (c) This talk: we can compute using SGD. (d) Application to adversarial attacks: implicitly learning the manifold.
  • 5. SGD in ERM – mini batch SGD To solve empirical risk minimization (ERM) f (θ) = 1 n n i=1 fi (θ), where fi (θ) = θ(Zi ). At each step: Draw S i.i.d. uniformly random indices It from [n] (with replacement) Compute stochastic gradient gs(θt) = 1 S i∈It fi (θt) θt+1 = θt − ηgs(θt)
  • 6. Asymptotic normality – classical results M-estimator – statistics When number of samples n → ∞, √ n(θ − θ∗ ) N(0, H∗−1 G∗ H∗−1 ), where G∗ = EZ [ θ θ∗ (Z) θ θ∗ (Z) ] and H∗ = EZ [ 2 θ θ∗ (Z)]. Stochastic approximation – optimization When number of steps t → ∞, √ t 1 t t i=1 θt − θ N(0, H−1 GH−1 ), where G = E[gs(θ)gs(θ) |= θ] and H = 2f (θ).
  • 7. Asymptotic normality – classical results M-estimator – statistics When number of samples n → ∞, √ n(θ − θ∗ ) N(0, H∗−1 G∗ H∗−1 ), where G∗ = EZ [ θ θ∗ (Z) θ θ∗ (Z) ] and H∗ = EZ [ 2 θ θ∗ (Z)]. Stochastic approximation – optimization When number of steps t → ∞, √ t 1 t t i=1 θt − θ N(0, H−1 GH−1 ), where G = E[gs(θ)gs(θ) |= θ] and H = 2f (θ). SGD not only useful for optimization, but also useful for statistical inference!
  • 8. Statistical inference using mini batch SGD burn in θ−b, θ−b+1, · · · θ−1, θ0, ¯θ (i) t =1 t t j=1 θ (i) j θ (1) 1 , θ (1) 2 , · · · , θ (1) t discarded θ (1) t+1, θ (1) t+2, · · · , θ (1) t+d θ (2) 1 , θ (2) 2 , · · · , θ (2) t θ (2) t+1, θ (2) t+2, · · · , θ (2) t+d ... θ (R) 1 , θ (R) 2 , · · · , θ (R) t θ (R) t+1, θ (R) t+2, · · · , θ (R) t+d At each step: Draw S i.i.d. uniformly random indices It from [n] (with replacement) Compute stochastic gradient gs(θt) = 1 S i∈It fi (θt) θt+1 = θt − ηgs(θt) Use an ensemble of i = 1, 2, . . . , R estima- tors for statistical inference: θ(i) = θ + √ S √ t √ n (¯θ (i) t − θ).
  • 9. Advantages of SGD inference empirically not more expensive, uses many fewer operations than bootstrap can be used when training neural networks with SGD easy to plug into existing SGD code Other statistical inference methods directly computing inverse Fisher information matrix resampling: bootstrap, subsampling
  • 10. Advantages of SGD inference empirically not more expensive, uses many fewer operations than bootstrap can be used when training neural networks with SGD easy to plug into existing SGD code Other statistical inference methods directly computing inverse Fisher information matrix resampling: bootstrap, subsampling Too computationally expensive, not suited for “big data”!
  • 11. Intuition – Ornstein-Uhlenbeck process approximation In SGD, denote ∆t = θt − θ, and we have ∆t+1 = ∆t − ηgs(θ + ∆t). ∆t can be approximated by the Ornstein-Uhlenbeck process d∆(T) = −H∆ dT + √ ηG 1 2 dB(T), where B(T) is a standard Brownian motion.
  • 12. Intuition – Ornstein-Uhlenbeck process approximation Denote ¯θt = 1 t t i=1 θt. √ t(¯θt − θ) can be approximated as √ t(¯θt − θ) = 1√ t t i=1 (θi − θ) = 1 η √ t t i=1 (θi − θ)η ≈ 1 η √ t tη 0 ∆(T) dT, (1) where we use the approximation that η ≈ dT. By rearranging terms and multiplying both sides by H−1, we can rewrite the stochastic differential equation as ∆(T) dT = −H−1 d∆(T) + √ ηH−1G 1 2 dB(T). Thus, we have tη 0 ∆(T) dT = −H−1 (∆(tη) − ∆(0)) + √ ηH−1 G 1 2 B(tη). (2) After plugging (2) into (1) we have √ t ¯θt − θ ≈ − 1 η √ t H−1 (∆(tη) − ∆(0)) + 1√ tη H−1 G 1 2 B(tη). When ∆(0) = 0, the variance Var −1/η √ t · H−1 (∆(tη) − ∆(0)) = O (1/tη). Since 1/√ tη · H−1G 1 2 B(tη) ∼ N(0, H−1GH−1), when η → 0 and ηt → ∞, we conclude that √ t(¯θt − θ) ∼ N(0, H−1 GH−1 ).
  • 13. Theoretical guarantee Theorem For a differentiable convex function f (θ) = 1 n n i=1 fi (θ), with gradient f (θ), let θ ∈ Rp be its minimizer, and denote its Hessian at θ by H := 2f (θ) . Assume that ∀θ ∈ Rp, f satisfies: (F1) Weak strong convexity: (θ − θ) f (θ) ≥ α θ − θ 2 2, for constant α > 0, (F2) Lipschitz gradient continuity: f (θ) 2 ≤ L θ − θ 2, for constant L > 0, (F3) Bounded Taylor remainder: f (θ) − H(θ − θ) 2 ≤ E θ − θ 2 2, for constant E > 0, (F4) Bounded Hessian spectrum at θ: 0 < λL ≤ λi (H) ≤ λU < ∞, ∀i. Furthermore, let gs(θ) be a stochastic gradient of f , satisfying: (G1) E [gs(θ) | θ] = f (θ), (G2) E gs(θ) 2 2 | θ ≤ A θ − θ 2 2 + B, (G3) E gs(θ) 4 2 | θ ≤ C θ − θ 4 2 + D, (G4) E gs(θ)gs(θ) | θ − G 2 ≤ A1 θ − θ 2 + A2 θ − θ 2 2 + A3 θ − θ 3 2 + A4 θ − θ 4 2, for positive, data dependent constants A, B, C, D, Ai , for i = 1, . . . , 4. Assume that θ1 − θ 2 2 = O(η); then for sufficiently small step size η > 0, the average SGD sequence θt = 1 t n i=1 θi satisfies: tE[(¯θt − θ)(¯θt − θ) ] − H−1 GH−1 2 √ η + 1 tη + tη2, where G = E[gs(θ)gs(θ) | θ].
  • 14. Theoretical guarantee Theorem For a differentiable convex function f (θ) = 1 n n i=1 fi (θ), with gradient f (θ), let θ ∈ Rp be its minimizer, and denote its Hessian at θ by H := 2f (θ) . Assume that ∀θ ∈ Rp, f satisfies: (F1) Weak strong convexity: (θ − θ) f (θ) ≥ α θ − θ 2 2, for constant α > 0, (F2) Lipschitz gradient continuity: f (θ) 2 ≤ L θ − θ 2, for constant L > 0, (F3) Bounded Taylor remainder: f (θ) − H(θ − θ) 2 ≤ E θ − θ 2 2, for constant E > 0, (F4) Bounded Hessian spectrum at θ: 0 < λL ≤ λi (H) ≤ λU < ∞, ∀i. Furthermore, let gs(θ) be a stochastic gradient of f , satisfying: (G1) E [gs(θ) | θ] = f (θ), (G2) E gs(θ) 2 2 | θ ≤ A θ − θ 2 2 + B, (G3) E gs(θ) 4 2 | θ ≤ C θ − θ 4 2 + D, (G4) E gs(θ)gs(θ) | θ − G 2 ≤ A1 θ − θ 2 + A2 θ − θ 2 2 + A3 θ − θ 3 2 + A4 θ − θ 4 2, for positive, data dependent constants A, B, C, D, Ai , for i = 1, . . . , 4. Assume that θ1 − θ 2 2 = O(η); then for sufficiently small step size η > 0, the average SGD sequence θt = 1 t n i=1 θi satisfies: tE[(¯θt − θ)(¯θt − θ) ] − H−1 GH−1 2 √ η + 1 tη + tη2, where G = E[gs(θ)gs(θ) | θ]. Proof idea: H−1 = η i≥0(I − ηH)i
  • 15. Comparison with bootstrap Univariate model estimation −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0 1.5 2.0 N(0, 1/n) θSGD − ¯θSGD θbootstrap − ¯θbootstrap (a) Normal 1√ 2π exp(−(x−µ)2 2 ) µ = 0 0.8 1.0 1.2 1.4 1.6 0 1 2 3 4 SGD bootstrap (b) Exponential µe−µx µ = 1 0.8 1.0 1.2 1.4 0 1 2 3 4 5 SGD bootstrap (c) Poisson µx e−µx x! µ = 1
  • 16. 95% confidence interval coverage simulation η t = 100 t = 500 t = 2500 0.1 (0.957, 4.41) (0.955, 4.51) (0.960, 4.53) 0.02 (0.869, 3.30) (0.923, 3.77) (0.918, 3.87) 0.004 (0.634, 2.01) (0.862, 3.20) (0.916, 3.70) (a) Bootstrap (0.941, 4.14), normal approximation (0.928, 3.87) η t = 100 t = 500 t = 2500 0.1 (0.949, 4.74) (0.962, 4.91) (0.963, 4.94) 0.02 (0.845, 3.37) (0.916, 4.01) (0.927, 4.17) 0.004 (0.616, 2.00) (0.832, 3.30) (0.897, 3.93) (b) Bootstrap (0.938, 4.47), normal approximation (0.925, 4.18) Table 1: Linear regression: dimension = 10, 100 samples. (a) diagonal covariance (b) non-diagonal covariance η t = 100 t = 500 t = 2500 0.1 (0.872, 0.204) (0.937, 0.249) (0.939, 0.258) 0.02 (0.610, 0.112) (0.871, 0.196) (0.926, 0.237) 0.004 (0.312, 0.051) (0.596, 0.111) (0.86, 0.194) (a) Bootstrap (0.932, 0.253), normal approximation (0.957, 0.264) η t = 100 t = 500 t = 2500 0.1 (0.859, 0.206) (0.931, 0.255) (0.947, 0.266) 0.02 (0.600, 0.112) (0.847, 0.197) (0.931, 0.244) 0.004 (0.302, 0.051) (0.583, 0.111) (0.851, 0.195) (b) Bootstrap (0.932, 0.245), normal approximation (0.954, 0.256) Table 2: Logistic regression: dimension = 10, 1000 samples. (a) diagonal covariance (b) non-diagonal covariance Better when each replicate’s average uses a longer consecutive sequence larger step size (coverage probability, confidence interval width)
  • 17. Adversarial Attacks Neural network classifiers with very high accuracy on test sets are extremely susceptible to nearly imperceptible adversarial attacks.
  • 20. Confidence intervals for mitigating adversarial examples MNIST – logistic regression 0 5 10 15 20 25 0 5 10 15 20 25 (b) Original “0”: P{0 | image} ≈ 1 − e−46 CI ≈ (1 − e−28 , 1 − e−64 ) 0 5 10 15 20 25 0 5 10 15 20 25 (c) Adversarial “0”: P{0 | image} ≈ e−17 CI ≈ (e−31 , 1 − e−11 ) 0 5 10 15 20 25 0 5 10 15 20 25 Figure 1: MNIST adversarial perturbation (scaled for display) Adversarial examples produced by gradient attack have large confidence intervals!