A Theoretical Framework for Robustness of (Deep) Classifiers against
Adversarial Examples
release_emp5rqinpnacdfijqkc3adhssm
by
Beilun Wang, Ji Gao, Yanjun Qi
2017
Abstract
Most machine learning classifiers, including deep neural networks, are
vulnerable to adversarial examples. Such inputs are typically generated by
adding small but purposeful modifications that lead to incorrect outputs while
imperceptible to human eyes. The goal of this paper is not to introduce a
single method, but to make theoretical steps towards fully understanding
adversarial examples. By using concepts from topology, our theoretical analysis
brings forth the key reasons why an adversarial example can fool a classifier
(f_1) and adds its oracle (f_2, like human eyes) in such analysis. By
investigating the topological relationship between two (pseudo)metric spaces
corresponding to predictor f_1 and oracle f_2, we develop necessary and
sufficient conditions that can determine if f_1 is always robust
(strong-robust) against adversarial examples according to f_2. Interestingly
our theorems indicate that just one unnecessary feature can make f_1 not
strong-robust, and the right feature representation learning is the key to
getting a classifier that is both accurate and strong-robust.
In text/plain
format
Archived Files and Locations
application/pdf 3.7 MB
file_6tjo5glsgrdjroj3zahnruskbe
|
arxiv.org (repository) web.archive.org (webarchive) |
1612.00334v8
access all versions, variants, and formats of this works (eg, pre-prints)