Search | arXiv e-print repository

On the invariance of the Kolmogorov complexity of $β$-expansions

Authors: Valentin Abadie, Helmut Boelcskei

Abstract: Measuring the complexity of real numbers is of major importance in computer science, for the purpose of knowing which computations are allowed. Consider a non-computable real number $s$, i.e. a real number which cannot be stored on a computer. We can store only an approximation of $x$, for instance by considering a finite bitstring representing a finite prefix of its binary expansion. For a fixed… ▽ More Measuring the complexity of real numbers is of major importance in computer science, for the purpose of knowing which computations are allowed. Consider a non-computable real number $s$, i.e. a real number which cannot be stored on a computer. We can store only an approximation of $x$, for instance by considering a finite bitstring representing a finite prefix of its binary expansion. For a fixed approximation error $\varepsilon>0$, the size of this finite bitstring is dependent on the \textit{algorithmic complexity} of the finite prefixes of the binary expansion of $s$. The \textit{algorithmic complexity} of a binary sequence $x$, often referred to as \textit{Kolmogorov complexity}, is the length of the smallest binary sequence $x'$, for which there exists an algorithm, such that when presented with $x'$ as input, it outputs $x$. The algorithmic complexity of the binary expansion of real numbers is widely studied, but the algorithmic complexity of other ways of representing real numbers remains poorly reported. However, knowing the algorithmic complexity of different representations may allow to define new and more efficient strategies to represent real numbers. Several papers have established an equivalence between the algorithmic complexity of the $q$-ary expansions, with $q \in \mathbb{N}$, $q \geq 2$, i.e. representations of real numbers in any integer base. In this paper, we study the algorithmic complexity of the so-called $β$-expansions, which are representations of real numbers in a base $β\in (1,2)$ that display a much more complex behavior as compared to the $q$-ary expansion. We show that for a given real number $s$, the binary expansion is a minimizer of algorithmic complexity, and that for every given $β\in (1,2)$, there exists a $β$-expansion of $s$ which achieves the lower bound of algorithmic complexity displayed by the binary expansion of $s$. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.01952 [pdf, other]

Three Quantization Regimes for ReLU Networks

Authors: Weigutian Ou, Philipp Schenkel, Helmut Bölcskei

Abstract: We establish the fundamental limits in the approximation of Lipschitz functions by deep ReLU neural networks with finite-precision weights. Specifically, three regimes, namely under-, over-, and proper quantization, in terms of minimax approximation error behavior as a function of network weight precision, are identified. This is accomplished by deriving nonasymptotic tight lower and upper bounds… ▽ More We establish the fundamental limits in the approximation of Lipschitz functions by deep ReLU neural networks with finite-precision weights. Specifically, three regimes, namely under-, over-, and proper quantization, in terms of minimax approximation error behavior as a function of network weight precision, are identified. This is accomplished by deriving nonasymptotic tight lower and upper bounds on the minimax approximation error. Notably, in the proper-quantization regime, neural networks exhibit memory-optimality in the approximation of Lipschitz functions. Deep networks have an inherent advantage over shallow networks in achieving memory-optimality. We also develop the notion of depth-precision tradeoff, showing that networks with high-precision weights can be converted into functionally equivalent deeper networks with low-precision weights, while preserving memory-optimality. This idea is reminiscent of sigma-delta analog-to-digital conversion, where oversampling rate is traded for resolution in the quantization of signal samples. We improve upon the best-known ReLU network approximation results for Lipschitz functions and describe a refinement of the bit extraction technique which could be of independent general interest. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.05259 [pdf, other]

Cellular automata, many-valued logic, and deep neural networks

Authors: Yani Zhang, Helmut Bölcskei

Abstract: We develop a theory characterizing the fundamental capability of deep neural networks to learn, from evolution traces, the logical rules governing the behavior of cellular automata (CA). This is accomplished by first establishing a novel connection between CA and Lukasiewicz propositional logic. While binary CA have been known for decades to essentially perform operations in Boolean logic, no such… ▽ More We develop a theory characterizing the fundamental capability of deep neural networks to learn, from evolution traces, the logical rules governing the behavior of cellular automata (CA). This is accomplished by first establishing a novel connection between CA and Lukasiewicz propositional logic. While binary CA have been known for decades to essentially perform operations in Boolean logic, no such relationship exists for general CA. We demonstrate that many-valued (MV) logic, specifically Lukasiewicz propositional logic, constitutes a suitable language for characterizing general CA as logical machines. This is done by interpolating CA transition functions to continuous piecewise linear functions, which, by virtue of the McNaughton theorem, yield formulae in MV logic characterizing the CA. Recognizing that deep rectified linear unit (ReLU) networks realize continuous piecewise linear functions, it follows that these formulae are naturally extracted from CA evolution traces by deep ReLU networks. A corresponding algorithm together with a software implementation is provided. Finally, we show that the dynamical behavior of CA can be realized by recurrent neural networks. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2401.12113 [pdf, other]

Extracting Formulae in Many-Valued Logic from Deep Neural Networks

Authors: Yani Zhang, Helmut Bölcskei

Abstract: We propose a new perspective on deep ReLU networks, namely as circuit counterparts of Lukasiewicz infinite-valued logic -- a many-valued (MV) generalization of Boolean logic. An algorithm for extracting formulae in MV logic from deep ReLU networks is presented. As the algorithm applies to networks with general, in particular also real-valued, weights, it can be used to extract logical formulae fro… ▽ More We propose a new perspective on deep ReLU networks, namely as circuit counterparts of Lukasiewicz infinite-valued logic -- a many-valued (MV) generalization of Boolean logic. An algorithm for extracting formulae in MV logic from deep ReLU networks is presented. As the algorithm applies to networks with general, in particular also real-valued, weights, it can be used to extract logical formulae from deep ReLU networks trained on data. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2303.03731 [pdf, ps, other]

Completion of Matrices with Low Description Complexity

Authors: Erwin Riegler, Günther Koliander, David Stotz, Helmut Bölcskei

Abstract: We propose a theory for matrix completion that goes beyond the low-rank structure commonly considered in the literature and applies to general matrices of low description complexity. Specifically, complexity of the sets of matrices encompassed by the theory is measured in terms of Hausdorff and upper Minkowski dimensions. Our goal is the characterization of the number of linear measurements, with… ▽ More We propose a theory for matrix completion that goes beyond the low-rank structure commonly considered in the literature and applies to general matrices of low description complexity. Specifically, complexity of the sets of matrices encompassed by the theory is measured in terms of Hausdorff and upper Minkowski dimensions. Our goal is the characterization of the number of linear measurements, with an emphasis on rank-$1$ measurements, needed for the existence of an algorithm that yields reconstruction, either perfect, with probability 1, or with arbitrarily small probability of error, depending on the setup. Concretely, we show that matrices taken from a set $\mathcal{U}$ such that $\mathcal{U}-\mathcal{U}$ has Hausdorff dimension $s$ can be recovered from $k>s$ measurements, and random matrices supported on a set $\mathcal{U}$ of Hausdorff dimension $s$ can be recovered with probability 1 from $k>s$ measurements. What is more, we establish the existence of recovery mappings that are robust against additive perturbations or noise in the measurements. Concretely, we show that there are $β$-Hölder continuous mappings recovering matrices taken from a set of upper Minkowski dimension $s$ from $k>2s/(1-β)$ measurements and, with arbitrarily small probability of error, random matrices supported on a set of upper Minkowski dimension $s$ from $k>s/(1-β)$ measurements. The numerous concrete examples we consider include low-rank matrices, sparse matrices, QR decompositions with sparse R-components, and matrices of fractal nature. △ Less

Submitted 27 November, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

arXiv:2211.15466 [pdf, other]

Metric entropy of causal, discrete-time LTI systems

Authors: Clemens Hutter, Thomas Allard, Helmut Bölcskei

Abstract: In [1] it is shown that recurrent neural networks (RNNs) can learn - in a metric entropy optimal manner - discrete time, linear time-invariant (LTI) systems. This is effected by comparing the number of bits needed to encode the approximating RNN to the metric entropy of the class of LTI systems under consideration [2, 3]. The purpose of this note is to provide an elementary self-contained proof of… ▽ More In [1] it is shown that recurrent neural networks (RNNs) can learn - in a metric entropy optimal manner - discrete time, linear time-invariant (LTI) systems. This is effected by comparing the number of bits needed to encode the approximating RNN to the metric entropy of the class of LTI systems under consideration [2, 3]. The purpose of this note is to provide an elementary self-contained proof of the metric entropy results in [2, 3], in the process of which minor mathematical issues appearing in [2, 3] are cleaned up. These corrections also lead to the correction of a constant in a result in [1] (see Remark 2.5). △ Less

Submitted 28 November, 2022; originally announced November 2022.

Comments: [1] arXiv:2105.02556

arXiv:2111.12312 [pdf, ps, other]

Lossy Compression of General Random Variables

Authors: Erwin Riegler, Helmut Bölcskei, Günther Koliander

Abstract: This paper is concerned with the lossy compression of general random variables, specifically with rate-distortion theory and quantization of random variables taking values in general measurable spaces such as, e.g., manifolds and fractal sets. Manifold structures are prevalent in data science, e.g., in compressed sensing, machine learning, image processing, and handwritten digit recognition. Fract… ▽ More This paper is concerned with the lossy compression of general random variables, specifically with rate-distortion theory and quantization of random variables taking values in general measurable spaces such as, e.g., manifolds and fractal sets. Manifold structures are prevalent in data science, e.g., in compressed sensing, machine learning, image processing, and handwritten digit recognition. Fractal sets find application in image compression and in the modeling of Ethernet traffic. Our main contributions are bounds on the rate-distortion function and the quantization error. These bounds are very general and essentially only require the existence of reference measures satisfying certain regularity conditions in terms of small ball probabilities. To illustrate the wide applicability of our results, we particularize them to random variables taking values in i) manifolds, namely, hyperspheres and Grassmannians, and ii) self-similar sets characterized by iterated function systems satisfying the weak separation property. △ Less

Submitted 2 June, 2023; v1 submitted 24 November, 2021; originally announced November 2021.

arXiv:2107.12466 [pdf, ps, other]

High-Dimensional Distribution Generation Through Deep Neural Networks

Authors: Dmytro Perekrestenko, Léandre Eberhard, Helmut Bölcskei

Abstract: We show that every $d$-dimensional probability distribution of bounded support can be generated through deep ReLU networks out of a $1$-dimensional uniform input distribution. What is more, this is possible without incurring a cost - in terms of approximation error measured in Wasserstein-distance - relative to generating the $d$-dimensional target distribution from $d$ independent random variable… ▽ More We show that every $d$-dimensional probability distribution of bounded support can be generated through deep ReLU networks out of a $1$-dimensional uniform input distribution. What is more, this is possible without incurring a cost - in terms of approximation error measured in Wasserstein-distance - relative to generating the $d$-dimensional target distribution from $d$ independent random variables. This is enabled by a vast generalization of the space-filling approach discovered in (Bailey & Telgarsky, 2018). The construction we propose elicits the importance of network depth in driving the Wasserstein distance between the target distribution and its neural network approximation to zero. Finally, we find that, for histogram target distributions, the number of bits needed to encode the corresponding generative network equals the fundamental limit for encoding probability distributions as dictated by quantization theory. △ Less

Submitted 27 August, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

Comments: v3 Figures 2 and 6 were changed to provide more illustrative examples. Published in Partial Differential Equations and Applications, Springer, Sept. 2021

arXiv:2105.02556 [pdf, other]

Metric Entropy Limits on Recurrent Neural Network Learning of Linear Dynamical Systems

Authors: Clemens Hutter, Recep Gül, Helmut Bölcskei

Abstract: One of the most influential results in neural network theory is the universal approximation theorem [1, 2, 3] which states that continuous functions can be approximated to within arbitrary accuracy by single-hidden-layer feedforward neural networks. The purpose of this paper is to establish a result in this spirit for the approximation of general discrete-time linear dynamical systems - including… ▽ More One of the most influential results in neural network theory is the universal approximation theorem [1, 2, 3] which states that continuous functions can be approximated to within arbitrary accuracy by single-hidden-layer feedforward neural networks. The purpose of this paper is to establish a result in this spirit for the approximation of general discrete-time linear dynamical systems - including time-varying systems - by recurrent neural networks (RNNs). For the subclass of linear time-invariant (LTI) systems, we devise a quantitative version of this statement. Specifically, measuring the complexity of the considered class of LTI systems through metric entropy according to [4], we show that RNNs can optimally learn - or identify in system-theory parlance - stable LTI systems. For LTI systems whose input-output relation is characterized through a difference equation, this means that RNNs can learn the difference equation from input-output traces in a metric-entropy optimal manner. △ Less

Submitted 15 December, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

Comments: 28 pages

arXiv:2101.09341 [pdf, ps, other]

Beurling-type density criteria for system identification

Authors: V. Vlačić, C. Aubel, H. Bölcskei

Abstract: This paper addresses the problem of identifying a linear time-varying (LTV) system characterized by a (possibly infinite) discrete set of delay-Doppler shifts without a lattice (or other geometry-discretizing) constraint on the support set. Concretely, we show that a class of such LTV systems is identifiable whenever the upper uniform Beurling density of the delay-Doppler support sets, measured un… ▽ More This paper addresses the problem of identifying a linear time-varying (LTV) system characterized by a (possibly infinite) discrete set of delay-Doppler shifts without a lattice (or other geometry-discretizing) constraint on the support set. Concretely, we show that a class of such LTV systems is identifiable whenever the upper uniform Beurling density of the delay-Doppler support sets, measured uniformly over the class, is strictly less than 1/2. The proof of this result reveals an interesting relation between LTV system identification and interpolation in the Bargmann-Fock space. Moreover, we show that this density condition is also necessary for classes of systems invariant under time-frequency shifts and closed under a natural topology on the support sets. We furthermore show that identifiability guarantees robust recovery of the delay-Doppler support set, as well as the weights of the individual delay-Doppler shifts, both in the sense of asymptotically vanishing reconstruction error for vanishing measurement error. △ Less

Submitted 22 January, 2021; originally announced January 2021.

arXiv:2006.16664 [pdf, ps, other]

Constructive Universal High-Dimensional Distribution Generation through Deep ReLU Networks

Authors: Dmytro Perekrestenko, Stephan Müller, Helmut Bölcskei

Abstract: We present an explicit deep neural network construction that transforms uniformly distributed one-dimensional noise into an arbitrarily close approximation of any two-dimensional Lipschitz-continuous target distribution. The key ingredient of our design is a generalization of the "space-filling" property of sawtooth functions discovered in (Bailey & Telgarsky, 2018). We elicit the importance of de… ▽ More We present an explicit deep neural network construction that transforms uniformly distributed one-dimensional noise into an arbitrarily close approximation of any two-dimensional Lipschitz-continuous target distribution. The key ingredient of our design is a generalization of the "space-filling" property of sawtooth functions discovered in (Bailey & Telgarsky, 2018). We elicit the importance of depth - in our neural network construction - in driving the Wasserstein distance between the target distribution and the approximation realized by the network to zero. An extension to output distributions of arbitrary dimension is outlined. Finally, we show that the proposed construction does not incur a cost - in terms of error measured in Wasserstein-distance - relative to generating $d$-dimensional target distributions from $d$ independent random variables. △ Less

Submitted 5 June, 2021; v1 submitted 30 June, 2020; originally announced June 2020.

arXiv:2006.11727 [pdf, other]

Affine symmetries and neural network identifiability

Authors: Verner Vlačić, Helmut Bölcskei

Abstract: We address the following question of neural network identifiability: Suppose we are given a function $f:\mathbb{R}^m\to\mathbb{R}^n$ and a nonlinearity $ρ$. Can we specify the architecture, weights, and biases of all feed-forward neural networks with respect to $ρ$ giving rise to $f$? Existing literature on the subject suggests that the answer should be yes, provided we are only concerned with fin… ▽ More We address the following question of neural network identifiability: Suppose we are given a function $f:\mathbb{R}^m\to\mathbb{R}^n$ and a nonlinearity $ρ$. Can we specify the architecture, weights, and biases of all feed-forward neural networks with respect to $ρ$ giving rise to $f$? Existing literature on the subject suggests that the answer should be yes, provided we are only concerned with finding networks that satisfy certain "genericity conditions". Moreover, the identified networks are mutually related by symmetries of the nonlinearity. For instance, the $\tanh$ function is odd, and so flipping the signs of the incoming and outgoing weights of a neuron does not change the output map of the network. The results known hitherto, however, apply either to single-layer networks, or to networks satisfying specific structural assumptions (such as full connectivity), as well as to specific nonlinearities. In an effort to answer the identifiability question in greater generality, we consider arbitrary nonlinearities with potentially complicated affine symmetries, and we show that the symmetries can be used to find a rich set of networks giving rise to the same function $f$. The set obtained in this manner is, in fact, exhaustive (i.e., it contains all networks giving rise to $f$) unless there exists a network $\mathcal{A}$ "with no internal symmetries" giving rise to the identically zero function. This result can thus be interpreted as an analog of the rank-nullity theorem for linear operators. We furthermore exhibit a class of "$\tanh$-type" nonlinearities (including the tanh function itself) for which such a network $\mathcal{A}$ does not exist, thereby solving the identifiability question for these nonlinearities in full generality. Finally, we show that this class contains nonlinearities with arbitrarily complicated symmetries. △ Less

Submitted 22 October, 2020; v1 submitted 21 June, 2020; originally announced June 2020.

Comments: 59 pages, 9 figures

arXiv:2006.02310 [pdf, ps, other]

Canonical Conditions for K/2 Degrees of Freedom

Authors: Recep Gül, David Stotz, Syed Ali Jafar, Helmut Bölcskei, Shlomo Shamai

Abstract: We present a necessary and sufficient condition for $1/2$ degree of freedom for each user in constant $K$-user single-antenna interference channels. This condition applies to all channel topologies, i.e., to fully-connected channels as well as channels that have individual links absent, reflected by corresponding zeros in the channel matrix. Moreover, it captures the essence of interference alignm… ▽ More We present a necessary and sufficient condition for $1/2$ degree of freedom for each user in constant $K$-user single-antenna interference channels. This condition applies to all channel topologies, i.e., to fully-connected channels as well as channels that have individual links absent, reflected by corresponding zeros in the channel matrix. Moreover, it captures the essence of interference alignment by virtue of being expressed in terms of a generic injectivity condition that guarantees separability of signal and interference. Finally, we provide codebook constructions achieving $1/2$ degree of freedom for each user for all channel matrices satisfying our condition. △ Less

Submitted 3 June, 2020; originally announced June 2020.

arXiv:1906.06994 [pdf, other]

Neural network identifiability for a family of sigmoidal nonlinearities

Authors: Verner Vlačić, Helmut Bölcskei

Abstract: This paper addresses the following question of neural network identifiability: Does the input-output map realized by a feed-forward neural network with respect to a given nonlinearity uniquely specify the network architecture, weights, and biases? Existing literature on the subject Sussman 1992, Albertini, Sontag et al. 1993, Fefferman 1994 suggests that the answer should be yes, up to certain sym… ▽ More This paper addresses the following question of neural network identifiability: Does the input-output map realized by a feed-forward neural network with respect to a given nonlinearity uniquely specify the network architecture, weights, and biases? Existing literature on the subject Sussman 1992, Albertini, Sontag et al. 1993, Fefferman 1994 suggests that the answer should be yes, up to certain symmetries induced by the nonlinearity, and provided the networks under consideration satisfy certain "genericity conditions". The results in Sussman 1992 and Albertini, Sontag et al. 1993 apply to networks with a single hidden layer and in Fefferman 1994 the networks need to be fully connected. In an effort to answer the identifiability question in greater generality, we derive necessary genericity conditions for the identifiability of neural networks of arbitrary depth and connectivity with an arbitrary nonlinearity. Moreover, we construct a family of nonlinearities for which these genericity conditions are minimal, i.e., both necessary and sufficient. This family is large enough to approximate many commonly encountered nonlinearities to within arbitrary precision in the uniform norm. △ Less

Submitted 2 September, 2020; v1 submitted 11 June, 2019; originally announced June 2019.

Comments: 43 pages, 11 figures

arXiv:1901.02220 [pdf, other]

Deep Neural Network Approximation Theory

Authors: Dennis Elbrächter, Dmytro Perekrestenko, Philipp Grohs, Helmut Bölcskei

Abstract: This paper develops fundamental limits of deep neural network learning by characterizing what is possible if no constraints are imposed on the learning algorithm and on the amount of training data. Concretely, we consider Kolmogorov-optimal approximation through deep neural networks with the guiding theme being a relation between the complexity of the function (class) to be approximated and the co… ▽ More This paper develops fundamental limits of deep neural network learning by characterizing what is possible if no constraints are imposed on the learning algorithm and on the amount of training data. Concretely, we consider Kolmogorov-optimal approximation through deep neural networks with the guiding theme being a relation between the complexity of the function (class) to be approximated and the complexity of the approximating network in terms of connectivity and memory requirements for storing the network topology and the associated quantized weights. The theory we develop establishes that deep networks are Kolmogorov-optimal approximants for markedly different function classes, such as unit balls in Besov spaces and modulation spaces. In addition, deep networks provide exponential approximation accuracy - i.e., the approximation error decays exponentially in the number of nonzero weights in the network - of the multiplication operation, polynomials, sinusoidal functions, and certain smooth functions. Moreover, this holds true even for one-dimensional oscillatory textures and the Weierstrass function - a fractal function, neither of which has previously known methods achieving exponential approximation accuracy. We also show that in the approximation of sufficiently smooth functions finite-width deep networks require strictly smaller connectivity than finite-depth wide networks. △ Less

Submitted 12 March, 2021; v1 submitted 8 January, 2019; originally announced January 2019.

Comments: minor revision

arXiv:1811.03996 [pdf, ps, other]

Uncertainty relations and sparse signal recovery

Authors: Erwin Riegler, Helmut Bölcskei

Abstract: This chapter provides a principled introduction to uncertainty relations underlying sparse signal recovery. We start with the seminal work by Donoho and Stark, 1989, which defines uncertainty relations as upper bounds on the operator norm of the band-limitation operator followed by the time-limitation operator, generalize this theory to arbitrary pairs of operators, and then develop -- out of this… ▽ More This chapter provides a principled introduction to uncertainty relations underlying sparse signal recovery. We start with the seminal work by Donoho and Stark, 1989, which defines uncertainty relations as upper bounds on the operator norm of the band-limitation operator followed by the time-limitation operator, generalize this theory to arbitrary pairs of operators, and then develop -- out of this generalization -- the coherence-based uncertainty relations due to Elad and Bruckstein, 2002, as well as uncertainty relations in terms of concentration of $1$-norm or $2$-norm. The theory is completed with the recently discovered set-theoretic uncertainty relations which lead to best possible recovery thresholds in terms of a general measure of parsimony, namely Minkowski dimension. We also elaborate on the remarkable connection between uncertainty relations and the "large sieve", a family of inequalities developed in analytic number theory. It is finally shown how uncertainty relations allow to establish fundamental limits of practical signal recovery problems such as inpainting, declipping, super-resolution, and denoising of signals corrupted by impulse noise or narrowband interference. Detailed proofs are provided throughout the chapter. △ Less

Submitted 26 March, 2020; v1 submitted 9 November, 2018; originally announced November 2018.

Comments: Chapter in Information-theoretic Methods in Data Science, M. Rodrigues and Y. Eldar, Eds., Cambridge University Press, 2020

arXiv:1806.01528 [pdf, other]

The universal approximation power of finite-width deep ReLU networks

Authors: Dmytro Perekrestenko, Philipp Grohs, Dennis Elbrächter, Helmut Bölcskei

Abstract: We show that finite-width deep ReLU neural networks yield rate-distortion optimal approximation (Bölcskei et al., 2018) of polynomials, windowed sinusoidal functions, one-dimensional oscillatory textures, and the Weierstrass function, a fractal function which is continuous but nowhere differentiable. Together with their recently established universal approximation property of affine function syste… ▽ More We show that finite-width deep ReLU neural networks yield rate-distortion optimal approximation (Bölcskei et al., 2018) of polynomials, windowed sinusoidal functions, one-dimensional oscillatory textures, and the Weierstrass function, a fractal function which is continuous but nowhere differentiable. Together with their recently established universal approximation property of affine function systems (Bölcskei et al., 2018), this shows that deep neural networks approximate vastly different signal structures generated by the affine group, the Weyl-Heisenberg group, or through warping, and even certain fractals, all with approximation error decaying exponentially in the number of neurons. We also prove that in the approximation of sufficiently smooth functions finite-width deep networks require strictly smaller connectivity than finite-depth wide networks. △ Less

Submitted 5 June, 2018; originally announced June 2018.

arXiv:1805.03100 [pdf, ps, other]

Necessary Conditions for K/2 Degrees of Freedom

Authors: Recep Gül, Helmut Bölcskei, Shlomo Shamai

Abstract: Stotz et al., 2016, reported a sufficient (injectivity) condition for each user in a K-user single-antenna constant interference channel to achieve 1/2 degree of freedom. The present paper proves that this condition is necessary as well and hence provides an equivalence characterization of interference channel matrices allowing full degrees of freedom. Stotz et al., 2016, reported a sufficient (injectivity) condition for each user in a K-user single-antenna constant interference channel to achieve 1/2 degree of freedom. The present paper proves that this condition is necessary as well and hence provides an equivalence characterization of interference channel matrices allowing full degrees of freedom. △ Less

Submitted 8 May, 2018; originally announced May 2018.

arXiv:1804.08980 [pdf, ps, other]

Rate-Distortion Theory for General Sets and Measures

Authors: Erwin Riegler, Günther Koliander, Helmut Bölcskei

Abstract: This paper is concerned with a rate-distortion theory for sequences of i.i.d. random variables with general distribution supported on general sets including manifolds and fractal sets. Manifold structures are prevalent in data science, e.g., in compressed sensing, machine learning, image processing, and handwritten digit recognition. Fractal sets find application in image compression and in modeli… ▽ More This paper is concerned with a rate-distortion theory for sequences of i.i.d. random variables with general distribution supported on general sets including manifolds and fractal sets. Manifold structures are prevalent in data science, e.g., in compressed sensing, machine learning, image processing, and handwritten digit recognition. Fractal sets find application in image compression and in modeling of Ethernet traffic. We derive a lower bound on the (single-letter) rate-distortion function that applies to random variables X of general distribution and for continuous X reduces to the classical Shannon lower bound. Moreover, our lower bound is explicit up to a parameter obtained by solving a convex optimization problem in a nonnegative real variable. The only requirement for the bound to apply is the existence of a sigma-finite reference measure for X satisfying a certain subregularity condition. This condition is very general and prevents the reference measure from being highly concentrated on balls of small radii. To illustrate the wide applicability of our result, we evaluate the lower bound for a random variable distributed uniformly on a manifold, namely, the unit circle, and a random variable distributed uniformly on a self-similar set, namely, the middle third Cantor set. △ Less

Submitted 24 April, 2018; originally announced April 2018.

arXiv:1803.06887 [pdf, ps, other]

Lossless Analog Compression

Authors: Giovanni Alberti, Helmut Bölcskei, Camillo De Lellis, Günther Koliander, Erwin Riegler

Abstract: We establish the fundamental limits of lossless analog compression by considering the recovery of arbitrary m-dimensional real random vectors x from the noiseless linear measurements y=Ax with n x m measurement matrix A. Our theory is inspired by the groundbreaking work of Wu and Verdu (2010) on almost lossless analog compression, but applies to the nonasymptotic, i.e., fixed-m case, and considers… ▽ More We establish the fundamental limits of lossless analog compression by considering the recovery of arbitrary m-dimensional real random vectors x from the noiseless linear measurements y=Ax with n x m measurement matrix A. Our theory is inspired by the groundbreaking work of Wu and Verdu (2010) on almost lossless analog compression, but applies to the nonasymptotic, i.e., fixed-m case, and considers zero error probability. Specifically, our achievability result states that, for almost all A, the random vector x can be recovered with zero error probability provided that n > K(x), where K(x) is given by the infimum of the lower modified Minkowski dimension over all support sets U of x. We then particularize this achievability result to the class of s-rectifiable random vectors as introduced in Koliander et al. (2016); these are random vectors of absolutely continuous distribution---with respect to the s-dimensional Hausdorff measure---supported on countable unions of s-dimensional differentiable submanifolds of the m-dimensional real coordinate space. Countable unions of differentiable submanifolds include essentially all signal models used in the compressed sensing literature. Specifically, we prove that, for almost all A, s-rectifiable random vectors x can be recovered with zero error probability from n>s linear measurements. This threshold is, however, found not to be tight as exemplified by the construction of an s-rectifiable random vector that can be recovered with zero error probability from n<s linear measurements. This leads us to the introduction of the new class of s-analytic random vectors, which admit a strong converse in the sense of n greater than or equal to s being necessary for recovery with probability of error smaller than one. The central conceptual tools in the development of our theory are geometric measure theory and the theory of real analytic functions. △ Less

Submitted 17 July, 2019; v1 submitted 19 March, 2018; originally announced March 2018.

arXiv:1707.02711 [pdf, ps, other]

Topology Reduction in Deep Convolutional Feature Extraction Networks

Authors: Thomas Wiatowski, Philipp Grohs, Helmut Bölcskei

Abstract: Deep convolutional neural networks (CNNs) used in practice employ potentially hundreds of layers and $10$,$000$s of nodes. Such network sizes entail significant computational complexity due to the large number of convolutions that need to be carried out; in addition, a large number of parameters needs to be learned and stored. Very deep and wide CNNs may therefore not be well suited to application… ▽ More Deep convolutional neural networks (CNNs) used in practice employ potentially hundreds of layers and $10$,$000$s of nodes. Such network sizes entail significant computational complexity due to the large number of convolutions that need to be carried out; in addition, a large number of parameters needs to be learned and stored. Very deep and wide CNNs may therefore not be well suited to applications operating under severe resource constraints as is the case, e.g., in low-power embedded and mobile platforms. This paper aims at understanding the impact of CNN topology, specifically depth and width, on the network's feature extraction capabilities. We address this question for the class of scattering networks that employ either Weyl-Heisenberg filters or wavelets, the modulus non-linearity, and no pooling. The exponential feature map energy decay results in Wiatowski et al., 2017, are generalized to $\mathcal{O}(a^{-N})$, where an arbitrary decay factor $a>1$ can be realized through suitable choice of the Weyl-Heisenberg prototype function or the mother wavelet. We then show how networks of fixed (possibly small) depth $N$ can be designed to guarantee that $((1-\varepsilon)\cdot 100)\%$ of the input signal's energy are contained in the feature vector. Based on the notion of operationally significant nodes, we characterize, partly rigorously and partly heuristically, the topology-reducing effects of (effectively) band-limited input signals, band-limited filters, and feature map symmetries. Finally, for networks based on Weyl-Heisenberg filters, we determine the prototype function bandwidth that minimizes---for fixed network depth $N$---the average number of operationally significant nodes per layer. △ Less

Submitted 14 March, 2018; v1 submitted 10 July, 2017; originally announced July 2017.

Comments: Corrected errors in arguments on spectral decay of Sobolev functions. Replaced part of the decay results (Sections 5-7) by corresponding statements for effectively band-limited functions

Journal ref: Proc. of SPIE (Wavelets and Sparsity XVII), San Diego, USA, Vol. 10394, pp. 1039418:1-1039418:12, Aug. 2017, (invited paper)

arXiv:1705.01714 [pdf, other]

Optimal Approximation with Sparsely Connected Deep Neural Networks

Authors: Helmut Bölcskei, Philipp Grohs, Gitta Kutyniok, Philipp Petersen

Abstract: We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in $L^2(\mathbb R^d)$. In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functions from this class to within a prescribed accurac… ▽ More We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in $L^2(\mathbb R^d)$. In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functions from this class to within a prescribed accuracy. Additionally, we prove that our lower bounds are achievable for a broad family of function classes. Specifically, all function classes that are optimally approximated by a general class of representation systems---so-called \emph{affine systems}---can be approximated by deep neural networks with minimal connectivity and memory requirements. Affine systems encompass a wealth of representation systems from applied harmonic analysis such as wavelets, ridgelets, curvelets, shearlets, $α$-shearlets, and more generally $α$-molecules. Our central result elucidates a remarkable universality property of neural networks and shows that they achieve the optimum approximation properties of all affine systems combined. As a specific example, we consider the class of $α^{-1}$-cartoon-like functions, which is approximated optimally by $α$-shearlets. We also explain how our results can be extended to the case of functions on low-dimensional immersed manifolds. Finally, we present numerical experiments demonstrating that the standard stochastic gradient descent algorithm generates deep neural networks providing close-to-optimal approximation rates. Moreover, these results indicate that stochastic gradient descent can actually learn approximations that are sparse in the representation systems optimally sparsifying the function class the network is trained on. △ Less

Submitted 16 May, 2018; v1 submitted 4 May, 2017; originally announced May 2017.

MSC Class: 41A25; 82C32; 42C40; 42C15; 41A46; 68T05; 94A34; 94A12

arXiv:1704.03636 [pdf, other]

Energy Propagation in Deep Convolutional Neural Networks

Authors: Thomas Wiatowski, Philipp Grohs, Helmut Bölcskei

Abstract: Many practical machine learning tasks employ very deep convolutional neural networks. Such large depths pose formidable computational challenges in training and operating the network. It is therefore important to understand how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers. In addition, it is desirable that the feature extractor generated by the net… ▽ More Many practical machine learning tasks employ very deep convolutional neural networks. Such large depths pose formidable computational challenges in training and operating the network. It is therefore important to understand how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers. In addition, it is desirable that the feature extractor generated by the network be informative in the sense of the only signal mapping to the all-zeros feature vector being the zero input signal. This "trivial null-set" property can be accomplished by asking for "energy conservation" in the sense of the energy in the feature vector being proportional to that of the corresponding input signal. This paper establishes conditions for energy conservation (and thus for a trivial null-set) for a wide class of deep convolutional neural network-based feature extractors and characterizes corresponding feature map energy decay rates. Specifically, we consider general scattering networks employing the modulus non-linearity and we find that under mild analyticity and high-pass conditions on the filters (which encompass, inter alia, various constructions of Weyl-Heisenberg filters, wavelets, ridgelets, ($α$)-curvelets, and shearlets) the feature map energy decays at least polynomially fast. For broad families of wavelets and Weyl-Heisenberg filters, the guaranteed decay rate is shown to be exponential. Moreover, we provide handy estimates of the number of layers needed to have at least $((1-\varepsilon)\cdot 100)\%$ of the input signal energy be contained in the feature vector. △ Less

Submitted 1 February, 2018; v1 submitted 12 April, 2017; originally announced April 2017.

Comments: Corrected errors in arguments on the spectral decay of Sobolev functions and on the volume of tubes, IEEE Transactions on Information Theory, 2018

arXiv:1701.02538 [pdf, other]

Vandermonde Matrices with Nodes in the Unit Disk and the Large Sieve

Authors: Céline Aubel, Helmut Bölcskei

Abstract: We derive bounds on the extremal singular values and the condition number of NxK, with N>=K, Vandermonde matrices with nodes in the unit disk. The mathematical techniques we develop to prove our main results are inspired by a link---first established by by Selberg [1] and later extended by Moitra [2]---between the extremal singular values of Vandermonde matrices with nodes on the unit circle and l… ▽ More We derive bounds on the extremal singular values and the condition number of NxK, with N>=K, Vandermonde matrices with nodes in the unit disk. The mathematical techniques we develop to prove our main results are inspired by a link---first established by by Selberg [1] and later extended by Moitra [2]---between the extremal singular values of Vandermonde matrices with nodes on the unit circle and large sieve inequalities. Our main conceptual contribution lies in establishing a connection between the extremal singular values of Vandermonde matrices with nodes in the unit disk and a novel large sieve inequality involving polynomials in z \in C with |z|<=1. Compared to Bazán's upper bound on the condition number [3], which, to the best of our knowledge, constitutes the only analytical result---available in the literature---on the condition number of Vandermonde matrices with nodes in the unit disk, our bound not only takes a much simpler form, but is also sharper for certain node configurations. Moreover, the bound we obtain can be evaluated consistently in a numerically stable fashion, whereas the evaluation of Bazán's bound requires the solution of a linear system of equations which has the same condition number as the Vandermonde matrix under consideration and can therefore lead to numerical instability in practice. As a byproduct, our result---when particularized to the case of nodes on the unit circle---slightly improves upon the Selberg-Moitra bound. △ Less

Submitted 3 August, 2017; v1 submitted 10 January, 2017; originally announced January 2017.

Comments: 45 pages, 2 figures, accepted for publication in Applied and Computational Harmonic Analysis

MSC Class: 15A12; 65F35

arXiv:1612.03450 [pdf, other]

doi 10.1109/TIT.2018.2812824

Noisy subspace clustering via matching pursuits

Authors: Michael Tschannen, Helmut Bölcskei

Abstract: Sparsity-based subspace clustering algorithms have attracted significant attention thanks to their excellent performance in practical applications. A prominent example is the sparse subspace clustering (SSC) algorithm by Elhamifar and Vidal, which performs spectral clustering based on an adjacency matrix obtained by sparsely representing each data point in terms of all the other data points via th… ▽ More Sparsity-based subspace clustering algorithms have attracted significant attention thanks to their excellent performance in practical applications. A prominent example is the sparse subspace clustering (SSC) algorithm by Elhamifar and Vidal, which performs spectral clustering based on an adjacency matrix obtained by sparsely representing each data point in terms of all the other data points via the Lasso. When the number of data points is large or the dimension of the ambient space is high, the computational complexity of SSC quickly becomes prohibitive. Dyer et al. observed that SSC-OMP obtained by replacing the Lasso by the greedy orthogonal matching pursuit (OMP) algorithm results in significantly lower computational complexity, while often yielding comparable performance. The central goal of this paper is an analytical performance characterization of SSC-OMP for noisy data. Moreover, we introduce and analyze the SSC-MP algorithm, which employs matching pursuit (MP) in lieu of OMP. Both SSC-OMP and SSC-MP are proven to succeed even when the subspaces intersect and when the data points are contaminated by severe noise. The clustering conditions we obtain for SSC-OMP and SSC-MP are similar to those for SSC and for the thresholding-based subspace clustering (TSC) algorithm due to Heckel and Bölcskei. Analytical results in combination with numerical results indicate that both SSC-OMP and SSC-MP with a data-dependent stopping criterion automatically detect the dimensions of the subspaces underlying the data. Moreover, experiments on synthetic and on real data show that SSC-MP compares very favorably to SSC, SSC-OMP, TSC, and the nearest subspace neighbor algorithm, both in terms of clustering performance and running time. In addition, we find that, in contrast to SSC-OMP, the performance of SSC-MP is very robust with respect to the choice of parameters in the stopping criteria. △ Less

Submitted 8 June, 2018; v1 submitted 11 December, 2016; originally announced December 2016.

Comments: 24 pages, 5 figures

Journal ref: IEEE Transactions on Information Theory, Vol. 64, No. 6, pp. 4081-4104, June 2018

arXiv:1612.01103 [pdf, other]

doi 10.1109/TSP.2017.2736513

Robust nonparametric nearest neighbor random process clustering

Authors: Michael Tschannen, Helmut Bölcskei

Abstract: We consider the problem of clustering noisy finite-length observations of stationary ergodic random processes according to their generative models without prior knowledge of the model statistics and the number of generative models. Two algorithms, both using the $L^1$-distance between estimated power spectral densities (PSDs) as a measure of dissimilarity, are analyzed. The first one, termed neare… ▽ More We consider the problem of clustering noisy finite-length observations of stationary ergodic random processes according to their generative models without prior knowledge of the model statistics and the number of generative models. Two algorithms, both using the $L^1$-distance between estimated power spectral densities (PSDs) as a measure of dissimilarity, are analyzed. The first one, termed nearest neighbor process clustering (NNPC), relies on partitioning the nearest neighbor graph of the observations via spectral clustering. The second algorithm, simply referred to as $k$-means (KM), consists of a single $k$-means iteration with farthest point initialization and was considered before in the literature, albeit with a different dissimilarity measure. We prove that both algorithms succeed with high probability in the presence of noise and missing entries, and even when the generative process PSDs overlap significantly, all provided that the observation length is sufficiently large. Our results quantify the tradeoff between the overlap of the generative process PSDs, the observation length, the fraction of missing entries, and the noise variance. Finally, we provide extensive numerical results for synthetic and real data and find that NNPC outperforms state-of-the-art algorithms in human motion sequence clustering. △ Less

Submitted 28 September, 2017; v1 submitted 4 December, 2016; originally announced December 2016.

Comments: 15 pages, 7 figures

Journal ref: IEEE Transactions on Signal Processing, Vol. 65, No. 22, pp. 6009-6023, Nov. 2017

arXiv:1605.08283 [pdf, other]

Discrete Deep Feature Extraction: A Theory and New Architectures

Authors: Thomas Wiatowski, Michael Tschannen, Aleksandar Stanić, Philipp Grohs, Helmut Bölcskei

Abstract: First steps towards a mathematical theory of deep convolutional neural networks for feature extraction were made---for the continuous-time case---in Mallat, 2012, and Wiatowski and Bölcskei, 2015. This paper considers the discrete case, introduces new convolutional neural network architectures, and proposes a mathematical framework for their analysis. Specifically, we establish deformation and tra… ▽ More First steps towards a mathematical theory of deep convolutional neural networks for feature extraction were made---for the continuous-time case---in Mallat, 2012, and Wiatowski and Bölcskei, 2015. This paper considers the discrete case, introduces new convolutional neural network architectures, and proposes a mathematical framework for their analysis. Specifically, we establish deformation and translation sensitivity results of local and global nature, and we investigate how certain structural properties of the input signal are reflected in the corresponding feature vectors. Our theory applies to general filters and general Lipschitz-continuous non-linearities and pooling operators. Experiments on handwritten digit classification and facial landmark detection---including feature importance evaluation---complement the theoretical findings. △ Less

Submitted 26 May, 2016; originally announced May 2016.

Comments: Proc. of International Conference on Machine Learning (ICML), New York, USA, June 2016, to appear

Journal ref: Proc. of International Conference on Machine Learning (ICML), New York, USA, pp. 2149-2158, June 2016

arXiv:1605.00912 [pdf, other]

Lossless Linear Analog Compression

Authors: Giovanni Alberti, Helmut Bölcskei, Camillo De Lellis, Günther Koliander, Erwin Riegler

Abstract: We establish the fundamental limits of lossless linear analog compression by considering the recovery of random vectors ${\boldsymbol{\mathsf{x}}}\in{\mathbb R}^m$ from the noiseless linear measurements ${\boldsymbol{\mathsf{y}}}=\boldsymbol{A}{\boldsymbol{\mathsf{x}}}$ with measurement matrix $\boldsymbol{A}\in{\mathbb R}^{n\times m}$. Specifically, for a random vector… ▽ More We establish the fundamental limits of lossless linear analog compression by considering the recovery of random vectors ${\boldsymbol{\mathsf{x}}}\in{\mathbb R}^m$ from the noiseless linear measurements ${\boldsymbol{\mathsf{y}}}=\boldsymbol{A}{\boldsymbol{\mathsf{x}}}$ with measurement matrix $\boldsymbol{A}\in{\mathbb R}^{n\times m}$. Specifically, for a random vector ${\boldsymbol{\mathsf{x}}}\in{\mathbb R}^m$ of arbitrary distribution we show that ${\boldsymbol{\mathsf{x}}}$ can be recovered with zero error probability from $n>\inf\underline{\operatorname{dim}}_\mathrm{MB}(U)$ linear measurements, where $\underline{\operatorname{dim}}_\mathrm{MB}(\cdot)$ denotes the lower modified Minkowski dimension and the infimum is over all sets $U\subseteq{\mathbb R}^{m}$ with $\mathbb{P}[{\boldsymbol{\mathsf{x}}}\in U]=1$. This achievability statement holds for Lebesgue almost all measurement matrices $\boldsymbol{A}$. We then show that $s$-rectifiable random vectors---a stochastic generalization of $s$-sparse vectors---can be recovered with zero error probability from $n>s$ linear measurements. From classical compressed sensing theory we would expect $n\geq s$ to be necessary for successful recovery of ${\boldsymbol{\mathsf{x}}}$. Surprisingly, certain classes of $s$-rectifiable random vectors can be recovered from fewer than $s$ measurements. Imposing an additional regularity condition on the distribution of $s$-rectifiable random vectors ${\boldsymbol{\mathsf{x}}}$, we do get the expected converse result of $s$ measurements being necessary. The resulting class of random vectors appears to be new and will be referred to as $s$-analytic random vectors. △ Less

Submitted 5 May, 2016; v1 submitted 3 May, 2016; originally announced May 2016.

arXiv:1605.00031 [pdf, other]

doi 10.1109/ISIT.2016.7541482

Deep Convolutional Neural Networks on Cartoon Functions

Authors: Philipp Grohs, Thomas Wiatowski, Helmut Bölcskei

Abstract: Wiatowski and Bölcskei, 2015, proved that deformation stability and vertical translation invariance of deep convolutional neural network-based feature extractors are guaranteed by the network structure per se rather than the specific convolution kernels and non-linearities. While the translation invariance result applies to square-integrable functions, the deformation stability bound holds for ban… ▽ More Wiatowski and Bölcskei, 2015, proved that deformation stability and vertical translation invariance of deep convolutional neural network-based feature extractors are guaranteed by the network structure per se rather than the specific convolution kernels and non-linearities. While the translation invariance result applies to square-integrable functions, the deformation stability bound holds for band-limited functions only. Many signals of practical relevance (such as natural images) exhibit, however, sharp and curved discontinuities and are, hence, not band-limited. The main contribution of this paper is a deformation stability result that takes these structural properties into account. Specifically, we establish deformation stability bounds for the class of cartoon functions introduced by Donoho, 2001. △ Less

Submitted 12 February, 2018; v1 submitted 29 April, 2016; originally announced May 2016.

Comments: This is a slightly updated version of the paper published in the ISIT proceedings. Specifically, we corrected errors in the arguments on the volume of tubes. Note that this correction does not affect the main statements of the paper

Journal ref: Proc. of IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, pp. 1163-1167, July 2016

arXiv:1604.07196 [pdf, ps, other]

Deterministic Performance Analysis of Subspace Methods for Cisoid Parameter Estimation

Authors: Céline Aubel, Helmut Bölcskei

Abstract: Performance analyses of subspace algorithms for cisoid parameter estimation available in the literature are predominantly of statistical nature with a focus on asymptotic$-$either in the sample size or the SNR$-$statements. This paper presents a deterministic, finite sample size, and finite-SNR performance analysis of the ESPRIT algorithm and the matrix pencil method. Our results are based, inter… ▽ More Performance analyses of subspace algorithms for cisoid parameter estimation available in the literature are predominantly of statistical nature with a focus on asymptotic$-$either in the sample size or the SNR$-$statements. This paper presents a deterministic, finite sample size, and finite-SNR performance analysis of the ESPRIT algorithm and the matrix pencil method. Our results are based, inter alia, on a new upper bound on the condition number of Vandermonde matrices with nodes inside the unit disk. This bound is obtained through a generalization of Hilbert's inequality frequently used in large sieve theory. △ Less

Submitted 25 April, 2016; originally announced April 2016.

Comments: IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, July 2016

arXiv:1512.06293 [pdf, other]

A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction

Authors: Thomas Wiatowski, Helmut Bölcskei

Abstract: Deep convolutional neural networks have led to breakthrough results in numerous practical machine learning tasks such as classification of images in the ImageNet data set, control-policy-learning to play Atari games or the board game Go, and image captioning. Many of these applications first perform feature extraction and then feed the results thereof into a trainable classifier. The mathematical… ▽ More Deep convolutional neural networks have led to breakthrough results in numerous practical machine learning tasks such as classification of images in the ImageNet data set, control-policy-learning to play Atari games or the board game Go, and image captioning. Many of these applications first perform feature extraction and then feed the results thereof into a trainable classifier. The mathematical analysis of deep convolutional neural networks for feature extraction was initiated by Mallat, 2012. Specifically, Mallat considered so-called scattering networks based on a wavelet transform followed by the modulus non-linearity in each network layer, and proved translation invariance (asymptotically in the wavelet scale parameter) and deformation stability of the corresponding feature extractor. This paper complements Mallat's results by developing a theory that encompasses general convolutional transforms, or in more technical parlance, general semi-discrete frames (including Weyl-Heisenberg filters, curvelets, shearlets, ridgelets, wavelets, and learned filters), general Lipschitz-continuous non-linearities (e.g., rectified linear units, shifted logistic sigmoids, hyperbolic tangents, and modulus functions), and general Lipschitz-continuous pooling operators emulating, e.g., sub-sampling and averaging. In addition, all of these elements can be different in different network layers. For the resulting feature extractor we prove a translation invariance result of vertical nature in the sense of the features becoming progressively more translation-invariant with increasing network depth, and we establish deformation sensitivity bounds that apply to signal classes such as, e.g., band-limited functions, cartoon functions, and Lipschitz functions. △ Less

Submitted 24 October, 2017; v1 submitted 19 December, 2015; originally announced December 2015.

Comments: IEEE Transactions on Information Theory, to appear

arXiv:1512.01017 [pdf, other]

Almost lossless analog signal separation and probabilistic uncertainty relations

Authors: David Stotz, Erwin Riegler, Eirikur Agustsson, Helmut Bölcskei

Abstract: We propose an information-theoretic framework for analog signal separation. Specifically, we consider the problem of recovering two analog signals, modeled as general random vectors, from the noiseless sum of linear measurements of the signals. Our framework is inspired by the groundbreaking work of Wu and Verdú (2010) on analog compression and encompasses, inter alia, inpainting, declipping, supe… ▽ More We propose an information-theoretic framework for analog signal separation. Specifically, we consider the problem of recovering two analog signals, modeled as general random vectors, from the noiseless sum of linear measurements of the signals. Our framework is inspired by the groundbreaking work of Wu and Verdú (2010) on analog compression and encompasses, inter alia, inpainting, declipping, super-resolution, the recovery of signals corrupted by impulse noise, and the separation of (e.g., audio or video) signals into two distinct components. The main results we report are general achievability bounds for the compression rate, i.e., the number of measurements relative to the dimension of the ambient space the signals live in, under either measurability or Hölder continuity imposed on the separator. Furthermore, we find a matching converse for sources of mixed discrete-continuous distribution. For measurable separators our proofs are based on a new probabilistic uncertainty relation which shows that the intersection of generic subspaces with general sets of sufficiently small Minkowski dimension is empty. Hölder continuous separators are dealt with by introducing the concept of regularized probabilistic uncertainty relations. The probabilistic uncertainty relations we develop are inspired by embedding results in dynamical systems theory due to Sauer et al. (1991) and---conceptually---parallel classical Donoho-Stark and Elad-Bruckstein uncertainty principles at the heart of compressed sensing theory. Operationally, the new uncertainty relations take the theory of sparse signal separation beyond traditional sparsity---as measured in terms of the number of non-zero entries---to the more general notion of low description complexity as quantified by Minkowski dimension. Finally, our approach also allows to significantly strengthen key results in Wu and Verdú (2010). △ Less

Submitted 13 July, 2017; v1 submitted 3 December, 2015; originally announced December 2015.

Comments: to appear in IEEE Trans. on Inf. Theory

arXiv:1509.01047 [pdf, other]

A Theory of Super-Resolution from Short-Time Fourier Transform Measurements

Authors: Céline Aubel, David Stotz, Helmut Bölcskei

Abstract: While spike trains are obviously not band-limited, the theory of super-resolution tells us that perfect recovery of unknown spike locations and weights from low-pass Fourier transform measurements is possible provided that the minimum spacing, $Δ$, between spikes is not too small. Specifically, for a measurement cutoff frequency of $f_c$, Donoho [2] showed that exact recovery is possible if the sp… ▽ More While spike trains are obviously not band-limited, the theory of super-resolution tells us that perfect recovery of unknown spike locations and weights from low-pass Fourier transform measurements is possible provided that the minimum spacing, $Δ$, between spikes is not too small. Specifically, for a measurement cutoff frequency of $f_c$, Donoho [2] showed that exact recovery is possible if the spikes (on $\mathbb{R}$) lie on a lattice and $Δ> 1/f_c$, but does not specify a corresponding recovery method. Cand$\text{è}$s and Fernandez-Granda [3, 4] provide a convex programming method for the recovery of periodic spike trains (i.e., spike trains on the torus $\mathbb{T}$), which succeeds provably if $Δ> 2/f_c$ and $f_c \geq 128$ or if $Δ> 1.26/f_c$ and $f_c \geq 10^3$, and does not need the spikes within the fundamental period to lie on a lattice. In this paper, we develop a theory of super-resolution from short-time Fourier transform (STFT) measurements. Specifically, we present a recovery method similar in spirit to the one in [3] for pure Fourier measurements. For a STFT Gaussian window function of width $σ= 1/(4f_c)$ this method succeeds provably if $Δ> 1/f_c$, without restrictions on $f_c$. Our theory is based on a measure-theoretic formulation of the recovery problem, which leads to considerable generality in the sense of the results being grid-free and applying to spike trains on both $\mathbb{R}$ and $\mathbb{T}$. The case of spike trains on $\mathbb{R}$ comes with significant technical challenges. For recovery of spike trains on $\mathbb{T}$ we prove that the correct solution can be approximated---in weak-* topology---by solving a sequence of finite-dimensional convex programming problems. △ Less

Submitted 23 January, 2017; v1 submitted 3 September, 2015; originally announced September 2015.

Comments: 66 pages, accepted for publication in the Journal of Fourier Analysis and Applications

MSC Class: 28A33; 46E27; 46N10; 42B10; 32A10; 46F05

arXiv:1507.07105 [pdf, ps, other]

Dimensionality-reduced subspace clustering

Authors: Reinhard Heckel, Michael Tschannen, Helmut Bölcskei

Abstract: Subspace clustering refers to the problem of clustering unlabeled high-dimensional data points into a union of low-dimensional linear subspaces, whose number, orientations, and dimensions are all unknown. In practice one may have access to dimensionality-reduced observations of the data only, resulting, e.g., from undersampling due to complexity and speed constraints on the acquisition device or m… ▽ More Subspace clustering refers to the problem of clustering unlabeled high-dimensional data points into a union of low-dimensional linear subspaces, whose number, orientations, and dimensions are all unknown. In practice one may have access to dimensionality-reduced observations of the data only, resulting, e.g., from undersampling due to complexity and speed constraints on the acquisition device or mechanism. More pertinently, even if the high-dimensional data set is available it is often desirable to first project the data points into a lower-dimensional space and to perform clustering there; this reduces storage requirements and computational cost. The purpose of this paper is to quantify the impact of dimensionality reduction through random projection on the performance of three subspace clustering algorithms, all of which are based on principles from sparse signal recovery. Specifically, we analyze the thresholding based subspace clustering (TSC) algorithm, the sparse subspace clustering (SSC) algorithm, and an orthogonal matching pursuit variant thereof (SSC-OMP). We find, for all three algorithms, that dimensionality reduction down to the order of the subspace dimensions is possible without incurring significant performance degradation. Moreover, these results are order-wise optimal in the sense that reducing the dimensionality further leads to a fundamentally ill-posed clustering problem. Our findings carry over to the noisy case as illustrated through analytical results for TSC and simulations for SSC and SSC-OMP. Extensive experiments on synthetic and real data complement our theoretical findings. △ Less

Submitted 13 December, 2015; v1 submitted 25 July, 2015; originally announced July 2015.

Comments: new results for the noisy case, additional simulation work, additional discussions in the main body

arXiv:1506.01866 [pdf, ps, other]

Characterizing degrees of freedom through additive combinatorics

Authors: David Stotz, Helmut Bölcskei

Abstract: We establish a formal connection between the problem of characterizing degrees of freedom (DoF) in constant single-antenna interference channels (ICs), with general channel matrix, and the field of additive combinatorics. The theory we develop is based on a recent breakthrough result by Hochman in fractal geometry. Our first main contribution is an explicit condition on the channel matrix to admit… ▽ More We establish a formal connection between the problem of characterizing degrees of freedom (DoF) in constant single-antenna interference channels (ICs), with general channel matrix, and the field of additive combinatorics. The theory we develop is based on a recent breakthrough result by Hochman in fractal geometry. Our first main contribution is an explicit condition on the channel matrix to admit full, i.e., $K/2$ DoF; this condition is satisfied for almost all channel matrices. We also provide a construction of corresponding DoF-optimal input distributions. The second main result is a new DoF-formula exclusively in terms of Shannon entropies. This formula is more amenable to both analytical statements and numerical evaluations than the DoF-formula by Wu et al., which is in terms of Rényi information dimension. We then use the new DoF-formula to shed light on the hardness of finding the exact number of DoF in ICs with rational channel coefficients, and to improve the best known bounds on the DoF of a well-studied channel matrix. △ Less

Submitted 5 June, 2015; originally announced June 2015.

Comments: submitted to IEEE Trans. on Inf. Theory

arXiv:1504.05487 [pdf, ps, other]

Deep Convolutional Neural Networks Based on Semi-Discrete Frames

Authors: Thomas Wiatowski, Helmut Bölcskei

Abstract: Deep convolutional neural networks have led to breakthrough results in practical feature extraction applications. The mathematical analysis of these networks was pioneered by Mallat, 2012. Specifically, Mallat considered so-called scattering networks based on identical semi-discrete wavelet frames in each network layer, and proved translation-invariance as well as deformation stability of the resu… ▽ More Deep convolutional neural networks have led to breakthrough results in practical feature extraction applications. The mathematical analysis of these networks was pioneered by Mallat, 2012. Specifically, Mallat considered so-called scattering networks based on identical semi-discrete wavelet frames in each network layer, and proved translation-invariance as well as deformation stability of the resulting feature extractor. The purpose of this paper is to develop Mallat's theory further by allowing for different and, most importantly, general semi-discrete frames (such as, e.g., Gabor frames, wavelets, curvelets, shearlets, ridgelets) in distinct network layers. This allows to extract wider classes of features than point singularities resolved by the wavelet transform. Our generalized feature extractor is proven to be translation-invariant, and we develop deformation stability results for a larger class of deformations than those considered by Mallat. For Mallat's wavelet-based feature extractor, we get rid of a number of technical conditions. The mathematical engine behind our results is continuous frame theory, which allows us to completely detach the invariance and deformation stability proofs from the particular algebraic structure of the underlying frames. △ Less

Submitted 21 April, 2015; originally announced April 2015.

Comments: Proc. of IEEE International Symposium on Information Theory (ISIT), Hong Kong, China, June 2015, to appear

Journal ref: Proc. of IEEE International Symposium on Information Theory (ISIT), Hong Kong, China, pp. 1212-1216, June 2015

arXiv:1504.05059 [pdf, other]

doi 10.1109/ISIT.2015.7282647

Nonparametric Nearest Neighbor Random Process Clustering

Authors: Michael Tschannen, Helmut Bölcskei

Abstract: We consider the problem of clustering noisy finite-length observations of stationary ergodic random processes according to their nonparametric generative models without prior knowledge of the model statistics and the number of generative models. Two algorithms, both using the L1-distance between estimated power spectral densities (PSDs) as a measure of dissimilarity, are analyzed. The first algori… ▽ More We consider the problem of clustering noisy finite-length observations of stationary ergodic random processes according to their nonparametric generative models without prior knowledge of the model statistics and the number of generative models. Two algorithms, both using the L1-distance between estimated power spectral densities (PSDs) as a measure of dissimilarity, are analyzed. The first algorithm, termed nearest neighbor process clustering (NNPC), to the best of our knowledge, is new and relies on partitioning the nearest neighbor graph of the observations via spectral clustering. The second algorithm, simply referred to as k-means (KM), consists of a single k-means iteration with farthest point initialization and was considered before in the literature, albeit with a different measure of dissimilarity and with asymptotic performance results only. We show that both NNPC and KM succeed with high probability under noise and even when the generative process PSDs overlap significantly, all provided that the observation length is sufficiently large. Our results quantify the tradeoff between the overlap of the generative process PSDs, the noise variance, and the observation length. Finally, we present numerical performance results for synthetic and real data. △ Less

Submitted 20 April, 2015; originally announced April 2015.

Comments: IEEE International Symposium on Information Theory (ISIT), June 2015, to appear

arXiv:1504.05036 [pdf, ps, other]

Density Criteria for the Identification of Linear Time-Varying Systems

Authors: Céline Aubel, Helmut Bölcskei

Abstract: This paper addresses the problem of identifying a linear time-varying (LTV) system characterized by a (possibly infinite) discrete set of delays and Doppler shifts. We prove that stable identifiability is possible if the upper uniform Beurling density of the delay-Doppler support set is strictly smaller than 1/2 and stable identifiability is impossible for densities strictly larger than 1/2. The p… ▽ More This paper addresses the problem of identifying a linear time-varying (LTV) system characterized by a (possibly infinite) discrete set of delays and Doppler shifts. We prove that stable identifiability is possible if the upper uniform Beurling density of the delay-Doppler support set is strictly smaller than 1/2 and stable identifiability is impossible for densities strictly larger than 1/2. The proof of this density theorem reveals an interesting relation between LTV system identification and interpolation in the Bargmann-Fock space. Finally, we introduce a subspace method for solving the system identification problem at hand. △ Less

Submitted 20 April, 2015; originally announced April 2015.

Comments: IEEE International Symposium on Information Theory (ISIT), Hong Kong, China, June 2015

arXiv:1504.04970 [pdf, ps, other]

Information-Theoretic Limits of Matrix Completion

Authors: Erwin Riegler, David Stotz, Helmut Bölcskei

Abstract: We propose an information-theoretic framework for matrix completion. The theory goes beyond the low-rank structure and applies to general matrices of "low description complexity". Specifically, we consider $m\times n$ random matrices $\mathbf{X}$ of arbitrary distribution (continuous, discrete, discrete-continuous mixture, or even singular). With $\mathcal{S}$ an $\varepsilon$-support set of… ▽ More We propose an information-theoretic framework for matrix completion. The theory goes beyond the low-rank structure and applies to general matrices of "low description complexity". Specifically, we consider $m\times n$ random matrices $\mathbf{X}$ of arbitrary distribution (continuous, discrete, discrete-continuous mixture, or even singular). With $\mathcal{S}$ an $\varepsilon$-support set of $\mathbf{X}$, i.e., $\mathrm{P}[\mathbf{X}\in\mathcal{S}]\geq 1-\varepsilon$, and $\underline{\mathrm{dim}}_\mathrm{B}(\mathcal{S})$ denoting the lower Minkowski dimension of $\mathcal{S}$, we show that $k> \underline{\mathrm{dim}}_\mathrm{B}(\mathcal{S})$ trace inner product measurements with measurement matrices $A_i$, suffice to recover $\mathbf{X}$ with probability of error at most $\varepsilon$. The result holds for Lebesgue a.a. $A_i$ and does not need incoherence between the $A_i$ and the unknown matrix $\mathbf{X}$. We furthermore show that $k> \underline{\mathrm{dim}}_\mathrm{B}(\mathcal{S})$ measurements also suffice to recover the unknown matrix $\mathbf{X}$ from measurements taken with rank-one $A_i$, again this applies to a.a. rank-one $A_i$. Rank-one measurement matrices are attractive as they require less storage space than general measurement matrices and can be applied faster. Particularizing our results to the recovery of low-rank matrices, we find that $k>(m+n-r)r$ measurements are sufficient to recover matrices of rank at most $r$. Finally, we construct a class of rank-$r$ matrices that can be recovered with arbitrarily small probability of error from $k<(m+n-r)r$ measurements. △ Less

Submitted 10 August, 2016; v1 submitted 20 April, 2015; originally announced April 2015.

arXiv:1404.7374 [pdf, ps, other]

doi 10.1109/ISIT.2014.6874877

Explicit and almost sure conditions for K/2 degrees of freedom

Authors: David Stotz, Helmut Bölcskei

Abstract: It is well known that in K-user constant single-antenna interference channels K/2 degrees of freedom (DoF) can be achieved for almost all channel matrices. Explicit conditions on the channel matrix to admit K/2 DoF are, however, not available. The purpose of this paper is to identify such explicit conditions, which are satisfied for almost all channel matrices. We also provide a construction of co… ▽ More It is well known that in K-user constant single-antenna interference channels K/2 degrees of freedom (DoF) can be achieved for almost all channel matrices. Explicit conditions on the channel matrix to admit K/2 DoF are, however, not available. The purpose of this paper is to identify such explicit conditions, which are satisfied for almost all channel matrices. We also provide a construction of corresponding asymptotically DoF-optimal input distributions. The main technical tool used is a recent breakthrough result by Hochman in fractal geometry. △ Less

Submitted 29 April, 2014; originally announced April 2014.

Comments: To be presented at IEEE Int. Symp. Inf. Theory 2014, Honolulu, HI

arXiv:1404.6818 [pdf, ps, other]

Subspace clustering of dimensionality-reduced data

Authors: Reinhard Heckel, Michael Tschannen, Helmut Bölcskei

Abstract: Subspace clustering refers to the problem of clustering unlabeled high-dimensional data points into a union of low-dimensional linear subspaces, assumed unknown. In practice one may have access to dimensionality-reduced observations of the data only, resulting, e.g., from "undersampling" due to complexity and speed constraints on the acquisition device. More pertinently, even if one has access to… ▽ More Subspace clustering refers to the problem of clustering unlabeled high-dimensional data points into a union of low-dimensional linear subspaces, assumed unknown. In practice one may have access to dimensionality-reduced observations of the data only, resulting, e.g., from "undersampling" due to complexity and speed constraints on the acquisition device. More pertinently, even if one has access to the high-dimensional data set it is often desirable to first project the data points into a lower-dimensional space and to perform the clustering task there; this reduces storage requirements and computational cost. The purpose of this paper is to quantify the impact of dimensionality-reduction through random projection on the performance of the sparse subspace clustering (SSC) and the thresholding based subspace clustering (TSC) algorithms. We find that for both algorithms dimensionality reduction down to the order of the subspace dimensions is possible without incurring significant performance degradation. The mathematical engine behind our theorems is a result quantifying how the affinities between subspaces change under random dimensionality reducing projections. △ Less

Submitted 27 April, 2014; originally announced April 2014.

Comments: ISIT 2014

arXiv:1403.3438 [pdf, ps, other]

Neighborhood Selection for Thresholding-based Subspace Clustering

Authors: Reinhard Heckel, Eirikur Agustsson, Helmut Bölcskei

Abstract: Subspace clustering refers to the problem of clustering high-dimensional data points into a union of low-dimensional linear subspaces, where the number of subspaces, their dimensions and orientations are all unknown. In this paper, we propose a variation of the recently introduced thresholding-based subspace clustering (TSC) algorithm, which applies spectral clustering to an adjacency matrix const… ▽ More Subspace clustering refers to the problem of clustering high-dimensional data points into a union of low-dimensional linear subspaces, where the number of subspaces, their dimensions and orientations are all unknown. In this paper, we propose a variation of the recently introduced thresholding-based subspace clustering (TSC) algorithm, which applies spectral clustering to an adjacency matrix constructed from the nearest neighbors of each data point with respect to the spherical distance measure. The new element resides in an individual and data-driven choice of the number of nearest neighbors. Previous performance results for TSC, as well as for other subspace clustering algorithms based on spectral clustering, come in terms of an intermediate performance measure, which does not address the clustering error directly. Our main analytical contribution is a performance analysis of the modified TSC algorithm (as well as the original TSC algorithm) in terms of the clustering error directly. △ Less

Submitted 13 March, 2014; originally announced March 2014.

Comments: ICASSP 2014

arXiv:1403.2239 [pdf, other]

doi 10.1109/ICASSP.2014.6853553

Super-Resolution from Short-Time Fourier Transform Measurements

Authors: Céline Aubel, David Stotz, Helmut Bölcskei

Abstract: While spike trains are obviously not band-limited, the theory of super-resolution tells us that perfect recovery of unknown spike locations and weights from low-pass Fourier transform measurements is possible provided that the minimum spacing, $Δ$, between spikes is not too small. Specifically, for a cutoff frequency of $f_c$, Donoho [2] shows that exact recovery is possible if $Δ> 1/f_c$, but doe… ▽ More While spike trains are obviously not band-limited, the theory of super-resolution tells us that perfect recovery of unknown spike locations and weights from low-pass Fourier transform measurements is possible provided that the minimum spacing, $Δ$, between spikes is not too small. Specifically, for a cutoff frequency of $f_c$, Donoho [2] shows that exact recovery is possible if $Δ> 1/f_c$, but does not specify a corresponding recovery method. On the other hand, Candès and Fernandez-Granda [3] provide a recovery method based on convex optimization, which provably succeeds as long as $Δ> 2/f_c$. In practical applications one often has access to windowed Fourier transform measurements, i.e., short-time Fourier transform (STFT) measurements, only. In this paper, we develop a theory of super-resolution from STFT measurements, and we propose a method that provably succeeds in recovering spike trains from STFT measurements provided that $Δ> 1/f_c$. △ Less

Submitted 10 March, 2014; originally announced March 2014.

Comments: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2014, to appear

arXiv:1307.4891 [pdf, other]

Robust Subspace Clustering via Thresholding

Authors: Reinhard Heckel, Helmut Bölcskei

Abstract: The problem of clustering noisy and incompletely observed high-dimensional data points into a union of low-dimensional subspaces and a set of outliers is considered. The number of subspaces, their dimensions, and their orientations are assumed unknown. We propose a simple low-complexity subspace clustering algorithm, which applies spectral clustering to an adjacency matrix obtained by thresholding… ▽ More The problem of clustering noisy and incompletely observed high-dimensional data points into a union of low-dimensional subspaces and a set of outliers is considered. The number of subspaces, their dimensions, and their orientations are assumed unknown. We propose a simple low-complexity subspace clustering algorithm, which applies spectral clustering to an adjacency matrix obtained by thresholding the correlations between data points. In other words, the adjacency matrix is constructed from the nearest neighbors of each data point in spherical distance. A statistical performance analysis shows that the algorithm exhibits robustness to additive noise and succeeds even when the subspaces intersect. Specifically, our results reveal an explicit tradeoff between the affinity of the subspaces and the tolerable noise level. We furthermore prove that the algorithm succeeds even when the data points are incompletely observed with the number of missing entries allowed to be (up to a log-factor) linear in the ambient dimension. We also propose a simple scheme that provably detects outliers, and we present numerical results on real and synthetic data. △ Less

Submitted 21 August, 2015; v1 submitted 18 July, 2013; originally announced July 2013.

Comments: final version, to appear in the IEEE Transactions on Information Theory

arXiv:1307.4790 [pdf, other]

Time-Frequency Foundations of Communications

Authors: Gerald Matz, Helmut Bölcskei, Franz Hlawatsch

Abstract: In the tradition of Gabor's 1946 landmark paper [1], we advocate a time-frequency (TF) approach to communications. TF methods for communications have been proposed very early (see the box History). While several tutorial papers and book chapters on the topic are available (see, e.g., [2]-[4] and references therein), the goal of this paper is to present the fundamental aspects in a coherent and eas… ▽ More In the tradition of Gabor's 1946 landmark paper [1], we advocate a time-frequency (TF) approach to communications. TF methods for communications have been proposed very early (see the box History). While several tutorial papers and book chapters on the topic are available (see, e.g., [2]-[4] and references therein), the goal of this paper is to present the fundamental aspects in a coherent and easily accessible manner. Specifically, we establish the role of TF methods in communications across a range of subject areas including TF dispersive channels, orthogonal frequency division multiplexing (OFDM), information-theoretic limits, and system identification and channel estimation. Furthermore, we present fundamental results that are stated in the literature for the continuous-time case in simple linear algebra terms. △ Less

Submitted 17 July, 2013; originally announced July 2013.

Comments: 9 pages, 3 figures; to appear in the IEEE Signal Processing Magazine Special Issue on Time-Frequency Analysis and Applications

arXiv:1305.3486 [pdf, ps, other]

Noisy Subspace Clustering via Thresholding

Authors: Reinhard Heckel, Helmut Bölcskei

Abstract: We consider the problem of clustering noisy high-dimensional data points into a union of low-dimensional subspaces and a set of outliers. The number of subspaces, their dimensions, and their orientations are unknown. A probabilistic performance analysis of the thresholding-based subspace clustering (TSC) algorithm introduced recently in [1] shows that TSC succeeds in the noisy case, even when the… ▽ More We consider the problem of clustering noisy high-dimensional data points into a union of low-dimensional subspaces and a set of outliers. The number of subspaces, their dimensions, and their orientations are unknown. A probabilistic performance analysis of the thresholding-based subspace clustering (TSC) algorithm introduced recently in [1] shows that TSC succeeds in the noisy case, even when the subspaces intersect. Our results reveal an explicit tradeoff between the allowed noise level and the affinity of the subspaces. We furthermore find that the simple outlier detection scheme introduced in [1] provably succeeds in the noisy case. △ Less

Submitted 18 July, 2013; v1 submitted 15 May, 2013; originally announced May 2013.

Comments: Presented at the IEEE Int. Symp. Inf. Theory (ISIT) 2013, Istanbul, Turkey. The version posted here corrects a minor error in the published version. Specifically, the exponent -c n_l in the success probability of Theorem 1 and in the corresponding proof outline has been corrected to -c(n_l-1)

arXiv:1305.3422 [pdf, ps, other]

doi 10.1109/ISIT.2013.6620197

Almost Lossless Analog Signal Separation

Authors: David Stotz, Erwin Riegler, Helmut Bölcskei

Abstract: We propose an information-theoretic framework for analog signal separation. Specifically, we consider the problem of recovering two analog signals from a noiseless sum of linear measurements of the signals. Our framework is inspired by the groundbreaking work of Wu and Verdú (2010) on almost lossless analog compression. The main results of the present paper are a general achievability bound for th… ▽ More We propose an information-theoretic framework for analog signal separation. Specifically, we consider the problem of recovering two analog signals from a noiseless sum of linear measurements of the signals. Our framework is inspired by the groundbreaking work of Wu and Verdú (2010) on almost lossless analog compression. The main results of the present paper are a general achievability bound for the compression rate in the analog signal separation problem, an exact expression for the optimal compression rate in the case of signals that have mixed discrete-continuous distributions, and a new technique for showing that the intersection of generic subspaces with subsets of sufficiently small Minkowski dimension is empty. This technique can also be applied to obtain a simplified proof of a key result in Wu and Verdú (2010). △ Less

Submitted 15 May, 2013; originally announced May 2013.

Comments: To be presented at IEEE Int. Symp. Inf. Theory 2013, Istanbul, Turkey

arXiv:1303.3716 [pdf, ps, other]

Subspace Clustering via Thresholding and Spectral Clustering

Authors: Reinhard Heckel, Helmut Bölcskei

Abstract: We consider the problem of clustering a set of high-dimensional data points into sets of low-dimensional linear subspaces. The number of subspaces, their dimensions, and their orientations are unknown. We propose a simple and low-complexity clustering algorithm based on thresholding the correlations between the data points followed by spectral clustering. A probabilistic performance analysis shows… ▽ More We consider the problem of clustering a set of high-dimensional data points into sets of low-dimensional linear subspaces. The number of subspaces, their dimensions, and their orientations are unknown. We propose a simple and low-complexity clustering algorithm based on thresholding the correlations between the data points followed by spectral clustering. A probabilistic performance analysis shows that this algorithm succeeds even when the subspaces intersect, and when the dimensions of the subspaces scale (up to a log-factor) linearly in the ambient dimension. Moreover, we prove that the algorithm also succeeds for data points that are subject to erasures with the number of erasures scaling (up to a log-factor) linearly in the ambient dimension. Finally, we propose a simple scheme that provably detects outliers. △ Less

Submitted 15 March, 2013; originally announced March 2013.

Comments: ICASSP 2013

arXiv:1210.2272 [pdf, ps, other]

Joint Sparsity with Different Measurement Matrices

Authors: Reinhard Heckel, Helmut Bölcskei

Abstract: We consider a generalization of the multiple measurement vector (MMV) problem, where the measurement matrices are allowed to differ across measurements. This problem arises naturally when multiple measurements are taken over time, e.g., and the measurement modality (matrix) is time-varying. We derive probabilistic recovery guarantees showing that---under certain (mild) conditions on the measuremen… ▽ More We consider a generalization of the multiple measurement vector (MMV) problem, where the measurement matrices are allowed to differ across measurements. This problem arises naturally when multiple measurements are taken over time, e.g., and the measurement modality (matrix) is time-varying. We derive probabilistic recovery guarantees showing that---under certain (mild) conditions on the measurement matrices---l2/l1-norm minimization and a variant of orthogonal matching pursuit fail with a probability that decays exponentially in the number of measurements. This allows us to conclude that, perhaps surprisingly, recovery performance does not suffer from the individual measurements being taken through different measurement matrices. What is more, recovery performance typically benefits (significantly) from diversity in the measurement matrices; we specify conditions under which such improvements are obtained. These results continue to hold when the measurements are subject to (bounded) noise. △ Less

Submitted 8 October, 2012; originally announced October 2012.

Comments: Allerton 2012

arXiv:1210.2259 [pdf, ps, other]

doi 10.1109/TIT.2016.2536666

Degrees of freedom in vector interference channels

Authors: David Stotz, Helmut Bölcskei

Abstract: This paper continues the Wu-Shamai-Verdu program [3] on characterizing the degrees of freedom (DoF) of interference channels (ICs) through Renyi information dimension. Specifically, we find a single-letter formula for the DoF of vector ICs, encompassing multiple-input multiple-output (MIMO) ICs, time- and/or frequency-selective ICs, and combinations thereof, as well as scalar ICs as considered in… ▽ More This paper continues the Wu-Shamai-Verdu program [3] on characterizing the degrees of freedom (DoF) of interference channels (ICs) through Renyi information dimension. Specifically, we find a single-letter formula for the DoF of vector ICs, encompassing multiple-input multiple-output (MIMO) ICs, time- and/or frequency-selective ICs, and combinations thereof, as well as scalar ICs as considered in [3]. The DoF-formula we obtain lower-bounds the DoF of all channels--with respect to the choice of the channel matrix--and upper-bounds the DoF of almost all channels. It applies to a large class of noise distributions, and its proof is based on an extension of a result by Guionnet and Shlyakthenko [3] to the vector case in combination with the Ruzsa triangle inequality for differential entropy introduced by Kontoyiannis and Madiman [4]. As in scalar ICs, achieving full DoF requires the use of singular input distributions. Strikingly, in the vector case it suffices to enforce singularity on the joint distribution of each individual transmit vector. This can be realized through signaling in subspaces of the ambient signal space, which is in accordance with the idea of interference alignment, and, most importantly, allows the scalar entries of the transmit vectors to have non-singular distributions. The DoF-formula for vector ICs we obtain enables a unified treatment of "classical" interference alignment a la Cadambe and Jafar [5], and Maddah-Ali et al. [6], and the number-theoretic schemes proposed in [7], [8]. Moreover, it allows to calculate the DoF achieved by new signaling schemes for vector ICs. We furthermore recover the result by Cadambe and Jafar on the non-separability of parallel ICs [9] and we show that almost all parallel ICs are separable in terms of DoF. Finally, our results apply to complex vector ICs, thereby extending the main findings of [2] to the complex case. △ Less

Submitted 17 June, 2016; v1 submitted 8 October, 2012; originally announced October 2012.

Comments: replaces conference version presented at the 50th Annual Allerton Conference on Communication, Control, and Computing (2012)

Journal ref: IEEE Trans. on Inf. Theory, vol. 62, no. 7, pp. 4172-4197, 2016

Showing 1–50 of 88 results for author: Bölcskei, H