Discriminative latent subspace learning with adaptive metric learning

Ma, Jiajun; Tang, Yuan Yan; Shang, Zhaowei

doi:10.1007/s00521-023-09159-8

Discriminative latent subspace learning with adaptive metric learning

Original Article
Published: 20 November 2023

Volume 36, pages 2049–2066, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

115 Accesses
Explore all metrics

Abstract

Least squares regression (LSR) has been widely used in the field of pattern recognition. However, LSR-based classifier still suffers from the following issues. One is that it focuses only on the dependency between the input data and the output targets, while overlooking the local structure of instances. Another one is that using binary labels as the regression targets is too strict to fully exploit the discriminative information of the data. To address these issues, we propose a novel multiclass classification method called discriminative latent subspace learning with adaptive metric learning (DLSAML). Specifically, DLSAML adaptively learns a metric matrix for the residuals between inputs and outputs, driving smaller distances between instances of the same class and larger distances between instances of different classes in the output space. To solve the second problem, latent representations are learnt guided by the pairwise label relations as the regression targets, allowing for more flexible use of discriminative information in the data. As a combination of these two techniques, the interactive optimization of the projection matrix and metric matrix allows DLSAML to fully exploit the structural and supervised information of the data to obtain a more discriminative latent subspace for multiclass classification. Extensive experiments on several benchmark datasets have demonstrated the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Latent Subspace Representation for Multiclass Classification

Soft Label Guided Unsupervised Discriminative Sparse Subspace Feature Selection

Article 25 January 2024

Discriminant Manifold Learning via Sparse Coding for Image Analysis

Data and code availability

All data and code included in this study are available upon request by contact with the corresponding author.

Notes

References

Franklin J (2010) The elements of statistical learning: data mining, inference and prediction. Publ Am Stat Assoc 99(466):567–567
Google Scholar
Suykens AKJ, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Article Google Scholar
Dieuleveut A, Flammarion N, Bach RF (2017) Harder, better, faster, stronger convergence rates for least-squares regression. J Mach Learn Res 18(101):1–51
MathSciNet Google Scholar
Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754
Article Google Scholar
Kim TK, Kittler J, Cipolla R (2007) Discriminative learning and recognition of image set classes using canonical correlations. IEEE Trans Pattern Anal Mach Intell 29(6):1005–1018
Article Google Scholar
Wang J, Xie F, Nie F, Li X (2022) Robust supervised and semisupervised least squares regression using $ \ell $-norm minimization. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3150102
Article Google Scholar
Chen K, Tao W (2019) Learning linear regression via single convolutional layer for visual object tracking. IEEE Trans Multimed 21(1):86–97
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodology) 58(1):267–288
MathSciNet Google Scholar
An S, Liu W, Venkatesh S (2007) Face recognition using kernel ridge regression. In: IEEE conference on computer vision and pattern recognition, pp 1–7
Dogan U, Glasmachers T, Igel C (2016) A unified view on multi-class support vector classification. J Mach Learn Res 17(45):1–32
MathSciNet Google Scholar
Ma J, Zhou S, Li D (2021) Robust multiclass least squares support vector classifier with optimal error distribution. Knowl Based Syst 215(3):106652
Article Google Scholar
Fang X, Teng S, Lai Z, He Z, Xie S, Wong WK (2017) Robust latent subspace learning for image classification. IEEE Trans Neural Netw Learn Syst 29(6):2502–2515
Article MathSciNet Google Scholar
Cai X, Ding C, Nie F, Huang H (2013) On the equivalent of low-rank linear regressions and linear discriminant analysis based regressions. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1124–1132
Ye J (2007) Least squares linear discriminant analysis. In: International conference on machine learning, vol 227, pp 1087–1093
Zhang C, Li H, Qian Y, Chen C, Gao Y (2021) Pairwise relations oriented discriminative regression. IEEE Trans Circuits Syst Video Technol 31:2646–2660
Article Google Scholar
Wang Q, He X, Jiang X, Li X (2020) Robust bi-stochastic graph regularized matrix factorization for data clustering. IEEE Trans Pattern Anal Mach Intell 44(1):390–403
Google Scholar
Zadeh P, Hosseini R, Sra S (2016) Geometric mean metric learning. In: International conference on machine learning, pp 2464–2471
Wang L, Zhang X-Y, Pan C (2015) MSDLSR: margin scalable discriminative least squares regression for multicategory classification. IEEE Trans Neural Netw Learn Syst 27(12):2711–2717
Article Google Scholar
Zhang X-Y, Wang L, Xiang S, Liu C-L (2014) Retargeted least squares regression algorithm. IEEE Trans Neural Netw Learn Syst 26(9):2206–2213
Article MathSciNet Google Scholar
Wen J, Xu Y, Li Z, Ma Z, Xu Y (2018) Inter-class sparsity based discriminative least square regression. Neural Netw 102:36–47
Article Google Scholar
Wang L, Pan C (2018) Groupwise retargeted least-squares regression. IEEE Trans Neural Netw Learn Syst 29(4):1352–1358
Article Google Scholar
Zhan S, Wu J, Han N, Wen J, Fang X (2020) Group low-rank representation-based discriminant linear regression. IEEE Trans Circuits Syst Video Technol 30(3):760–770
Article Google Scholar
Cherkassky V (1997) The nature of statistical learning theory. IEEE Trans Neural Netw Learn Syst 8(6):1564–1564
Article Google Scholar
Chen C, Ma S, Yang J (2015) A general inertial proximal point algorithm for mixed variational inequality problem. SIAM J Optim 25(4):2120–2142
Article MathSciNet Google Scholar
Chen C, He B, Ye Y, Yuan X (2016) The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1–2):57–79
Article MathSciNet Google Scholar
Chiang C-Y, Chu EK-W, Lin W-W (2012) On the $\star $-Sylvester equation AX$\pm $X$^{\star }$B$^{\star }$=C. Appl Math Comput 218(17):8393–8407
MathSciNet Google Scholar
Hong M, Luo Z-Q (2017) On the linear convergence of the alternating direction method of multipliers. Math Program 162(1):165–199
Article MathSciNet Google Scholar
Chen C, He B, Ye Y, Yuan X (2016) The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1–2):57–79
Article MathSciNet Google Scholar
Faybusovich L (2006) Convex optimization-S. Boyd and L. Vandenberghe. IEEE Trans Autom Control AC 51(11):1859–1859
Article Google Scholar
Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660
Article Google Scholar
Martinez AM, Benavente R (1998) The AR face database. Cvc technical report
Learned-Miller E, Huang GB, RoyChowdhury A, Li H, Hua G (2016) Labeled faces in the wild: a survey. Springer, Cham, pp 189–248
Google Scholar
Sim T, Baker S, Bsat M (2004) The CMU pose, illumination, and expression database. IEEE Trans Pattern Anal Mach Intell 25(12):1615–1618
Google Scholar
Nene SA, Nayar SK, Murase H (1996) Columbia Object Image Library (COIL-100). Columbia University
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(2605):2579–2605
Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. Comput Vis Pattern Recogn 2:2169–2178
Google Scholar
Zheng Z, Zhihui L, Yong X, Ling S, Jian W, Guo-Sen X (2017) Discriminative elastic-net regularized linear regression. IEEE Trans Image Process 26:1466–1481
Article MathSciNet Google Scholar
Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of China under Grant 62172458.

Author information

Authors and Affiliations

Zhuhai UM, Science and Technology Research Institute, 1889 Huandao East Road, Zhuhai, 519031, China
Jiajun Ma & Yuan Yan Tang
Chongqing University, School of Computer Science, Chongqing, 400044, China
Jiajun Ma & Zhaowei Shang

Authors

Jiajun Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Yan Tang
View author publications
You can also search for this author in PubMed Google Scholar
Zhaowei Shang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiajun Ma.

Ethics declarations

Conflict of interest

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1 Proof of the Theorem 1

Proof

For simplicity, let $\mathcal {L}$ denotes the optimization problem (11). The KKT conditions for (12) are derived as follows (note that the process of solving $\textbf{M}$ and normalization constraint of $\textbf{T}$ does not involve in the Lagrange multipliers, thus we do not proof the KKT condition for them):

$$\begin{aligned} \textbf{U}&=\textbf{W},~~~~~\textbf{V}=\textbf{T}. \end{aligned}$$

(25)

$$\begin{aligned} \frac{\partial {\mathcal {L}}}{\partial {\textbf{W}}}&=\textbf{X}^{\top }\widetilde{\textbf{L}}(\textbf{X} \textbf{W}\!-\textbf{T})\textbf{M}+\textbf{P}=0. \end{aligned}$$

(26)

$$\begin{aligned} \frac{\partial {\mathcal {L}}}{\partial {\textbf{U}}}&=\textbf{X}^{\top } \widetilde{\textbf{D}}(\textbf{X} \textbf{U}\!-\!\textbf{V})\textbf{M}^{-1}\!+\!\lambda _1\textbf{U}\!-\textbf{P}=0 \end{aligned}$$

(27)

$$\begin{aligned} \frac{\partial {\mathcal {L}}}{\partial {\textbf{T}}}&=\widetilde{\textbf{L}}(\textbf{T} \!-\!\textbf{X}\textbf{W})\textbf{M}\!+\!\lambda _2(\textbf{T}\textbf{V}^{\top }\!-\!\textbf{Y} \textbf{Y}^{\top })\textbf{V} \!+\textbf{Z}=0 \end{aligned}$$

(28)

$$\begin{aligned} \frac{\partial {\mathcal {L}}}{\partial {\textbf{V}}}&=\widetilde{\textbf{D}}(\textbf{V} \!-\!\textbf{X}\textbf{U})\textbf{M}^{-1}\!+\lambda _2(\textbf{V}\textbf{T}^{\top }\! -\!\textbf{Y}\textbf{Y}^{\top })\textbf{T} -\textbf{Z}=0 \end{aligned}$$

(29)

First, the Lagrangian multiplier $\textbf{P}$ and $\textbf{Z}$ can be obtained from Algorithm 1, as follows

$$\begin{aligned} \textbf{P}^{k}=\textbf{P}+\mu (\textbf{W}-\textbf{U}), \textbf{Z}^{k}=\textbf{Z}+\sigma (\textbf{T}-\textbf{V}). \end{aligned}$$

(30)

If sequence $\{\textbf{P}^{k}\}_{k=1}^{\infty }$ and $\{\textbf{Z}^{k}\}_{k=1}^{\infty }$ converge to the stationary points, i.e., $(\textbf{P}^{k}-\textbf{P})\rightarrow 0$ and $(\textbf{Z}^{k}-\textbf{Z})\rightarrow 0$, then $(\textbf{U}-\textbf{W})\rightarrow 0$ and $(\textbf{V}-\textbf{T})\rightarrow 0$. Thus, the first two KKT conditions (25) is proved.

The third KKT condition can also be derived by utilizing the result of $\textbf{W}$ in Algorithm 1. We first rewrite (14) as follows:

$$\begin{aligned} \!\mu \textbf{W}\!=\!-(\textbf{X}^{\top }\widetilde{\textbf{L}}(\textbf{X} \textbf{W}\!-\textbf{T})\textbf{M}\!+\textbf{P})+\!\mu \textbf{U}\!\! \end{aligned}$$

(31)

Then, we have

$$\begin{aligned} \mu (\textbf{W}^{k}-\textbf{W})=\!-(\textbf{X}^{\top }\widetilde{\textbf{L}}(\textbf{X} \textbf{W}\!-\textbf{T})\textbf{M}\!+\textbf{P})+\mu (\textbf{U}-\textbf{W}) \end{aligned}$$

(32)

Based on the first condition $\textbf{U}-\textbf{W}=0$, we can infer $\textbf{X}^{\top }\widetilde{\textbf{L}}(\textbf{X} \textbf{W}\!-\textbf{T})\textbf{M}+\textbf{P}=0$, when $(\textbf{W}^{k}-\textbf{W})\rightarrow 0$.

Likewise, we can get the following equation using $\textbf{U}$ from Algorithm 1:

$$\begin{aligned} (\lambda _1+\mu )(\textbf{U}^{k}-\textbf{U})&=-(\!\textbf{X}^{\top }\widetilde{\textbf{D}}(\textbf{X} \textbf{U}\!-\!\textbf{V})\textbf{M}^{-1}\nonumber \\ {}&\quad +\lambda _1\textbf{U} -\textbf{P}+\mu (\textbf{U}-\textbf{W})). \end{aligned}$$

(33)

Since $\textbf{U}-\textbf{W}$ converges to 0, we obtain $\textbf{X}^{\top }\widetilde{\textbf{D}}(\textbf{X} \textbf{U}\!-\!\textbf{V})\textbf{M}^{-1}\!+\!\lambda _1\textbf{U}\!-\textbf{P}=0$ whenever $(\textbf{U}^{k}-\textbf{U})\rightarrow 0$.

For the fifth condition, from (18), we have the following equation:

$$\begin{aligned} \widetilde{\textbf{L}}(\textbf{T}^{k}-\textbf{T})\textbf{M} =\;&\widetilde{\textbf{L}}(\textbf{X}\textbf{W}-\textbf{T}) \textbf{M}-\lambda _2(\textbf{T}\textbf{V}^{\top }\nonumber \\ {}&\quad -\textbf{Y}\textbf{Y}^{\top })\textbf{V} -\textbf{Z}-\delta (\textbf{T}-\textbf{V}) \end{aligned}$$

(34)

On the left-hand side of Eq. (34), $\widetilde{\textbf{L}}$ is a bounded constant matrix and $\textbf{M}$ is a constant SPD matrix defined as in Eq. (22). Since $\textbf{T}-\textbf{V}$ converges to 0, we have $\widetilde{\textbf{L}}(\textbf{T} \!-\!\textbf{X}\textbf{W})\textbf{M}\!+\!\lambda _2(\textbf{T} \textbf{V}^{\top }\!-\!\textbf{Y}\textbf{Y}^{\top })\textbf{V} \!+\textbf{Z}\rightarrow 0$ whenever $(\textbf{T}^{k}-\textbf{T})\rightarrow 0$.

For the last condition, from (20), we have the following equation:

$$\begin{aligned} \widetilde{\textbf{D}}(\textbf{V}^{k}-\textbf{V})\textbf{M}^{-1}&=\widetilde{\textbf{D}}(\textbf{X}\textbf{U}-\textbf{V}) \textbf{M}^{-1}\!-\!\lambda _2(\textbf{V}\textbf{T}^{\top }\nonumber \\&\quad -\textbf{Y}\textbf{Y}^{\top })\textbf{T}\!+\!\textbf{Z}\!-\!\delta (\textbf{V}-\textbf{T}) \end{aligned}$$

(35)

On the left-hand side of Eq. (35), $\widetilde{\textbf{D}}$ is a bounded constant matrix and $\textbf{M}^{-1}$ is a constant SPD matrix, since $\textbf{M}$ is an SPD matrix as defined in Eq. (22). If $(\textbf{V}^{k}-\textbf{V})\rightarrow 0$, then $\widetilde{\textbf{D}}(\textbf{V} \!-\!\textbf{X}\textbf{U})\textbf{M}^{-1}\!+\lambda _2(\textbf{V} \textbf{T}^{\top }\!-\!\textbf{Y}\textbf{Y}^{\top })\textbf{T} -\textbf{Z}\rightarrow 0$ $(\textbf{V}-\textbf{T}=0)$ as well.

Since the solution sequence $\{\Theta ^k\}_{k=1}^{\infty }$ is assumed to satisfy the condition of $lim_{k\rightarrow \infty }(\Theta ^{k+1}-\Theta ^{k})=0$, the value of sequence $\{\Theta ^k\}_{k=1}^{\infty }$ asymptotically satisfies the KKT condition for objective function (11). $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, J., Tang, Y.Y. & Shang, Z. Discriminative latent subspace learning with adaptive metric learning. Neural Comput & Applic 36, 2049–2066 (2024). https://doi.org/10.1007/s00521-023-09159-8

Download citation

Received: 13 October 2022
Accepted: 20 October 2023
Published: 20 November 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00521-023-09159-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discriminative latent subspace learning with adaptive metric learning

Abstract

Access this article

Similar content being viewed by others

Latent Subspace Representation for Multiclass Classification

Soft Label Guided Unsupervised Discriminative Sparse Subspace Feature Selection

Discriminant Manifold Learning via Sparse Coding for Image Analysis

Data and code availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix 1 Proof of the Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discriminative latent subspace learning with adaptive metric learning

Abstract

Access this article

Similar content being viewed by others

Latent Subspace Representation for Multiclass Classification

Soft Label Guided Unsupervised Discriminative Sparse Subspace Feature Selection

Discriminant Manifold Learning via Sparse Coding for Image Analysis

Data and code availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix 1 Proof of the Theorem 1

Appendix 1 Proof of the Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation