Abstract
Least squares regression (LSR) has been widely used in the field of pattern recognition. However, LSR-based classifier still suffers from the following issues. One is that it focuses only on the dependency between the input data and the output targets, while overlooking the local structure of instances. Another one is that using binary labels as the regression targets is too strict to fully exploit the discriminative information of the data. To address these issues, we propose a novel multiclass classification method called discriminative latent subspace learning with adaptive metric learning (DLSAML). Specifically, DLSAML adaptively learns a metric matrix for the residuals between inputs and outputs, driving smaller distances between instances of the same class and larger distances between instances of different classes in the output space. To solve the second problem, latent representations are learnt guided by the pairwise label relations as the regression targets, allowing for more flexible use of discriminative information in the data. As a combination of these two techniques, the interactive optimization of the projection matrix and metric matrix allows DLSAML to fully exploit the structural and supervised information of the data to obtain a more discriminative latent subspace for multiclass classification. Extensive experiments on several benchmark datasets have demonstrated the effectiveness of the proposed method.
Similar content being viewed by others
Data and code availability
All data and code included in this study are available upon request by contact with the corresponding author.
Notes
References
Franklin J (2010) The elements of statistical learning: data mining, inference and prediction. Publ Am Stat Assoc 99(466):567–567
Suykens AKJ, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Dieuleveut A, Flammarion N, Bach RF (2017) Harder, better, faster, stronger convergence rates for least-squares regression. J Mach Learn Res 18(101):1–51
Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754
Kim TK, Kittler J, Cipolla R (2007) Discriminative learning and recognition of image set classes using canonical correlations. IEEE Trans Pattern Anal Mach Intell 29(6):1005–1018
Wang J, Xie F, Nie F, Li X (2022) Robust supervised and semisupervised least squares regression using \( \ell \)-norm minimization. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3150102
Chen K, Tao W (2019) Learning linear regression via single convolutional layer for visual object tracking. IEEE Trans Multimed 21(1):86–97
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodology) 58(1):267–288
An S, Liu W, Venkatesh S (2007) Face recognition using kernel ridge regression. In: IEEE conference on computer vision and pattern recognition, pp 1–7
Dogan U, Glasmachers T, Igel C (2016) A unified view on multi-class support vector classification. J Mach Learn Res 17(45):1–32
Ma J, Zhou S, Li D (2021) Robust multiclass least squares support vector classifier with optimal error distribution. Knowl Based Syst 215(3):106652
Fang X, Teng S, Lai Z, He Z, Xie S, Wong WK (2017) Robust latent subspace learning for image classification. IEEE Trans Neural Netw Learn Syst 29(6):2502–2515
Cai X, Ding C, Nie F, Huang H (2013) On the equivalent of low-rank linear regressions and linear discriminant analysis based regressions. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1124–1132
Ye J (2007) Least squares linear discriminant analysis. In: International conference on machine learning, vol 227, pp 1087–1093
Zhang C, Li H, Qian Y, Chen C, Gao Y (2021) Pairwise relations oriented discriminative regression. IEEE Trans Circuits Syst Video Technol 31:2646–2660
Wang Q, He X, Jiang X, Li X (2020) Robust bi-stochastic graph regularized matrix factorization for data clustering. IEEE Trans Pattern Anal Mach Intell 44(1):390–403
Zadeh P, Hosseini R, Sra S (2016) Geometric mean metric learning. In: International conference on machine learning, pp 2464–2471
Wang L, Zhang X-Y, Pan C (2015) MSDLSR: margin scalable discriminative least squares regression for multicategory classification. IEEE Trans Neural Netw Learn Syst 27(12):2711–2717
Zhang X-Y, Wang L, Xiang S, Liu C-L (2014) Retargeted least squares regression algorithm. IEEE Trans Neural Netw Learn Syst 26(9):2206–2213
Wen J, Xu Y, Li Z, Ma Z, Xu Y (2018) Inter-class sparsity based discriminative least square regression. Neural Netw 102:36–47
Wang L, Pan C (2018) Groupwise retargeted least-squares regression. IEEE Trans Neural Netw Learn Syst 29(4):1352–1358
Zhan S, Wu J, Han N, Wen J, Fang X (2020) Group low-rank representation-based discriminant linear regression. IEEE Trans Circuits Syst Video Technol 30(3):760–770
Cherkassky V (1997) The nature of statistical learning theory. IEEE Trans Neural Netw Learn Syst 8(6):1564–1564
Chen C, Ma S, Yang J (2015) A general inertial proximal point algorithm for mixed variational inequality problem. SIAM J Optim 25(4):2120–2142
Chen C, He B, Ye Y, Yuan X (2016) The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1–2):57–79
Chiang C-Y, Chu EK-W, Lin W-W (2012) On the \(\star \)-Sylvester equation AX\(\pm \)X\(^{\star }\)B\(^{\star }\)=C. Appl Math Comput 218(17):8393–8407
Hong M, Luo Z-Q (2017) On the linear convergence of the alternating direction method of multipliers. Math Program 162(1):165–199
Chen C, He B, Ye Y, Yuan X (2016) The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1–2):57–79
Faybusovich L (2006) Convex optimization-S. Boyd and L. Vandenberghe. IEEE Trans Autom Control AC 51(11):1859–1859
Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660
Martinez AM, Benavente R (1998) The AR face database. Cvc technical report
Learned-Miller E, Huang GB, RoyChowdhury A, Li H, Hua G (2016) Labeled faces in the wild: a survey. Springer, Cham, pp 189–248
Sim T, Baker S, Bsat M (2004) The CMU pose, illumination, and expression database. IEEE Trans Pattern Anal Mach Intell 25(12):1615–1618
Nene SA, Nayar SK, Murase H (1996) Columbia Object Image Library (COIL-100). Columbia University
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(2605):2579–2605
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. Comput Vis Pattern Recogn 2:2169–2178
Zheng Z, Zhihui L, Yong X, Ling S, Jian W, Guo-Sen X (2017) Discriminative elastic-net regularized linear regression. IEEE Trans Image Process 26:1466–1481
Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
Acknowledgements
This work was supported by the Natural Science Foundation of China under Grant 62172458.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared that they have no conflicts of interest to this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix 1 Proof of the Theorem 1
Appendix 1 Proof of the Theorem 1
Proof
For simplicity, let \(\mathcal {L}\) denotes the optimization problem (11). The KKT conditions for (12) are derived as follows (note that the process of solving \(\textbf{M}\) and normalization constraint of \(\textbf{T}\) does not involve in the Lagrange multipliers, thus we do not proof the KKT condition for them):
First, the Lagrangian multiplier \(\textbf{P}\) and \(\textbf{Z}\) can be obtained from Algorithm 1, as follows
If sequence \(\{\textbf{P}^{k}\}_{k=1}^{\infty }\) and \(\{\textbf{Z}^{k}\}_{k=1}^{\infty }\) converge to the stationary points, i.e., \((\textbf{P}^{k}-\textbf{P})\rightarrow 0\) and \((\textbf{Z}^{k}-\textbf{Z})\rightarrow 0\), then \((\textbf{U}-\textbf{W})\rightarrow 0\) and \((\textbf{V}-\textbf{T})\rightarrow 0\). Thus, the first two KKT conditions (25) is proved.
The third KKT condition can also be derived by utilizing the result of \(\textbf{W}\) in Algorithm 1. We first rewrite (14) as follows:
Then, we have
Based on the first condition \(\textbf{U}-\textbf{W}=0\), we can infer \(\textbf{X}^{\top }\widetilde{\textbf{L}}(\textbf{X} \textbf{W}\!-\textbf{T})\textbf{M}+\textbf{P}=0\), when \((\textbf{W}^{k}-\textbf{W})\rightarrow 0\).
Likewise, we can get the following equation using \(\textbf{U}\) from Algorithm 1:
Since \(\textbf{U}-\textbf{W}\) converges to 0, we obtain \(\textbf{X}^{\top }\widetilde{\textbf{D}}(\textbf{X} \textbf{U}\!-\!\textbf{V})\textbf{M}^{-1}\!+\!\lambda _1\textbf{U}\!-\textbf{P}=0\) whenever \((\textbf{U}^{k}-\textbf{U})\rightarrow 0\).
For the fifth condition, from (18), we have the following equation:
On the left-hand side of Eq. (34), \(\widetilde{\textbf{L}}\) is a bounded constant matrix and \(\textbf{M}\) is a constant SPD matrix defined as in Eq. (22). Since \(\textbf{T}-\textbf{V}\) converges to 0, we have \(\widetilde{\textbf{L}}(\textbf{T} \!-\!\textbf{X}\textbf{W})\textbf{M}\!+\!\lambda _2(\textbf{T} \textbf{V}^{\top }\!-\!\textbf{Y}\textbf{Y}^{\top })\textbf{V} \!+\textbf{Z}\rightarrow 0\) whenever \((\textbf{T}^{k}-\textbf{T})\rightarrow 0\).
For the last condition, from (20), we have the following equation:
On the left-hand side of Eq. (35), \(\widetilde{\textbf{D}}\) is a bounded constant matrix and \(\textbf{M}^{-1}\) is a constant SPD matrix, since \(\textbf{M}\) is an SPD matrix as defined in Eq. (22). If \((\textbf{V}^{k}-\textbf{V})\rightarrow 0\), then \(\widetilde{\textbf{D}}(\textbf{V} \!-\!\textbf{X}\textbf{U})\textbf{M}^{-1}\!+\lambda _2(\textbf{V} \textbf{T}^{\top }\!-\!\textbf{Y}\textbf{Y}^{\top })\textbf{T} -\textbf{Z}\rightarrow 0\) \((\textbf{V}-\textbf{T}=0)\) as well.
Since the solution sequence \(\{\Theta ^k\}_{k=1}^{\infty }\) is assumed to satisfy the condition of \(lim_{k\rightarrow \infty }(\Theta ^{k+1}-\Theta ^{k})=0\), the value of sequence \(\{\Theta ^k\}_{k=1}^{\infty }\) asymptotically satisfies the KKT condition for objective function (11). \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ma, J., Tang, Y.Y. & Shang, Z. Discriminative latent subspace learning with adaptive metric learning. Neural Comput & Applic 36, 2049–2066 (2024). https://doi.org/10.1007/s00521-023-09159-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09159-8