Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Discriminative latent subspace learning with adaptive metric learning

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Least squares regression (LSR) has been widely used in the field of pattern recognition. However, LSR-based classifier still suffers from the following issues. One is that it focuses only on the dependency between the input data and the output targets, while overlooking the local structure of instances. Another one is that using binary labels as the regression targets is too strict to fully exploit the discriminative information of the data. To address these issues, we propose a novel multiclass classification method called discriminative latent subspace learning with adaptive metric learning (DLSAML). Specifically, DLSAML adaptively learns a metric matrix for the residuals between inputs and outputs, driving smaller distances between instances of the same class and larger distances between instances of different classes in the output space. To solve the second problem, latent representations are learnt guided by the pairwise label relations as the regression targets, allowing for more flexible use of discriminative information in the data. As a combination of these two techniques, the interactive optimization of the projection matrix and metric matrix allows DLSAML to fully exploit the structural and supervised information of the data to obtain a more discriminative latent subspace for multiclass classification. Extensive experiments on several benchmark datasets have demonstrated the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data and code availability

All data and code included in this study are available upon request by contact with the corresponding author.

Notes

  1. http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html.

  2. http://www2.ece.ohio-state.edu/~aleix/ARdatabase.html.

  3. http://vis-www.cs.umass.edu/lfw/.

  4. https://www.ri.cmu.edu/project/pie-database/.

  5. https://www.cs.columbia.edu/CAVE/software/softlib/coil-100.php.

  6. http://www-cvr.ai.uiuc.edu/ponce_grp/data/.

References

  1. Franklin J (2010) The elements of statistical learning: data mining, inference and prediction. Publ Am Stat Assoc 99(466):567–567

    Google Scholar 

  2. Suykens AKJ, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  Google Scholar 

  3. Dieuleveut A, Flammarion N, Bach RF (2017) Harder, better, faster, stronger convergence rates for least-squares regression. J Mach Learn Res 18(101):1–51

    MathSciNet  Google Scholar 

  4. Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754

    Article  Google Scholar 

  5. Kim TK, Kittler J, Cipolla R (2007) Discriminative learning and recognition of image set classes using canonical correlations. IEEE Trans Pattern Anal Mach Intell 29(6):1005–1018

    Article  Google Scholar 

  6. Wang J, Xie F, Nie F, Li X (2022) Robust supervised and semisupervised least squares regression using \( \ell \)-norm minimization. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3150102

    Article  Google Scholar 

  7. Chen K, Tao W (2019) Learning linear regression via single convolutional layer for visual object tracking. IEEE Trans Multimed 21(1):86–97

    Article  Google Scholar 

  8. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodology) 58(1):267–288

    MathSciNet  Google Scholar 

  9. An S, Liu W, Venkatesh S (2007) Face recognition using kernel ridge regression. In: IEEE conference on computer vision and pattern recognition, pp 1–7

  10. Dogan U, Glasmachers T, Igel C (2016) A unified view on multi-class support vector classification. J Mach Learn Res 17(45):1–32

    MathSciNet  Google Scholar 

  11. Ma J, Zhou S, Li D (2021) Robust multiclass least squares support vector classifier with optimal error distribution. Knowl Based Syst 215(3):106652

    Article  Google Scholar 

  12. Fang X, Teng S, Lai Z, He Z, Xie S, Wong WK (2017) Robust latent subspace learning for image classification. IEEE Trans Neural Netw Learn Syst 29(6):2502–2515

    Article  MathSciNet  Google Scholar 

  13. Cai X, Ding C, Nie F, Huang H (2013) On the equivalent of low-rank linear regressions and linear discriminant analysis based regressions. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1124–1132

  14. Ye J (2007) Least squares linear discriminant analysis. In: International conference on machine learning, vol 227, pp 1087–1093

  15. Zhang C, Li H, Qian Y, Chen C, Gao Y (2021) Pairwise relations oriented discriminative regression. IEEE Trans Circuits Syst Video Technol 31:2646–2660

    Article  Google Scholar 

  16. Wang Q, He X, Jiang X, Li X (2020) Robust bi-stochastic graph regularized matrix factorization for data clustering. IEEE Trans Pattern Anal Mach Intell 44(1):390–403

    Google Scholar 

  17. Zadeh P, Hosseini R, Sra S (2016) Geometric mean metric learning. In: International conference on machine learning, pp 2464–2471

  18. Wang L, Zhang X-Y, Pan C (2015) MSDLSR: margin scalable discriminative least squares regression for multicategory classification. IEEE Trans Neural Netw Learn Syst 27(12):2711–2717

    Article  Google Scholar 

  19. Zhang X-Y, Wang L, Xiang S, Liu C-L (2014) Retargeted least squares regression algorithm. IEEE Trans Neural Netw Learn Syst 26(9):2206–2213

    Article  MathSciNet  Google Scholar 

  20. Wen J, Xu Y, Li Z, Ma Z, Xu Y (2018) Inter-class sparsity based discriminative least square regression. Neural Netw 102:36–47

    Article  Google Scholar 

  21. Wang L, Pan C (2018) Groupwise retargeted least-squares regression. IEEE Trans Neural Netw Learn Syst 29(4):1352–1358

    Article  Google Scholar 

  22. Zhan S, Wu J, Han N, Wen J, Fang X (2020) Group low-rank representation-based discriminant linear regression. IEEE Trans Circuits Syst Video Technol 30(3):760–770

    Article  Google Scholar 

  23. Cherkassky V (1997) The nature of statistical learning theory. IEEE Trans Neural Netw Learn Syst 8(6):1564–1564

    Article  Google Scholar 

  24. Chen C, Ma S, Yang J (2015) A general inertial proximal point algorithm for mixed variational inequality problem. SIAM J Optim 25(4):2120–2142

    Article  MathSciNet  Google Scholar 

  25. Chen C, He B, Ye Y, Yuan X (2016) The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1–2):57–79

    Article  MathSciNet  Google Scholar 

  26. Chiang C-Y, Chu EK-W, Lin W-W (2012) On the \(\star \)-Sylvester equation AX\(\pm \)X\(^{\star }\)B\(^{\star }\)=C. Appl Math Comput 218(17):8393–8407

    MathSciNet  Google Scholar 

  27. Hong M, Luo Z-Q (2017) On the linear convergence of the alternating direction method of multipliers. Math Program 162(1):165–199

    Article  MathSciNet  Google Scholar 

  28. Chen C, He B, Ye Y, Yuan X (2016) The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1–2):57–79

    Article  MathSciNet  Google Scholar 

  29. Faybusovich L (2006) Convex optimization-S. Boyd and L. Vandenberghe. IEEE Trans Autom Control AC 51(11):1859–1859

    Article  Google Scholar 

  30. Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660

    Article  Google Scholar 

  31. Martinez AM, Benavente R (1998) The AR face database. Cvc technical report

  32. Learned-Miller E, Huang GB, RoyChowdhury A, Li H, Hua G (2016) Labeled faces in the wild: a survey. Springer, Cham, pp 189–248

    Google Scholar 

  33. Sim T, Baker S, Bsat M (2004) The CMU pose, illumination, and expression database. IEEE Trans Pattern Anal Mach Intell 25(12):1615–1618

    Google Scholar 

  34. Nene SA, Nayar SK, Murase H (1996) Columbia Object Image Library (COIL-100). Columbia University

  35. Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(2605):2579–2605

    Google Scholar 

  36. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. Comput Vis Pattern Recogn 2:2169–2178

    Google Scholar 

  37. Zheng Z, Zhihui L, Yong X, Ling S, Jian W, Guo-Sen X (2017) Discriminative elastic-net regularized linear regression. IEEE Trans Image Process 26:1466–1481

    Article  MathSciNet  Google Scholar 

  38. Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of China under Grant 62172458.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiajun Ma.

Ethics declarations

Conflict of interest

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1 Proof of the Theorem 1

Appendix 1 Proof of the Theorem 1

Proof

For simplicity, let \(\mathcal {L}\) denotes the optimization problem (11). The KKT conditions for (12) are derived as follows (note that the process of solving \(\textbf{M}\) and normalization constraint of \(\textbf{T}\) does not involve in the Lagrange multipliers, thus we do not proof the KKT condition for them):

$$\begin{aligned} \textbf{U}&=\textbf{W},~~~~~\textbf{V}=\textbf{T}. \end{aligned}$$
(25)
$$\begin{aligned} \frac{\partial {\mathcal {L}}}{\partial {\textbf{W}}}&=\textbf{X}^{\top }\widetilde{\textbf{L}}(\textbf{X} \textbf{W}\!-\textbf{T})\textbf{M}+\textbf{P}=0. \end{aligned}$$
(26)
$$\begin{aligned} \frac{\partial {\mathcal {L}}}{\partial {\textbf{U}}}&=\textbf{X}^{\top } \widetilde{\textbf{D}}(\textbf{X} \textbf{U}\!-\!\textbf{V})\textbf{M}^{-1}\!+\!\lambda _1\textbf{U}\!-\textbf{P}=0 \end{aligned}$$
(27)
$$\begin{aligned} \frac{\partial {\mathcal {L}}}{\partial {\textbf{T}}}&=\widetilde{\textbf{L}}(\textbf{T} \!-\!\textbf{X}\textbf{W})\textbf{M}\!+\!\lambda _2(\textbf{T}\textbf{V}^{\top }\!-\!\textbf{Y} \textbf{Y}^{\top })\textbf{V} \!+\textbf{Z}=0 \end{aligned}$$
(28)
$$\begin{aligned} \frac{\partial {\mathcal {L}}}{\partial {\textbf{V}}}&=\widetilde{\textbf{D}}(\textbf{V} \!-\!\textbf{X}\textbf{U})\textbf{M}^{-1}\!+\lambda _2(\textbf{V}\textbf{T}^{\top }\! -\!\textbf{Y}\textbf{Y}^{\top })\textbf{T} -\textbf{Z}=0 \end{aligned}$$
(29)

First, the Lagrangian multiplier \(\textbf{P}\) and \(\textbf{Z}\) can be obtained from Algorithm 1, as follows

$$\begin{aligned} \textbf{P}^{k}=\textbf{P}+\mu (\textbf{W}-\textbf{U}), \textbf{Z}^{k}=\textbf{Z}+\sigma (\textbf{T}-\textbf{V}). \end{aligned}$$
(30)

If sequence \(\{\textbf{P}^{k}\}_{k=1}^{\infty }\) and \(\{\textbf{Z}^{k}\}_{k=1}^{\infty }\) converge to the stationary points, i.e., \((\textbf{P}^{k}-\textbf{P})\rightarrow 0\) and \((\textbf{Z}^{k}-\textbf{Z})\rightarrow 0\), then \((\textbf{U}-\textbf{W})\rightarrow 0\) and \((\textbf{V}-\textbf{T})\rightarrow 0\). Thus, the first two KKT conditions (25) is proved.

The third KKT condition can also be derived by utilizing the result of \(\textbf{W}\) in Algorithm 1. We first rewrite (14) as follows:

$$\begin{aligned} \!\mu \textbf{W}\!=\!-(\textbf{X}^{\top }\widetilde{\textbf{L}}(\textbf{X} \textbf{W}\!-\textbf{T})\textbf{M}\!+\textbf{P})+\!\mu \textbf{U}\!\! \end{aligned}$$
(31)

Then, we have

$$\begin{aligned} \mu (\textbf{W}^{k}-\textbf{W})=\!-(\textbf{X}^{\top }\widetilde{\textbf{L}}(\textbf{X} \textbf{W}\!-\textbf{T})\textbf{M}\!+\textbf{P})+\mu (\textbf{U}-\textbf{W}) \end{aligned}$$
(32)

Based on the first condition \(\textbf{U}-\textbf{W}=0\), we can infer \(\textbf{X}^{\top }\widetilde{\textbf{L}}(\textbf{X} \textbf{W}\!-\textbf{T})\textbf{M}+\textbf{P}=0\), when \((\textbf{W}^{k}-\textbf{W})\rightarrow 0\).

Likewise, we can get the following equation using \(\textbf{U}\) from Algorithm 1:

$$\begin{aligned} (\lambda _1+\mu )(\textbf{U}^{k}-\textbf{U})&=-(\!\textbf{X}^{\top }\widetilde{\textbf{D}}(\textbf{X} \textbf{U}\!-\!\textbf{V})\textbf{M}^{-1}\nonumber \\ {}&\quad +\lambda _1\textbf{U} -\textbf{P}+\mu (\textbf{U}-\textbf{W})). \end{aligned}$$
(33)

Since \(\textbf{U}-\textbf{W}\) converges to 0, we obtain \(\textbf{X}^{\top }\widetilde{\textbf{D}}(\textbf{X} \textbf{U}\!-\!\textbf{V})\textbf{M}^{-1}\!+\!\lambda _1\textbf{U}\!-\textbf{P}=0\) whenever \((\textbf{U}^{k}-\textbf{U})\rightarrow 0\).

For the fifth condition, from (18), we have the following equation:

$$\begin{aligned} \widetilde{\textbf{L}}(\textbf{T}^{k}-\textbf{T})\textbf{M} =\;&\widetilde{\textbf{L}}(\textbf{X}\textbf{W}-\textbf{T}) \textbf{M}-\lambda _2(\textbf{T}\textbf{V}^{\top }\nonumber \\ {}&\quad -\textbf{Y}\textbf{Y}^{\top })\textbf{V} -\textbf{Z}-\delta (\textbf{T}-\textbf{V}) \end{aligned}$$
(34)

On the left-hand side of Eq. (34), \(\widetilde{\textbf{L}}\) is a bounded constant matrix and \(\textbf{M}\) is a constant SPD matrix defined as in Eq. (22). Since \(\textbf{T}-\textbf{V}\) converges to 0, we have \(\widetilde{\textbf{L}}(\textbf{T} \!-\!\textbf{X}\textbf{W})\textbf{M}\!+\!\lambda _2(\textbf{T} \textbf{V}^{\top }\!-\!\textbf{Y}\textbf{Y}^{\top })\textbf{V} \!+\textbf{Z}\rightarrow 0\) whenever \((\textbf{T}^{k}-\textbf{T})\rightarrow 0\).

For the last condition, from (20), we have the following equation:

$$\begin{aligned} \widetilde{\textbf{D}}(\textbf{V}^{k}-\textbf{V})\textbf{M}^{-1}&=\widetilde{\textbf{D}}(\textbf{X}\textbf{U}-\textbf{V}) \textbf{M}^{-1}\!-\!\lambda _2(\textbf{V}\textbf{T}^{\top }\nonumber \\&\quad -\textbf{Y}\textbf{Y}^{\top })\textbf{T}\!+\!\textbf{Z}\!-\!\delta (\textbf{V}-\textbf{T}) \end{aligned}$$
(35)

On the left-hand side of Eq. (35), \(\widetilde{\textbf{D}}\) is a bounded constant matrix and \(\textbf{M}^{-1}\) is a constant SPD matrix, since \(\textbf{M}\) is an SPD matrix as defined in Eq. (22). If \((\textbf{V}^{k}-\textbf{V})\rightarrow 0\), then \(\widetilde{\textbf{D}}(\textbf{V} \!-\!\textbf{X}\textbf{U})\textbf{M}^{-1}\!+\lambda _2(\textbf{V} \textbf{T}^{\top }\!-\!\textbf{Y}\textbf{Y}^{\top })\textbf{T} -\textbf{Z}\rightarrow 0\) \((\textbf{V}-\textbf{T}=0)\) as well.

Since the solution sequence \(\{\Theta ^k\}_{k=1}^{\infty }\) is assumed to satisfy the condition of \(lim_{k\rightarrow \infty }(\Theta ^{k+1}-\Theta ^{k})=0\), the value of sequence \(\{\Theta ^k\}_{k=1}^{\infty }\) asymptotically satisfies the KKT condition for objective function (11). \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, J., Tang, Y.Y. & Shang, Z. Discriminative latent subspace learning with adaptive metric learning. Neural Comput & Applic 36, 2049–2066 (2024). https://doi.org/10.1007/s00521-023-09159-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09159-8

Keywords