Fusing bilinear multi-channel gated vector for fine-grained classification

Zhu, Qiangxi; Kuang, Wenlan; Li, Zhixin

doi:10.1007/s00138-023-01378-2

Fusing bilinear multi-channel gated vector for fine-grained classification

Original Paper
Published: 07 February 2023

Volume 34, article number 26, (2023)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

320 Accesses
1 Citation
2 Altmetric
Explore all metrics

Abstract

Fine-grained visual classification aims to identify images belonging to multiple subcategories within the same category. Most existing methods use a single network to extract image features or learn fine-grained features by localizing and scaling key regions. Due to the limited number of components, this may miss valuable clues or cause performance degradation. This paper proposes an efficient approach to address this problem. First, we propose to learn as many global features as possible in images via a dual-baseline network. Second, considering the importance of the attention mechanism for image classification, we exploit the gated interaction of channels between global feature maps to generate attention to discover key discriminant regions of images. In the same way, the interactive channel attention and position attention of the global feature map are used to focus on the key discriminant regions of the image. In the above attention, interactive gated attention is generated by the gating vector mapped by the multi-layer perceptron MLP. Similarly, for channel attention and position attention, we perform attention based on global feature semantic information enhancement. The proposed model performs well on three datasets: CUB-200-2011, Stanford Cars, and FGVC aircraft.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Aggregate attention module for fine-grained image classification

Article 20 November 2021

Multilayer feature fusion with parallel convolutional block for fine-grained image classification

Article 24 June 2021

A multichannel location-aware interaction network for visual classification

Article 05 July 2023

References

Li, Z., Lin, L., Zhang, C., Ma, H., Zhao, W., Shi, Z.: A semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Trans. Multimedia Comput. Commun. Appl. 17(1), 1–23 (2021)
Article Google Scholar
Zhou, T., Li, Z., Zhang, C., Ma, H.: Classify multi-label images via improved cnn model with adversarial network. Multimedia Tools Appl. 79(9), 6871–6890 (2020)
Article Google Scholar
Wei, X.-S., Xie, C.-W., Wu, J.: Mask-cnn: Localizing parts and selecting descriptors for fine-grained image recognition. arXiv preprint arXiv:1605.06878 (2016)
Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based r-cnns for fine-grained category detection. In: Proceedings of the European Conference on Computer Vision, pp. 834–849 (2014)
Branson, S., Van Horn, G., Belongie, S., Perona, P.: Bird species categorization using pose normalized deep convolutional nets. arXiv preprint arXiv:1406.2952 (2014)
Sermanet, P., Frome, A., Real, E.: Attention for fine-grained categorization. arXiv preprint arXiv:1412.7054 (2014)
Chen, S., Li, Z., Tang, Z.: Relation r-cnn: a graph based relation-aware network for object detection. IEEE Signal Process. Lett. 27, 1680–1684 (2020)
Article Google Scholar
Lin, T.-Y., RoyChowdhury, A., Maji, S.: Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)
Ge, Z., McCool, C., Sanderson, C., Corke, P.: Subset feature learning for fine-grained category classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 46–52 (2015)
Zhang, F., Li, M., Zhai, G., Liu, Y.: Multi-branch and multi-scale attention learning for fine-grained visual categorization. In: Proceedings of the International Conference on Multimedia Modeling, pp. 136–147 (2021)
Liu, C., Huang, L., Wei, Z., Zhang, W.: Subtler mixed attention network on fine-grained image classification. Appl. Intell. 51(11), 7903–7916 (2021)
Article Google Scholar
Gao, Y., Han, X., Wang, X., Huang, W., Scott, M.: Channel interaction networks for fine-grained image categorization. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10818–10825 (2020)
Chen, Y., Bai, Y., Zhang, W., Mei, T.: Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5157–5166 (2019)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Fu, J., Zheng, H., Mei, T.: Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4438–4446 (2017)
Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., Wang, L.: Learning to navigate for fine-grained classification. In: Proceedings of the European Conference on Computer Vision, pp. 420–435 (2018)
Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., Woo, W.-C.: Convolutional lstm network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst., pp. 802–810 (2015)
Shroff, P., Chen, T., Wei, Y., Wang, Z.: Focus longer to see better: Recursively refined attention for fine-grained image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop, pp. 868–869 (2020)
Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5209–5217 (2017)
Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 317–326 (2016)
Yu, C., Zhao, X., Zheng, Q., Zhang, P., You, X.: Hierarchical bilinear pooling for fine-grained visual recognition. In: Proceedings of the European Conference on Computer Vision, pp. 574–589 (2018)
Zhu, Q., Kuang, W., Li, Z.: Dual attention interactive fine-grained classification network based on data augmentation. J. Vis. Commun. Image Represent. 88, 103632 (2022)
Article Google Scholar
Li, H., Zhang, X., Tian, Q., Xiong, H.: Attribute mix: Semantic data augmentation for fine grained recognition. In: Proceedings of the 2020 IEEE International Conference on Visual Communications and Image Processing, pp. 243–246 (2020)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
Sun, M., Yuan, Y., Zhou, F., Ding, E.: Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European Conference on Computer Vision, pp. 805–821 (2018)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-ucsd Birds-200-2011 Dataset. Technical report, California Institute of Technology (2011)
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)
Liu, M., Yu, C., Ling, H., Lei, J.: Hierarchical joint cnn-based models for fine-grained cars recognition. In: Proceedings of the International Conference on Cloud Computing and Security, pp. 337–347 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Ye, Z., Hu, F., Liu, Y., Xia, Z., Lyu, F., Liu, P.: Associating multi-scale receptive fields for fine-grained recognition. In: Proceedings of the 2020 IEEE International Conference on Image Processing, pp. 1851–1855 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Lin, D., Shen, X., Lu, C., Jia, J.: Deep lac: Deep localization, alignment and classification for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1666–1674 (2015)
Simon, M., Rodner, E.: Neural activation constellations: Unsupervised part model discovery with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1143–1151 (2015)
Liu, X., Xia, T., Wang, J., Yang, Y., Zhou, F., Lin, Y.: Fully convolutional attention networks for fine-grained recognition. arXiv preprint arXiv:1603.06765 (2016)
Li, Z., Yang, Y., Liu, X., Zhou, F., Wen, S., Xu, W.: Dynamic computational time for visual attention. In: Proceedings of the IEEE International Conference on Computer Vision Workshop, pp. 1199–1209 (2017)
Ou, X., Cui, K., Tang, H., Fu, X., et al.: Impacts of decomposition of vallisneria natans on nutrient speciation concentration in two kinds of water environments. Res. Environ. Sci. 30(10), 1553–1560 (2017)
Google Scholar
Ju, M., Ryu, H., Moon, S., Yoo, C.D.: Gapnet: Generic-attribute-pose network for fine-grained visual categorization using multi-attribute attention module. In: Proceedings of the 2020 IEEE International Conference on Image Processing, pp. 703–707 (2020)
Zhang, C., Yao, Y., Zhang, J., Chen, J., Huang, P., Zhang, J., Tang, Z.: Web-supervised network for fine-grained visual classification. In: Proceedings of the 2020 IEEE International Conference on Multimedia and Expo, pp. 1–6 (2020)
Luo, W., Yang, X., Mo, X., Lu, Y., Davis, L.S., Li, J., Yang, J., Lim, S.-N.: Cross-x learning for fine-grained visual categorization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8242–8251 (2019)
Huang, S., Wang, X., Tao, D.: Snapmix: Semantically proportional mixing for augmenting fine-grained data. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1628–1636 (2021)
Gwilliam, M., Teuscher, A., Anderson, C., Farrell, R.: Fair comparison: quantifying variance in results for fine-grained visual categorization. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3309–3318 (2021)
Li, X., Yang, C., Chen, S.-L., Zhu, C., Yin, X.-C.: Semantic bilinear pooling for fine-grained recognition. In: Proceedings of the 25th International Conference on Pattern Recognition, pp. 3660–3666 (2021)
Zhang, X., Xiong, H., Zhou, W., Lin, W., Tian, Q.: Picking deep filter responses for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1134–1142 (2016)
Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. IEEE Trans. Multimedia 19(6), 1245–1256 (2017)
Article Google Scholar
Li, X., Monga, V.: Group based deep shared feature learning for fine-grained image classification. arXiv preprint arXiv:2004.01817 (2020)
Hu, Y., Yang, Y., Zhang, J., Cao, X., Zhen, X.: Attentional kernel encoding networks for fine-grained visual categorization. IEEE Trans. Circuits Syst. Video Technol. 31(1), 301–314 (2020)
Article Google Scholar
Wang, Y., Morariu, V.I., Davis, L.S.: Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4148–4157 (2018)
Kong, S., Fowlkes, C.: Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 365–374 (2017)
Zhang, T., Chang, D., Ma, Z., Guo, J.: Progressive co-attention network for fine-grained visual classification. In: Proceedings of the 2021 International Conference on Visual Communications and Image Processing, pp. 1–5 (2021)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Gauthier, J.: Conditional generative adversarial nets for convolutional face generation. Technical report, Stanford University (2014)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 62276073, 61966004), the Guangxi Natural Science Foundation (No. 2019GXNSFDA245018), the Guangxi “Bagui Scholar” Teams for Innovation and Research Project, and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, 541000, China
Qiangxi Zhu, Wenlan Kuang & Zhixin Li

Authors

Qiangxi Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Wenlan Kuang
View author publications
You can also search for this author in PubMed Google Scholar
Zhixin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhixin Li.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work. There is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhu, Q., Kuang, W. & Li, Z. Fusing bilinear multi-channel gated vector for fine-grained classification. Machine Vision and Applications 34, 26 (2023). https://doi.org/10.1007/s00138-023-01378-2

Download citation

Received: 06 March 2022
Revised: 18 January 2023
Accepted: 19 January 2023
Published: 07 February 2023
DOI: https://doi.org/10.1007/s00138-023-01378-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fusing bilinear multi-channel gated vector for fine-grained classification

Abstract

Access this article

Similar content being viewed by others

Aggregate attention module for fine-grained image classification

Multilayer feature fusion with parallel convolutional block for fine-grained image classification

A multichannel location-aware interaction network for visual classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fusing bilinear multi-channel gated vector for fine-grained classification

Abstract

Access this article

Similar content being viewed by others

Aggregate attention module for fine-grained image classification

Multilayer feature fusion with parallel convolutional block for fine-grained image classification

A multichannel location-aware interaction network for visual classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation