Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Fusing bilinear multi-channel gated vector for fine-grained classification

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Fine-grained visual classification aims to identify images belonging to multiple subcategories within the same category. Most existing methods use a single network to extract image features or learn fine-grained features by localizing and scaling key regions. Due to the limited number of components, this may miss valuable clues or cause performance degradation. This paper proposes an efficient approach to address this problem. First, we propose to learn as many global features as possible in images via a dual-baseline network. Second, considering the importance of the attention mechanism for image classification, we exploit the gated interaction of channels between global feature maps to generate attention to discover key discriminant regions of images. In the same way, the interactive channel attention and position attention of the global feature map are used to focus on the key discriminant regions of the image. In the above attention, interactive gated attention is generated by the gating vector mapped by the multi-layer perceptron MLP. Similarly, for channel attention and position attention, we perform attention based on global feature semantic information enhancement. The proposed model performs well on three datasets: CUB-200-2011, Stanford Cars, and FGVC aircraft.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Li, Z., Lin, L., Zhang, C., Ma, H., Zhao, W., Shi, Z.: A semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Trans. Multimedia Comput. Commun. Appl. 17(1), 1–23 (2021)

    Article  Google Scholar 

  2. Zhou, T., Li, Z., Zhang, C., Ma, H.: Classify multi-label images via improved cnn model with adversarial network. Multimedia Tools Appl. 79(9), 6871–6890 (2020)

    Article  Google Scholar 

  3. Wei, X.-S., Xie, C.-W., Wu, J.: Mask-cnn: Localizing parts and selecting descriptors for fine-grained image recognition. arXiv preprint arXiv:1605.06878 (2016)

  4. Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based r-cnns for fine-grained category detection. In: Proceedings of the European Conference on Computer Vision, pp. 834–849 (2014)

  5. Branson, S., Van Horn, G., Belongie, S., Perona, P.: Bird species categorization using pose normalized deep convolutional nets. arXiv preprint arXiv:1406.2952 (2014)

  6. Sermanet, P., Frome, A., Real, E.: Attention for fine-grained categorization. arXiv preprint arXiv:1412.7054 (2014)

  7. Chen, S., Li, Z., Tang, Z.: Relation r-cnn: a graph based relation-aware network for object detection. IEEE Signal Process. Lett. 27, 1680–1684 (2020)

    Article  Google Scholar 

  8. Lin, T.-Y., RoyChowdhury, A., Maji, S.: Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)

  9. Ge, Z., McCool, C., Sanderson, C., Corke, P.: Subset feature learning for fine-grained category classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 46–52 (2015)

  10. Zhang, F., Li, M., Zhai, G., Liu, Y.: Multi-branch and multi-scale attention learning for fine-grained visual categorization. In: Proceedings of the International Conference on Multimedia Modeling, pp. 136–147 (2021)

  11. Liu, C., Huang, L., Wei, Z., Zhang, W.: Subtler mixed attention network on fine-grained image classification. Appl. Intell. 51(11), 7903–7916 (2021)

    Article  Google Scholar 

  12. Gao, Y., Han, X., Wang, X., Huang, W., Scott, M.: Channel interaction networks for fine-grained image categorization. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10818–10825 (2020)

  13. Chen, Y., Bai, Y., Zhang, W., Mei, T.: Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5157–5166 (2019)

  14. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  15. Fu, J., Zheng, H., Mei, T.: Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4438–4446 (2017)

  16. Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., Wang, L.: Learning to navigate for fine-grained classification. In: Proceedings of the European Conference on Computer Vision, pp. 420–435 (2018)

  17. Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., Woo, W.-C.: Convolutional lstm network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst., pp. 802–810 (2015)

  18. Shroff, P., Chen, T., Wei, Y., Wang, Z.: Focus longer to see better: Recursively refined attention for fine-grained image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop, pp. 868–869 (2020)

  19. Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5209–5217 (2017)

  20. Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 317–326 (2016)

  21. Yu, C., Zhao, X., Zheng, Q., Zhang, P., You, X.: Hierarchical bilinear pooling for fine-grained visual recognition. In: Proceedings of the European Conference on Computer Vision, pp. 574–589 (2018)

  22. Zhu, Q., Kuang, W., Li, Z.: Dual attention interactive fine-grained classification network based on data augmentation. J. Vis. Commun. Image Represent. 88, 103632 (2022)

    Article  Google Scholar 

  23. Li, H., Zhang, X., Tian, Q., Xiong, H.: Attribute mix: Semantic data augmentation for fine grained recognition. In: Proceedings of the 2020 IEEE International Conference on Visual Communications and Image Processing, pp. 243–246 (2020)

  24. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  25. Sun, M., Yuan, Y., Zhou, F., Ding, E.: Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European Conference on Computer Vision, pp. 805–821 (2018)

  26. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-ucsd Birds-200-2011 Dataset. Technical report, California Institute of Technology (2011)

  27. Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)

  28. Liu, M., Yu, C., Ling, H., Lei, J.: Hierarchical joint cnn-based models for fine-grained cars recognition. In: Proceedings of the International Conference on Cloud Computing and Security, pp. 337–347 (2016)

  29. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  30. Ye, Z., Hu, F., Liu, Y., Xia, Z., Lyu, F., Liu, P.: Associating multi-scale receptive fields for fine-grained recognition. In: Proceedings of the 2020 IEEE International Conference on Image Processing, pp. 1851–1855 (2020)

  31. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  32. Lin, D., Shen, X., Lu, C., Jia, J.: Deep lac: Deep localization, alignment and classification for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1666–1674 (2015)

  33. Simon, M., Rodner, E.: Neural activation constellations: Unsupervised part model discovery with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1143–1151 (2015)

  34. Liu, X., Xia, T., Wang, J., Yang, Y., Zhou, F., Lin, Y.: Fully convolutional attention networks for fine-grained recognition. arXiv preprint arXiv:1603.06765 (2016)

  35. Li, Z., Yang, Y., Liu, X., Zhou, F., Wen, S., Xu, W.: Dynamic computational time for visual attention. In: Proceedings of the IEEE International Conference on Computer Vision Workshop, pp. 1199–1209 (2017)

  36. Ou, X., Cui, K., Tang, H., Fu, X., et al.: Impacts of decomposition of vallisneria natans on nutrient speciation concentration in two kinds of water environments. Res. Environ. Sci. 30(10), 1553–1560 (2017)

    Google Scholar 

  37. Ju, M., Ryu, H., Moon, S., Yoo, C.D.: Gapnet: Generic-attribute-pose network for fine-grained visual categorization using multi-attribute attention module. In: Proceedings of the 2020 IEEE International Conference on Image Processing, pp. 703–707 (2020)

  38. Zhang, C., Yao, Y., Zhang, J., Chen, J., Huang, P., Zhang, J., Tang, Z.: Web-supervised network for fine-grained visual classification. In: Proceedings of the 2020 IEEE International Conference on Multimedia and Expo, pp. 1–6 (2020)

  39. Luo, W., Yang, X., Mo, X., Lu, Y., Davis, L.S., Li, J., Yang, J., Lim, S.-N.: Cross-x learning for fine-grained visual categorization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8242–8251 (2019)

  40. Huang, S., Wang, X., Tao, D.: Snapmix: Semantically proportional mixing for augmenting fine-grained data. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1628–1636 (2021)

  41. Gwilliam, M., Teuscher, A., Anderson, C., Farrell, R.: Fair comparison: quantifying variance in results for fine-grained visual categorization. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3309–3318 (2021)

  42. Li, X., Yang, C., Chen, S.-L., Zhu, C., Yin, X.-C.: Semantic bilinear pooling for fine-grained recognition. In: Proceedings of the 25th International Conference on Pattern Recognition, pp. 3660–3666 (2021)

  43. Zhang, X., Xiong, H., Zhou, W., Lin, W., Tian, Q.: Picking deep filter responses for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1134–1142 (2016)

  44. Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. IEEE Trans. Multimedia 19(6), 1245–1256 (2017)

    Article  Google Scholar 

  45. Li, X., Monga, V.: Group based deep shared feature learning for fine-grained image classification. arXiv preprint arXiv:2004.01817 (2020)

  46. Hu, Y., Yang, Y., Zhang, J., Cao, X., Zhen, X.: Attentional kernel encoding networks for fine-grained visual categorization. IEEE Trans. Circuits Syst. Video Technol. 31(1), 301–314 (2020)

    Article  Google Scholar 

  47. Wang, Y., Morariu, V.I., Davis, L.S.: Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4148–4157 (2018)

  48. Kong, S., Fowlkes, C.: Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 365–374 (2017)

  49. Zhang, T., Chang, D., Ma, Z., Guo, J.: Progressive co-attention network for fine-grained visual classification. In: Proceedings of the 2021 International Conference on Visual Communications and Image Processing, pp. 1–5 (2021)

  50. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)

  51. Gauthier, J.: Conditional generative adversarial nets for convolutional face generation. Technical report, Stanford University (2014)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 62276073, 61966004), the Guangxi Natural Science Foundation (No. 2019GXNSFDA245018), the Guangxi “Bagui Scholar” Teams for Innovation and Research Project, and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhixin Li.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work. There is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Q., Kuang, W. & Li, Z. Fusing bilinear multi-channel gated vector for fine-grained classification. Machine Vision and Applications 34, 26 (2023). https://doi.org/10.1007/s00138-023-01378-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01378-2

Keywords