Abstract
Fine-grained visual classification aims to identify images belonging to multiple subcategories within the same category. Most existing methods use a single network to extract image features or learn fine-grained features by localizing and scaling key regions. Due to the limited number of components, this may miss valuable clues or cause performance degradation. This paper proposes an efficient approach to address this problem. First, we propose to learn as many global features as possible in images via a dual-baseline network. Second, considering the importance of the attention mechanism for image classification, we exploit the gated interaction of channels between global feature maps to generate attention to discover key discriminant regions of images. In the same way, the interactive channel attention and position attention of the global feature map are used to focus on the key discriminant regions of the image. In the above attention, interactive gated attention is generated by the gating vector mapped by the multi-layer perceptron MLP. Similarly, for channel attention and position attention, we perform attention based on global feature semantic information enhancement. The proposed model performs well on three datasets: CUB-200-2011, Stanford Cars, and FGVC aircraft.
Similar content being viewed by others
References
Li, Z., Lin, L., Zhang, C., Ma, H., Zhao, W., Shi, Z.: A semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Trans. Multimedia Comput. Commun. Appl. 17(1), 1–23 (2021)
Zhou, T., Li, Z., Zhang, C., Ma, H.: Classify multi-label images via improved cnn model with adversarial network. Multimedia Tools Appl. 79(9), 6871–6890 (2020)
Wei, X.-S., Xie, C.-W., Wu, J.: Mask-cnn: Localizing parts and selecting descriptors for fine-grained image recognition. arXiv preprint arXiv:1605.06878 (2016)
Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based r-cnns for fine-grained category detection. In: Proceedings of the European Conference on Computer Vision, pp. 834–849 (2014)
Branson, S., Van Horn, G., Belongie, S., Perona, P.: Bird species categorization using pose normalized deep convolutional nets. arXiv preprint arXiv:1406.2952 (2014)
Sermanet, P., Frome, A., Real, E.: Attention for fine-grained categorization. arXiv preprint arXiv:1412.7054 (2014)
Chen, S., Li, Z., Tang, Z.: Relation r-cnn: a graph based relation-aware network for object detection. IEEE Signal Process. Lett. 27, 1680–1684 (2020)
Lin, T.-Y., RoyChowdhury, A., Maji, S.: Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)
Ge, Z., McCool, C., Sanderson, C., Corke, P.: Subset feature learning for fine-grained category classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 46–52 (2015)
Zhang, F., Li, M., Zhai, G., Liu, Y.: Multi-branch and multi-scale attention learning for fine-grained visual categorization. In: Proceedings of the International Conference on Multimedia Modeling, pp. 136–147 (2021)
Liu, C., Huang, L., Wei, Z., Zhang, W.: Subtler mixed attention network on fine-grained image classification. Appl. Intell. 51(11), 7903–7916 (2021)
Gao, Y., Han, X., Wang, X., Huang, W., Scott, M.: Channel interaction networks for fine-grained image categorization. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10818–10825 (2020)
Chen, Y., Bai, Y., Zhang, W., Mei, T.: Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5157–5166 (2019)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Fu, J., Zheng, H., Mei, T.: Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4438–4446 (2017)
Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., Wang, L.: Learning to navigate for fine-grained classification. In: Proceedings of the European Conference on Computer Vision, pp. 420–435 (2018)
Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., Woo, W.-C.: Convolutional lstm network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst., pp. 802–810 (2015)
Shroff, P., Chen, T., Wei, Y., Wang, Z.: Focus longer to see better: Recursively refined attention for fine-grained image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop, pp. 868–869 (2020)
Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5209–5217 (2017)
Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 317–326 (2016)
Yu, C., Zhao, X., Zheng, Q., Zhang, P., You, X.: Hierarchical bilinear pooling for fine-grained visual recognition. In: Proceedings of the European Conference on Computer Vision, pp. 574–589 (2018)
Zhu, Q., Kuang, W., Li, Z.: Dual attention interactive fine-grained classification network based on data augmentation. J. Vis. Commun. Image Represent. 88, 103632 (2022)
Li, H., Zhang, X., Tian, Q., Xiong, H.: Attribute mix: Semantic data augmentation for fine grained recognition. In: Proceedings of the 2020 IEEE International Conference on Visual Communications and Image Processing, pp. 243–246 (2020)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Sun, M., Yuan, Y., Zhou, F., Ding, E.: Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European Conference on Computer Vision, pp. 805–821 (2018)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-ucsd Birds-200-2011 Dataset. Technical report, California Institute of Technology (2011)
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)
Liu, M., Yu, C., Ling, H., Lei, J.: Hierarchical joint cnn-based models for fine-grained cars recognition. In: Proceedings of the International Conference on Cloud Computing and Security, pp. 337–347 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Ye, Z., Hu, F., Liu, Y., Xia, Z., Lyu, F., Liu, P.: Associating multi-scale receptive fields for fine-grained recognition. In: Proceedings of the 2020 IEEE International Conference on Image Processing, pp. 1851–1855 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Lin, D., Shen, X., Lu, C., Jia, J.: Deep lac: Deep localization, alignment and classification for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1666–1674 (2015)
Simon, M., Rodner, E.: Neural activation constellations: Unsupervised part model discovery with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1143–1151 (2015)
Liu, X., Xia, T., Wang, J., Yang, Y., Zhou, F., Lin, Y.: Fully convolutional attention networks for fine-grained recognition. arXiv preprint arXiv:1603.06765 (2016)
Li, Z., Yang, Y., Liu, X., Zhou, F., Wen, S., Xu, W.: Dynamic computational time for visual attention. In: Proceedings of the IEEE International Conference on Computer Vision Workshop, pp. 1199–1209 (2017)
Ou, X., Cui, K., Tang, H., Fu, X., et al.: Impacts of decomposition of vallisneria natans on nutrient speciation concentration in two kinds of water environments. Res. Environ. Sci. 30(10), 1553–1560 (2017)
Ju, M., Ryu, H., Moon, S., Yoo, C.D.: Gapnet: Generic-attribute-pose network for fine-grained visual categorization using multi-attribute attention module. In: Proceedings of the 2020 IEEE International Conference on Image Processing, pp. 703–707 (2020)
Zhang, C., Yao, Y., Zhang, J., Chen, J., Huang, P., Zhang, J., Tang, Z.: Web-supervised network for fine-grained visual classification. In: Proceedings of the 2020 IEEE International Conference on Multimedia and Expo, pp. 1–6 (2020)
Luo, W., Yang, X., Mo, X., Lu, Y., Davis, L.S., Li, J., Yang, J., Lim, S.-N.: Cross-x learning for fine-grained visual categorization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8242–8251 (2019)
Huang, S., Wang, X., Tao, D.: Snapmix: Semantically proportional mixing for augmenting fine-grained data. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1628–1636 (2021)
Gwilliam, M., Teuscher, A., Anderson, C., Farrell, R.: Fair comparison: quantifying variance in results for fine-grained visual categorization. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3309–3318 (2021)
Li, X., Yang, C., Chen, S.-L., Zhu, C., Yin, X.-C.: Semantic bilinear pooling for fine-grained recognition. In: Proceedings of the 25th International Conference on Pattern Recognition, pp. 3660–3666 (2021)
Zhang, X., Xiong, H., Zhou, W., Lin, W., Tian, Q.: Picking deep filter responses for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1134–1142 (2016)
Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. IEEE Trans. Multimedia 19(6), 1245–1256 (2017)
Li, X., Monga, V.: Group based deep shared feature learning for fine-grained image classification. arXiv preprint arXiv:2004.01817 (2020)
Hu, Y., Yang, Y., Zhang, J., Cao, X., Zhen, X.: Attentional kernel encoding networks for fine-grained visual categorization. IEEE Trans. Circuits Syst. Video Technol. 31(1), 301–314 (2020)
Wang, Y., Morariu, V.I., Davis, L.S.: Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4148–4157 (2018)
Kong, S., Fowlkes, C.: Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 365–374 (2017)
Zhang, T., Chang, D., Ma, Z., Guo, J.: Progressive co-attention network for fine-grained visual classification. In: Proceedings of the 2021 International Conference on Visual Communications and Image Processing, pp. 1–5 (2021)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Gauthier, J.: Conditional generative adversarial nets for convolutional face generation. Technical report, Stanford University (2014)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Nos. 62276073, 61966004), the Guangxi Natural Science Foundation (No. 2019GXNSFDA245018), the Guangxi “Bagui Scholar” Teams for Innovation and Research Project, and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work. There is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, Q., Kuang, W. & Li, Z. Fusing bilinear multi-channel gated vector for fine-grained classification. Machine Vision and Applications 34, 26 (2023). https://doi.org/10.1007/s00138-023-01378-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01378-2