Joint Learning of LSTMs-CNN and Prototype for Micro-video Venue Classification

Liu, Wei; Huang, Xianglin; Cao, Gang; Song, Gege; Yang, Lifang

doi:10.1007/978-3-030-00767-6_65

Wei Liu^18,19,
Xianglin Huang¹⁸,
Gang Cao¹⁸,
Gege Song¹⁸ &
…
Lifang Yang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11165))

Included in the following conference series:

Pacific Rim Conference on Multimedia

2485 Accesses
5 Citations

Abstract

Generally, venue category information of the micro-video is an important cue in social network applications, such as location-oriented applications and personalized services. In the existing micro-video venue classification methods, the discrimination becomes worse due to unsuitable convolutional filter and convolutional padding, and the robustness is not enough that is caused by the softmax layer. In order to alleviate such problems, we propose a novel learning framework which jointly learns LSTMs-CNN and Prototype for micro-video venue classification. Specifically, LSTMs-CNN with convolutional padding of the SAME type and small convolutional filter is used to extract spatio-temporal information. The Prototype is simultaneously learned to improve the robustness against softmax classification function. We adopt Euclidean distance loss function to train the whole network. Extensive experimental results on a real-world dataset show that our model significantly outperforms the state-of-the-art baselines in terms of both Micro-F and Macro-F scores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification

Article 17 December 2019

Attention-enhanced joint learning network for micro-video venue classification

Article 01 July 2023

Getting More from One Attractive Scene: Venue Retrieval in Micro-videos

Notes

References

Hays, J., Efros, A.A.: IM2GPS: estimating geographic information from a single image. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Google Scholar
Zhu, L., Huang, Z., Liu, X., He, X., Song, J., Zhou, X.: discrete multi-modal hashing with canonical views for robust mobile landmark search. IEEE Trans. Multimed. 19(9), 2066–2079 (2017)
Article Google Scholar
Ye, M., Yin, P., Lee, W. C.: Location recommendation for location-based social networks. In: Proceedings of ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, pp. 458–461 (2010)
Google Scholar
Zhang, J., Nie, L., Wang, X., He, X., Huang, X., Chua, T.S.: Shorter-is-better: venue category estimation from micro-video. In: Proceedings of ACM International Conference on Multimedia, pp. 1415–1424 (2016)
Google Scholar
Nie, L., Wang, X., Zhang, J., He, X., Zhang, H., Hong, R., et al.: Enhancing micro-video understanding by harnessing external sounds. In: Proceedings of ACM International Conference on Multimedia, pp. 1192–1200 (2017)
Google Scholar
Liu, M., Nie, L., Wang, M., Chen, B.: Towards micro-video understanding by joint sequential-sparse modeling. In: Proceedings of ACM International Conference on Multimedia, pp. 970–978 (2017)
Google Scholar
Yang, H., Zhang X., Yin F, Liu C.: Robust classification with convolutional prototype learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of International Conference on Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Lepri, B., Mana, N., Cappelletti, A., Pianesi, F.: Automatic prediction of individual performance from thin slices of social behavior. In: Proceedings of ACM International Conference on Multimedia, pp. 733–736 (2009)
Google Scholar
Sanden, C., Zhang, J.Z.: Enhancing multi-label music genre classification through ensemble techniques. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 705–714 (2011)
Google Scholar

Download references

Acknowledgments

We would like to thank the anonymous reviewers for their valuable comments. This work was supported by the National Natural Science Foundation of China (61772539), and the Fundamental Research Funds for the Central Universities (Nos. 3132017XNG1715, 3132018XNG1806).

Author information

Authors and Affiliations

Communication University of China, Beijing, China
Wei Liu, Xianglin Huang, Gang Cao, Gege Song & Lifang Yang
Nanyang Institute of Technology, Nanyang, China
Wei Liu

Authors

Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xianglin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Cao
View author publications
You can also search for this author in PubMed Google Scholar
Gege Song
View author publications
You can also search for this author in PubMed Google Scholar
Lifang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Liu .

Editor information

Editors and Affiliations

Hefei University of Technology, Hefei, China
Richang Hong
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
University of Tokyo, Tokyo, Japan
Toshihiko Yamasaki
Hefei University of Technology, Hefei, China
Meng Wang
City University of Hong Kong, Hong Kong, Hong Kong
Chong-Wah Ngo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, W., Huang, X., Cao, G., Song, G., Yang, L. (2018). Joint Learning of LSTMs-CNN and Prototype for Micro-video Venue Classification. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11165. Springer, Cham. https://doi.org/10.1007/978-3-030-00767-6_65

Download citation

DOI: https://doi.org/10.1007/978-3-030-00767-6_65
Published: 19 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00766-9
Online ISBN: 978-3-030-00767-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Joint Learning of LSTMs-CNN and Prototype for Micro-video Venue Classification

Abstract

Access this chapter

Similar content being viewed by others

Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification

Attention-enhanced joint learning network for micro-video venue classification

Getting More from One Attractive Scene: Venue Retrieval in Micro-videos

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Joint Learning of LSTMs-CNN and Prototype for Micro-video Venue Classification

Abstract

Access this chapter

Similar content being viewed by others

Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification

Attention-enhanced joint learning network for micro-video venue classification

Getting More from One Attractive Scene: Venue Retrieval in Micro-videos

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation