Abstract
Generally, venue category information of the micro-video is an important cue in social network applications, such as location-oriented applications and personalized services. In the existing micro-video venue classification methods, the discrimination becomes worse due to unsuitable convolutional filter and convolutional padding, and the robustness is not enough that is caused by the softmax layer. In order to alleviate such problems, we propose a novel learning framework which jointly learns LSTMs-CNN and Prototype for micro-video venue classification. Specifically, LSTMs-CNN with convolutional padding of the SAME type and small convolutional filter is used to extract spatio-temporal information. The Prototype is simultaneously learned to improve the robustness against softmax classification function. We adopt Euclidean distance loss function to train the whole network. Extensive experimental results on a real-world dataset show that our model significantly outperforms the state-of-the-art baselines in terms of both Micro-F and Macro-F scores.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hays, J., Efros, A.A.: IM2GPS: estimating geographic information from a single image. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Zhu, L., Huang, Z., Liu, X., He, X., Song, J., Zhou, X.: discrete multi-modal hashing with canonical views for robust mobile landmark search. IEEE Trans. Multimed. 19(9), 2066–2079 (2017)
Ye, M., Yin, P., Lee, W. C.: Location recommendation for location-based social networks. In: Proceedings of ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, pp. 458–461 (2010)
Zhang, J., Nie, L., Wang, X., He, X., Huang, X., Chua, T.S.: Shorter-is-better: venue category estimation from micro-video. In: Proceedings of ACM International Conference on Multimedia, pp. 1415–1424 (2016)
Nie, L., Wang, X., Zhang, J., He, X., Zhang, H., Hong, R., et al.: Enhancing micro-video understanding by harnessing external sounds. In: Proceedings of ACM International Conference on Multimedia, pp. 1192–1200 (2017)
Liu, M., Nie, L., Wang, M., Chen, B.: Towards micro-video understanding by joint sequential-sparse modeling. In: Proceedings of ACM International Conference on Multimedia, pp. 970–978 (2017)
Yang, H., Zhang X., Yin F, Liu C.: Robust classification with convolutional prototype learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of International Conference on Neural Information Processing Systems, pp. 3111–3119 (2013)
Lepri, B., Mana, N., Cappelletti, A., Pianesi, F.: Automatic prediction of individual performance from thin slices of social behavior. In: Proceedings of ACM International Conference on Multimedia, pp. 733–736 (2009)
Sanden, C., Zhang, J.Z.: Enhancing multi-label music genre classification through ensemble techniques. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 705–714 (2011)
Acknowledgments
We would like to thank the anonymous reviewers for their valuable comments. This work was supported by the National Natural Science Foundation of China (61772539), and the Fundamental Research Funds for the Central Universities (Nos. 3132017XNG1715, 3132018XNG1806).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, W., Huang, X., Cao, G., Song, G., Yang, L. (2018). Joint Learning of LSTMs-CNN and Prototype for Micro-video Venue Classification. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11165. Springer, Cham. https://doi.org/10.1007/978-3-030-00767-6_65
Download citation
DOI: https://doi.org/10.1007/978-3-030-00767-6_65
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00766-9
Online ISBN: 978-3-030-00767-6
eBook Packages: Computer ScienceComputer Science (R0)