ABSTRACT
We study the problem of automatically assigning appropriate music pieces to a picture or, in general, series of pictures. This task, commonly referred to as soundtrack suggestion, is non-trivial as it requires a lot of human attention and a good deal of experience, with master pieces distinguished, e.g., with the Academy Award for Best Original Score. We put forward PICASSO to solve this task in a fully automated way. PICASSO makes use of genuine samples obtained from first-class contemporary movies. Hence, the training set can be arbitrarily large and is also inexpensive to obtain but still provides an excellent source of information. At query time, PICASSO employs a three-level algorithm. First, it selects for a given query image a ranking of the most similar screenshots taken, and subsequently, selects for each screenshot the most similar songs to the music played in the movie when the screenshot was taken. Last, it issues a top-K aggregation algorithm to find the overall best suitable songs available. We have created a large training set consisting of over 40,000 image/soundtrack samples obtained from 28 movies and evaluated the suitability of PICASSO by means of a user study.
- Stefan Berchtold, Christian Böhm, and Hans-Peter Kriegel. The pyramid-technique: Towards breaking the curse of dimensionality. In SIGMOD Conference, 1998. Google ScholarDigital Library
- Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is "nearest neighbor" meaningful? In ICDT, 1999. Google ScholarDigital Library
- Rui Cai, Lei Zhang, Feng Jing, Wei Lai, and Wei-Ying Ma. Automated music video generation using web image resource. In ICASSP, 2007.Google ScholarCross Ref
- Shih-Fu Chang, T. Sikora, and A. Purl. Overview of the mpeg-7 standard. In IEEE Trans. Circuits Syst. Video Techn., June 2001. Google ScholarDigital Library
- Marco Cristani, Anna Pesarin, Carlo Drioli, Vittorio Murino, Antonio Rodà, Michele Grapulin, and Nicu Sebe. Toward an automatically generated soundtrack from low-level cross-modal correlations for automotive scenarios. In ACM Multimedia, 2010. Google ScholarDigital Library
- Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, 2004. Google ScholarDigital Library
- D.P.W. Ellis and G.E. Poliner. Identifying 'cover songs' with chroma features and dynamic programming beat tracking. In ICASSP, 2007.Google ScholarCross Ref
- Ronald Fagin, Ravi Kumar, Mohammad Mahdian, D. Sivakumar, and Erik Vee. Comparing partial rankings. In SIAM J. Discrete Math., 2006. Google ScholarDigital Library
- Flickr. http://www.flickr.com/.Google Scholar
- Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999. Google ScholarDigital Library
- Alfred Haar. Zur Theorie der orthogonalen Funktionensysteme. Mathematische Annalen, 69. 1910.Google Scholar
- Xian-Sheng Hua, Lie Lu, and HongJiang Zhang. Automatic music video generation based on temporal pattern analysis. In ACM Multimedia, 2004. Google ScholarDigital Library
- Xian-Sheng Hua, Lie Lu, and HongJiang Zhang. Optimization-based automated home video editing system. In IEEE Trans. Circuits Syst. Video Techn., 2004. Google ScholarDigital Library
- Internet movie database. http://www.imdb.com.Google Scholar
- Cheng-Te Li and Man-Kwan Shan. Emotion-based impressionism slideshow with automatic music accompaniment. In ACM Multimedia, 2007. Google ScholarDigital Library
- David G. Lowe. Distinctive image features from scale-invariant keypoints. In International Journal of Computer Vision, 2004. Google ScholarDigital Library
- Michael I. Mandel and Dan Ellis. Song-level features and support vector machines for music classification. In ISMIR, 2005.Google Scholar
- B. S. Manjunath, Jens-Rainer Ohm, Vinod V. Vasudevan, and Akio Yamada. Color and texture descriptors. In IEEE Trans. Circuits Syst. Video Techn., 2001. Google ScholarDigital Library
- Marsyas - music/speech dataset. http://marsyas.info/download/data_sets.Google Scholar
- A. Martin, D. Charlet, and L. Mauuary. Robust speech/non-speech detection using lda applied to mfcc. In ICASSP, 2001.Google ScholarCross Ref
- Music2ten. http://music2ten.com.Google Scholar
- Irina Rish. An empirical study of the naive Bayes classifier. In IJCAI-01 workshop on "Empirical Methods in AI", 2001.Google Scholar
- H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. In IEEE Transactions on Acoustics, Speech and Signal Processing,February 1978.Google Scholar
- Hanan Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, 2006. Google ScholarDigital Library
- Zbigniew R. Struzik and Arno Siebes. The haar wavelet transform in the time series similarity paradigm. In PKDD, 1999. Google ScholarDigital Library
- G. Tzanetakis and P. Cook. Musical genre classification of audio signals. In IEEE Transactions on Speech and Audio Processing, July 2002.Google Scholar
- George Tzanetakis. Music analysis, retrieval and synthesis of audio signals marsyas. In ACM Multimedia, 2009. Google ScholarDigital Library
- Jinjun Wang, Changsheng Xu, Engsiong Chng, Lingyu Duan, Kongwah Wan, and Qi Tian. Automatic generation of personalized music sports video. In ACM Multimedia, 2005. Google ScholarDigital Library
- Songhua Xu, Tao Jin, and Francis Chi-Moon Lau. Automatic generation of music slide show using personal photos. In ISM, 2008. Google ScholarDigital Library
- Youtube. http://www.youtube.com.Google Scholar
Index Terms
- Picasso - to sing, you must close your eyes and draw
Recommendations
PICASSO: automated soundtrack suggestion for multi-modal data
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementWe demonstrate PICASSO, a novel approach to soundtrack recommendation. Given text, video, or image documents, PICASSO selects the best fitting music pieces, out of a given set of files, for instance, a user's personal mp3 collection. This task, commonly ...
SRbench--a benchmark for soundtrack recommendation systems
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementIn this work, a benchmark to evaluate the retrieval performance of soundtrack recommendation systems is proposed. Such systems aim at finding songs that are played as background music for a given set of images. The proposed benchmark is based on ...
SING: symbol-to-instrument neural generator
NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing SystemsRecent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their ...
Comments