Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2009916.2010012acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Picasso - to sing, you must close your eyes and draw

Published:24 July 2011Publication History

ABSTRACT

We study the problem of automatically assigning appropriate music pieces to a picture or, in general, series of pictures. This task, commonly referred to as soundtrack suggestion, is non-trivial as it requires a lot of human attention and a good deal of experience, with master pieces distinguished, e.g., with the Academy Award for Best Original Score. We put forward PICASSO to solve this task in a fully automated way. PICASSO makes use of genuine samples obtained from first-class contemporary movies. Hence, the training set can be arbitrarily large and is also inexpensive to obtain but still provides an excellent source of information. At query time, PICASSO employs a three-level algorithm. First, it selects for a given query image a ranking of the most similar screenshots taken, and subsequently, selects for each screenshot the most similar songs to the music played in the movie when the screenshot was taken. Last, it issues a top-K aggregation algorithm to find the overall best suitable songs available. We have created a large training set consisting of over 40,000 image/soundtrack samples obtained from 28 movies and evaluated the suitability of PICASSO by means of a user study.

References

  1. Stefan Berchtold, Christian Böhm, and Hans-Peter Kriegel. The pyramid-technique: Towards breaking the curse of dimensionality. In SIGMOD Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is "nearest neighbor" meaningful? In ICDT, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Rui Cai, Lei Zhang, Feng Jing, Wei Lai, and Wei-Ying Ma. Automated music video generation using web image resource. In ICASSP, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  4. Shih-Fu Chang, T. Sikora, and A. Purl. Overview of the mpeg-7 standard. In IEEE Trans. Circuits Syst. Video Techn., June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Marco Cristani, Anna Pesarin, Carlo Drioli, Vittorio Murino, Antonio Rodà, Michele Grapulin, and Nicu Sebe. Toward an automatically generated soundtrack from low-level cross-modal correlations for automotive scenarios. In ACM Multimedia, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D.P.W. Ellis and G.E. Poliner. Identifying 'cover songs' with chroma features and dynamic programming beat tracking. In ICASSP, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  8. Ronald Fagin, Ravi Kumar, Mohammad Mahdian, D. Sivakumar, and Erik Vee. Comparing partial rankings. In SIAM J. Discrete Math., 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Flickr. http://www.flickr.com/.Google ScholarGoogle Scholar
  10. Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Alfred Haar. Zur Theorie der orthogonalen Funktionensysteme. Mathematische Annalen, 69. 1910.Google ScholarGoogle Scholar
  12. Xian-Sheng Hua, Lie Lu, and HongJiang Zhang. Automatic music video generation based on temporal pattern analysis. In ACM Multimedia, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Xian-Sheng Hua, Lie Lu, and HongJiang Zhang. Optimization-based automated home video editing system. In IEEE Trans. Circuits Syst. Video Techn., 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Internet movie database. http://www.imdb.com.Google ScholarGoogle Scholar
  15. Cheng-Te Li and Man-Kwan Shan. Emotion-based impressionism slideshow with automatic music accompaniment. In ACM Multimedia, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. David G. Lowe. Distinctive image features from scale-invariant keypoints. In International Journal of Computer Vision, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michael I. Mandel and Dan Ellis. Song-level features and support vector machines for music classification. In ISMIR, 2005.Google ScholarGoogle Scholar
  18. B. S. Manjunath, Jens-Rainer Ohm, Vinod V. Vasudevan, and Akio Yamada. Color and texture descriptors. In IEEE Trans. Circuits Syst. Video Techn., 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Marsyas - music/speech dataset. http://marsyas.info/download/data_sets.Google ScholarGoogle Scholar
  20. A. Martin, D. Charlet, and L. Mauuary. Robust speech/non-speech detection using lda applied to mfcc. In ICASSP, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  21. Music2ten. http://music2ten.com.Google ScholarGoogle Scholar
  22. Irina Rish. An empirical study of the naive Bayes classifier. In IJCAI-01 workshop on "Empirical Methods in AI", 2001.Google ScholarGoogle Scholar
  23. H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. In IEEE Transactions on Acoustics, Speech and Signal Processing,February 1978.Google ScholarGoogle Scholar
  24. Hanan Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zbigniew R. Struzik and Arno Siebes. The haar wavelet transform in the time series similarity paradigm. In PKDD, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Tzanetakis and P. Cook. Musical genre classification of audio signals. In IEEE Transactions on Speech and Audio Processing, July 2002.Google ScholarGoogle Scholar
  27. George Tzanetakis. Music analysis, retrieval and synthesis of audio signals marsyas. In ACM Multimedia, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jinjun Wang, Changsheng Xu, Engsiong Chng, Lingyu Duan, Kongwah Wan, and Qi Tian. Automatic generation of personalized music sports video. In ACM Multimedia, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Songhua Xu, Tao Jin, and Francis Chi-Moon Lau. Automatic generation of music slide show using personal photos. In ISM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Youtube. http://www.youtube.com.Google ScholarGoogle Scholar

Index Terms

  1. Picasso - to sing, you must close your eyes and draw

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in
                  • Published in

                    cover image ACM Conferences
                    SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
                    July 2011
                    1374 pages
                    ISBN:9781450307574
                    DOI:10.1145/2009916

                    Copyright © 2011 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 24 July 2011

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article

                    Acceptance Rates

                    Overall Acceptance Rate792of3,983submissions,20%

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader