research-article

Picasso - to sing, you must close your eyes and draw

Authors:
Aleksandar Stupar

Saarland University, Saarbrücken, Germany

Saarland University, Saarbrücken, Germany
View Profile

,
Sebastian Michel

Saarland University, Saarbrücken, Germany

Saarland University, Saarbrücken, Germany
View Profile

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalJuly 2011Pages 715–724https://doi.org/10.1145/2009916.2010012

Published:24 July 2011Publication History

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Pages 715–724

ABSTRACT

We study the problem of automatically assigning appropriate music pieces to a picture or, in general, series of pictures. This task, commonly referred to as soundtrack suggestion, is non-trivial as it requires a lot of human attention and a good deal of experience, with master pieces distinguished, e.g., with the Academy Award for Best Original Score. We put forward PICASSO to solve this task in a fully automated way. PICASSO makes use of genuine samples obtained from first-class contemporary movies. Hence, the training set can be arbitrarily large and is also inexpensive to obtain but still provides an excellent source of information. At query time, PICASSO employs a three-level algorithm. First, it selects for a given query image a ranking of the most similar screenshots taken, and subsequently, selects for each screenshot the most similar songs to the music played in the movie when the screenshot was taken. Last, it issues a top-K aggregation algorithm to find the overall best suitable songs available. We have created a large training set consisting of over 40,000 image/soundtrack samples obtained from 28 movies and evaluated the suitability of PICASSO by means of a user study.

References

Stefan Berchtold, Christian Böhm, and Hans-Peter Kriegel. The pyramid-technique: Towards breaking the curse of dimensionality. In SIGMOD Conference, 1998. Google ScholarDigital Library
Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is "nearest neighbor" meaningful? In ICDT, 1999. Google ScholarDigital Library
Rui Cai, Lei Zhang, Feng Jing, Wei Lai, and Wei-Ying Ma. Automated music video generation using web image resource. In ICASSP, 2007.Google ScholarCross Ref
Shih-Fu Chang, T. Sikora, and A. Purl. Overview of the mpeg-7 standard. In IEEE Trans. Circuits Syst. Video Techn., June 2001. Google ScholarDigital Library
Marco Cristani, Anna Pesarin, Carlo Drioli, Vittorio Murino, Antonio Rodà, Michele Grapulin, and Nicu Sebe. Toward an automatically generated soundtrack from low-level cross-modal correlations for automotive scenarios. In ACM Multimedia, 2010. Google ScholarDigital Library
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, 2004. Google ScholarDigital Library
D.P.W. Ellis and G.E. Poliner. Identifying 'cover songs' with chroma features and dynamic programming beat tracking. In ICASSP, 2007.Google ScholarCross Ref
Ronald Fagin, Ravi Kumar, Mohammad Mahdian, D. Sivakumar, and Erik Vee. Comparing partial rankings. In SIAM J. Discrete Math., 2006. Google ScholarDigital Library
Flickr. http://www.flickr.com/.Google Scholar
Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999. Google ScholarDigital Library
Alfred Haar. Zur Theorie der orthogonalen Funktionensysteme. Mathematische Annalen, 69. 1910.Google Scholar
Xian-Sheng Hua, Lie Lu, and HongJiang Zhang. Automatic music video generation based on temporal pattern analysis. In ACM Multimedia, 2004. Google ScholarDigital Library
Xian-Sheng Hua, Lie Lu, and HongJiang Zhang. Optimization-based automated home video editing system. In IEEE Trans. Circuits Syst. Video Techn., 2004. Google ScholarDigital Library
Internet movie database. http://www.imdb.com.Google Scholar
Cheng-Te Li and Man-Kwan Shan. Emotion-based impressionism slideshow with automatic music accompaniment. In ACM Multimedia, 2007. Google ScholarDigital Library
David G. Lowe. Distinctive image features from scale-invariant keypoints. In International Journal of Computer Vision, 2004. Google ScholarDigital Library
Michael I. Mandel and Dan Ellis. Song-level features and support vector machines for music classification. In ISMIR, 2005.Google Scholar
B. S. Manjunath, Jens-Rainer Ohm, Vinod V. Vasudevan, and Akio Yamada. Color and texture descriptors. In IEEE Trans. Circuits Syst. Video Techn., 2001. Google ScholarDigital Library
Marsyas - music/speech dataset. http://marsyas.info/download/data_sets.Google Scholar
A. Martin, D. Charlet, and L. Mauuary. Robust speech/non-speech detection using lda applied to mfcc. In ICASSP, 2001.Google ScholarCross Ref
Music2ten. http://music2ten.com.Google Scholar
Irina Rish. An empirical study of the naive Bayes classifier. In IJCAI-01 workshop on "Empirical Methods in AI", 2001.Google Scholar
H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. In IEEE Transactions on Acoustics, Speech and Signal Processing,February 1978.Google Scholar
Hanan Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, 2006. Google ScholarDigital Library
Zbigniew R. Struzik and Arno Siebes. The haar wavelet transform in the time series similarity paradigm. In PKDD, 1999. Google ScholarDigital Library
G. Tzanetakis and P. Cook. Musical genre classification of audio signals. In IEEE Transactions on Speech and Audio Processing, July 2002.Google Scholar
George Tzanetakis. Music analysis, retrieval and synthesis of audio signals marsyas. In ACM Multimedia, 2009. Google ScholarDigital Library
Jinjun Wang, Changsheng Xu, Engsiong Chng, Lingyu Duan, Kongwah Wan, and Qi Tian. Automatic generation of personalized music sports video. In ACM Multimedia, 2005. Google ScholarDigital Library
Songhua Xu, Tao Jin, and Francis Chi-Moon Lau. Automatic generation of music slide show using personal photos. In ISM, 2008. Google ScholarDigital Library
Youtube. http://www.youtube.com.Google Scholar

Index Terms

Picasso - to sing, you must close your eyes and draw

Recommendations

PICASSO: automated soundtrack suggestion for multi-modal data
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

We demonstrate PICASSO, a novel approach to soundtrack recommendation. Given text, video, or image documents, PICASSO selects the best fitting music pieces, out of a given set of files, for instance, a user's personal mp3 collection. This task, commonly ...
Read More
SRbench--a benchmark for soundtrack recommendation systems
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

In this work, a benchmark to evaluate the retrieval performance of soundtrack recommendation systems is proposed. Such systems aim at finding songs that are played as background music for a given set of images. The proposed benchmark is based on ...
Read More
SING: symbol-to-instrument neural generator
NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

Recent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
July 2011
1374 pages
ISBN:9781450307574
DOI:10.1145/2009916
General Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Jian-Yun Nie
University of Montreal, Canada
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Research, Spain
,
Tat-Seng Chua
National University of Singapore
,
W. Bruce Croft
University of Massachusetts, Amherst, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 July 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
automatic music selection
background music
slide show
soundtrack recommendation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 401
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Picasso - to sing, you must close your eyes and draw

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

PICASSO: automated soundtrack suggestion for multi-modal data

SRbench--a benchmark for soundtrack recommendation systems

SING: symbol-to-instrument neural generator