research-article

Learning to Rank: Regret Lower Bounds and Efficient Algorithms

Authors:
Richard Combes

Centrale-Supelec, L2S, Gif-sur-Yvette, France

Centrale-Supelec, L2S, Gif-sur-Yvette, France
View Profile

,
Stefan Magureanu

KTH, Royal Institute of Technology, Stockholm, Sweden

KTH, Royal Institute of Technology, Stockholm, Sweden
View Profile

,
Alexandre Proutiere

KTH, Royal Institute of Technology, Stockholm, Sweden

KTH, Royal Institute of Technology, Stockholm, Sweden
View Profile

,
Cyrille Laroche

KTH, Royal Institute of Technology, Stockholm, Sweden

KTH, Royal Institute of Technology, Stockholm, Sweden
View Profile

SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer SystemsJune 2015Pages 231–244https://doi.org/10.1145/2745844.2745852

Published:15 June 2015Publication History

SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

Pages 231–244

ABSTRACT

Algorithms for learning to rank Web documents, display ads, or other types of items constitute a fundamental component of search engines and more generally of online services. In such systems, when a user makes a request or visits a web page, an ordered list of items (e.g. documents or ads) is displayed; the user scans this list in order, and clicks on the first relevant item if any. When the user clicks on an item, the reward collected by the system typically decreases with the position of the item in the displayed list. The main challenge in the design of sequential list selection algorithms stems from the fact that the probabilities with which the user clicks on the various items are unknown and need to be learned. We formulate the design of such algorithms as a stochastic bandit optimization problem. This problem differs from the classical bandit framework: (1) the type of feedback received by the system depends on the actual relevance of the various items in the displayed list (if the user clicks on the last item, we know that none of the previous items in the list are relevant); (2) there are inherent correlations between the average relevance of the items (e.g. the user may be interested in a specific topic only). We assume that items are categorized according to their topic and that users are clustered, so that users of the same cluster are interested in the same topic. We investigate several scenarios depending on the available side-information on the user before selecting the displayed list: (a) we first treat the case where the topic the user is interested in is known when she places a request; (b) we then study the case where the user cluster is known but the mapping between user clusters and topics is unknown. For both scenarios, we derive regret lower bounds and devise algorithms that approach these fundamental limits.

References

C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender, "Learning to rank using gradient descent," in Proc. of ICML, 2005. Google ScholarDigital Library
S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski, "Bandits for taxonomies: A model based approach," in Proc. of SIAM SDM, 2007.Google Scholar
F. Radlinski and T. Joachims, "Active exploration for learning rankings from clickthrough data," in Proc. of ACM SIGKDD, 2007. Google ScholarDigital Library
M. J. Streeter, D. Golovin, and A. Krause, "Online learning of assignments." in Proc. of NIPS, 2009.Google Scholar
H. Robbins, "Some aspects of the sequential design of experiments," Bulletin of the American Mathematical Society, vol. 58, no. 5, pp. 527--535, 1952.Google ScholarCross Ref
J. Gittins, Bandit Processes and Dynamic Allocation Indices. John Wiley, 1989.Google Scholar
F. Radlinski, R. Kleinberg, and T. Joachims, "Learning diverse rankings with multi-armed bandits," in Proc. of ICML, 2008. Google ScholarDigital Library
Y. Yue and T. Joachims, "Interactively optimizing information retrieval systems as a dueling bandits problem," in Proc. of ICML, 2009. Google ScholarDigital Library
Y. Yue and C. Guestrin, "Linear submodular bandits and their application to diversified retrieval," in Proc. of NIPS, 2011.Google Scholar
S. Khuller, A. Moss, and J. S. Naor, "The budgeted maximum coverage problem," Inf. Process. Lett., vol. 70, no. 1, pp. 39--45, Apr. 1999. Google ScholarDigital Library
P. Kohli, M. Salek, and G. Stoddard, "A fast bandit algorithm for recommendations to users with heterogeneous tastes," in Proc. of AAAI, 2013.Google Scholar
S. Agrawal, Y. Ding, A. Saberi, and Y. Ye, "Correlation robust stochastic optimization," in Proc. of ACM SODA, 2010. Google ScholarDigital Library
A. Slivkins, F. Radlinski, and S. Gollapudi, "Ranked bandits in metric spaces: learning optimally diverse rankings over large document collections," Journal of Machine Learning Research, 2013. Google ScholarDigital Library
S. Bubeck and N. Cesa-Bianchi, "Regret analysis of stochastic and nonstochastic multi-armed bandit problems," Foundations and Trends in Machine Learning, vol. 5, no. 1, pp. 1--122, 2012.Google ScholarCross Ref
R. Agrawal, "The continuum-armed bandit problem," SIAM J. Control and Optimization, vol. 33, no. 6, pp. 1926--1951, 1995. Google ScholarDigital Library
R. Kleinberg, A. Slivkins, and E. Upfal, "Multi-armed bandits in metric spaces," in Proc. of STOC, 2008. Google ScholarDigital Library
S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, "Online optimization in x-armed bandits," in Proc. of NIPS, 2008.Google Scholar
S. Magureanu, R. Combes, and A. Proutiere, "Lipschitz bandits: Regret lower bound and optimal algorithms," in Proceedings of The 27th Conference on Learning Theory, COLT 2014, Barcelona, Spain, June 13-15, 2014, 2014, pp. 975--999.Google Scholar
V. Dani, T. P. Hayes, and S. M. Kakade, "Stochastic linear optimization under bandit feedback," in Proc. of COLT, 2008.Google Scholar
A. Flaxman, A. T. Kalai, and H. B. McMahan, "Online convex optimization in the bandit setting: gradient descent without a gradient," in Proc. of ACM SODA, 2005. Google ScholarDigital Library
L. Bui, R. Johari, and S. Mannor, "Clustered bandits," http://arxiv.org/abs/1206.4169, 2012.Google Scholar
T. L. Lai and H. Robbins, "Asymptotically efficient adaptive allocation rules," Advances in Applied Mathematics, vol. 6, no. 1, pp. 4--2, 1985. Google ScholarDigital Library
T. L. Graves and T. L. Lai, "Asymptotically efficient adaptive choice of control laws in controlled markov chains," SIAM Journal on Control and Optimization, vol. 35, no. 3, pp. 715--743, 1997. Google ScholarDigital Library
A. Garivier and O. Cappé, "The KL-UCB algorithm for bounded stochastic bandits and beyond," in Proc. of COLT, 2011.Google Scholar
R. Combes and A. Proutiere, "Unimodal bandits: Regret lower bounds and optimal algorithms," in Proc. of ICML, http://arxiv.org/abs/1405.5096, 2014.Google Scholar

Index Terms

Learning to Rank: Regret Lower Bounds and Efficient Algorithms
1. Computing methodologies
  1. Machine learning
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Learning to Rank: Regret Lower Bounds and Efficient Algorithms
Performance evaluation review

Algorithms for learning to rank Web documents, display ads, or other types of items constitute a fundamental component of search engines and more generally of online services. In such systems, when a user makes a request or visits a web page, an ordered ...
Read More
Bandits with Budgets: Regret Lower Bounds and Optimal Algorithms
Performance evaluation review

We investigate multi-armed bandits with budgets, a natural model for ad-display optimization encountered in search engines. We provide asymptotic regret lower bounds satisfied by any algorithm, and propose algorithms which match those lower bounds. We ...
Read More
Effective rank aggregation for metasearching

Nowadays, mashup services and especially metasearch engines play an increasingly important role on the Web. Most of users use them directly or indirectly to access and aggregate information from more than one data sources. Similarly to the rest of the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
June 2015
488 pages
ISBN:9781450334860
DOI:10.1145/2745844
General Chairs:
Bill Lin
University of California, San Diego
,
Jun (Jim) Xu
Georgia Tech
,
Program Chairs:
Sudipta Sengupta
Microsoft Research
,
Devavrat Shah
Massachusetts Institute of Technology
ACM SIGMETRICS Performance Evaluation Review Volume 43, Issue 1
Performance evaluation review
June 2015
468 pages
ISSN:0163-5999
DOI:10.1145/2796314
Editors:
Derek Eager
University of Saskatchewan
,
Carey Williamson
University of Calgary
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ad-display optimization
learning
multi-armed bandits
search engines
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMETRICS '15 Paper Acceptance Rate32of239submissions,13%Overall Acceptance Rate459of2,691submissions,17%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 633
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning to Rank: Regret Lower Bounds and Efficient Algorithms

SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning to Rank: Regret Lower Bounds and Efficient Algorithms

Bandits with Budgets: Regret Lower Bounds and Optimal Algorithms

Effective rank aggregation for metasearching