ABSTRACT
Algorithms for learning to rank Web documents, display ads, or other types of items constitute a fundamental component of search engines and more generally of online services. In such systems, when a user makes a request or visits a web page, an ordered list of items (e.g. documents or ads) is displayed; the user scans this list in order, and clicks on the first relevant item if any. When the user clicks on an item, the reward collected by the system typically decreases with the position of the item in the displayed list. The main challenge in the design of sequential list selection algorithms stems from the fact that the probabilities with which the user clicks on the various items are unknown and need to be learned. We formulate the design of such algorithms as a stochastic bandit optimization problem. This problem differs from the classical bandit framework: (1) the type of feedback received by the system depends on the actual relevance of the various items in the displayed list (if the user clicks on the last item, we know that none of the previous items in the list are relevant); (2) there are inherent correlations between the average relevance of the items (e.g. the user may be interested in a specific topic only). We assume that items are categorized according to their topic and that users are clustered, so that users of the same cluster are interested in the same topic. We investigate several scenarios depending on the available side-information on the user before selecting the displayed list: (a) we first treat the case where the topic the user is interested in is known when she places a request; (b) we then study the case where the user cluster is known but the mapping between user clusters and topics is unknown. For both scenarios, we derive regret lower bounds and devise algorithms that approach these fundamental limits.
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender, "Learning to rank using gradient descent," in Proc. of ICML, 2005. Google ScholarDigital Library
- S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski, "Bandits for taxonomies: A model based approach," in Proc. of SIAM SDM, 2007.Google Scholar
- F. Radlinski and T. Joachims, "Active exploration for learning rankings from clickthrough data," in Proc. of ACM SIGKDD, 2007. Google ScholarDigital Library
- M. J. Streeter, D. Golovin, and A. Krause, "Online learning of assignments." in Proc. of NIPS, 2009.Google Scholar
- H. Robbins, "Some aspects of the sequential design of experiments," Bulletin of the American Mathematical Society, vol. 58, no. 5, pp. 527--535, 1952.Google ScholarCross Ref
- J. Gittins, Bandit Processes and Dynamic Allocation Indices. John Wiley, 1989.Google Scholar
- F. Radlinski, R. Kleinberg, and T. Joachims, "Learning diverse rankings with multi-armed bandits," in Proc. of ICML, 2008. Google ScholarDigital Library
- Y. Yue and T. Joachims, "Interactively optimizing information retrieval systems as a dueling bandits problem," in Proc. of ICML, 2009. Google ScholarDigital Library
- Y. Yue and C. Guestrin, "Linear submodular bandits and their application to diversified retrieval," in Proc. of NIPS, 2011.Google Scholar
- S. Khuller, A. Moss, and J. S. Naor, "The budgeted maximum coverage problem," Inf. Process. Lett., vol. 70, no. 1, pp. 39--45, Apr. 1999. Google ScholarDigital Library
- P. Kohli, M. Salek, and G. Stoddard, "A fast bandit algorithm for recommendations to users with heterogeneous tastes," in Proc. of AAAI, 2013.Google Scholar
- S. Agrawal, Y. Ding, A. Saberi, and Y. Ye, "Correlation robust stochastic optimization," in Proc. of ACM SODA, 2010. Google ScholarDigital Library
- A. Slivkins, F. Radlinski, and S. Gollapudi, "Ranked bandits in metric spaces: learning optimally diverse rankings over large document collections," Journal of Machine Learning Research, 2013. Google ScholarDigital Library
- S. Bubeck and N. Cesa-Bianchi, "Regret analysis of stochastic and nonstochastic multi-armed bandit problems," Foundations and Trends in Machine Learning, vol. 5, no. 1, pp. 1--122, 2012.Google ScholarCross Ref
- R. Agrawal, "The continuum-armed bandit problem," SIAM J. Control and Optimization, vol. 33, no. 6, pp. 1926--1951, 1995. Google ScholarDigital Library
- R. Kleinberg, A. Slivkins, and E. Upfal, "Multi-armed bandits in metric spaces," in Proc. of STOC, 2008. Google ScholarDigital Library
- S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, "Online optimization in x-armed bandits," in Proc. of NIPS, 2008.Google Scholar
- S. Magureanu, R. Combes, and A. Proutiere, "Lipschitz bandits: Regret lower bound and optimal algorithms," in Proceedings of The 27th Conference on Learning Theory, COLT 2014, Barcelona, Spain, June 13-15, 2014, 2014, pp. 975--999.Google Scholar
- V. Dani, T. P. Hayes, and S. M. Kakade, "Stochastic linear optimization under bandit feedback," in Proc. of COLT, 2008.Google Scholar
- A. Flaxman, A. T. Kalai, and H. B. McMahan, "Online convex optimization in the bandit setting: gradient descent without a gradient," in Proc. of ACM SODA, 2005. Google ScholarDigital Library
- L. Bui, R. Johari, and S. Mannor, "Clustered bandits," http://arxiv.org/abs/1206.4169, 2012.Google Scholar
- T. L. Lai and H. Robbins, "Asymptotically efficient adaptive allocation rules," Advances in Applied Mathematics, vol. 6, no. 1, pp. 4--2, 1985. Google ScholarDigital Library
- T. L. Graves and T. L. Lai, "Asymptotically efficient adaptive choice of control laws in controlled markov chains," SIAM Journal on Control and Optimization, vol. 35, no. 3, pp. 715--743, 1997. Google ScholarDigital Library
- A. Garivier and O. Cappé, "The KL-UCB algorithm for bounded stochastic bandits and beyond," in Proc. of COLT, 2011.Google Scholar
- R. Combes and A. Proutiere, "Unimodal bandits: Regret lower bounds and optimal algorithms," in Proc. of ICML, http://arxiv.org/abs/1405.5096, 2014.Google Scholar
Index Terms
- Learning to Rank: Regret Lower Bounds and Efficient Algorithms
Recommendations
Learning to Rank: Regret Lower Bounds and Efficient Algorithms
Performance evaluation reviewAlgorithms for learning to rank Web documents, display ads, or other types of items constitute a fundamental component of search engines and more generally of online services. In such systems, when a user makes a request or visits a web page, an ordered ...
Bandits with Budgets: Regret Lower Bounds and Optimal Algorithms
Performance evaluation reviewWe investigate multi-armed bandits with budgets, a natural model for ad-display optimization encountered in search engines. We provide asymptotic regret lower bounds satisfied by any algorithm, and propose algorithms which match those lower bounds. We ...
Effective rank aggregation for metasearching
Nowadays, mashup services and especially metasearch engines play an increasingly important role on the Web. Most of users use them directly or indirectly to access and aggregate information from more than one data sources. Similarly to the rest of the ...
Comments