Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2745844.2745852acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Learning to Rank: Regret Lower Bounds and Efficient Algorithms

Published:15 June 2015Publication History

ABSTRACT

Algorithms for learning to rank Web documents, display ads, or other types of items constitute a fundamental component of search engines and more generally of online services. In such systems, when a user makes a request or visits a web page, an ordered list of items (e.g. documents or ads) is displayed; the user scans this list in order, and clicks on the first relevant item if any. When the user clicks on an item, the reward collected by the system typically decreases with the position of the item in the displayed list. The main challenge in the design of sequential list selection algorithms stems from the fact that the probabilities with which the user clicks on the various items are unknown and need to be learned. We formulate the design of such algorithms as a stochastic bandit optimization problem. This problem differs from the classical bandit framework: (1) the type of feedback received by the system depends on the actual relevance of the various items in the displayed list (if the user clicks on the last item, we know that none of the previous items in the list are relevant); (2) there are inherent correlations between the average relevance of the items (e.g. the user may be interested in a specific topic only). We assume that items are categorized according to their topic and that users are clustered, so that users of the same cluster are interested in the same topic. We investigate several scenarios depending on the available side-information on the user before selecting the displayed list: (a) we first treat the case where the topic the user is interested in is known when she places a request; (b) we then study the case where the user cluster is known but the mapping between user clusters and topics is unknown. For both scenarios, we derive regret lower bounds and devise algorithms that approach these fundamental limits.

References

  1. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender, "Learning to rank using gradient descent," in Proc. of ICML, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski, "Bandits for taxonomies: A model based approach," in Proc. of SIAM SDM, 2007.Google ScholarGoogle Scholar
  3. F. Radlinski and T. Joachims, "Active exploration for learning rankings from clickthrough data," in Proc. of ACM SIGKDD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. J. Streeter, D. Golovin, and A. Krause, "Online learning of assignments." in Proc. of NIPS, 2009.Google ScholarGoogle Scholar
  5. H. Robbins, "Some aspects of the sequential design of experiments," Bulletin of the American Mathematical Society, vol. 58, no. 5, pp. 527--535, 1952.Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Gittins, Bandit Processes and Dynamic Allocation Indices. John Wiley, 1989.Google ScholarGoogle Scholar
  7. F. Radlinski, R. Kleinberg, and T. Joachims, "Learning diverse rankings with multi-armed bandits," in Proc. of ICML, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Yue and T. Joachims, "Interactively optimizing information retrieval systems as a dueling bandits problem," in Proc. of ICML, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Yue and C. Guestrin, "Linear submodular bandits and their application to diversified retrieval," in Proc. of NIPS, 2011.Google ScholarGoogle Scholar
  10. S. Khuller, A. Moss, and J. S. Naor, "The budgeted maximum coverage problem," Inf. Process. Lett., vol. 70, no. 1, pp. 39--45, Apr. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Kohli, M. Salek, and G. Stoddard, "A fast bandit algorithm for recommendations to users with heterogeneous tastes," in Proc. of AAAI, 2013.Google ScholarGoogle Scholar
  12. S. Agrawal, Y. Ding, A. Saberi, and Y. Ye, "Correlation robust stochastic optimization," in Proc. of ACM SODA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Slivkins, F. Radlinski, and S. Gollapudi, "Ranked bandits in metric spaces: learning optimally diverse rankings over large document collections," Journal of Machine Learning Research, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Bubeck and N. Cesa-Bianchi, "Regret analysis of stochastic and nonstochastic multi-armed bandit problems," Foundations and Trends in Machine Learning, vol. 5, no. 1, pp. 1--122, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  15. R. Agrawal, "The continuum-armed bandit problem," SIAM J. Control and Optimization, vol. 33, no. 6, pp. 1926--1951, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Kleinberg, A. Slivkins, and E. Upfal, "Multi-armed bandits in metric spaces," in Proc. of STOC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, "Online optimization in x-armed bandits," in Proc. of NIPS, 2008.Google ScholarGoogle Scholar
  18. S. Magureanu, R. Combes, and A. Proutiere, "Lipschitz bandits: Regret lower bound and optimal algorithms," in Proceedings of The 27th Conference on Learning Theory, COLT 2014, Barcelona, Spain, June 13-15, 2014, 2014, pp. 975--999.Google ScholarGoogle Scholar
  19. V. Dani, T. P. Hayes, and S. M. Kakade, "Stochastic linear optimization under bandit feedback," in Proc. of COLT, 2008.Google ScholarGoogle Scholar
  20. A. Flaxman, A. T. Kalai, and H. B. McMahan, "Online convex optimization in the bandit setting: gradient descent without a gradient," in Proc. of ACM SODA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Bui, R. Johari, and S. Mannor, "Clustered bandits," http://arxiv.org/abs/1206.4169, 2012.Google ScholarGoogle Scholar
  22. T. L. Lai and H. Robbins, "Asymptotically efficient adaptive allocation rules," Advances in Applied Mathematics, vol. 6, no. 1, pp. 4--2, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. L. Graves and T. L. Lai, "Asymptotically efficient adaptive choice of control laws in controlled markov chains," SIAM Journal on Control and Optimization, vol. 35, no. 3, pp. 715--743, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Garivier and O. Cappé, "The KL-UCB algorithm for bounded stochastic bandits and beyond," in Proc. of COLT, 2011.Google ScholarGoogle Scholar
  25. R. Combes and A. Proutiere, "Unimodal bandits: Regret lower bounds and optimal algorithms," in Proc. of ICML, http://arxiv.org/abs/1405.5096, 2014.Google ScholarGoogle Scholar

Index Terms

  1. Learning to Rank: Regret Lower Bounds and Efficient Algorithms

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
        June 2015
        488 pages
        ISBN:9781450334860
        DOI:10.1145/2745844

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 June 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMETRICS '15 Paper Acceptance Rate32of239submissions,13%Overall Acceptance Rate459of2,691submissions,17%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader