Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FPGA Acceleration of RankBoost in Web Search Engines

Published:01 January 2009Publication History
Skip Abstract Section

Abstract

Search relevance is a key measurement for the usefulness of search engines. Shift of search relevance among search engines can easily change a search company's market cap by tens of billions of dollars. With the ever-increasing scale of the Web, machine learning technologies have become important tools to improve search relevance ranking. RankBoost is a promising algorithm in this area, but it is not widely used due to its long training time. To reduce the computation time for RankBoost, we designed a FPGA-based accelerator system and its upgraded version. The accelerator, plugged into a commodity PC, increased the training speed on MSN search engine data up to 1800x compared to the original software implementation on a server. The proposed accelerator has been successfully used by researchers in the search relevance ranking.

References

  1. Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. Comput. Netwo. ISDN Syst. 30, 1-7, 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning. ACM, New York, 88--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. El-Ghazawi, T., Bennett, D., Poznanovic, D., Cantle, A., Underwood, K., Pennington, R., Buell, D., George, A., and Kindratenko, V. 2006. Is high-performance reconfigurable computing the next supercomputing paradigm? In Proceedings of the ACM/IEEE Conference on Supercomputing. ACM, New York, 219--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Fan, W., Gordon, M. D., Pathak, P., Xi, W., and Fox, E. A. 2004. Ranking function optimization for effective Web search by genetic programming: An empirical study. In Proceedings of the 37th Hawaii International Conference on System Sciences. 8--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Freund, Y., Iyer, R., Schapire, R., and Singer, Y. 2003. An efficient boosting algorithm for combining preferences. Mach. Learn. 4, 933--969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Freund, Y. and Schapire, R. E. 1995. A decision-theoretic generalization of on-line learning and an application to boosting. In Computational Learning, Lecture Notes in Computer Science, vol. 904. Springer, Berlin, 23--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fuhr, N. 1989. Optimum polynomial retrieval functions based on the probability ranking principle. ACM Trans. Inform. Syst. 7, 3, 183--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Iyer, R. D., Lewis, D. D., Schapire, R. E., Singer, Y., and Singhal, A. 2000. Boosting for document routing. In Proceedings of the 9th International Conference on Information and Knowledge Management. ACM, New York, 70--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jarvelin, K. and Kekalainen, J. 2002. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inform. Syst. 20, 4, 422--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Joachims, T. 2002. Optimizing search engines using click through data. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 133--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Laptev, I. 2006. Improvements of object detection using boosted histograms. In Proceedings of British Machine Vision Conference.Google ScholarGoogle ScholarCross RefCross Ref
  13. Liu, X., Zhang, L., Li, M., Zhang, H., and Wang, D. 2004. Boosting image classification with lda-based feature combination for digital photograph management. Patt. Recogn. Special Issue on Image Understanding for Digital Photos. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Nallapati, R. 2004. Discriminative models for information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 1401--1406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Qin, T., Liu, T.-Y., Tsai, M.-F., Zhang, X.-D., and Li, H. 2006. Learning to search Web pages with query-level loss functions. In Microsoft Research Tech. rep.Google ScholarGoogle Scholar
  16. Schapire, R. E. 1999. A brief introduction to boosting. In Proceedings of International Joint Conference on Artificial Intelligence. 1401--1406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Schapire, R. E. 2001. The boosting approach to machine learning: An overview. In Proceedings of the MSRI Workshop on Nonlinear Estimation and Classification.Google ScholarGoogle Scholar
  18. Tsai, M.-F., Liu, T.-Y., Qin, T., Chen, H.-H., and Ma, W.-Y. 2007. Frank: a ranking method with fidelity loss. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 383--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Underwood, K. and Hemmert, K. 2004. Closing the gap: Cpu and fpga trends in sustainable floating-point blas performance. In Proceedings of 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'04). 219--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Viola, P. and Jones, M. 2001. Robust real-time object detection. Tech. rep. in Compaq Cambridge Research Lab.Google ScholarGoogle Scholar
  21. Xu, N.-Y., Cai, X.-F., Gao, R., Zhang, L., and Hsu, F.-H. 2007. Fpga-based accelerator design for rankboost in Web search engines. In Proceedings of the IEEE International Conference on Field-Programmable Technology (FPT'07). 33--40.Google ScholarGoogle Scholar

Index Terms

  1. FPGA Acceleration of RankBoost in Web Search Engines

    Recommendations

    Reviews

    Javier Castillo

    Web search engines have changed our lives. The possibility of introducing a search term and having millions of results in less than a second puts an amazing amount of knowledge in our hands. One of the main problems in facing this incredible amount of data is classifying it according to a criterion. This criterion is usually called the relevance of the results, and its visible part is the ordered results returned by the Web search engine. Obviously, the algorithms used to classify all this information are very complex and require enormous computational power. All of these algorithms, including the MSN one presented in this work and the ultra-secret Google one, are machine learning algorithms that use a large-scale training set. As the authors point out, processing this data can take days, so they decided to create hardware accelerators to speed up the processing. The paper is divided into four sections. In Sections 1 and 2, the authors describe the RankBoost algorithm for Web search relevance, which will be accelerated later using custom hardware. The other sections give an overview of two accelerator boards and the hardware designed to run in the field-programmable gate array (FPGA). Sections 1 and 2 introduce and describe the RankBoost algorithm. The presentation is based on pseudocode and a set of references to other theoretical work. For those of us who are not mathematicians or experts in artificial intelligence (AI), it is quite hard to follow. The explanations tend to be detailed, but they lack a lot of information that is not at all trivial. Considering that the journal this paper is published in focuses on reconfigurable hardware, it would have been better to have a lighter introduction emphasizing the part to be implemented in the hardware, rather than a very detailed one that is incomplete and difficult to understand. As far as I am concerned, this is the only negative point of the paper. The next sections are clear. They present the implementation of the M3.int algorithm, which is one of the steps of the RankBoost Web search relevance algorithm's WeakLearn procedure. After a detailed explanation of the algorithm, the authors present pseudocode that adds and accumulates, in order to compute an integral of the histogram. Since this part of the RankBoost algorithm uses 99 percent of central processing unit (CPU) time, they implement it in the FPGA. Section 4 shows the internal architecture of the M3.int accelerator on the STAR-III board, developed by the authors. The M3.int accelerator is very simple; it is just a lot of floating point adders, working in parallel, that input data from a double data rate (DDR) memory through a peripheral component interconnect (PCI) local bus interface, and perform several additions to calculate the integral histogram. Obviously, the DDR size and speed and the PCI interface are bottlenecks of the system, so the authors develop a new board named FAR2. The FAR2 board is similar, but it includes 16 GB of DDR2 memory instead of STAR-III's 2 GB DDR, and it implements a PCIe interface up to 4 GB per second, to send and receive data from the host at high rates. The FPGA hardware is the same, except for one modification to the DDR memory interface to support DDR2 memory and add the PCIe controller. The extensive results section presents many experiments and a performance model to estimate the speed of the accelerator. The results show that the training speed of the MSN search engine is accelerated by two or three orders of magnitude, which is an excellent result. The STAR-III board obtains 170.6 times acceleration and the FAR2 board achieves an acceleration rate of 1,800 times. The main contribution of the paper is the application of the FPGA technology to a new field. The FPGA acceleration of Web services is a very interesting trend and should become the starting point of a new field of investigation for the community. From a technological point of view, the work is not very exciting. Since it is an example of excellent engineering work, the internal architecture of the system is very simple and does not present any advances in FPGA design. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 1, Issue 4
      January 2009
      161 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/1462586
      Issue’s Table of Contents

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 January 2009
      • Accepted: 1 October 2008
      • Revised: 1 August 2008
      • Received: 1 June 2008
      Published in trets Volume 1, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader