Abstract
Search relevance is a key measurement for the usefulness of search engines. Shift of search relevance among search engines can easily change a search company's market cap by tens of billions of dollars. With the ever-increasing scale of the Web, machine learning technologies have become important tools to improve search relevance ranking. RankBoost is a promising algorithm in this area, but it is not widely used due to its long training time. To reduce the computation time for RankBoost, we designed a FPGA-based accelerator system and its upgraded version. The accelerator, plugged into a commodity PC, increased the training speed on MSN search engine data up to 1800x compared to the original software implementation on a server. The proposed accelerator has been successfully used by researchers in the search relevance ranking.
- Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison Wesley. Google ScholarDigital Library
- Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. Comput. Netwo. ISDN Syst. 30, 1-7, 107--117. Google ScholarDigital Library
- Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning. ACM, New York, 88--96. Google ScholarDigital Library
- El-Ghazawi, T., Bennett, D., Poznanovic, D., Cantle, A., Underwood, K., Pennington, R., Buell, D., George, A., and Kindratenko, V. 2006. Is high-performance reconfigurable computing the next supercomputing paradigm? In Proceedings of the ACM/IEEE Conference on Supercomputing. ACM, New York, 219--228. Google ScholarDigital Library
- Fan, W., Gordon, M. D., Pathak, P., Xi, W., and Fox, E. A. 2004. Ranking function optimization for effective Web search by genetic programming: An empirical study. In Proceedings of the 37th Hawaii International Conference on System Sciences. 8--16. Google ScholarDigital Library
- Freund, Y., Iyer, R., Schapire, R., and Singer, Y. 2003. An efficient boosting algorithm for combining preferences. Mach. Learn. 4, 933--969. Google ScholarDigital Library
- Freund, Y. and Schapire, R. E. 1995. A decision-theoretic generalization of on-line learning and an application to boosting. In Computational Learning, Lecture Notes in Computer Science, vol. 904. Springer, Berlin, 23--37. Google ScholarDigital Library
- Fuhr, N. 1989. Optimum polynomial retrieval functions based on the probability ranking principle. ACM Trans. Inform. Syst. 7, 3, 183--204. Google ScholarDigital Library
- Iyer, R. D., Lewis, D. D., Schapire, R. E., Singer, Y., and Singhal, A. 2000. Boosting for document routing. In Proceedings of the 9th International Conference on Information and Knowledge Management. ACM, New York, 70--77. Google ScholarDigital Library
- Jarvelin, K. and Kekalainen, J. 2002. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inform. Syst. 20, 4, 422--446. Google ScholarDigital Library
- Joachims, T. 2002. Optimizing search engines using click through data. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 133--142. Google ScholarDigital Library
- Laptev, I. 2006. Improvements of object detection using boosted histograms. In Proceedings of British Machine Vision Conference.Google ScholarCross Ref
- Liu, X., Zhang, L., Li, M., Zhang, H., and Wang, D. 2004. Boosting image classification with lda-based feature combination for digital photograph management. Patt. Recogn. Special Issue on Image Understanding for Digital Photos. Google ScholarDigital Library
- Nallapati, R. 2004. Discriminative models for information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 1401--1406. Google ScholarDigital Library
- Qin, T., Liu, T.-Y., Tsai, M.-F., Zhang, X.-D., and Li, H. 2006. Learning to search Web pages with query-level loss functions. In Microsoft Research Tech. rep.Google Scholar
- Schapire, R. E. 1999. A brief introduction to boosting. In Proceedings of International Joint Conference on Artificial Intelligence. 1401--1406. Google ScholarDigital Library
- Schapire, R. E. 2001. The boosting approach to machine learning: An overview. In Proceedings of the MSRI Workshop on Nonlinear Estimation and Classification.Google Scholar
- Tsai, M.-F., Liu, T.-Y., Qin, T., Chen, H.-H., and Ma, W.-Y. 2007. Frank: a ranking method with fidelity loss. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 383--390. Google ScholarDigital Library
- Underwood, K. and Hemmert, K. 2004. Closing the gap: Cpu and fpga trends in sustainable floating-point blas performance. In Proceedings of 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'04). 219--228. Google ScholarDigital Library
- Viola, P. and Jones, M. 2001. Robust real-time object detection. Tech. rep. in Compaq Cambridge Research Lab.Google Scholar
- Xu, N.-Y., Cai, X.-F., Gao, R., Zhang, L., and Hsu, F.-H. 2007. Fpga-based accelerator design for rankboost in Web search engines. In Proceedings of the IEEE International Conference on Field-Programmable Technology (FPT'07). 33--40.Google Scholar
Index Terms
- FPGA Acceleration of RankBoost in Web Search Engines
Recommendations
Overlap Among Major Web Search Engines
ITNG '06: Proceedings of the Third International Conference on Information Technology: New GenerationsOur study examined the overlap among results retrieved by three major Web search engines for a large set of more than 10,316 queries. Previous smaller studies have discussed the lack of overlap in results returned by Web search engines for the same ...
A study of results overlap and uniqueness among major web search engines
The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results ...
Distributed RankBoost Acceleration Using FPGA and MPI for Web Relevance Ranking
ICPADS '08: Proceedings of the 2008 14th IEEE International Conference on Parallel and Distributed SystemsWeb search engine ranks web pages according to their relevance to user queries, which is critical for the success of commercial search engines. Rank Boost algorithm is promising in Web relevance ranking area, while its computation complexity makes our ...
Comments