A Fast Matching Method Based on Semantic Similarity for Short Texts

Jiaming Xu; Pengcheng Liu; Gaowei Wu; Zhengya Sun; Bo Xu; Hongwei Hao

doi:10.1007/978-3-642-41644-6_28

As the emergence of various social media, short texts, such as weibos and instant messages, are very prevalent on today's websites. In order to mine semantically similar information from massive data, a fast and efficient matching method for short texts has become an urgent task. However, the conventional matching methods suffer from the data sparsity in short documents. In this paper, we propose a novel matching method, referred as semantically similar hashing (SSHash). The basic idea of

more »

... is to directly train a topic model from corpus rather than documents, then project texts into hash codes by using latent features. The major advantages of SSHash are that 1) SSHash alleviates the sparse problem in short texts, because we obtain the latent features from whole corpus regardless of document level; and 2) SSHash can accomplish similar matching in an interactive real time by introducing hash method. We carry out extensive experiments on real-world short texts. The results demonstrate that our method significantly outperforms baseline methods on several evaluation metrics.

doi:10.1007/978-3-642-41644-6_28 fatcat:wmppbbuccze6heprhgjmye4vqy

A Fast Matching Method Based on Semantic Similarity for Short Texts [chapter]

Preserved Fulltext