Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Web News Data Extraction Technology Based on Text Keywords release_ep5vufo5crfdhhtjtckmguvqwm

by Kun Zhang

Published in Complexity by Hindawi Limited.

2021   Volume 2021, p1-11

Abstract

In order to shorten the time for users to query news on the Internet, this paper studies and designs a network news data extraction technology, which can obtain the main news information through the extraction of news text keywords. Firstly, the TF-IDF keyword extraction algorithm, TextRank keyword extraction algorithm, and LDA keyword extraction algorithm are analyzed to understand the keyword extraction process, and the TF-IDF algorithm is optimized by Zipf's law. By introducing the idea of model fusion, five schemes based on waterfall fusion and parallel combination fusion are designed, and the effects of the five schemes are verified by experiments. It is found that the designed extraction technology has a good effect on network news data extraction. News keyword extraction has a great application prospect, which can provide the basis for the research fields of news key phrases, news abstracts, and so on.
In application/xml+jats format

Archived Files and Locations

application/pdf  2.7 MB
file_djdnkyzvh5fj7gi766wlmlb34a
downloads.hindawi.com (publisher)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2021-04-16
Language   en ?
Container Metadata
Open Access Publication
In DOAJ
In Keepers Registry
ISSN-L:  1076-2787
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: edf78315-3691-47b9-b44c-8a9a283e36d2
API URL: JSON