A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data
2014
IOSR Journal of Computer Engineering
A novel method for XML duplicate detection, called XMLDup. ...
Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. ...
To overcome this problem, we implement the proposed Priority algorithm for detect duplicate data's in large XML data. ...
doi:10.9790/0661-1628101105
fatcat:wzub2vvw6vaz7d5xeslctainj4
Random Forests Algorithm Based Duplicate Detection in On-Site Programming Big Data Environment
2020
Journal of Information Hiding and Privacy Protection
Therefore, data cleaning is essential for on-site programming big data. Duplicate data detection is an important step in data cleaning, which can save storage resources and enhance data consistency. ...
Due to the insufficiency in traditional Sorted Neighborhood Method (SNM) and the difficulty of high-dimensional data detection, an optimized algorithm based on random forests with the dynamic and adaptive ...
Experiments for duplicate data detection have proved the effectiveness and advantages of optimized algorithm proposed. ...
doi:10.32604/jihpp.2020.016299
fatcat:6orhh5esefd5beb6255uxihy5y
CLUSTER BASED DUPLICATE DETECTION
2013
Journal of Computer Science
The new method offers more accuracy dis-similarity measure for each cluster data without manual intervention at the time of duplicate deduction. ...
the volume of data for text comparisons. ...
Limitation At present algorithm will not support for detect the duplicate of two diffent image and two different video file. ...
doi:10.3844/jcssp.2013.1514.1518
fatcat:ehls5zls35hp3kikxgknbvigya
Database Record Duplicate Detection System using Simil Algorithm
2018
International Journal on Computer Science and Engineering
The similarity metrics that are commonly used to detect similar field entries are covered with some algorithm used for duplicate detection to find approximately duplicates records in a database. ...
Linking data to detect duplicates is good in improving the quality and integrity of data which allow re-uses of existing data sources for future research work [1]. ...
Duplicate Detection Algorithm There are several numbers of duplicate detection algorithms but this study discusses the few of them that are effective and commonly. Jaccard Similarity Algorithm. ...
doi:10.21817/ijcse/2018/v10i2/181002013
fatcat:pbaohn4ywrhp7ovitwcarbsany
A system proposal for automated data cleaning environment
2020
ITEGAM- Journal of Engineering and Technology for Industrial Applications (ITEGAM-JETIA)
Against this backdrop, we developed an automated configurable data cleaning environment based on training and physical-semantic data similarity, aiming to provide a more efficient and extensible tool for ...
Approaches were also demonstrated to show that besides detecting and treating information inconsistencies and duplication of positive cases, they also addressed cases of detected false-positives and the ...
The first module provides several algorithms for detecting duplicate and inconsistent data. ...
doi:10.5935/jetia.v6i25.685
fatcat:4hxsg3z2ijduvge6u5qqumu35e
A STUDY AND SURVEY ON VARIOUS PROGRESSIVE DUPLICATE DETECTION MECHANISMS
2016
International Journal of Research in Engineering and Technology
Several different methods of data analysis are studied here with various approaches for duplicate detection. ...
The efficiency can be doubled over the conventional duplicate detection method using this algorithm. ...
The techniques for duplicate record
detection are very essential to improve the extracted data
quality.
U. ...
doi:10.15623/ijret.2016.0503082
fatcat:tbmxipuqfzcvxo4d4y7r6p7vrm
Deriving the Probability with Machine Learning and Efficient Duplicate Detection in Hierarchical Objects
English
2014
International Journal of Computer Trends and Technology
English
Duplicate detection is the major important task in the data mining, in order to find duplicate in the original data as well as data object. ...
In this method the number of XML Data is considered as input and the predicts the conditional probability value for each data in the hierarchical structure. ...
The subsequent measurement discerns among three methods used to execute duplicate detection: machine learning and similarity measures are performed to learn duplicate data objects, clustering algorithms ...
doi:10.14445/22312803/ijctt-v7p105
fatcat:ixqtxbybyfbzln2tilqp4zsivq
EDDDS: An Efficient Duplicate Data Detection System
2015
IJARCCE
There are lots of works already presented in the past for finding the duplicates in the relational data. But nowadays there is more focus on finding duplicates in the XML data. ...
Because of XML is very popular for data storing and extensively used for data exchange between the organizations. ...
We would also like to thank our department for giving us the resources and the freedom to pursue this project. ...
doi:10.17148/ijarcce.2015.44142
fatcat:hre3v4cponbulihwnzyhz6ytta
Database Repeat Record Detection based on Improved Quantum Particle Swarm Optimization Algorithm
2019
International Journal of Performability Engineering
The detection of similar duplicate records was a key link in database data cleaning. ...
The proposed method also solved the problem of database similar duplicate record detection effectively. ...
For example, Song et al. proposed a big data similar duplicate record detection algorithm based on the MapReduce model. ...
doi:10.23940/ijpe.19.02.p35.710718
fatcat:6iuota7s6bgt5mioemm3md7l2m
Scaling up duplicate detection in graph data
2008
Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08
We scale up duplicate detection in graph data (DDG) to large amounts of data using the support of a relational database system. ...
Scalability has been neglected so far, even though it is crucial for large real-world duplicate detection tasks. ...
We observe that for duplicate detection in graph data, no methods for scalable iterative duplicate detection have been proposed, a shortcoming we address in this paper. ...
doi:10.1145/1458082.1458259
dblp:conf/cikm/HerschelN08
fatcat:7jlts2o6jzc6vgelesgsen4tom
Ef?cient Duplicate Detection and Elimination in Hierarchical Multimedia Data
2015
International Journal of Computer Applications
Also due to differences between various data models, the algorithms which are for single relations cannot be applied on XML data. ...
Here Bayesian network is used with modified pruning algorithm for duplicate detection, and experiments are performed on both artificial and real world datasets. ...
Algorithm for Proposed Pruning Method
Algorithm: XMLMulDup(N) Input: The node or subtree N for which algorithm will detect duplicates. ...
doi:10.5120/21751-5018
fatcat:r6a7x6xzofcqnc6enuoma7npsa
RESim - Automated Detection of Duplicated Requirements in Software Engineering Projects
2020
Requirements Engineering: Foundation for Software Quality
framework of different similarity detection algorithms for researchers. ...
Among the main problems of requirements engineering, the detection and management of duplicated requirements is highlighted. ...
Acknowledgments The work presented in this paper has been supported by the GENESIS project under the National Spanish Program for Research Aimed at the Challenges of Society (RETOS) 2016, contract TIN2016 ...
dblp:conf/refsq/MotgerPM20
fatcat:vzozvhjd6ncsjb7fvyjlxcfk5i
PSO Algorithm to Select Subsets of Q-Gram Features for Record Duplicate Detection
2013
International Journal of Computer Applications
The accuracy obtained for the proposed Duplicate Record Detection is found to be 89%. ...
In the previous work duplicate record detection was done using Q-gram concept and the fuzzy classifier. ...
The data values from distance calculation can be used in Feature selection using PSO algorithm and the fitness function to commutate should be the precise and accurate value for detecting the duplicate ...
doi:10.5120/14166-9829
fatcat:n3omlmrhl5bi5kzz73ydocuo7m
A Near-Duplicate Detection Algorithm to Facilitate Document Clustering
2014
International Journal of Data Mining & Knowledge Management Process
The experimental results show that our algorithm outperforms in terms of similarity measures. The near duplicate and duplicate document identification has resulted reduced memory in repositories. ...
Web Ming faces huge problems due to Duplicate and Near Duplicate Web pages. Detecting Near Duplicates is very difficult in large collection of data like "internet". ...
The future work will be research for more robust and accurate methods for near duplicate detection and elimination on basis of the detection. ...
doi:10.5121/ijdkp.2014.4604
fatcat:66jmy6xqhrbqdistbyawfrfo6u
Using Fuzzy Logic Technique to Eliminate the Duplicates in Large Database
2015
Journal of University of Human Development
Duplicate records are broad problem in many of the databases. There are wide efforts focusing on elimination of duplicate in data sets, because is it important part of data cleaning. ...
This paper focuses on discovery and removing duplication by using fuzzy logic technique. ...
Atiqur Rahaman presented A Domain-Independent Data Cleaning Algorithm for Detecting Similar-Duplicates. ...
doi:10.21928/juhd.v1n4y2015.pp423-426
fatcat:vr5d7mauxbhynilyk2xzkqbnhy
« Previous
Showing results 1 — 15 out of 241,517 results