Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








241,517 Hits in 5.3 sec

An Effective Solution to Adequate and Operative Duplicate Detection in Stratified Data

A. Baladhandayutham, S. Roselin Mary
2014 IOSR Journal of Computer Engineering  
A novel method for XML duplicate detection, called XMLDup.  ...  Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data.  ...  To overcome this problem, we implement the proposed Priority algorithm for detect duplicate data's in large XML data.  ... 
doi:10.9790/0661-1628101105 fatcat:wzub2vvw6vaz7d5xeslctainj4

Random Forests Algorithm Based Duplicate Detection in On-Site Programming Big Data Environment

Qianqian Li, Meng Li, Lei Guo, Zhen Zhang
2020 Journal of Information Hiding and Privacy Protection  
Therefore, data cleaning is essential for on-site programming big data. Duplicate data detection is an important step in data cleaning, which can save storage resources and enhance data consistency.  ...  Due to the insufficiency in traditional Sorted Neighborhood Method (SNM) and the difficulty of high-dimensional data detection, an optimized algorithm based on random forests with the dynamic and adaptive  ...  Experiments for duplicate data detection have proved the effectiveness and advantages of optimized algorithm proposed.  ... 
doi:10.32604/jihpp.2020.016299 fatcat:6orhh5esefd5beb6255uxihy5y

CLUSTER BASED DUPLICATE DETECTION

Kumar
2013 Journal of Computer Science  
The new method offers more accuracy dis-similarity measure for each cluster data without manual intervention at the time of duplicate deduction.  ...  the volume of data for text comparisons.  ...  Limitation At present algorithm will not support for detect the duplicate of two diffent image and two different video file.  ... 
doi:10.3844/jcssp.2013.1514.1518 fatcat:ehls5zls35hp3kikxgknbvigya

Database Record Duplicate Detection System using Simil Algorithm

Jumoke Soyemi, James Adegboye
2018 International Journal on Computer Science and Engineering  
The similarity metrics that are commonly used to detect similar field entries are covered with some algorithm used for duplicate detection to find approximately duplicates records in a database.  ...  Linking data to detect duplicates is good in improving the quality and integrity of data which allow re-uses of existing data sources for future research work [1].  ...  Duplicate Detection Algorithm There are several numbers of duplicate detection algorithms but this study discusses the few of them that are effective and commonly.  Jaccard Similarity Algorithm.  ... 
doi:10.21817/ijcse/2018/v10i2/181002013 fatcat:pbaohn4ywrhp7ovitwcarbsany

A system proposal for automated data cleaning environment

Carlos Roberto Valêncio, Toni Jardini, Victor Hugo Penhalves Martins, Angelo Cesar Colombini, Márcio Zamboti Fortes
2020 ITEGAM- Journal of Engineering and Technology for Industrial Applications (ITEGAM-JETIA)  
Against this backdrop, we developed an automated configurable data cleaning environment based on training and physical-semantic data similarity, aiming to provide a more efficient and extensible tool for  ...  Approaches were also demonstrated to show that besides detecting and treating information inconsistencies and duplication of positive cases, they also addressed cases of detected false-positives and the  ...  The first module provides several algorithms for detecting duplicate and inconsistent data.  ... 
doi:10.5935/jetia.v6i25.685 fatcat:4hxsg3z2ijduvge6u5qqumu35e

A STUDY AND SURVEY ON VARIOUS PROGRESSIVE DUPLICATE DETECTION MECHANISMS

Ashwini.V. Lakote .
2016 International Journal of Research in Engineering and Technology  
Several different methods of data analysis are studied here with various approaches for duplicate detection.  ...  The efficiency can be doubled over the conventional duplicate detection method using this algorithm.  ...  The techniques for duplicate record detection are very essential to improve the extracted data quality. U.  ... 
doi:10.15623/ijret.2016.0503082 fatcat:tbmxipuqfzcvxo4d4y7r6p7vrm

Deriving the Probability with Machine Learning and Efficient Duplicate Detection in Hierarchical Objects
English

D Nithya,, K Karthickeyan
2014 International Journal of Computer Trends and Technology  
Duplicate detection is the major important task in the data mining, in order to find duplicate in the original data as well as data object.  ...  In this method the number of XML Data is considered as input and the predicts the conditional probability value for each data in the hierarchical structure.  ...  The subsequent measurement discerns among three methods used to execute duplicate detection: machine learning and similarity measures are performed to learn duplicate data objects, clustering algorithms  ... 
doi:10.14445/22312803/ijctt-v7p105 fatcat:ixqtxbybyfbzln2tilqp4zsivq

EDDDS: An Efficient Duplicate Data Detection System

Bhavana Dhake, Dr.Lomte S.S., Prof.Nagargoje Y.R., Prof..Auti R.A, Prof.Patil B.K.
2015 IJARCCE  
There are lots of works already presented in the past for finding the duplicates in the relational data. But nowadays there is more focus on finding duplicates in the XML data.  ...  Because of XML is very popular for data storing and extensively used for data exchange between the organizations.  ...  We would also like to thank our department for giving us the resources and the freedom to pursue this project.  ... 
doi:10.17148/ijarcce.2015.44142 fatcat:hre3v4cponbulihwnzyhz6ytta

Database Repeat Record Detection based on Improved Quantum Particle Swarm Optimization Algorithm

Guangzhou Yu
2019 International Journal of Performability Engineering  
The detection of similar duplicate records was a key link in database data cleaning.  ...  The proposed method also solved the problem of database similar duplicate record detection effectively.  ...  For example, Song et al. proposed a big data similar duplicate record detection algorithm based on the MapReduce model.  ... 
doi:10.23940/ijpe.19.02.p35.710718 fatcat:6iuota7s6bgt5mioemm3md7l2m

Scaling up duplicate detection in graph data

Melanie Herschel, Felix Naumann
2008 Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08  
We scale up duplicate detection in graph data (DDG) to large amounts of data using the support of a relational database system.  ...  Scalability has been neglected so far, even though it is crucial for large real-world duplicate detection tasks.  ...  We observe that for duplicate detection in graph data, no methods for scalable iterative duplicate detection have been proposed, a shortcoming we address in this paper.  ... 
doi:10.1145/1458082.1458259 dblp:conf/cikm/HerschelN08 fatcat:7jlts2o6jzc6vgelesgsen4tom

Ef?cient Duplicate Detection and Elimination in Hierarchical Multimedia Data

Manjusha R.Pawar, J. V. Shinde
2015 International Journal of Computer Applications  
Also due to differences between various data models, the algorithms which are for single relations cannot be applied on XML data.  ...  Here Bayesian network is used with modified pruning algorithm for duplicate detection, and experiments are performed on both artificial and real world datasets.  ...  Algorithm for Proposed Pruning Method Algorithm: XMLMulDup(N) Input: The node or subtree N for which algorithm will detect duplicates.  ... 
doi:10.5120/21751-5018 fatcat:r6a7x6xzofcqnc6enuoma7npsa

RESim - Automated Detection of Duplicated Requirements in Software Engineering Projects

Quim Motger, Cristina Palomares, Jordi Marco
2020 Requirements Engineering: Foundation for Software Quality  
framework of different similarity detection algorithms for researchers.  ...  Among the main problems of requirements engineering, the detection and management of duplicated requirements is highlighted.  ...  Acknowledgments The work presented in this paper has been supported by the GENESIS project under the National Spanish Program for Research Aimed at the Challenges of Society (RETOS) 2016, contract TIN2016  ... 
dblp:conf/refsq/MotgerPM20 fatcat:vzozvhjd6ncsjb7fvyjlxcfk5i

PSO Algorithm to Select Subsets of Q-Gram Features for Record Duplicate Detection

M. Padmanaban, R. Radha
2013 International Journal of Computer Applications  
The accuracy obtained for the proposed Duplicate Record Detection is found to be 89%.  ...  In the previous work duplicate record detection was done using Q-gram concept and the fuzzy classifier.  ...  The data values from distance calculation can be used in Feature selection using PSO algorithm and the fitness function to commutate should be the precise and accurate value for detecting the duplicate  ... 
doi:10.5120/14166-9829 fatcat:n3omlmrhl5bi5kzz73ydocuo7m

A Near-Duplicate Detection Algorithm to Facilitate Document Clustering

Lavanya Pamulaparty, Guru Rao C.V, Sreenivasa Rao M
2014 International Journal of Data Mining & Knowledge Management Process  
The experimental results show that our algorithm outperforms in terms of similarity measures. The near duplicate and duplicate document identification has resulted reduced memory in repositories.  ...  Web Ming faces huge problems due to Duplicate and Near Duplicate Web pages. Detecting Near Duplicates is very difficult in large collection of data like "internet".  ...  The future work will be research for more robust and accurate methods for near duplicate detection and elimination on basis of the detection.  ... 
doi:10.5121/ijdkp.2014.4604 fatcat:66jmy6xqhrbqdistbyawfrfo6u

Using Fuzzy Logic Technique to Eliminate the Duplicates in Large Database

Mortadha M. Hamad, Alaa Abdulqahar Jihad
2015 Journal of University of Human Development  
Duplicate records are broad problem in many of the databases. There are wide efforts focusing on elimination of duplicate in data sets, because is it important part of data cleaning.  ...  This paper focuses on discovery and removing duplication by using fuzzy logic technique.  ...  Atiqur Rahaman presented A Domain-Independent Data Cleaning Algorithm for Detecting Similar-Duplicates.  ... 
doi:10.21928/juhd.v1n4y2015.pp423-426 fatcat:vr5d7mauxbhynilyk2xzkqbnhy
« Previous Showing results 1 — 15 out of 241,517 results