Abstract
The nearest neighbor (NN) classifier represents one of the most popular non-parametric classification approaches and has been successfully applied in several pattern recognition problems. The two main limitations of this technique are its computational complexity and its sensitivity to the presence of outliers in the training set. Though the first problem has been partially overcome thanks to the availability of inexpensive memory and high processing speeds, the second one still persists, and several editing and condensing techniques have been proposed, aimed at selecting a proper set of prototypes from the training set. In this work, an editing technique is proposed, based on the idea of rewarding the patterns that contribute to a correct classification and punishing those that provide a wrong one. The analysis is carried out both at local and at global level, by analyzing the training set at different scales. A score is calculated for each pattern, and the patterns whose score is lower than a predefined threshold are edited out. An extensive experimentation has been conducted on several classification problems both to evaluate the efficacy of the proposed technique with respect to other editing approaches and to investigate the advantage of using reward–punishment editing in combination with condensing techniques or as a pre-processing stage when classifiers different from the NN are adopted.
Similar content being viewed by others
Notes
The dataset is available at http://bias.csr.unibo.it/datasets/RP-Editing_2Ddataset.rar.
References
Barandela R, Gasca E (2000) Decontamination of training samples for supervised pattern recognition methods. In: Proceedings of joint IAPR international workshops SSPR and SPR 2000, pp 621–630
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Blake CL, Merz CJ (1998) UCI Repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html
Cháves E, Figueroa K, Navarro G (2001) A fast algorithm for the all k nearest neighbors problem in general metric spaces. Escuela da Ciencias Fisicas y Matematicas, Universidad Michacana, Morelia
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27
Dasarathy BV, Sanchez JS, Townsend S (2000) Nearest neighbour editing and condensing tools—synergy exploitation. Pattern Anal Appl 3:19–30
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York
Eick CF, Zeidat N, Vilalta R (2004) Using representative-based clustering for nearest neighbor dataset editing. In: Proceedings of IEEE international conference on data mining, pp 375–378
Franco A, Maltoni D, Nanni L (2004) Reward-punishment editing. In: Proceedings of 17th international conference on pattern recognition, vol 4, pp 424–427
Gaede V, Gunther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231
García V, Mollineda RA, Sánchez JS (2010) On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal Appl (in press)
Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Kohonen T (2001) Self-organizing maps, 3rd edn. Springer, Berlin
Koplowitz J, Brown TA (1981) On the relation of performance to editing in nearest neighbor rules. Pattern Recogn 13:251–255
Kuncheva LI (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recogn Lett 16:809–814
Li Y, Huang J, Zhang W, Zhang X (2005) New prototype selection rule integrated condensing with editing process for the nearest neighbor rules. In: Proceedings of IEEE international conference on industrial technology, pp 950–954
Mollineda RA, Ferri FJ, Vidal E (2002) An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering. Pattern Recogn 35:2771–2782
Paredes R, Vidal E (2006) Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recogn 39:180–188
Paredes R, Vidal E (2000) Weighting prototypes, a new editing approach. In: Proceedings of international conference on pattern recognition, vol II, pp 25–28
Pedreira C (2006) Learning vector quantization with training data selection. IEEE Trans Pattern Anal Mach Intell 18(1):157–162
Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 26(9):2230–2236
Riquelme JC, Aguilar-Ruiz JS, Toro M (2003) Finding representative patterns with ordered projections. Pattern Recogn 36:1009–1018
Rögnvaldsson T, You L (2004) Why neural networks should not be used for HIV-1 protease cleavage site prediction. Bioinformatics 20(11):1702–1709
Sánchez JS, Pla F, Ferri FJ (1998) On the use of neighborhood-based non-parametric classifiers. Pattern Recogn Lett 18(11–13):1179–1186
Sanchez JS, Barandela R, Marquez AI, Alejo R, Badenas J (2003) Analysis of new techniques to obtain quality training sets. Pattern Recogn Lett 24:1015–1022
Sanchez JS, Mollineda RA, Sotoca JM (2007) An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl 10:189–201
Shakhnarovich G, Darrell T, Indyk P (2006) Nearest-neighbor methods in learning and vision: theory and practice. MIT Press, Cambridge
Tomek I (1976) An experiment with the edited nearest neighbor. IEEE Trans Syst Man Cybern 6(2):121–126
Vapnik V (1998) Statistical learning theory. Wiley, New York
Vazquez F, Sanchez JS, Pla F (2005) A stochastic approach to Wilson’s editing algorithm. In: Proceedings of Iberian conference on pattern recognition and image analysis, pp 35–42
Watson CI, Wilson CL (1992) NIST Special Database 4, Fingerprint database. U.S. National Institute of Standards and Technology
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2:408–421
Yin XC, Liu CP, Han Z (2005) Feature combination using boosting. Pattern Recogn Lett 26:2195–2205
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Franco, A., Maltoni, D. & Nanni, L. Data pre-processing through reward–punishment editing. Pattern Anal Applic 13, 367–381 (2010). https://doi.org/10.1007/s10044-010-0182-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-010-0182-x