Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Data pre-processing through reward–punishment editing

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The nearest neighbor (NN) classifier represents one of the most popular non-parametric classification approaches and has been successfully applied in several pattern recognition problems. The two main limitations of this technique are its computational complexity and its sensitivity to the presence of outliers in the training set. Though the first problem has been partially overcome thanks to the availability of inexpensive memory and high processing speeds, the second one still persists, and several editing and condensing techniques have been proposed, aimed at selecting a proper set of prototypes from the training set. In this work, an editing technique is proposed, based on the idea of rewarding the patterns that contribute to a correct classification and punishing those that provide a wrong one. The analysis is carried out both at local and at global level, by analyzing the training set at different scales. A score is calculated for each pattern, and the patterns whose score is lower than a predefined threshold are edited out. An extensive experimentation has been conducted on several classification problems both to evaluate the efficacy of the proposed technique with respect to other editing approaches and to investigate the advantage of using reward–punishment editing in combination with condensing techniques or as a pre-processing stage when classifiers different from the NN are adopted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. The dataset is available at http://bias.csr.unibo.it/datasets/RP-Editing_2Ddataset.rar.

References

  1. Barandela R, Gasca E (2000) Decontamination of training samples for supervised pattern recognition methods. In: Proceedings of joint IAPR international workshops SSPR and SPR 2000, pp 621–630

  2. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York

    MATH  Google Scholar 

  3. Blake CL, Merz CJ (1998) UCI Repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html

  4. Cháves E, Figueroa K, Navarro G (2001) A fast algorithm for the all k nearest neighbors problem in general metric spaces. Escuela da Ciencias Fisicas y Matematicas, Universidad Michacana, Morelia

    Google Scholar 

  5. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27

    Article  MATH  Google Scholar 

  6. Dasarathy BV, Sanchez JS, Townsend S (2000) Nearest neighbour editing and condensing tools—synergy exploitation. Pattern Anal Appl 3:19–30

    Article  Google Scholar 

  7. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  Google Scholar 

  8. Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York

  9. Eick CF, Zeidat N, Vilalta R (2004) Using representative-based clustering for nearest neighbor dataset editing. In: Proceedings of IEEE international conference on data mining, pp 375–378

  10. Franco A, Maltoni D, Nanni L (2004) Reward-punishment editing. In: Proceedings of 17th international conference on pattern recognition, vol 4, pp 424–427

  11. Gaede V, Gunther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231

    Article  Google Scholar 

  12. García V, Mollineda RA, Sánchez JS (2010) On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal Appl (in press)

  13. Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516

    Article  Google Scholar 

  14. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Article  Google Scholar 

  15. Kohonen T (2001) Self-organizing maps, 3rd edn. Springer, Berlin

  16. Koplowitz J, Brown TA (1981) On the relation of performance to editing in nearest neighbor rules. Pattern Recogn 13:251–255

    Article  Google Scholar 

  17. Kuncheva LI (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recogn Lett 16:809–814

    Article  Google Scholar 

  18. Li Y, Huang J, Zhang W, Zhang X (2005) New prototype selection rule integrated condensing with editing process for the nearest neighbor rules. In: Proceedings of IEEE international conference on industrial technology, pp 950–954

  19. Mollineda RA, Ferri FJ, Vidal E (2002) An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering. Pattern Recogn 35:2771–2782

    Article  MATH  Google Scholar 

  20. Paredes R, Vidal E (2006) Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recogn 39:180–188

    Article  MATH  Google Scholar 

  21. Paredes R, Vidal E (2000) Weighting prototypes, a new editing approach. In: Proceedings of international conference on pattern recognition, vol II, pp 25–28

  22. Pedreira C (2006) Learning vector quantization with training data selection. IEEE Trans Pattern Anal Mach Intell 18(1):157–162

    Article  MathSciNet  Google Scholar 

  23. Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 26(9):2230–2236

    Article  Google Scholar 

  24. Riquelme JC, Aguilar-Ruiz JS, Toro M (2003) Finding representative patterns with ordered projections. Pattern Recogn 36:1009–1018

    Article  Google Scholar 

  25. Rögnvaldsson T, You L (2004) Why neural networks should not be used for HIV-1 protease cleavage site prediction. Bioinformatics 20(11):1702–1709

    Article  Google Scholar 

  26. Sánchez JS, Pla F, Ferri FJ (1998) On the use of neighborhood-based non-parametric classifiers. Pattern Recogn Lett 18(11–13):1179–1186

    Google Scholar 

  27. Sanchez JS, Barandela R, Marquez AI, Alejo R, Badenas J (2003) Analysis of new techniques to obtain quality training sets. Pattern Recogn Lett 24:1015–1022

    Article  Google Scholar 

  28. Sanchez JS, Mollineda RA, Sotoca JM (2007) An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl 10:189–201

    Article  MathSciNet  Google Scholar 

  29. Shakhnarovich G, Darrell T, Indyk P (2006) Nearest-neighbor methods in learning and vision: theory and practice. MIT Press, Cambridge

  30. Tomek I (1976) An experiment with the edited nearest neighbor. IEEE Trans Syst Man Cybern 6(2):121–126

    MATH  MathSciNet  Google Scholar 

  31. Vapnik V (1998) Statistical learning theory. Wiley, New York

  32. Vazquez F, Sanchez JS, Pla F (2005) A stochastic approach to Wilson’s editing algorithm. In: Proceedings of Iberian conference on pattern recognition and image analysis, pp 35–42

  33. Watson CI, Wilson CL (1992) NIST Special Database 4, Fingerprint database. U.S. National Institute of Standards and Technology

  34. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2:408–421

    Google Scholar 

  35. Yin XC, Liu CP, Han Z (2005) Feature combination using boosting. Pattern Recogn Lett 26:2195–2205

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Annalisa Franco.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Franco, A., Maltoni, D. & Nanni, L. Data pre-processing through reward–punishment editing. Pattern Anal Applic 13, 367–381 (2010). https://doi.org/10.1007/s10044-010-0182-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-010-0182-x

Keywords