Data pre-processing through reward–punishment editing

Franco, Annalisa; Maltoni, Davide; Nanni, Loris

doi:10.1007/s10044-010-0182-x

Data pre-processing through reward–punishment editing

Theoretical Advances
Published: 03 September 2010

Volume 13, pages 367–381, (2010)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Annalisa Franco¹,
Davide Maltoni¹ &
Loris Nanni¹

201 Accesses
9 Citations
Explore all metrics

Abstract

The nearest neighbor (NN) classifier represents one of the most popular non-parametric classification approaches and has been successfully applied in several pattern recognition problems. The two main limitations of this technique are its computational complexity and its sensitivity to the presence of outliers in the training set. Though the first problem has been partially overcome thanks to the availability of inexpensive memory and high processing speeds, the second one still persists, and several editing and condensing techniques have been proposed, aimed at selecting a proper set of prototypes from the training set. In this work, an editing technique is proposed, based on the idea of rewarding the patterns that contribute to a correct classification and punishing those that provide a wrong one. The analysis is carried out both at local and at global level, by analyzing the training set at different scales. A score is calculated for each pattern, and the patterns whose score is lower than a predefined threshold are edited out. An extensive experimentation has been conducted on several classification problems both to evaluate the efficacy of the proposed technique with respect to other editing approaches and to investigate the advantage of using reward–punishment editing in combination with condensing techniques or as a pre-processing stage when classifiers different from the NN are adopted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

The dataset is available at http://bias.csr.unibo.it/datasets/RP-Editing_2Ddataset.rar.

References

Barandela R, Gasca E (2000) Decontamination of training samples for supervised pattern recognition methods. In: Proceedings of joint IAPR international workshops SSPR and SPR 2000, pp 621–630
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
MATH Google Scholar
Blake CL, Merz CJ (1998) UCI Repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html
Cháves E, Figueroa K, Navarro G (2001) A fast algorithm for the all k nearest neighbors problem in general metric spaces. Escuela da Ciencias Fisicas y Matematicas, Universidad Michacana, Morelia
Google Scholar
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27
Article MATH Google Scholar
Dasarathy BV, Sanchez JS, Townsend S (2000) Nearest neighbour editing and condensing tools—synergy exploitation. Pattern Anal Appl 3:19–30
Article Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet Google Scholar
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York
Eick CF, Zeidat N, Vilalta R (2004) Using representative-based clustering for nearest neighbor dataset editing. In: Proceedings of IEEE international conference on data mining, pp 375–378
Franco A, Maltoni D, Nanni L (2004) Reward-punishment editing. In: Proceedings of 17th international conference on pattern recognition, vol 4, pp 424–427
Gaede V, Gunther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231
Article Google Scholar
García V, Mollineda RA, Sánchez JS (2010) On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal Appl (in press)
Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516
Article Google Scholar
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Article Google Scholar
Kohonen T (2001) Self-organizing maps, 3rd edn. Springer, Berlin
Koplowitz J, Brown TA (1981) On the relation of performance to editing in nearest neighbor rules. Pattern Recogn 13:251–255
Article Google Scholar
Kuncheva LI (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recogn Lett 16:809–814
Article Google Scholar
Li Y, Huang J, Zhang W, Zhang X (2005) New prototype selection rule integrated condensing with editing process for the nearest neighbor rules. In: Proceedings of IEEE international conference on industrial technology, pp 950–954
Mollineda RA, Ferri FJ, Vidal E (2002) An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering. Pattern Recogn 35:2771–2782
Article MATH Google Scholar
Paredes R, Vidal E (2006) Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recogn 39:180–188
Article MATH Google Scholar
Paredes R, Vidal E (2000) Weighting prototypes, a new editing approach. In: Proceedings of international conference on pattern recognition, vol II, pp 25–28
Pedreira C (2006) Learning vector quantization with training data selection. IEEE Trans Pattern Anal Mach Intell 18(1):157–162
Article MathSciNet Google Scholar
Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 26(9):2230–2236
Article Google Scholar
Riquelme JC, Aguilar-Ruiz JS, Toro M (2003) Finding representative patterns with ordered projections. Pattern Recogn 36:1009–1018
Article Google Scholar
Rögnvaldsson T, You L (2004) Why neural networks should not be used for HIV-1 protease cleavage site prediction. Bioinformatics 20(11):1702–1709
Article Google Scholar
Sánchez JS, Pla F, Ferri FJ (1998) On the use of neighborhood-based non-parametric classifiers. Pattern Recogn Lett 18(11–13):1179–1186
Google Scholar
Sanchez JS, Barandela R, Marquez AI, Alejo R, Badenas J (2003) Analysis of new techniques to obtain quality training sets. Pattern Recogn Lett 24:1015–1022
Article Google Scholar
Sanchez JS, Mollineda RA, Sotoca JM (2007) An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl 10:189–201
Article MathSciNet Google Scholar
Shakhnarovich G, Darrell T, Indyk P (2006) Nearest-neighbor methods in learning and vision: theory and practice. MIT Press, Cambridge
Tomek I (1976) An experiment with the edited nearest neighbor. IEEE Trans Syst Man Cybern 6(2):121–126
MATH MathSciNet Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
Vazquez F, Sanchez JS, Pla F (2005) A stochastic approach to Wilson’s editing algorithm. In: Proceedings of Iberian conference on pattern recognition and image analysis, pp 35–42
Watson CI, Wilson CL (1992) NIST Special Database 4, Fingerprint database. U.S. National Institute of Standards and Technology
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2:408–421
Google Scholar
Yin XC, Liu CP, Han Z (2005) Feature combination using boosting. Pattern Recogn Lett 26:2195–2205
Article Google Scholar

Download references

Author information

Authors and Affiliations

DEIS, IEIIT, Università di Bologna, Viale Risorgimento 2, 40136, Bologna, Italy
Annalisa Franco, Davide Maltoni & Loris Nanni

Authors

Annalisa Franco
View author publications
You can also search for this author in PubMed Google Scholar
Davide Maltoni
View author publications
You can also search for this author in PubMed Google Scholar
Loris Nanni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Annalisa Franco.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Franco, A., Maltoni, D. & Nanni, L. Data pre-processing through reward–punishment editing. Pattern Anal Applic 13, 367–381 (2010). https://doi.org/10.1007/s10044-010-0182-x

Download citation

Received: 14 December 2006
Accepted: 17 August 2008
Published: 03 September 2010
Issue Date: November 2010
DOI: https://doi.org/10.1007/s10044-010-0182-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data pre-processing through reward–punishment editing

Abstract

Access this article

Similar content being viewed by others

Selecting promising classes from generated data for an efficient multi-class nearest neighbor classification

A Fast and Efficient K-Nearest Neighbor Classifier Using a Convex Envelope

Instance Selection for the Nearest Neighbor Classifier: Connecting the Performance to the Underlying Data Structure

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data pre-processing through reward–punishment editing

Abstract

Access this article

Similar content being viewed by others

Selecting promising classes from generated data for an efficient multi-class nearest neighbor classification

A Fast and Efficient K-Nearest Neighbor Classifier Using a Convex Envelope

Instance Selection for the Nearest Neighbor Classifier: Connecting the Performance to the Underlying Data Structure

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation