Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Considerate approaches to constructing summary statistics for ABC model selection

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

For nearly any challenging scientific problem evaluation of the likelihood is problematic if not impossible. Approximate Bayesian computation (ABC) allows us to employ the whole Bayesian formalism to problems where we can use simulations from a model, but cannot evaluate the likelihood directly. When summary statistics of real and simulated data are compared—rather than the data directly—information is lost, unless the summary statistics are sufficient. Sufficient statistics are, however, not common but without them statistical inference in ABC inferences are to be considered with caution. Previously other authors have attempted to combine different statistics in order to construct (approximately) sufficient statistics using search and information heuristics. Here we employ an information-theoretical framework that can be used to construct appropriate (approximately sufficient) statistics by combining different statistics until the loss of information is minimized. We start from a potentially large number of different statistics and choose the smallest set that captures (nearly) the same information as the complete set. We then demonstrate that such sets of statistics can be constructed for both parameter estimation and model selection problems, and we apply our approach to a range of illustrative and real-world model selection problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Barnes, C., Silk, D., Sheng, X., Stumpf, M.: Bayesian design of synthetic biological systems. Proc. Natl. Acad. Sci. USA 108, 15190–15195 (2011)

    Article  Google Scholar 

  • Beaumont, M., Zhang, W., Balding, D.: Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002)

    Google Scholar 

  • Blum, M.G.B., Tran, V.C.: HIV with contact tracing: a case study in approximate Bayesian computation. Biostatistics 11, 644–660 (2010). doi:10.1093/biostatistics/kxq022

    Article  Google Scholar 

  • Burnham, K., Anderson, D.: Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer, Berlin (2002)

    MATH  Google Scholar 

  • Cover, T., Thomas, J.: Elements of Information Theory. Wiley-Interscience, New York (2006)

    MATH  Google Scholar 

  • Cox, D.: Principles of Statistical Inference. Cambridge University Press, Cambridge (2006)

    Book  MATH  Google Scholar 

  • Cox, D., Hinkley, D.: Theoretical Statistics. Chapman & Hall/CRC, London (1974)

    MATH  Google Scholar 

  • Dean, T.A., Singh, S.S.: Asymptotic behaviour of approximate Bayesian estimators (2011). arXiv:1105.3655

  • Dean, T.A., Singh, S.S., Jasra, A., Peters, G.W.: Parameter estimation for hidden Markov models with intractable likelihoods (2011). arXiv:1103.5399

  • Didelot, X., Everitt, R., Johansen, A., Lawson, D.: Likelihood-free estimation of model evidence (2010). http://warwick.ac.uk

  • Drovandi, C.C., Pettitt, A.N., Faddy, M.J.: Approximate Bayesian computation using indirect inference. J. R. Stat. Soc., Ser. C, Appl. Stat. 60, 317–337 (2011). doi:10.1111/j.1467-9876.2010.00747.x

    Article  MathSciNet  Google Scholar 

  • Ewens, W.: Mathematical Population Genetics, 2nd edn. Springer, Berlin (2004)

    Book  MATH  Google Scholar 

  • Fagundes, N.J.R., Ray, N., Beaumont, M., Neuenschwander, S., et al.: Statistical evaluation of alternative models of human evolution. Proc. Natl. Acad. Sci. USA 104, 17614–17619 (2007). doi:10.1073/pnas.0708280104

    Article  Google Scholar 

  • Fearnhead, P., Prangle, D.: Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J. R. Stat. Soc., Ser. B (2011)

  • Gelman, A., Carlin, J.B., Stern, H., Rubin, D.: Bayesian Data Analysis, 2nd edn. Chapman & Hall/CRC, London (2003)

    Google Scholar 

  • Hein, J., Schierup, M., Wiuf, C.: Gene Genealogies, Variation and Evolution. Oxford University Press, London (2005)

    MATH  Google Scholar 

  • Hudson, R.R.: Gene genealogies and the coalescent process (1991)

  • Joyce, P., Marjoram, P.: Approximately sufficient statistics and Bayesian computation. Stat. Appl. Genet. Mol. Biol. (2008)

  • Kusama, T.: On approximate sufficiency. Osaka J. Math. 13, 661–669 (1976)

    MathSciNet  MATH  Google Scholar 

  • Lehmann, E., Casella, G.: Theory of Point Estimation. Springer, Berlin (1993)

    Google Scholar 

  • Liepe, J., Barnes, C., Cule, E., Erguler, K., et al.: ABC-SysBio—approximate Bayesian computation in Python with GPU support. Bioinformatics 26, 1797–1799 (2010)

    Article  Google Scholar 

  • Mackay, D.J.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

  • Marin, J., Pillai, N., Robert, C., Rousseau, J.: Relevant statistics for Bayesian model choice (2011). arXiv:1110.4700

  • May, R.M.: Uses and abuses of mathematics in biology. Science 303, 790–793 (2004). doi:10.1126/science.1094442

    Article  Google Scholar 

  • Mézard, M., Montanari, A.: Information, Physics and Computation. Oxford University Press, London (2009)

    Book  MATH  Google Scholar 

  • Nunes, M.A., Balding, D.J.: On optimal selection of summary statistics for approximate Bayesian computation. Stat. Appl. Genet. Mol. Biol. 9 (2010)

  • Pèrez-Cruz, F.: Kullback-Leibler divergence estimation of continuous distributions. In: IEEE Int. Sympo. Information Theory (2008)

    Google Scholar 

  • Pritchard, J.K., Seielstad, M.T., Perez-Lezaun, A., Feldman, M.W.: Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 1791–1798 (1999)

    Article  Google Scholar 

  • Ratmann, O., Jorgensen, O., Hinkley, T., Stumpf, M., Richardson, S., Wiuf, C.: Using likelihood-free inference to compare evolutionary dynamics of the protein networks of h. pylori and p. falciparum. PLoS Comput. Biol. 3, 2266–2278 (2007)

    Article  MathSciNet  Google Scholar 

  • Robert, C.: The Bayesian Choice. Springer, Berlin (2007)

    MATH  Google Scholar 

  • Robert, C.P., Cornuet, J.-M., Marin, J.-M., Pillai, N.: Lack of confidence in ABC model choice. Proc. Natl. Acad. Sci. USA 108, 15112–15117 (2011)

    Article  Google Scholar 

  • Rudnick, J., Gaspari, G.: Elements of the Random Walk. Cambridge University Press, Cambridge (2010)

    Google Scholar 

  • Secrier, M., Toni, T., Stumpf, M.P.H.: The ABC of reverse engineering biological signalling systems. Mol. BioSyst. 5, 1925–1935 (2009). doi:10.1039/b908951a

    Article  Google Scholar 

  • Shao, J.: Mathematical Statistics. Springer, Berlin (2003)

    Book  MATH  Google Scholar 

  • Tanaka, M.M., Francis, A.R., Luciani, F., Sisson, S.A.: Using approximate Bayesian computation to estimate tuberculosis transmission parameters from genotype data. Genetics 173, 1511–1520 (2006). doi:10.1534/genetics.106.055574

    Article  Google Scholar 

  • Thorne, T., Stumpf, M.P.H.: Graph spectral analysis of protein interaction network evolution. J. R. Soc. Interface (2012). doi:10.1098/rsif.2012.0220

    Google Scholar 

  • Toni, T., Stumpf, M.P.H.: Simulation-based model selection for dynamical systems in systems and population biology. Bioinformatics 26, 104–110 (2010)

    Article  Google Scholar 

  • Toni, T., Welch, D., Strelkowa, N., Ipsen, A., Stumpf, M.P.: Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 6, 187–202 (2009). doi:10.1098/rsif.2008.0172

    Article  Google Scholar 

  • Wang, Q., Kulkarni, S., Verdú, S.: A nearest-neighbor approach to estimating divergence between continuous random vectors. In: IEEE International Symposium on Information Theory (2006)

    Google Scholar 

  • Wilkinson, R.D.: Approximate Bayesian computation (ABC) gives exact results under the assumption of model error (2008). arXiv:0811.3355

  • Wilkinson, R.D., Steiper, M.E., Soligo, C., Martin, R.D., Yang, Z., Tavaré, S.: Dating primate divergences through an integrated analysis of palaeontological and molecular data. Syst. Biol. 60, 16–31 (2011). doi:10.1093/sysbio/syq054

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael P. H. Stumpf.

Additional information

All authors contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barnes, C.P., Filippi, S., Stumpf, M.P.H. et al. Considerate approaches to constructing summary statistics for ABC model selection. Stat Comput 22, 1181–1197 (2012). https://doi.org/10.1007/s11222-012-9335-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-012-9335-7

Keywords