Abstract
For nearly any challenging scientific problem evaluation of the likelihood is problematic if not impossible. Approximate Bayesian computation (ABC) allows us to employ the whole Bayesian formalism to problems where we can use simulations from a model, but cannot evaluate the likelihood directly. When summary statistics of real and simulated data are compared—rather than the data directly—information is lost, unless the summary statistics are sufficient. Sufficient statistics are, however, not common but without them statistical inference in ABC inferences are to be considered with caution. Previously other authors have attempted to combine different statistics in order to construct (approximately) sufficient statistics using search and information heuristics. Here we employ an information-theoretical framework that can be used to construct appropriate (approximately sufficient) statistics by combining different statistics until the loss of information is minimized. We start from a potentially large number of different statistics and choose the smallest set that captures (nearly) the same information as the complete set. We then demonstrate that such sets of statistics can be constructed for both parameter estimation and model selection problems, and we apply our approach to a range of illustrative and real-world model selection problems.
Similar content being viewed by others
References
Barnes, C., Silk, D., Sheng, X., Stumpf, M.: Bayesian design of synthetic biological systems. Proc. Natl. Acad. Sci. USA 108, 15190–15195 (2011)
Beaumont, M., Zhang, W., Balding, D.: Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002)
Blum, M.G.B., Tran, V.C.: HIV with contact tracing: a case study in approximate Bayesian computation. Biostatistics 11, 644–660 (2010). doi:10.1093/biostatistics/kxq022
Burnham, K., Anderson, D.: Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer, Berlin (2002)
Cover, T., Thomas, J.: Elements of Information Theory. Wiley-Interscience, New York (2006)
Cox, D.: Principles of Statistical Inference. Cambridge University Press, Cambridge (2006)
Cox, D., Hinkley, D.: Theoretical Statistics. Chapman & Hall/CRC, London (1974)
Dean, T.A., Singh, S.S.: Asymptotic behaviour of approximate Bayesian estimators (2011). arXiv:1105.3655
Dean, T.A., Singh, S.S., Jasra, A., Peters, G.W.: Parameter estimation for hidden Markov models with intractable likelihoods (2011). arXiv:1103.5399
Didelot, X., Everitt, R., Johansen, A., Lawson, D.: Likelihood-free estimation of model evidence (2010). http://warwick.ac.uk
Drovandi, C.C., Pettitt, A.N., Faddy, M.J.: Approximate Bayesian computation using indirect inference. J. R. Stat. Soc., Ser. C, Appl. Stat. 60, 317–337 (2011). doi:10.1111/j.1467-9876.2010.00747.x
Ewens, W.: Mathematical Population Genetics, 2nd edn. Springer, Berlin (2004)
Fagundes, N.J.R., Ray, N., Beaumont, M., Neuenschwander, S., et al.: Statistical evaluation of alternative models of human evolution. Proc. Natl. Acad. Sci. USA 104, 17614–17619 (2007). doi:10.1073/pnas.0708280104
Fearnhead, P., Prangle, D.: Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J. R. Stat. Soc., Ser. B (2011)
Gelman, A., Carlin, J.B., Stern, H., Rubin, D.: Bayesian Data Analysis, 2nd edn. Chapman & Hall/CRC, London (2003)
Hein, J., Schierup, M., Wiuf, C.: Gene Genealogies, Variation and Evolution. Oxford University Press, London (2005)
Hudson, R.R.: Gene genealogies and the coalescent process (1991)
Joyce, P., Marjoram, P.: Approximately sufficient statistics and Bayesian computation. Stat. Appl. Genet. Mol. Biol. (2008)
Kusama, T.: On approximate sufficiency. Osaka J. Math. 13, 661–669 (1976)
Lehmann, E., Casella, G.: Theory of Point Estimation. Springer, Berlin (1993)
Liepe, J., Barnes, C., Cule, E., Erguler, K., et al.: ABC-SysBio—approximate Bayesian computation in Python with GPU support. Bioinformatics 26, 1797–1799 (2010)
Mackay, D.J.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)
Marin, J., Pillai, N., Robert, C., Rousseau, J.: Relevant statistics for Bayesian model choice (2011). arXiv:1110.4700
May, R.M.: Uses and abuses of mathematics in biology. Science 303, 790–793 (2004). doi:10.1126/science.1094442
Mézard, M., Montanari, A.: Information, Physics and Computation. Oxford University Press, London (2009)
Nunes, M.A., Balding, D.J.: On optimal selection of summary statistics for approximate Bayesian computation. Stat. Appl. Genet. Mol. Biol. 9 (2010)
Pèrez-Cruz, F.: Kullback-Leibler divergence estimation of continuous distributions. In: IEEE Int. Sympo. Information Theory (2008)
Pritchard, J.K., Seielstad, M.T., Perez-Lezaun, A., Feldman, M.W.: Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 1791–1798 (1999)
Ratmann, O., Jorgensen, O., Hinkley, T., Stumpf, M., Richardson, S., Wiuf, C.: Using likelihood-free inference to compare evolutionary dynamics of the protein networks of h. pylori and p. falciparum. PLoS Comput. Biol. 3, 2266–2278 (2007)
Robert, C.: The Bayesian Choice. Springer, Berlin (2007)
Robert, C.P., Cornuet, J.-M., Marin, J.-M., Pillai, N.: Lack of confidence in ABC model choice. Proc. Natl. Acad. Sci. USA 108, 15112–15117 (2011)
Rudnick, J., Gaspari, G.: Elements of the Random Walk. Cambridge University Press, Cambridge (2010)
Secrier, M., Toni, T., Stumpf, M.P.H.: The ABC of reverse engineering biological signalling systems. Mol. BioSyst. 5, 1925–1935 (2009). doi:10.1039/b908951a
Shao, J.: Mathematical Statistics. Springer, Berlin (2003)
Tanaka, M.M., Francis, A.R., Luciani, F., Sisson, S.A.: Using approximate Bayesian computation to estimate tuberculosis transmission parameters from genotype data. Genetics 173, 1511–1520 (2006). doi:10.1534/genetics.106.055574
Thorne, T., Stumpf, M.P.H.: Graph spectral analysis of protein interaction network evolution. J. R. Soc. Interface (2012). doi:10.1098/rsif.2012.0220
Toni, T., Stumpf, M.P.H.: Simulation-based model selection for dynamical systems in systems and population biology. Bioinformatics 26, 104–110 (2010)
Toni, T., Welch, D., Strelkowa, N., Ipsen, A., Stumpf, M.P.: Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 6, 187–202 (2009). doi:10.1098/rsif.2008.0172
Wang, Q., Kulkarni, S., Verdú, S.: A nearest-neighbor approach to estimating divergence between continuous random vectors. In: IEEE International Symposium on Information Theory (2006)
Wilkinson, R.D.: Approximate Bayesian computation (ABC) gives exact results under the assumption of model error (2008). arXiv:0811.3355
Wilkinson, R.D., Steiper, M.E., Soligo, C., Martin, R.D., Yang, Z., Tavaré, S.: Dating primate divergences through an integrated analysis of palaeontological and molecular data. Syst. Biol. 60, 16–31 (2011). doi:10.1093/sysbio/syq054
Author information
Authors and Affiliations
Corresponding author
Additional information
All authors contributed equally to this work.
Rights and permissions
About this article
Cite this article
Barnes, C.P., Filippi, S., Stumpf, M.P.H. et al. Considerate approaches to constructing summary statistics for ABC model selection. Stat Comput 22, 1181–1197 (2012). https://doi.org/10.1007/s11222-012-9335-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-012-9335-7