Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Cross-validation prior choice in Bayesian probit regression with many covariates

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

This paper examines prior choice in probit regression through a predictive cross-validation criterion. In particular, we focus on situations where the number of potential covariates is far larger than the number of observations, such as in gene expression data. Cross-validation avoids the tendency of such models to fit perfectly. We choose the scale parameter c in the standard variable selection prior as the minimizer of the log predictive score. Naive evaluation of the log predictive score requires substantial computational effort, and we investigate computationally cheaper methods using importance sampling. We find that K-fold importance densities perform best, in combination with either mixing over different values of c or with integrating over c through an auxiliary distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999)

    Article  Google Scholar 

  • Brown, P.J., Vannucci, M.: Multivariate Bayesian variable selection and prediction. J. R. Stat. Soc. 60(3), 627–641 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux, G., Marin, J.-M., Robert, C.P.: Sélection bayésienne de variables en régression linéaire. J. Soc. Fr. Stat. 147, 59–79 (2006)

    MathSciNet  Google Scholar 

  • Cui, W., George, E.I.: Empirical Bayes vs. fully Bayes variable selection. J. Stat. Plan. Inference 138, 888–900 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Denison, D.G.T., Holmes, C.C., Mallick, B.K., Smith, A.F.M.: Bayesian Methods for Nonlinear Classification and Regression. Wiley, New York (2002)

    MATH  Google Scholar 

  • Dobra, A.: Variable selection and dependency networks for genomewide data. Biostatistics 10, 621–639 (2009)

    Article  Google Scholar 

  • Fernández, C., Ley, E., Steel, M.F.J.: Benchmark priors for Bayesian model averaging. J. Econom. 100, 381–427 (2001)

    Article  MATH  Google Scholar 

  • Geisser, S., Eddy, W.F.: A predictive approach to model selection. J. Am. Stat. Assoc. 74, 153–160 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  • Gelfand, A.E., Dey, D.K.: Bayesian model choice: asymptotics and exact calculations. J. R. Stat. Soc., Ser. B 56, 501–514 (1994)

    MathSciNet  MATH  Google Scholar 

  • Gelfand, A.E., Dey, D.K., Chang, H.: Model determination using predictive distributions with implementation via sampling-based methods. Bayesian Stat. 4, 147–167 (1992)

    MathSciNet  Google Scholar 

  • George, E.I., Foster, D.P.: Calibration and empirical Bayes variable selection. Biometrika 87(4), 731–747 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Geyer, C.J.: Estimating normalizing constants and reweighting mixtures in MCMC. Technical Report 568, University of Minnesota, School of Statistics (1994)

  • Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction and estimation. J. Am. Stat. Assoc. 102, 359–378 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Good, I.J.: Rational decisions. J. R. Stat. Soc., Ser. B 14(1), 107–114 (1952)

    MathSciNet  Google Scholar 

  • Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, New York (2001)

    MATH  Google Scholar 

  • Holmes, C.C., Held, L.: Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1(1), 145–168 (2006)

    MathSciNet  Google Scholar 

  • Key, J., Pericchi, L., Smith, A.F.M.: Bayesian model choice: what and why? In: Bernardo, J., Berger, J.O., Dawid, A.P., Smith, A.F.M. (eds.) Bayesian Statistics, vol. 6, pp. 343–370. Oxford University Press, Oxford (1999)

    Google Scholar 

  • Lee, K.E., Sha, N., Dougherty, E.R., Vannucci, M., Mallick, B.: Gene selection: A Bayesian variable selection approach. Bioinformatics 19, 90–97 (2003)

    Article  Google Scholar 

  • Liang, F., Paulo, R., Molina, G., Clyde, M.A., Berger, J.O.: Mixture of g-priors for Bayesian variable selection. J. Am. Stat. Assoc. 103, 410–423 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2001)

    MATH  Google Scholar 

  • Owen, A., Zhou, Y.: Safe and effective importance sampling. J. Am. Stat. Assoc. 95, 135–143 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York (2004)

    MATH  Google Scholar 

  • Scott, J.G., Berger, J.O.: An exploration of aspects of Bayesian multiple testing. J. Stat. Plan. Inference 136, 2144–2162 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Sha, N., Vannucci, M., Brown, P.J., Trower, M.K., Amphlett, G., Falciani, F.: Gene selection in arthritis classification with large-scale microarray expression profiles. Comp. Funct. Genomics 4, 171–181 (2003)

    Article  Google Scholar 

  • Sha, N., Vannucci, M., Tadesse, M.G., Brown, P.J., Dragoni, I., Davies, N., Roberts, T.C., Contestabile, A., Salmon, M., Buckley, C., Falciani, F.: Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics 60, 812–819 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Shafer, G.: Lindley’s paradox. J. Am. Stat. Assoc. 77, 325–351 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  • Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)

    Article  Google Scholar 

  • Strimenopoulou, F., Brown, P.J.: Empirical Bayes logistic regression. Stat. Appl. Genet. Mol. Biol. 7, 9 (2008)

    MathSciNet  Google Scholar 

  • Veach, E., Guibas, L.: Optimally combining sampling techniques for Monte Carlo rendering. In: SIGGRAPH’95 Conference Proceedings, pp. 419–428. Addison–Wesley, Reading (1995)

    Google Scholar 

  • Ventura, V.: Non-parametric bootstrap recycling. Stat. Comput. 12, 261–273 (2002)

    Article  MathSciNet  Google Scholar 

  • Zhou, X., Liu, K.-Y., Wong, S.T.C.: Cancer classification and prediction using logistic regression with Bayesian gene selection. J. Biomed. Inform. 37(4), 249–259 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. F. J. Steel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lamnisos, D., Griffin, J.E. & Steel, M.F.J. Cross-validation prior choice in Bayesian probit regression with many covariates. Stat Comput 22, 359–373 (2012). https://doi.org/10.1007/s11222-011-9228-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-011-9228-1

Keywords