ABSTRACT
A statistical approach to decision tree modeling is described. In this approach, each decision in the tree is modeled parametrically as is the process by which an output is generated from an input and a sequence of decisions. The resulting model yields a likelihood measure of goodness of fit, allowing ML and MAP estimation techniques to be utilized. An efficient algorithm is presented to estimate the parameters in the tree. The model selection problem is presented and several alternative proposals are considered. A hidden Markov version of the tree is described for data sequences that have temporal dependencies.
- Baum, L.E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41, 164-171.Google ScholarCross Ref
- Bengio, Y., & Frasconi, P. (in press). Credit assignment through time: Alternatives to backpropagation. Neural Information Processing Systems 6. San Marco, CA: Morgan Kaufmann.Google Scholar
- Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International Group.Google Scholar
- Cacciatore, T & Nowlan, S. (in press). Mixtures of controllers for jump linear and non-linear plants. Neural Information Processing Systems 6. San Mateo, CA: Morgan Kaufmann.Google Scholar
- Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39, 1-38.Google ScholarCross Ref
- Draper, N. R., & Smith, H. (t981). Applied Regression Analysis. New York: John Wiley.Google Scholar
- Duda, R. O., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. New York: John Wiley.Google Scholar
- Ghahramani, Z., & Jordan, M. I. (in press). Supervised learning from incomplete data via the EM approach. Neural Information Processing Systems 6. San Mateo, CA: Morgan Kaufmann.Google Scholar
- Jordan, M. I. & Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181-214. Google ScholarDigital Library
- Jordan, M. I., & Xu, L. (1993). Convergence properties of the EM approach to learning in mixture-of-experts architectures. MIT Artificial Intelligence Laboratory Tech. Rep. 1458, Cambridge, MA. Google ScholarDigital Library
- McCullagh, P. & Nelder, J.A. (1983). Generalized Linear Models. London: Chapman and Hall.Google Scholar
- Meila, M. P., & Jordan, M. I. (1994). Learning the parameters of HMMs with auxiliary input. MIT Computational Cognitive Science Tech. Rep. 9401, Cambridge, MA.Google Scholar
- Murthy, S. K., Kasif, S., & Salzberg, S. (1993). OCI.' A randomized algorithm for building oblique decision trees. Technical Report, Department of Computer Science, Johns Hopkins University.Google Scholar
- Neal, R., & Hinton, G. E. (1993). A new view of the EM algorithm that justifies incremental and other variants. Submitted to Biometrika.Google Scholar
- Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann. Google ScholarDigital Library
- Quinlan, J. R., & Rivest, R. L. (1989). Inferring decision trees using the Minimum Description Length Principle. Information and Computation, 80, 227-248. Google ScholarDigital Library
- Utgoff, P. E., & Brodley, C. E. (1990). An incremental method for finding multivariate splits for decision trees. In Proceedings of the Seventh International Conference on Machine Learning, Los Altos, CA. Google ScholarDigital Library
- Xu, L., Jordan, M. I., & Hinton, G. E. (1994). An alternative mixture of experts model. MIT Computational Cognitive Science Tech. Rep. 9402, Cambridge, MA.Google Scholar
Index Terms
- A statistical approach to decision tree modeling
Recommendations
Sparse alternating decision tree
Alternating decision tree (ADTree) brings interpretability to boosting.A novel sparse version of multivariate ADTree is presented.Sparse ADTree is a better generalization of existing univariate ADTree.The decision nodes are designed based on modified ...
Using multi decision tree technique to improving decision tree classifier
The automatic classification systems, prediction and data mining are used in many applications marketing, finance, customer relationship management... using large databases. In this paper we describe a new data mining approach based on decision trees. ...
Crisp Decision Tree Induction Based on Fuzzy Decision Tree Algorithm
ICISE '09: Proceedings of the 2009 First IEEE International Conference on Information Science and EngineeringFuzzy decision tree is generally considered as an extension of crisp decision tree. The algorithms used in fuzzy decision tree induction are often the extended form of those used in crisp decision tree induction. In this paper, the problem is considered ...
Comments