A statistical approach to decision tree modeling

Author:
Michael I. Jordan

Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA

Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA
View Profile

COLT '94: Proceedings of the seventh annual conference on Computational learning theoryJuly 1994Pages 13–20https://doi.org/10.1145/180139.175372

Published:16 July 1994Publication History

COLT '94: Proceedings of the seventh annual conference on Computational learning theory

Pages 13–20

ABSTRACT

A statistical approach to decision tree modeling is described. In this approach, each decision in the tree is modeled parametrically as is the process by which an output is generated from an input and a sequence of decisions. The resulting model yields a likelihood measure of goodness of fit, allowing ML and MAP estimation techniques to be utilized. An efficient algorithm is presented to estimate the parameters in the tree. The model selection problem is presented and several alternative proposals are considered. A hidden Markov version of the tree is described for data sequences that have temporal dependencies.

References

Baum, L.E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41, 164-171.Google ScholarCross Ref
Bengio, Y., & Frasconi, P. (in press). Credit assignment through time: Alternatives to backpropagation. Neural Information Processing Systems 6. San Marco, CA: Morgan Kaufmann.Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International Group.Google Scholar
Cacciatore, T & Nowlan, S. (in press). Mixtures of controllers for jump linear and non-linear plants. Neural Information Processing Systems 6. San Mateo, CA: Morgan Kaufmann.Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39, 1-38.Google ScholarCross Ref
Draper, N. R., & Smith, H. (t981). Applied Regression Analysis. New York: John Wiley.Google Scholar
Duda, R. O., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. New York: John Wiley.Google Scholar
Ghahramani, Z., & Jordan, M. I. (in press). Supervised learning from incomplete data via the EM approach. Neural Information Processing Systems 6. San Mateo, CA: Morgan Kaufmann.Google Scholar
Jordan, M. I. & Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181-214. Google ScholarDigital Library
Jordan, M. I., & Xu, L. (1993). Convergence properties of the EM approach to learning in mixture-of-experts architectures. MIT Artificial Intelligence Laboratory Tech. Rep. 1458, Cambridge, MA. Google ScholarDigital Library
McCullagh, P. & Nelder, J.A. (1983). Generalized Linear Models. London: Chapman and Hall.Google Scholar
Meila, M. P., & Jordan, M. I. (1994). Learning the parameters of HMMs with auxiliary input. MIT Computational Cognitive Science Tech. Rep. 9401, Cambridge, MA.Google Scholar
Murthy, S. K., Kasif, S., & Salzberg, S. (1993). OCI.' A randomized algorithm for building oblique decision trees. Technical Report, Department of Computer Science, Johns Hopkins University.Google Scholar
Neal, R., & Hinton, G. E. (1993). A new view of the EM algorithm that justifies incremental and other variants. Submitted to Biometrika.Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann. Google ScholarDigital Library
Quinlan, J. R., & Rivest, R. L. (1989). Inferring decision trees using the Minimum Description Length Principle. Information and Computation, 80, 227-248. Google ScholarDigital Library
Utgoff, P. E., & Brodley, C. E. (1990). An incremental method for finding multivariate splits for decision trees. In Proceedings of the Seventh International Conference on Machine Learning, Los Altos, CA. Google ScholarDigital Library
Xu, L., Jordan, M. I., & Hinton, G. E. (1994). An alternative mixture of experts model. MIT Computational Cognitive Science Tech. Rep. 9402, Cambridge, MA.Google Scholar

Index Terms

A statistical approach to decision tree modeling
1. Computing methodologies
  1. Machine learning
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Sparse alternating decision tree

Alternating decision tree (ADTree) brings interpretability to boosting.A novel sparse version of multivariate ADTree is presented.Sparse ADTree is a better generalization of existing univariate ADTree.The decision nodes are designed based on modified ...
Read More
Using multi decision tree technique to improving decision tree classifier

The automatic classification systems, prediction and data mining are used in many applications marketing, finance, customer relationship management... using large databases. In this paper we describe a new data mining approach based on decision trees. ...
Read More
Crisp Decision Tree Induction Based on Fuzzy Decision Tree Algorithm
ICISE '09: Proceedings of the 2009 First IEEE International Conference on Information Science and Engineering

Fuzzy decision tree is generally considered as an extension of crisp decision tree. The algorithms used in fuzzy decision tree induction are often the extended form of those used in crisp decision tree induction. In this paper, the problem is considered ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
COLT '94: Proceedings of the seventh annual conference on Computational learning theory
July 1994
376 pages
ISBN:0897916557
DOI:10.1145/180139
Chairman:
Manfred Warmuth
Univ. of California, Santa Cruz
Copyright © 1994 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 July 1994
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate35of71submissions,49%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 416
  Total Downloads
- Downloads (Last 12 months)216
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A statistical approach to decision tree modeling

COLT '94: Proceedings of the seventh annual conference on Computational learning theory

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sparse alternating decision tree

Using multi decision tree technique to improving decision tree classifier

Crisp Decision Tree Induction Based on Fuzzy Decision Tree Algorithm