Combining Classification Algorithms
João Gama
LIACC
Universidade do Porto
Rua do Campo Alegre, 823 4150-180 Porto, Portugal
March 2000
Abstract
The ability of a chosen classification algorithm to induce a good generalization
depends on how appropriate its representation language used to express
generalizations of the examples is for the given task. Since different
learning algorithms employ different knowledge representations and search
heuristics, different search spaces are explored and diverse results are
obtained. The problem of finding the appropriate model for a given task
is an active research area. In this dissertation, instead of looking for
methods that fit the data using a single representation language, we present
a family of algorithms, under the generic name of Cascade Generalization,
whose search spaces contains models that use different representation languages.
The basic idea of the method is to use the learning algorithms in sequence.
At each iteration a two step process occurs. In the first step, a model
is built using a base classifier. In the second step, the instance space
is extended by the insertion of new attributes, generated by the base model.
The constructive step generates terms in the representational language
of the base classifier. If the high level classifier chooses one of these
terms, its representational power has been extended. The bias restrictions
of the high level classifier is relaxed by incorporating terms of the representational
language of the base classifiers. This is the basic idea behind Ltree
and the Cascade Generalization architecture.
The method is presented
in two parts, following somewhat different perspectives. In the first part,
it is presented as a method for building multivariate trees. Here we present
system Ltree, a multivariate decision tree. Ltree uses as constructive
operator a linear discriminant function. It was the precursor of the Cascade
architecture. In the second part, we present a general framework for combining
classifiers. The method Cascade Generalization is an extension of
the method presented in the first part. The base classifiers are not restricted
to discriminant functions, but are generalized to other types of classifiers.
We define the conditions that the base classifier must satisfy so that
it can be used in this framework, and define criteria for selecting the
suitable types of low and high level classifiers. We present two different
variants of Cascade Generalization. The first one employs loosely coupled
classifiers whilst the second one uses tight coupling. In the first schema
base classifier(s) pre-process data for another stage. This framework can
be used to combine most of the existing classifiers without changes, or
with rather small changes. The method only requires that the original data
be extended by the insertion of the probability class distribution that
must be generated by the base classifier. In the second schema, we use
constructive induction locally. That is, two or more classifiers are coupled
locally.
Although in this dissertation we have used only
Local Cascade
Generalization in conjunction with decision trees, the method could
be easily extended to other divide-and-conquer systems, like decision
lists.