Abstract
High-throughput bio-technologies (e.g. DNA microarray) generate data characterized by high dimensionality and low cardinality.
The bio-molecular diagnosis of malignancies, based on these biotechnologies, is a difficult learning task,
due to the characteristics of these high-dimensional data.
Many supervised machime learning techniques, among them support vector machines (SVMs), have been experimented, using also feature selection methods to reduce the dimensionality of the data.
In this paper we investigate an alternative approach based on random subspace ensemble methods.
The high dimensionality of the data is reduced by randomly sampling subsets of features (gene expression levels),
and accuracy is improved by aggregating the resulting base classifiers.
Considering the high computational cost of the proposed technique, we used the High-Performance C.I.L.E.A. Avogadro cluster of Xeon double processor workstations to perform all our computational experiments.