Data Mining and Evolutionary Algorithms

In 2010 Matthew Smith completed a PhD in Evolutionary Algorithms - he has written and presented a number of papers on the use of genetic programming to improve performance in data mining and classification applications.

For a complete list of published papers, please see:

A copy of the PhD thesis can be downloaded here, the abstract is reproduced below:

Abstract

This thesis examines the use of Genetic Programming and a Genetic Algorithm in a wrapper approach to pre-process data before it is classified by a (relatively simple) classifier such as C4.5. Genetic Programming is combined with a Genetic Algorithm to construct and select new features from those available in the data, a potentially significant process for data mining since it gives consideration to hidden relationships between features. Pre-processing data in this fashion for a simple classifier makes explicit the discovery of hidden relationships that is made implicitly by a more complex classifier such as an Artificial Neural Network or a Support Vector Machine. The algorithm is applied initially to C4.5, then to IBk and Naive Bayes, and allowed to select the most appropriate classifier type. Techniques are employed to improve the human readability of these new features and extract more information about the domain. Finally ensembles of heterogenous classifiers are constructed, both from the final population of a single run and by assembling the fittest individuals from many runs.