Principle goal: to take an existing algorithm and to make it parallel in a cloud computing environment following the Map and Reduce approach of Google.
Research at Google in combination with their vast computational resources have led to interesting ways of making algorithms parallel with the aim to make them faster for problems with large amount of input data [1]. Data mining is such an area where this same principle can apply, assuming algorithms can be run in parallel in a similar fashion. Important to note is that not only the algorithm itself, but also the processes in which it is embedded are distributed. For example, the data may need to be integrated, cleaned and transformed before supplied to the data mining algorithm.
In this project, you will take an algorithm used in a specific project where the aim is to automatically classify anatomical components that exhibit gene expression patterns. These patterns are taken from images taken from stained embryo sections. It is then your task to make the data mining algorithm parallel using the map and reduce principle and then execute your implementation of it on a cloud computing infrastructure, such as Eucalyptus [2].