The EDIM1 data-intensive architecture is intended to accelerate data-intensive processing.
One good way to organise this processing is with a map-reduce model.
We can consider two candidate implementations: Hadoop and the Spectre+Sphere combination from Grossman et al.
There is a challenge as to how best to use the EDIM1 architecture (3 hard drives and an SSD per dual-core Atom node) for these MR implementations.
A systematic evaluation using benchmarks, to develop an understanding of an appropriate strategy would be a first step.
Initial results with a small number of examples are available from previous MSc projects.
The second phase would compare these results with the results obtainable on other clusters supporting the OSDC or otherwise available.
A third phase might investigate multisite MR optimisation within the OSDC.
This is particularly suitable for OSDC PIRE visits, but may also give opportunities for MSc projects by looking at a particular MR and file system implementation and a particular class of applications or benchmarks.
Contact Malcolm Atkinson or Paolo Besana.