Optimising Data-Streaming Elements in Distributed Data Mining

15 January 2010 - 2:59pm — Jano.van.Hemert

Principle goal: To evaluate existing data streaming implementation, formulate model to predict streaming performance corresponding to buffering strategy and then optimise data streaming with dynamical buffering implementation.

The ADMIRE system [1] uses data streaming to connect software processing elements (PEs) so as to build directed acyclic graphs (DAGs) to efficiently distributed data mining and integration processes across computers. These data streams carry the output of a PE to the input of another PE. These PEs may have several inputs and consequently buffering may be needed within a data stream to handle different production and consumption rates.

The existing implementations interconnect with a bounded main memory buffer or buffers asso- ciated with communication between sockets. Jacobs [2] has observed the factor of more than 105 in the speed of memory access depending on whether it is serial or random. The workloads require arbitrary sized buffers due to mismatched processing speeds along different branches of a DAG. The challenge is to show how a set of implementations of data streaming can be designed so that they have efficient access patterns for the various scales of buffering required.

A successful project would study the requirement and existing implementations (in use locally as in the literature) and formulate a model that predicted performance versus buffer capacity. It would then test these predictions by measuring the performance with simulated workloads. Measurement tools and simulated loads will be provided in Java.

Project status:

Finished

Degree level:

MSc

Background:

Java programming essential. Distributed/parallel computing desirable.

Supervisors @ NeSC:

Chee.Sun.Liew

Malcolm.Atkinson

Jano.van.Hemert

Subject areas:

Computer Architecture

Distributed Systems

Parallel Programming

Performance Modelling and Simulation

System Level Integration

Projects:

ADMIRE

Student project type:

MSc student project

References:

[1] M. Atkinson, P. Brezany, O. Corcho, L. Han, J. van Hemert, L. Hluchy ́, A. Hume, I. Janciak, A. Krause, and D. Snelling. ADMIRE White Paper: Motivation, Strategy, Overview and Impact. Technical Report version 0.9, ADMIRE, EPCC, University of Edinburgh, January 2009. [2] A. Jacobs. The pathologies of big data. Commun. ACM, 52(8):36–44, 2009.

Cookie Control

Main menu

Latest news

Pages

You are here

Historical Interest Only

Optimising Data-Streaming Elements in Distributed Data Mining

Cookie Control

Search form

Main menu

Latest news

Pages

You are here

Historical Interest Only

Optimising Data-Streaming Elements in Distributed Data Mining