TY  - JOUR
T1  - A Generic Parallel Processing Model for Facilitating Data Mining and Integration
JF  - Parallel Computing
Y1  - 2011
A1  - Liangxiu Han
A1  - Chee Sun Liew
A1  - van Hemert, Jano
A1  - Malcolm Atkinson
KW  - Data Mining and Data Integration (DMI)
KW  - Life Sciences
KW  - OGSA-DAI
KW  - Parallelism
KW  - Pipeline Streaming
KW  - workflow
AB  - To facilitate Data Mining and Integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements PEs. The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provides room for performance enhancement. We have applied this approach to a real DMI case in the Life Sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study.
PB  - Elsevier
VL  - 37
IS  - 3
ER  -