Cookie Control

This site uses cookies to store information on your computer.

Some cookies on this site are essential, and the site won't work as expected without them. These cookies are set when you submit a form, login or interact with the site by doing something that goes beyond clicking on simple links.

By using our site you accept the terms of our Privacy Policy.

(One cookie will be set to store your preference)
(Ticking this sets a cookie to hide this popup if you then hit close. This will not store any personal information)

About this tool

About Cookie Control

You are here

Historical Interest Only

This is a static HTML version of an old Drupal site. The site is no longer maintained and could be deleted at any point. It is only here for historical interest.

A Generic Parallel Processing Model for Facilitating Data Mining and Integration

TitleA Generic Parallel Processing Model for Facilitating Data Mining and Integration
Publication TypeJournal Article
Year of Publication2011
AuthorsHan, L, Liew, CS, van Hemert, J, Atkinson, M
Journal TitleParallel Computing
Volume37
Issue3
Pages157 - 171
KeywordsData Mining and Data Integration (DMI); Life Sciences; OGSA-DAI; Parallelism; Pipeline Streaming; workflow
Abstract

To facilitate Data Mining and Integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements PEs. The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provides room for performance enhancement. We have applied this approach to a real DMI case in the Life Sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study.

DOI10.1016/j.parco.2011.02.006
Full Text