TY  - BOOK
T1  - The DATA Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business
T2  - Wiley Series on Parallel and Distributed Computing (Editor: Albert Y. Zomaya)
Y1  - 2013
A1  - Atkinson, Malcolm P.
A1  - Baxter, Robert M.
A1  - Peter Brezany
A1  - Oscar Corcho
A1  - Michelle Galea
A1  - Parsons, Mark
A1  - Snelling, David
A1  - van Hemert, Jano
KW  - Big Data
KW  - Data Intensive
KW  - data mining
KW  - Data Streaming
KW  - Databases
KW  - Dispel
KW  - Distributed Computing
KW  - Knowledge Discovery
KW  - Workflows
AB  - With the digital revolution opening up tremendous opportunities in many fields, there is a growing need for skilled professionals who can develop data-intensive systems and extract information and knowledge from them. This book frames for the first time a new systematic approach for tackling the challenges of data-intensive computing, providing decision makers and technical experts alike with practical tools for dealing with our exploding data collections.  Emphasising data-intensive thinking and interdisciplinary collaboration,  The DATA Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business examines the essential components of knowledge discovery, surveys many of the current research efforts worldwide, and points to new areas for innovation. Complete with a wealth of examples and DISPEL-based methods demonstrating how to gain more from data in real-world systems, the book:  * Outlines the concepts and rationale for implementing data-intensive computing in organisations  * Covers from the ground up problem-solving strategies for data analysis in a data-rich world  * Introduces techniques for data-intensive engineering using the Data-Intensive Systems Process Engineering Language DISPEL  * Features in-depth case studies in customer relations, environmental hazards, seismology, and more  * Showcases successful applications in areas ranging from astronomy and the humanities to transport engineering  * Includes sample program snippets throughout the text as well as additional materials on a companion website  The DATA Bonanza is a must-have guide for information strategists, data analysts, and engineers in business, research, and government, and for anyone wishing to be on the cutting edge of data mining, machine learning, databases, distributed systems, or large-scale computing.
JF  - Wiley Series on Parallel and Distributed Computing (Editor: Albert Y. Zomaya)
PB  - John Wiley & Sons Inc.
SN  - 978-1-118-39864-7
ER  - 

TY  - CHAP
T1  - Data-Intensive Thinking with DISPEL
T2  - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business
Y1  - 2013
A1  - Malcolm Atkinson
ED  - Malcolm Atkinson
ED  - Rob Baxter
ED  - Peter Brezany
ED  - Oscar Corcho
ED  - Michelle Galea
ED  - Parsons, Mark
ED  - Snelling, David
ED  - van Hemert, Jano
KW  - Data-Intensive Machines
KW  - Data-Intensive Thinking, Data-intensive Computing
KW  - Dispel
KW  - Distributed Computing
KW  - Knowledge Discovery
AB  - Chapter 4: "Data-intensive thinking with DISPEL", engages the reader with technical issues and solutions, by working through a sequence of examples, building up from a sketch of a solution to a large-scale data challenge. It uses the DISPEL language extensively, introducing its concepts and constructs. It shows how DISPEL may help designers, data-analysts, and engineers develop  solutions to the requirements emerging in any data-intensive application domain. The reader is taken through simple steps initially, this then builds to conceptually complex steps that are necessary to cope with the realities of real data providers, real data, real distributed systems, and long-running processes.
JF  - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business
PB  - John Wiley & Sons Inc.
ER  - 

TY  - CONF
T1  - Towards Optimising Distributed Data Streaming Graphs using Parallel Streams
T2  - Data Intensive Distributed Computing (DIDC'10), in conjunction with the 19th International Symposium on High Performance Distributed Computing
Y1  - 2010
A1  - Chee Sun Liew
A1  - Atkinson, Malcolm P.
A1  - van Hemert, Jano
A1  - Liangxiu Han
KW  - Data-intensive Computing
KW  - Distributed Computing
KW  - Optimisation
KW  - Parallel Stream
KW  - Scientific Workflows
AB  - Modern scientific collaborations have opened up the opportunity of solving complex problems that involve multi- disciplinary expertise and large-scale computational experiments. These experiments usually involve large amounts of data that are located in distributed data repositories running various software systems, and managed by different organisations. A common strategy to make the experiments more manageable is executing the processing steps as a workflow. In this paper, we look into the implementation of fine-grained data-flow between computational elements in a scientific workflow as streams. We model the distributed computation as a directed acyclic graph where the nodes represent the processing elements that incrementally implement specific subtasks. The processing elements are connected in a pipelined streaming manner, which allows task executions to overlap. We further optimise the execution by splitting pipelines across processes and by introducing extra parallel streams. We identify performance metrics and design a measurement tool to evaluate each enactment. We conducted ex- periments to evaluate our optimisation strategies with a real world problem in the Life Sciences—EURExpress-II. The paper presents our distributed data-handling model, the optimisation and instrumentation strategies and the evaluation experiments. We demonstrate linear speed up and argue that this use of data-streaming to enable both overlapped pipeline and parallelised enactment is a generally applicable optimisation strategy.
JF  - Data Intensive Distributed Computing (DIDC'10), in conjunction with the 19th International Symposium on High Performance Distributed Computing
PB  - ACM
CY  - Chicago, Illinois
UR  - http://www.cct.lsu.edu/~kosar/didc10/index.php
ER  -