TY  - CHAP
T1  - The Data-Intensive Survival Guide
T2  - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business
Y1  - 2013
A1  - Malcolm Atkinson
ED  - Malcolm Atkinson
ED  - Rob Baxter
ED  - Peter Brezany
ED  - Oscar Corcho
ED  - Michelle Galea
ED  - Parsons, Mark
ED  - Snelling, David
ED  - van Hemert, Jano
KW  - Data-Analysis Experts
KW  - Data-Intensive Architecture
KW  - Data-intensive Computing
KW  - Data-Intensive Engineers
KW  - Datascopes
KW  - Dispel
KW  - Domain Experts
KW  - Intellectual Ramps
KW  - Knowledge Discovery
KW  - Workflows
AB  - Chapter 3: "The data-intensive survival guide", presents an overview of all of the elements of the proposed data-intensive strategy. Sufficient detail is presented for readers to understand the principles and practice that we recommend. It should also provide a good preparation for readers who choose to sample later chapters. It introduces three professional viewpoints: domain experts, data-analysis experts, and data-intensive engineers. Success depends on a balanced approach that develops the capacity of all three groups. A data-intensive architecture provides a flexible framework for that balanced approach. This enables the three groups to build and exploit data-intensive processes that incrementally step from data to results. A language is introduced to describe these incremental data processes from all three points of view. The chapter introduces ‘datascopes’ as the productized data-handling environments and ‘intellectual ramps’ as the ‘on ramps’  for the highways from data to knowledge.
JF  - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business
PB  - John Wiley & Sons Ltd.
ER  - 

TY  - CHAP
T1  - Definition of the DISPEL Language
T2  - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business
Y1  - 2013
A1  - Paul Martin
A1  - Yaikhom, Gagarine
ED  - Malcolm Atkinson
ED  - Rob Baxter
ED  - Peter Brezany
ED  - Oscar Corcho
ED  - Michelle Galea
ED  - Parsons, Mark
ED  - Snelling, David
ED  - van Hemert, Jano
KW  - Data Streaming
KW  - Data-intensive Computing
KW  - Dispel
AB  - Chapter 10: "Definition of the DISPEL language", describes the novel aspects of the DISPEL language: its constructs, capabilities, and anticipated programming style.
JF  - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business
T3  - {Parallel and Distributed Computing, series editor Albert Y. Zomaya}
PB  - John Wiley & Sons Inc.
ER  - 

TY  - RPRT
T1  - Data-Intensive Research Workshop (15-19 March 2010) Report
Y1  - 2010
A1  - Malcolm Atkinson
A1  - Roure, David De
A1  - van Hemert, Jano
A1  - Shantenu Jha
A1  - Ruth McNally
A1  - Robert Mann
A1  - Stratis Viglas
A1  - Chris Williams
KW  - Data-intensive Computing
KW  - Data-Intensive Machines
KW  - Machine Learning
KW  - Scientific Databases
AB  - We met at the National e-Science Institute in Edinburgh on 15-19 March 2010 to develop our understanding of DIR. Approximately 100 participants (see Appendix A) worked together to develop their own understanding, and we are offering this report as the first step in communicating that to a wider community.  We present this in turns of our developing/emerging understanding of "What is DIR?" and "Why it is important?'". We then review the status of the field, report what the workshop achieved and what remains as open questions.
JF  - National e-Science Centre
PB  - Data-Intensive Research Group, School of Informatics, University of Edinburgh
CY  - Edinburgh
ER  - 

TY  - CONF
T1  - Towards Optimising Distributed Data Streaming Graphs using Parallel Streams
T2  - Data Intensive Distributed Computing (DIDC'10), in conjunction with the 19th International Symposium on High Performance Distributed Computing
Y1  - 2010
A1  - Chee Sun Liew
A1  - Atkinson, Malcolm P.
A1  - van Hemert, Jano
A1  - Liangxiu Han
KW  - Data-intensive Computing
KW  - Distributed Computing
KW  - Optimisation
KW  - Parallel Stream
KW  - Scientific Workflows
AB  - Modern scientific collaborations have opened up the opportunity of solving complex problems that involve multi- disciplinary expertise and large-scale computational experiments. These experiments usually involve large amounts of data that are located in distributed data repositories running various software systems, and managed by different organisations. A common strategy to make the experiments more manageable is executing the processing steps as a workflow. In this paper, we look into the implementation of fine-grained data-flow between computational elements in a scientific workflow as streams. We model the distributed computation as a directed acyclic graph where the nodes represent the processing elements that incrementally implement specific subtasks. The processing elements are connected in a pipelined streaming manner, which allows task executions to overlap. We further optimise the execution by splitting pipelines across processes and by introducing extra parallel streams. We identify performance metrics and design a measurement tool to evaluate each enactment. We conducted ex- periments to evaluate our optimisation strategies with a real world problem in the Life Sciences—EURExpress-II. The paper presents our distributed data-handling model, the optimisation and instrumentation strategies and the evaluation experiments. We demonstrate linear speed up and argue that this use of data-streaming to enable both overlapped pipeline and parallelised enactment is a generally applicable optimisation strategy.
JF  - Data Intensive Distributed Computing (DIDC'10), in conjunction with the 19th International Symposium on High Performance Distributed Computing
PB  - ACM
CY  - Chicago, Illinois
UR  - http://www.cct.lsu.edu/~kosar/didc10/index.php
ER  -