TY - CONF T1 - Ad hoc Cloud Computing T2 - IEEE Cloud Y1 - 2015 A1 - Gary McGilvary A1 - Barker, Adam A1 - Malcolm Atkinson KW - ad hoc KW - cloud computing KW - reliability KW - virtualization KW - volunteer computing AB - This paper presents the first complete, integrated and end-to-end solution for ad hoc cloud computing environments. Ad hoc clouds harvest resources from existing sporadically available, non-exclusive (i.e. primarily used for some other purpose) and unreliable infrastructures. In this paper we discuss the problems ad hoc cloud computing solves and outline our architecture which is based on BOINC. JF - IEEE Cloud UR - http://arxiv.org/abs/1505.08097 ER - TY - CONF T1 - Applying selectively parallel IO compression to parallel storage systems T2 - Euro-Par Y1 - 2014 A1 - Rosa Filgueira A1 - Malcolm Atkinson A1 - Yusuke Tanimura A1 - Isao Kojima JF - Euro-Par ER - TY - Generic T1 - Varpy: A python library for volcanology and rock physics data analysis. EGU2014-3699 Y1 - 2014 A1 - Rosa Filgueira A1 - Malcolm Atkinson A1 - Andrew Bell A1 - Branwen Snelling ER - TY - CHAP T1 - Data-Intensive Analysis T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Oscar Corcho A1 - van Hemert, Jano ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - data mining KW - Data-Analysis Experts KW - Data-Intensive Analysis KW - Knowledge Discovery AB - Part II: "Data-intensive Knowledge Discovery", focuses on the needs of data-analysis experts. It illustrates the problem-solving strategies appropriate for a data-rich world, without delving into the details of underlying technologies. It should engage and inform data-analysis specialists, such as statisticians, data miners, image analysts, bio-informaticians or chemo-informaticians, and generate ideas pertinent to their application areas. Chapter 5: "Data-intensive Analysis", introduces a set of common problems that data-analysis experts often encounter, by means of a set of scenarios of increasing levels of complexity. The scenarios typify knowledge discovery challenges and the presented solutions provide practical methods; a starting point for readers addressing their own data challenges. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - Data-Intensive Components and Usage Patterns T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Oscar Corcho ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data Analysis KW - data mining KW - Data-Intensive Components KW - Registry KW - Workflow Libraries KW - Workflow Sharing AB - Chapter 7: "Data-intensive components and usage patterns", provides a systematic review of the components that are commonly used in knowledge discovery tasks as well as common patterns of component composition. That is, it introduces the processing elements from which knowledge discovery solutions are built and common composition patterns for delivering trustworthy information. It reflects on how these components and patterns are evolving in a data-intensive context. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - The Data-Intensive Survival Guide T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Malcolm Atkinson ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data-Analysis Experts KW - Data-Intensive Architecture KW - Data-intensive Computing KW - Data-Intensive Engineers KW - Datascopes KW - Dispel KW - Domain Experts KW - Intellectual Ramps KW - Knowledge Discovery KW - Workflows AB - Chapter 3: "The data-intensive survival guide", presents an overview of all of the elements of the proposed data-intensive strategy. Sufficient detail is presented for readers to understand the principles and practice that we recommend. It should also provide a good preparation for readers who choose to sample later chapters. It introduces three professional viewpoints: domain experts, data-analysis experts, and data-intensive engineers. Success depends on a balanced approach that develops the capacity of all three groups. A data-intensive architecture provides a flexible framework for that balanced approach. This enables the three groups to build and exploit data-intensive processes that incrementally step from data to results. A language is introduced to describe these incremental data processes from all three points of view. The chapter introduces ‘datascopes’ as the productized data-handling environments and ‘intellectual ramps’ as the ‘on ramps’ for the highways from data to knowledge. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - Data-Intensive Thinking with DISPEL T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Malcolm Atkinson ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data-Intensive Machines KW - Data-Intensive Thinking, Data-intensive Computing KW - Dispel KW - Distributed Computing KW - Knowledge Discovery AB - Chapter 4: "Data-intensive thinking with DISPEL", engages the reader with technical issues and solutions, by working through a sequence of examples, building up from a sketch of a solution to a large-scale data challenge. It uses the DISPEL language extensively, introducing its concepts and constructs. It shows how DISPEL may help designers, data-analysts, and engineers develop solutions to the requirements emerging in any data-intensive application domain. The reader is taken through simple steps initially, this then builds to conceptually complex steps that are necessary to cope with the realities of real data providers, real data, real distributed systems, and long-running processes. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Inc. ER - TY - CHAP T1 - Definition of the DISPEL Language T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Paul Martin A1 - Yaikhom, Gagarine ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data Streaming KW - Data-intensive Computing KW - Dispel AB - Chapter 10: "Definition of the DISPEL language", describes the novel aspects of the DISPEL language: its constructs, capabilities, and anticipated programming style. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business T3 - {Parallel and Distributed Computing, series editor Albert Y. Zomaya} PB - John Wiley & Sons Inc. ER - TY - CHAP T1 - The Digital-Data Challenge T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Malcolm Atkinson A1 - Parsons, Mark ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Big Data KW - Data-intensive Computing, Knowledge Discovery KW - Digital Data KW - Digital-Data Revolution AB - Part I: Strategies for success in the digital-data revolution, provides an executive summary of the whole book to convince strategists, politicians, managers, and educators that our future data-intensive society requires new thinking, new behavior, new culture, and new distribution of investment and effort. This part will introduce the major concepts so that readers are equipped to discuss and steer their organization’s response to the opportunities and obligations brought by the growing wealth of data. It will help readers understand the changing context brought about by advances in digital devices, digital communication, and ubiquitous computing. Chapter 1: The digital-data challenge, will help readers to understand the challenges ahead in making good use of the data and introduce ideas that will lead to helpful strategies. A global digital-data revolution is catalyzing change in the ways in which we live, work, relax, govern, and organize. This is a significant change in society, as important as the invention of printing or the industrial revolution, but more challenging because it is happening globally at lnternet speed. Becoming agile in adapting to this new world is essential. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - The Digital-Data Revolution T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Malcolm Atkinson ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data KW - Information KW - Knowledge KW - Knowledge Discovery KW - Social Impact of Digital Data KW - Wisdom, Data-intensive Computing AB - Chapter 2: "The digital-data revolution", reviews the relationships between data, information, knowledge, and wisdom. It analyses and quantifies the changes in technology and society that are delivering the data bonanza, and then reviews the consequential changes via representative examples in biology, Earth sciences, social sciences, leisure activity, and business. It exposes quantitative details and shows the complexity and diversity of the growing wealth of data, introducing some of its potential benefits and examples of the impediments to successfully realizing those benefits. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - DISPEL Development T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Adrian Mouat A1 - Snelling, David ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Diagnostics KW - Dispel KW - IDE KW - Libraries KW - Processing Elements AB - Chapter 11: "DISPEL development", describes the tools and libraries that a DISPEL developer might expect to use. The tools include those needed during process definition, those required to organize enactment, and diagnostic aids for developers of applications and platforms. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Inc. ER - TY - CHAP T1 - DISPEL Enactment T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Chee Sun Liew A1 - Krause, Amrey A1 - Snelling, David ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data Streaming KW - Data-Intensive Engineering KW - Dispel KW - Workflow Enactment AB - Chapter 12: "DISPEL enactment", describes the four stages of DISPEL enactment. It is targeted at the data-intensive engineers who implement enactment services. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Inc. ER - TY - CHAP T1 - Foreword T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Tony Hey ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Big Data KW - Data-intensive Computing, Knowledge Discovery JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - Platforms for Data-Intensive Analysis T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Snelling, David ED - Malcolm Atkinson ED - Baxter, Robert M. ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data-Intensive Engineering KW - Data-Intensive Systems KW - Dispel KW - Distributed Systems AB - Part III: "Data-intensive engineering", is targeted at technical experts who will develop complex applications, new components, or data-intensive platforms. The techniques introduced may be applied very widely; for example, to any data-intensive distributed application, such as index generation, image processing, sequence comparison, text analysis, and sensor-stream monitoring. The challenges, methods, and implementation requirements are illustrated by making extensive use of DISPEL. Chapter 9: "Platforms for data-intensive analysis", gives a reprise of data-intensive architectures, examines the business case for investing in them, and introduces the stages of data-intensive workflow enactment. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - Preface T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Malcolm Atkinson ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Big Data, Data-intensive Computing, Knowledge Discovery AB - Who should read the book and why. The structure and conventions used. Suggested reading paths for different categories of reader. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CHAP T1 - Problem Solving in Data-Intensive Knowledge Discovery T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Oscar Corcho A1 - van Hemert, Jano ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data-Analysis Experts KW - Data-Intensive Analysis KW - Design Patterns for Knowledge Discovery KW - Knowledge Discovery AB - Chapter 6: "Problem solving in data-intensive knowledge discovery", on the basis of the previous scenarios, this chapter provides an overview of effective strategies in knowledge discovery, highlighting common problem-solving methods that apply in conventional contexts, and focusing on the similarities and differences of these methods. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CONF T1 - Provenance for seismological processing pipelines in a distributed streaming workflow T2 - EDBT/ICDT Workshops Y1 - 2013 A1 - Alessandro Spinuso A1 - James Cheney A1 - Malcolm Atkinson JF - EDBT/ICDT Workshops ER - TY - CHAP T1 - Sharing and Reuse in Knowledge Discovery T2 - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business Y1 - 2013 A1 - Oscar Corcho ED - Malcolm Atkinson ED - Rob Baxter ED - Peter Brezany ED - Oscar Corcho ED - Michelle Galea ED - Parsons, Mark ED - Snelling, David ED - van Hemert, Jano KW - Data-Intensive Analysis KW - Knowledge Discovery KW - Ontologies KW - Semantic Web KW - Sharing AB - Chapter 8: "Sharing and re-use in knowledge discovery", introduces more advanced knowledge discovery problems, and shows how improved component and pattern descriptions facilitate re-use. This supports the assembly of libraries of high level components well-adapted to classes of knowledge discovery methods or application domains. The descriptions are made more powerful by introducing notations from the semantic Web. JF - THE DATA BONANZA: Improving Knowledge Discovery for Science, Engineering and Business PB - John Wiley & Sons Ltd. ER - TY - CONF T1 - Towards Addressing CPU-Intensive Seismological Applications in Europe T2 - International Supercomputing Conference Y1 - 2013 A1 - Michele Carpené A1 - I.A. Klampanos A1 - Siew Hoon Leong A1 - Emanuele Casarotti A1 - Peter Danecek A1 - Graziella Ferini A1 - Andre Gemünd A1 - Amrey Krause A1 - Lion Krischer A1 - Federica Magnoni A1 - Marek Simon A1 - Alessandro Spinuso A1 - Luca Trani A1 - Malcolm Atkinson A1 - Giovanni Erbacci A1 - Anton Frank A1 - Heiner Igel A1 - Andreas Rietbrock A1 - Horst Schwichtenberg A1 - Jean-Pierre Vilotte AB - Advanced application environments for seismic analysis help geoscientists to execute complex simulations to predict the behaviour of a geophysical system and potential surface observations. At the same time data collected from seismic stations must be processed comparing recorded signals with predictions. The EU-funded project VERCE (http://verce.eu/) aims to enable specific seismological use-cases and, on the basis of requirements elicited from the seismology community, provide a service-oriented infrastructure to deal with such challenges. In this paper we present VERCE’s architecture, in particular relating to forward and inverse modelling of Earth models and how the, largely file-based, HPC model can be combined with data streaming operations to enhance the scalability of experiments.We posit that the integration of services and HPC resources in an open, collaborative environment is an essential medium for the advancement of sciences of critical importance, such as seismology. JF - International Supercomputing Conference CY - Leipzig, Germany ER - TY - CONF T1 - V-BOINC: The Virtualization of BOINC T2 - In Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2013). Y1 - 2013 A1 - Gary McGilvary A1 - Barker, Adam A1 - Ashley Lloyd A1 - Malcolm Atkinson AB - The Berkeley Open Infrastructure for Network Computing (BOINC) is an open source client-server middleware system created to allow projects with large computational requirements, usually set in the scientific domain, to utilize a technically unlimited number of volunteer machines distributed over large physical distances. However various problems exist deploying applications over these heterogeneous machines using BOINC: applications must be ported to each machine architecture type, the project server must be trusted to supply authentic applications, applications that do not regularly checkpoint may lose execution progress upon volunteer machine termination and applications that have dependencies may find it difficult to run under BOINC. To solve such problems we introduce virtual BOINC, or V-BOINC, where virtual machines are used to run computations on volunteer machines. Application developers can then compile their applications on a single architecture, checkpointing issues are solved through virtualization API's and many security concerns are addressed via the virtual machine's sandbox environment. In this paper we focus on outlining a unique approach on how virtualization can be introduced into BOINC and demonstrate that V-BOINC offers acceptable computational performance when compared to regular BOINC. Finally we show that applications with dependencies can easily run under V-BOINC in turn increasing the computational potential volunteer computing offers to the general public and project developers. V-BOINC can be downloaded at http://garymcgilvary.co.uk/vboinc.html JF - In Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2013). CY - Delft, The Netherlands ER - TY - CONF T1 - An adaptive, scalable, and portable technique for speeding up MPI-based applications T2 - International European Conference on Parallel and Distributed Computing, Europar-2012 Y1 - 2012 A1 - Rosa Filgueira A1 - Alberto Nuñez A1 - Javier Fernandez A1 - Malcolm Atkinson JF - International European Conference on Parallel and Distributed Computing, Europar-2012 ER - TY - JOUR T1 - Computed Tomography Perfusion Imaging Denoising Using Gaussian Process Regression JF - Physics in Medicine and Biology Y1 - 2012 A1 - Fan Zhu A1 - Carpenter, Trevor A1 - Rodríguez, David A1 - Malcolm Atkinson A1 - Wardlaw, Joanna AB - Objective: Brain perfusion weighted images acquired using dynamic contrast studies have an important clinical role in acute stroke diagnosis and treatment decisions. However, Computed Tomography (CT) images suffer from low contrast-to-noise ratios (CNR) as a consequence of the limitation of the exposure to radiation of the patient. As a consequence, the developments of methods for improving the CNR are valuable. Methods: The majority of existing approaches for denoising CT images are optimized for 3D (spatial) information, including spatial decimation (spatially weighted mean filters) and techniques based on wavelet and curvelet transforms. However, perfusion imaging data is 4D as it also contains temporal information. Our approach using Gaussian process regression (GPR), which takes advantage of the temporal information, to reduce the noise level. Results: Over the entire image, GPR gains a 99% CNR improvement over the raw images and also improves the quality of haemodynamic maps allowing a better identification of edges and detailed information. At the level of individual voxel, GPR provides a stable baseline, helps us to identify key parameters from tissue time- concentration curves and reduces the oscillations in the curve. Conclusion: GPR is superior to the comparable techniques used in this study. ER - TY - JOUR T1 - Parallel perfusion imaging processing using GPGPU JF - Computer Methods and Programs in Biomedicine Y1 - 2012 A1 - Fan Zhu A1 - Rodríguez, David A1 - Carpenter, Trevor A1 - Malcolm Atkinson A1 - Wardlaw, Joanna KW - Deconvolution KW - GPGPU KW - Local AIF KW - Parallelization KW - Perfusion Imaging AB - Background and purpose The objective of brain perfusion quantification is to generate parametric maps of relevant hemodynamic quantities such as cerebral blood flow (CBF), cerebral blood volume (CBV) and mean transit time (MTT) that can be used in diagnosis of acute stroke. These calculations involve deconvolution operations that can be very computationally expensive when using local Arterial Input Functions (AIF). As time is vitally important in the case of acute stroke, reducing the analysis time will reduce the number of brain cells damaged and increase the potential for recovery. Methods GPUs originated as graphics generation dedicated co-processors, but modern GPUs have evolved to become a more general processor capable of executing scientific computations. It provides a highly parallel computing environment due to its large number of computing cores and constitutes an affordable high performance computing method. In this paper, we will present the implementation of a deconvolution algorithm for brain perfusion quantification on GPGPU (General Purpose Graphics Processor Units) using the CUDA programming model. We present the serial and parallel implementations of such algorithms and the evaluation of the performance gains using GPUs. Results Our method has gained a 5.56 and 3.75 speedup for CT and MR images respectively. Conclusions It seems that using GPGPU is a desirable approach in perfusion imaging analysis, which does not harm the quality of cerebral hemodynamic maps but delivers results faster than the traditional computation. UR - http://www.sciencedirect.com/science/article/pii/S0169260712001587 ER - TY - RPRT T1 - EDIM1 Progress Report Y1 - 2011 A1 - Paul Martin A1 - Malcolm Atkinson A1 - Parsons, Mark A1 - Adam Carter A1 - Gareth Francis AB - The Edinburgh Data-Intensive Machine (EDIM1) is the product of a joint collaboration between the data-intensive group at the School of Informatics and EPCC. EDIM1 is an experimental system, offering an alternative architecture for data-intensive computation and providing a platform for evaluating tools for data-intensive research; a 120 node cluster of ‘data-bricks’ with high storage yet modest computational capacity. This document gives some background into the context in which EDIM1 was designed and constructed, as well as providing an overview of its use so far and future plans. ER - TY - JOUR T1 - A Generic Parallel Processing Model for Facilitating Data Mining and Integration JF - Parallel Computing Y1 - 2011 A1 - Liangxiu Han A1 - Chee Sun Liew A1 - van Hemert, Jano A1 - Malcolm Atkinson KW - Data Mining and Data Integration (DMI) KW - Life Sciences KW - OGSA-DAI KW - Parallelism KW - Pipeline Streaming KW - workflow AB - To facilitate Data Mining and Integration (DMI) processes in a generic way, we investigate a parallel pipeline streaming model. We model a DMI task as a streaming data-flow graph: a directed acyclic graph (DAG) of Processing Elements PEs. The composition mechanism links PEs via data streams, which may be in memory, buffered via disks or inter-computer data-flows. This makes it possible to build arbitrary DAGs with pipelining and both data and task parallelisms, which provides room for performance enhancement. We have applied this approach to a real DMI case in the Life Sciences and implemented a prototype. To demonstrate feasibility of the modelled DMI task and assess the efficiency of the prototype, we have also built a performance evaluation model. The experimental evaluation results show that a linear speedup has been achieved with the increase of the number of distributed computing nodes in this case study. PB - Elsevier VL - 37 IS - 3 ER - TY - CONF T1 - Optimum Platform Selection and Configuration for Computational Jobs T2 - All Hands Meeting 2011 Y1 - 2011 A1 - Gary McGilvary A1 - Malcolm Atkinson A1 - Barker, Adam A1 - Ashley Lloyd AB - The performance and cost of many scientific applications which execute on a variety of High Performance Computing (HPC), local cluster environments and cloud services could be enhanced, and costs reduced if the platform was carefully selected on a per-application basis and the application itself was optimally configured for a given platform. With a wide-variety of computing platforms on offer, each possessing different properties, all too frequently platform decisions are made on an ad-hoc basis with limited ‘black-box’ information. The limitless number of possible application configurations also make it difficult for an individual who wants to achieve cost-effective results with the maximum performance available. Such individuals may include biomedical researchers analysing microarray data, software developers running aviation simulations or bankers performing risk assessments. However in either case, it is likely that many may not have the required knowledge to select the optimum platform and setup for their application; to do so, would require extensive knowledge of their applications and various platforms. In this paper we describe a framework that aims to resolve such issues by (i) reducing the detail required in the decision making process by placing this information within a selection framework, thereby (ii) maximising an application’s performance gain and/or reducing costs. We present a set of preliminary results where we compare the performance of running the Simple Parallel R INTerface (SPRINT) over a variety of platforms. SPRINT is a framework providing parallel functions of the statistical package R, allowing post genomic data to be easily analysed on HPC resources [1]. We run SPRINT on Amazon’s Elastic Compute Cloud (EC2) to compare the performance with the results obtained from HECToR, the UK’s National Supercomputing Service, and the Edinburgh Compute and Data Facilities (ECDF) cluster. JF - All Hands Meeting 2011 CY - York ER - TY - CONF T1 - A Parallel Deconvolution Algorithm in Perfusion Imaging T2 - Healthcare Informatics, Imaging, and Systems Biology (HISB) Y1 - 2011 A1 - Zhu, Fan. A1 - Rodríguez, David A1 - Carpenter, Trevor A1 - Malcolm Atkinson A1 - Wardlaw, Joanna KW - Deconvolution KW - GPGPU KW - Parallelization KW - Perfusion Imaging AB - In this paper, we will present the implementation of a deconvolution algorithm for brain perfusion quantification on GPGPU (General Purpose Graphics Processor Units) using the CUDA programming model. GPUs originated as graphics generation dedicated co-processors, but the modern GPUs have evolved to become a more general processor capable of executing scientific computations. It provides a highly parallel computing environment due to its huge number of computing cores and constitutes an affordable high performance computing method. The objective of brain perfusion quantification is to generate parametric maps of relevant haemodynamic quantities such as Cerebral Blood Flow (CBF), Cerebral Blood Volume (CBV) and Mean Transit Time (MTT) that can be used in diagnosis of conditions such as stroke or brain tumors. These calculations involve deconvolution operations that in the case of using local Arterial Input Functions (AIF) can be very expensive computationally. We present the serial and parallel implementations of such algorithm and the evaluation of the performance gains using GPUs. JF - Healthcare Informatics, Imaging, and Systems Biology (HISB) CY - San Jose, California SN - 978-1-4577-0325-6 UR - http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6061411&tag=1 ER - TY - JOUR T1 - Validation and mismatch repair of workflows through typed data streams JF - Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences Y1 - 2011 A1 - Yaikhom, Gagarine A1 - Malcolm Atkinson A1 - van Hemert, Jano A1 - Oscar Corcho A1 - Krause, Amy AB - The type system of a language guarantees that all of the operations on a set of data comply with the rules and conditions set by the language. While language typing is a fundamental requirement for any programming language, the typing of data that flow between processing elements within a workflow is currently being treated as optional. In this paper, we introduce a three-level type system for typing workflow data streams. These types are parts of the Data Intensive System Process Engineering Language programming language, which empowers users with the ability to validate the connections inside a workflow composition, and apply appropriate data type conversions when necessary. Furthermore, this system enables the enactment engine in carrying out type-directed workflow optimizations. VL - 369 IS - 1949 ER - TY - RPRT T1 - Data-Intensive Research Workshop (15-19 March 2010) Report Y1 - 2010 A1 - Malcolm Atkinson A1 - Roure, David De A1 - van Hemert, Jano A1 - Shantenu Jha A1 - Ruth McNally A1 - Robert Mann A1 - Stratis Viglas A1 - Chris Williams KW - Data-intensive Computing KW - Data-Intensive Machines KW - Machine Learning KW - Scientific Databases AB - We met at the National e-Science Institute in Edinburgh on 15-19 March 2010 to develop our understanding of DIR. Approximately 100 participants (see Appendix A) worked together to develop their own understanding, and we are offering this report as the first step in communicating that to a wider community. We present this in turns of our developing/emerging understanding of "What is DIR?" and "Why it is important?'". We then review the status of the field, report what the workshop achieved and what remains as open questions. JF - National e-Science Centre PB - Data-Intensive Research Group, School of Informatics, University of Edinburgh CY - Edinburgh ER - TY - Generic T1 - Federated Enactment of Workflow Patterns T2 - Lecture Notes in Computer Science Y1 - 2010 A1 - Yaikhom, Gagarine A1 - Liew, Chee A1 - Liangxiu Han A1 - van Hemert, Jano A1 - Malcolm Atkinson A1 - Krause, Amy ED - D’Ambra, Pasqua ED - Guarracino, Mario ED - Talia, Domenico AB - In this paper we address two research questions concerning workflows: 1) how do we abstract and catalogue recurring workflow patterns?; and 2) how do we facilitate optimisation of the mapping from workflow patterns to actual resources at runtime? Our aim here is to explore techniques that are applicable to large-scale workflow compositions, where the resources could change dynamically during the lifetime of an application. We achieve this by introducing a registry-based mechanism where pattern abstractions are catalogued and stored. In conjunction with an enactment engine, which communicates with this registry, concrete computational implementations and resources are assigned to these patterns, conditional to the execution parameters. Using a data mining application from the life sciences, we demonstrate this new approach. JF - Lecture Notes in Computer Science PB - Springer Berlin / Heidelberg VL - 6271 UR - http://dx.doi.org/10.1007/978-3-642-15277-1_31 N1 - 10.1007/978-3-642-15277-1_31 ER - TY - RPRT T1 - ADMIRE D1.5 – Report defining an iteration of the model and language: PM3 and DL3 Y1 - 2009 A1 - Peter Brezany A1 - Ivan Janciak A1 - Alexander Woehrer A1 - Carlos Buil Aranda A1 - Malcolm Atkinson A1 - van Hemert, Jano AB - This document is the third deliverable to report on the progress of the model, language and ontology research conducted within Workpackage 1 of the ADMIRE project. Significant progress has been made on each of the above areas. The new results that we achieved are recorded against the targets defined for project month 18 and are reported in four sections of this document PB - ADMIRE project UR - http://www.admire-project.eu/docs/ADMIRE-D1.5-model-language-ontology.pdf ER - TY - JOUR T1 - Guest Editorial: Research Data: It’s What You Do With Them JF - International Journal of Digital Curation Y1 - 2009 A1 - Malcolm Atkinson AB - These days it may be stating the obvious that the number of data resources, their complexity and diversity is growing rapidly due to the compound effects of increasing speed and resolution of digital instruments, due to pervasive data-collection automation and due to the growing power of computers. Just because we are becoming used to the accelerating growth of data resources, it does not mean we can be complacent; they represent an enormous wealth of opportunity to extract information, to make discoveries and to inform policy. But all too often it still takes a heroic effort to discover and exploit those opportunities, hence the research and progress, charted by the Fourth International Digital Curation Conference1 and recorded in this issue of the International Journal of Digital Curation, are an invaluable step on a long and demanding journey. VL - 4 UR - http://www.ijdc.net/index.php/ijdc/article/view/96 IS - 1 ER - TY - JOUR T1 - Strategies and Policies to Support and Advance Education in e-Science JF - Computing Now Y1 - 2009 A1 - Malcolm Atkinson A1 - Elizabeth Vander Meer A1 - Fergusson, David A1 - Clive Davenhall A1 - Hamza Mehammed AB - In previous installments of this series, we’ve presented tools and resources that university undergraduate and graduate environments must provide to allow for the continued development and success of e-Science education. We’ve introduced related summer (http://doi.ieeecomputersociety.org/ 10.1109/MDSO.2008.20) and winter (http://doi.ieeecomputersociety.org/10.1109/MDSO.2008.26) schools and important issues such as t-Infrastructure provision (http://doi.ieeecomputersociety.org/ 10.1109/MDSO.2008.28), intellectual property rights in the context of digital repositories (http://doi.ieeecomputersociety.org/10.1109/MDSO.2008.34), and curriculum content (http://www2. computer.org/portal/web/computingnow/0309/education). We conclude now with an overview of areas in which we must focus effort and strategies and policies that could provide much-needed support in these areas. We direct these strategy and policy recommendations toward key stakeholders in e-Science education, such as ministries of education, councils in professional societies, and professional teachers and educational strategists. Ministries of education can influence funding councils, thus financially supporting our proposals. Professional societies can assist in curricula revision, and teachers and strategists shape curricula in institutions, which makes them valuable in improving and developing education in e-Science and (perhaps) e-Science in education. We envision incremental change in curricula, so our proposals aim to evolve existing courses, rather than suggesting drastic upheavals and isolated additions. The long-term goal is to ensure that every graduate obtains the appropriate level of e-Science competency for their field, but we don’t presume to define that level for any given discipline or institution. We set out issues and ideas but don’t offer rigid prescriptions, which would take control away from important stakeholders. UR - http://www.computer.org/portal/web/computingnow/education ER - TY - JOUR T1 - Distributed Computing Education, Part 5: Coming to Terms with Intellectual Property Rights JF - Distributed Systems Online Y1 - 2008 A1 - Boon Low A1 - Kathryn Cassidy A1 - Fergusson, David A1 - Malcolm Atkinson A1 - Elizabeth Vander Meer A1 - Mags McGeever AB - In part 1 of this series on distributed computing education, we introduced a list of components important for teaching environments. We outlined the first three components, which included development of materials for education, education for educators and teaching infrastructures, identifying current practice, challenges, and opportunities for provision. The final component, a supportive policy framework that encourages cooperation and sharing, includes the need to manage intellectual property rights (IPR). PB - IEEE Computer Society VL - 9 UR - http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4755177 IS - 12 ER - TY - CONF T1 - OGSA-DAI Status and Benchmarks T2 - All Hands Meeting 2005 Y1 - 2005 A1 - Antonioletti, Mario A1 - Malcolm Atkinson A1 - Rob Baxter A1 - Andrew Borle A1 - Hong, Neil P. Chue A1 - Patrick Dantressangle A1 - Hume, Alastair C. A1 - Mike Jackson A1 - Krause, Amy A1 - Laws, Simon A1 - Parsons, Mark A1 - Paton, Norman W. A1 - Jennifer M. Schopf A1 - Tom Sugden A1 - Watson, Paul AB - This paper presents a status report on some of the highlights that have taken place within the OGSADAI project since the last AHM. A description of Release 6.0 functionality and details of the forthcoming release, due in September 2005, is given. Future directions for this project are discussed. This paper also describes initial results of work being done to systematically benchmark recent OGSADAI releases. The OGSA-DAI software distribution, and more information about the project, is available from the project website at www.ogsadai.org.uk. JF - All Hands Meeting 2005 CY - Nottingham, UK ER - TY - Generic T1 - Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data T2 - All Hands Meeting 2004 Y1 - 2004 A1 - Richard Sinnott A1 - Malcolm Atkinson A1 - Micha Bayer A1 - Dave Berry A1 - Anna Dominiczak A1 - Magnus Ferrier A1 - David Gilbert A1 - Neil Hanlon A1 - Derek Houghton A1 - Hunt, Ela A1 - David White AB - The BRIDGES project is a UK e-Science project that provides grid based support for biomedical research into the genetics of hypertension – the Cardiovascular Functional Genomics Project (CFG). Its main goal is to provide an effective environment for CFG, and biomedical research in general, including access to integrated data, analysis and visualization, with appropriate authorisation and privacy, as well as grid based computational tools and resources. It also aims to provide an improved understanding of the requirements of academic biomedical research virtual organizations and to evaluate the utility of existing data federation tools. JF - All Hands Meeting 2004 CY - Nottingham, UK UR - http://www.allhands.org.uk/2004/proceedings/papers/87.pdf ER - TY - CONF T1 - OGSA-DAI Status Report and Future Directions T2 - All Hands Meeting 2004 Y1 - 2004 A1 - Antonioletti, Mario A1 - Malcolm Atkinson A1 - Rob Baxter A1 - Borley, Andrew A1 - Hong, Neil P. Chue A1 - Collins, Brian A1 - Jonathan Davies A1 - Desmond Fitzgerald A1 - Hardman, Neil A1 - Hume, Alastair C. A1 - Mike Jackson A1 - Krause, Amrey A1 - Laws, Simon A1 - Paton, Norman W. A1 - Tom Sugden A1 - Watson, Paul A1 - Mar AB - Data Access and Integration (DAI) of data resources, such as relational and XML databases, within a Grid context. Project members also participate in the development of DAI standards through the GGF DAIS WG. The standards that emerge through this effort will be adopted by OGSA-DAI once they have stabilised. The OGSA-DAI developers are also engaging with a growing user community to gather their data and functionality requirements. Several large projects are already using OGSA-DAI to provide their DAI capabilities. This paper presents a status report on OGSA-DAI activities since the last AHM and announces future directions. The OGSA-DAI software distribution and more information about the project is available from the project website at http://www.ogsadai.org.uk/. JF - All Hands Meeting 2004 CY - Nottingham, UK ER - TY - CONF T1 - OGSA-DAI: Two Years On T2 - GGF10 Y1 - 2004 A1 - Antonioletti, Mario A1 - Malcolm Atkinson A1 - Rob Baxter A1 - Borley, Andrew A1 - Neil Chue Hong A1 - Collins, Brian A1 - Jonathan Davies A1 - Hardman, Neil A1 - George Hicken A1 - Ally Hume A1 - Mike Jackson A1 - Krause, Amrey A1 - Laws, Simon A1 - Magowan, James A1 - Jeremy Nowell A1 - Paton, Norman W. A1 - Dave Pearson A1 - To AB - The OGSA-DAI project has been producing Grid-enabled middleware for almost two years now, providing data access and integration capabilities to data resources, such as databases, within an OGSA context. In these two years, OGSA-DAI has been tracking rapidly evolving standards, managing changes in software dependencies, contributing to the standardisation process and liasing with a growing user community together with their associated data requirements. This process has imparted important lessons and raised a number of issues that need to be addressed if a middleware product is to be widely adopted. This paper examines the experiences of OGSA-DAI in implementing proposed standards, the likely impact that the still-evolving standards landscape will have on future implementations and how these affect uptake of the software. The paper also examines the gathering of requirements from and engagement with the Grid community, the difficulties of defining a process for the management and publishing of metadata, and whether relevant standards can be implemented in an efficient manner. The OGSA-DAI software distribution and more details about the project are available from the project Web site at http://www.ogsadai.org.uk/. JF - GGF10 CY - Berlin, Germany ER - TY - RPRT T1 - Web Service Grids: An Evolutionary Approach Y1 - 2004 A1 - Malcolm Atkinson A1 - Roure, David De A1 - Alistair Dunlop A1 - Fox, Geoffrey A1 - Henderson, Peter A1 - Tony Hey A1 - Norman Paton A1 - Newhouse, Steven A1 - Parastatidis, Savas A1 - Anne Trefethen A1 - Watson, Paul A1 - Webber, Jim AB - The UK e-Science Programme is a £250M, 5 year initiative which has funded over 100 projects. These application-led projects are under-pinned by an emerging set of core middleware services that allow the coordinated, collaborative use of distributed resources. This set of middleware services runs on top of the research network and beneath the applications we call the ‘Grid’. Grid middleware is currently in transition from pre-Web Service versions to a new version based on Web Services. Unfortunately, only a very basic set of Web Services embodied in the Web Services Interoperability proposal, WS-I, are agreed by most IT companies. IBM and others have submitted proposals for Web Services for Grids - the Web Services ResourceFramework and Web Services Notification specifications - to the OASIS organisation for standardisation. This process could take up to 12 months from March 2004 and the specifications are subject to debate and potentially significant changes. Since several significant UK e-Science projects come to an end before the end of this process, the UK therefore needs to develop a strategy that will protect the UK’s investment in Grid middleware by informing the Open Middleware Infrastructure Institute’s (OMII) roadmap and UK middleware repository in Southampton. This paper sets out an evolutionary roadmap that will allow us to capture generic middleware components from projects in a form that will facilitate migration or interoperability with the emerging Grid Web Services standards and with on-going OGSA developments. In this paper we therefore define a set of Web Services specifications - that we call ‘WS-I+’ to reflect the fact that this is a larger set than currently accepted by WS-I – that we believe will enable us to achieve the twin goals of capturing these components and facilitating migration to future standards. We believe that the extra Web Services specifications we have included in WS-I+ are both helpful in building e-Science Grids and likely to be widely accepted. JF - UK e-Science Technical Report Series ER - TY - CONF T1 - The Design and Implementation of Grid Database Services in OGSA-DAI T2 - All Hands Meeting 2003 Y1 - 2003 A1 - Ali Anjomshoaa A1 - Antonioletti, Mario A1 - Malcolm Atkinson A1 - Rob Baxter A1 - Borley, Andrew A1 - Hong, Neil P. Chue A1 - Collins, Brian A1 - Hardman, Neil A1 - George Hicken A1 - Ally Hume A1 - Knox, Alan A1 - Mike Jackson A1 - Krause, Amrey A1 - Laws, Simon A1 - Magowan, James A1 - Charaka Palansuriya A1 - Paton, Norman W. AB - This paper presents a high-level overview of the design and implementation of the core components of the OGSA-DAI project. It describes the design decisions made, the project’s interaction with the Data Access and Integration Working Group of the Global Grid Forum and provides an overview of implementation characteristics. Further details of the implementation are provided in the extensive documentation available from the project web site. JF - All Hands Meeting 2003 CY - Nottingham, UK ER - TY - CONF T1 - The GRUMPS Architecture: Run-time Evolution in a Large Scale Distributed System T2 - Proceedings of the Workshop on Engineering Complex Object-Oriented Solutions for Evolution (ECOOSE), held as part of OOPSLA 2001. Y1 - 2001 A1 - Evans, Huw A1 - Peter Dickman A1 - Malcolm Atkinson AB - This paper describes the first version of the distributed programming architecture for the Grumps1 project. The architecture consists of objects that communicate in terms of both asynchronous and synchronous events. A novel three-level extensible naming scheme is discussed that allows Grumps developers to deploy systems that can refer to entities not identified at the time when the Grumps system and application-level code were implemented. Examples detailing how the topology of a Grumps system may be changed at run-time and how new object implementations may be distributed during system execution are given. The separation of policy from mechanism is shown to be a major part of how system evolution is supported and this is made even more flexible when expressed through the use of Java interfaces for crucial core concepts. JF - Proceedings of the Workshop on Engineering Complex Object-Oriented Solutions for Evolution (ECOOSE), held as part of OOPSLA 2001. ER -