TY - CONF
T1 - Ad hoc Cloud Computing
T2 - IEEE Cloud
Y1 - 2015
A1 - Gary McGilvary
A1 - Barker, Adam
A1 - Malcolm Atkinson
KW - ad hoc
KW - cloud computing
KW - reliability
KW - virtualization
KW - volunteer computing
AB - This paper presents the first complete, integrated and end-to-end solution for ad hoc cloud computing environments. Ad hoc clouds harvest resources from existing sporadically available, non-exclusive (i.e. primarily used for some other purpose) and unreliable infrastructures. In this paper we discuss the problems ad hoc cloud computing solves and outline our architecture which is based on BOINC.
JF - IEEE Cloud
UR - http://arxiv.org/abs/1505.08097
ER -
TY - BOOK
T1 - Ad hoc Cloud Computing (PhD Thesis)
Y1 - 2014
A1 - Gary McGilvary
AB - Commercial and private cloud providers offer virtualized resources via a set of co-located and dedicated hosts that are exclusively reserved for the purpose of offering a cloud service. While both cloud models appeal to the mass market, there are many cases where outsourcing to a remote platform or procuring an in-house infrastructure may not be ideal or even possible. To offer an attractive alternative, we introduce and develop an ad hoc cloud computing platform to transform spare resource capacity from an infrastructure owner's locally available, but non-exclusive and unreliable infrastructure, into an overlay cloud platform. The foundation of the ad hoc cloud relies on transferring and instantiating lightweight virtual machines on-demand upon near-optimal hosts while virtual machine checkpoints are distributed in a P2P fashion to other members of the ad hoc cloud. Virtual machines found to be non-operational are restored elsewhere ensuring the continuity of cloud jobs. In this thesis we investigate the feasibility, reliability and performance of ad hoc cloud computing infrastructures. We firstly show that the combination of both volunteer computing and virtualization is the backbone of the ad hoc cloud. We outline the process of virtualizing the volunteer system BOINC to create V-BOINC. V-BOINC distributes virtual machines to volunteer hosts allowing volunteer applications to be executed in the sandbox environment to solve many of the downfalls of BOINC; this however also provides the basis for an ad hoc cloud computing platform to be developed. We detail the challenges of transforming V-BOINC into an ad hoc cloud and outline the transformational process and integrated extensions. These include a BOINC job submission system, cloud job and virtual machine restoration schedulers and a periodic P2P checkpoint distribution component. Furthermore, as current monitoring tools are unable to cope with the dynamic nature of ad hoc clouds, a dynamic infrastructure monitoring and management tool called the Cloudlet Control Monitoring System is developed and presented. We evaluate each of our individual contributions as well as the reliability, performance and overheads associated with an ad hoc cloud deployed on a realistically simulated unreliable infrastructure. We conclude that the ad hoc cloud is not only a feasible concept but also a viable computational alternative that offers high levels of reliability and can at least offer reasonable performance, which at times may exceed the performance of a commercial cloud infrastructure.
PB - The University of Edinburgh
CY - Edinburgh
ER -
TY - CONF
T1 - C2MS: Dynamic Monitoring and Management of Cloud Infrastructures
T2 - IEEE CloudCom
Y1 - 2013
A1 - Gary McGilvary
A1 - Josep Rius
A1 - Íñigo Goiri
A1 - Francesc Solsona
A1 - Barker, Adam
A1 - Atkinson, Malcolm P.
AB - Server clustering is a common design principle employed by many organisations who require high availability, scalability and easier management of their infrastructure. Servers are typically clustered according to the service they provide whether it be the application(s) installed, the role of the server or server accessibility for example. In order to optimize performance, manage load and maintain availability, servers may migrate from one cluster group to another making it difficult for server monitoring tools to continuously monitor these dynamically changing groups. Server monitoring tools are usually statically configured and with any change of group membership requires manual reconfiguration; an unreasonable task to undertake on large-scale cloud infrastructures. In this paper we present the Cloudlet Control and Management System (C2MS); a system for monitoring and controlling dynamic groups of physical or virtual servers within cloud infrastructures. The C2MS extends Ganglia - an open source scalable system performance monitoring tool - by allowing system administrators to define, monitor and modify server groups without the need for server reconfiguration. In turn administrators can easily monitor group and individual server metrics on large-scale dynamic cloud infrastructures where roles of servers may change frequently. Furthermore, we complement group monitoring with a control element allowing administrator-specified actions to be performed over servers within service groups as well as introduce further customized monitoring metrics. This paper outlines the design, implementation and evaluation of the C2MS.
JF - IEEE CloudCom
CY - Bristol, UK
ER -
TY - JOUR
T1 - Embedded systems for global e-Social Science: Moving computation rather than data
JF - Future Generation Computer Systems
Y1 - 2013
A1 - Ashley D. Lloyd
A1 - Terence M. Sloan
A1 - Antonioletti, Mario
A1 - Gary McGilvary
AB - There is a wealth of digital data currently being gathered by commercial and private concerns that could supplement academic research. To unlock this data it is important to gain the trust of the companies that hold the data as well as showing them how they may benefit from this research. Part of this trust is gained through established reputation and the other through the technology used to safeguard the data. This paper discusses how different technology frameworks have been applied to safeguard the data and facilitate collaborative work between commercial concerns and academic institutions. The paper focuses on the distinctive requirements of e-Social Science: access to large-scale data on behaviour in society in environments that impose confidentiality constraints on access. These constraints arise from both privacy concerns and the commercial sensitivities of that data. In particular, the paper draws on the experiences of building an intercontinental Grid–INWA–from its first operation connecting Australia and Scotland to its subsequent extension to China across the Trans-Eurasia Information Network–the first large-scale research and education network for the Asia-Pacific region. This allowed commercial data to be analysed by experts that were geographically distributed across the globe. It also provided an entry point for a major Chinese commercial organization to approve use of a Grid solution in a new collaboration provided the centre of gravity of the data is retained within the jurisdiction of the data owner. We describe why, despite this approval, an embedded solution was eventually adopted. We find that ‘data sovereignty’ dominates any decision on whether and how to participate in e-Social Science collaborations and how this might impact on a Cloud based solution to this type of collaboration.
VL - 29
UR - http://www.sciencedirect.com/science/article/pii/S0167739X12002336
IS - 5
ER -
TY - JOUR
T1 - Exploiting Parallel R in the Cloud with SPRINT
JF - Methods of Information in Medicine
Y1 - 2013
A1 - Piotrowski, Michal
A1 - Gary McGilvary
A1 - Sloan, Terence
A1 - Mewissen, Muriel
A1 - Ashley Lloyd
A1 - Forster, Thorsten
A1 - Mitchell, Lawrence
A1 - Ghazal, Peter
A1 - Hill, Jon
AB - Background: Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. Objectives: Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon’s Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. Methods: The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. Results: It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of algorithm. Resource underutilization can further improve the time to result. End-user’s location impacts on costs due to factors such as local taxation. Conclusions: Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds.
VL - 52
IS - 1
ER -
TY - RPRT
T1 - The Implementation of OpenStack Cinder and Integration with NetApp and Ceph
Y1 - 2013
A1 - Gary McGilvary
A1 - Thomas Oulevey
AB - With the ever increasing amount of data produced from Large Hadron Collider (LHC) experiments, new ways are sought to help analyze and store this data as well as help researchers perform their own experiments. To help offer solutions to such problems, CERN has employed the use of cloud computing and in particular OpenStack; an open source and scalable platform for building public and private clouds. The OpenStack project contains many components such as Cinder used to create block storage that can be attached to virtual machines and in turn help increase performance. However instead of creating volumes locally with OpenStack, others remote storage clusters exist offering block based storage with features not present in the current OpenStack implementation; two popular solutions are NetApp and Ceph. Two features Ceph offers is the ability to stripe data stored within volumes over the distributed cluster as well as locally cache this data, both with the aim of improving performance. When in use with OpenStack, Ceph performs default data striping where the number and size of stripes is fixed and cannot be changed dependent on the volume to be created. Similarly, Ceph does not perform data caching when integrated with OpenStack. In this project we outline and document the integration of NetApp and Ceph with OpenStack as well as benchmark the performance of the NetApp and Ceph clusters already present at CERN. To allow Ceph data striping, we modify OpenStack to take the number and size of stripes input via the user to create volumes whose data is then striped according to the values they specify. Similarly, we also modify OpenStack to enable Ceph caching and allow users to select the caching policy they require per-volume. In this report, we describe how these features are implemented.
JF - CERN Openlab
PB - CERN
ER -
TY - CONF
T1 - V-BOINC: The Virtualization of BOINC
T2 - In Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2013).
Y1 - 2013
A1 - Gary McGilvary
A1 - Barker, Adam
A1 - Ashley Lloyd
A1 - Malcolm Atkinson
AB - The Berkeley Open Infrastructure for Network Computing (BOINC) is an open source client-server middleware system created to allow projects with large computational requirements, usually set in the scientific domain, to utilize a technically unlimited number of volunteer machines distributed over large physical distances. However various problems exist deploying applications over these heterogeneous machines using BOINC: applications must be ported to each machine architecture type, the project server must be trusted to supply authentic applications, applications that do not regularly checkpoint may lose execution progress upon volunteer machine termination and applications that have dependencies may find it difficult to run under BOINC. To solve such problems we introduce virtual BOINC, or V-BOINC, where virtual machines are used to run computations on volunteer machines. Application developers can then compile their applications on a single architecture, checkpointing issues are solved through virtualization API's and many security concerns are addressed via the virtual machine's sandbox environment. In this paper we focus on outlining a unique approach on how virtualization can be introduced into BOINC and demonstrate that V-BOINC offers acceptable computational performance when compared to regular BOINC. Finally we show that applications with dependencies can easily run under V-BOINC in turn increasing the computational potential volunteer computing offers to the general public and project developers. V-BOINC can be downloaded at http://garymcgilvary.co.uk/vboinc.html
JF - In Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2013).
CY - Delft, The Netherlands
ER -
TY - CONF
T1 - Optimum Platform Selection and Configuration for Computational Jobs
T2 - All Hands Meeting 2011
Y1 - 2011
A1 - Gary McGilvary
A1 - Malcolm Atkinson
A1 - Barker, Adam
A1 - Ashley Lloyd
AB - The performance and cost of many scientific applications which execute on a variety of High Performance Computing (HPC), local cluster environments and cloud services could be enhanced, and costs reduced if the platform was carefully selected on a per-application basis and the application itself was optimally configured for a given platform. With a wide-variety of computing platforms on offer, each possessing different properties, all too frequently platform decisions are made on an ad-hoc basis with limited ‘black-box’ information. The limitless number of possible application configurations also make it difficult for an individual who wants to achieve cost-effective results with the maximum performance available. Such individuals may include biomedical researchers analysing microarray data, software developers running aviation simulations or bankers performing risk assessments. However in either case, it is likely that many may not have the required knowledge to select the optimum platform and setup for their application; to do so, would require extensive knowledge of their applications and various platforms. In this paper we describe a framework that aims to resolve such issues by (i) reducing the detail required in the decision making process by placing this information within a selection framework, thereby (ii) maximising an application’s performance gain and/or reducing costs. We present a set of preliminary results where we compare the performance of running the Simple Parallel R INTerface (SPRINT) over a variety of platforms. SPRINT is a framework providing parallel functions of the statistical package R, allowing post genomic data to be easily analysed on HPC resources [1]. We run SPRINT on Amazon’s Elastic Compute Cloud (EC2) to compare the performance with the results obtained from HECToR, the UK’s National Supercomputing Service, and the Edinburgh Compute and Data Facilities (ECDF) cluster.
JF - All Hands Meeting 2011
CY - York
ER -