TY - JOUR T1 - Automatically Identifying and Annotating Mouse Embryo Gene Expression Patterns JF - Bioinformatics Y1 - 2011 A1 - Liangxiu Han A1 - van Hemert, Jano A1 - Richard Baldock KW - classification KW - e-Science AB - Motivation: Deciphering the regulatory and developmental mechanisms for multicellular organisms requires detailed knowledge of gene interactions and gene expressions. The availability of large datasets with both spatial and ontological annotation of the spatio-temporal patterns of gene-expression in mouse embryo provides a powerful resource to discover the biological function of embryo organisation. Ontological annotation of gene expressions consists of labelling images with terms from the anatomy ontology for mouse development. If the spatial genes of an anatomical component are expressed in an image, the image is then tagged with a term of that anatomical component. The current annotation is done manually by domain experts, which is both time consuming and costly. In addition, the level of detail is variable and inevitably, errors arise from the tedious nature of the task. In this paper, we present a new method to automatically identify and annotate gene expression patterns in the mouse embryo with anatomical terms. Results: The method takes images from in situ hybridisation studies and the ontology for the developing mouse embryo, it then combines machine learning and image processing techniques to produce classifiers that automatically identify and annotate gene expression patterns in these images.We evaluate our method on image data from the EURExpress-II study where we use it to automatically classify nine anatomical terms: humerus, handplate, fibula, tibia, femur, ribs, petrous part, scapula and head mesenchyme. The accuracy of our method lies between 70–80% with few exceptions. Conclusions: We show that other known methods have lower classification performance than ours. We have investigated the images misclassified by our method and found several cases where the original annotation was not correct. This shows our method is robust against this kind of noise. Availability: The annotation result and the experimental dataset in the paper can be freely accessed at http://www2.docm.mmu.ac.uk/STAFF/L.Han/geneannotation/ Contact: l.han@mmu.ac.uk, j.vanhemert@ed.ac.uk and Richard.Baldock@hgu.mrc.ac.uk VL - 27 UR - http://bioinformatics.oxfordjournals.org/content/early/2011/02/25/bioinformatics.btr105.abstract ER - TY - JOUR T1 - Managing dynamic enterprise and urgent workloads on clouds using layered queuing and historical performance models JF - Simulation Modelling Practice and Theory Y1 - 2011 A1 - David A. Bacigalupo A1 - van Hemert, Jano I. A1 - Xiaoyu Chen A1 - Asif Usmani A1 - Adam P. Chester A1 - Ligang He A1 - Donna N. Dillenberger A1 - Gary B. Wills A1 - Lester Gilbert A1 - Stephen A. Jarvis KW - e-Science AB - The automatic allocation of enterprise workload to resources can be enhanced by being able to make what–if response time predictions whilst different allocations are being considered. We experimentally investigate an historical and a layered queuing performance model and show how they can provide a good level of support for a dynamic-urgent cloud environment. Using this we define, implement and experimentally investigate the effectiveness of a prediction-based cloud workload and resource management algorithm. Based on these experimental analyses we: (i) comparatively evaluate the layered queuing and historical techniques; (ii) evaluate the effectiveness of the management algorithm in different operating scenarios; and (iii) provide guidance on using prediction-based workload and resource management. VL - 19 ER - TY - JOUR T1 - A user-friendly web portal for T-Coffee on supercomputers JF - BMC Bioinformatics Y1 - 2011 A1 - J. Rius A1 - F. Cores A1 - F. Solsona A1 - van Hemert, J. I. A1 - Koetsier, J. A1 - C. Notredame KW - e-Science KW - portal KW - rapid AB - Background Parallel T-Coffee (PTC) was the first parallel implementation of the T-Coffee multiple sequence alignment tool. It is based on MPI and RMA mechanisms. Its purpose is to reduce the execution time of the large-scale sequence alignments. It can be run on distributed memory clusters allowing users to align data sets consisting of hundreds of proteins within a reasonable time. However, most of the potential users of this tool are not familiar with the use of grids or supercomputers. Results In this paper we show how PTC can be easily deployed and controlled on a super computer architecture using a web portal developed using Rapid. Rapid is a tool for efficiently generating standardized portlets for a wide range of applications and the approach described here is generic enough to be applied to other applications, or to deploy PTC on different HPC environments. Conclusions The PTC portal allows users to upload a large number of sequences to be aligned by the parallel version of TC that cannot be aligned by a single machine due to memory and execution time constraints. The web portal provides a user-friendly solution. VL - 12 UR - http://www.biomedcentral.com/1471-2105/12/150 ER - TY - CONF T1 - Resource management of enterprise cloud systems using layered queuing and historical performance models T2 - IEEE International Symposium on Parallel Distributed Processing Y1 - 2010 A1 - Bacigalupo, D. A. A1 - van Hemert, J. A1 - Usmani, A. A1 - Dillenberger, D. N. A1 - Wills, G. B. A1 - Jarvis, S. A. KW - e-Science AB - The automatic allocation of enterprise workload to resources can be enhanced by being able to make `what-if' response time predictions, whilst different allocations are being considered. It is important to quantitatively compare the effectiveness of different prediction techniques for use in cloud infrastructures. To help make the comparison of relevance to a wide range of possible cloud environments it is useful to consider the following. 1.) urgent cloud customers such as the emergency services that can demand cloud resources at short notice (e.g. for our FireGrid emergency response software). 2.) dynamic enterprise systems, that must rapidly adapt to frequent changes in workload, system configuration and/or available cloud servers. 3.) The use of the predictions in a coordinated manner by both the cloud infrastructure and cloud customer management systems. 4.) A broad range of criteria for evaluating each technique. However, there have been no previous comparisons meeting these requirements. This paper, meeting the above requirements, quantitatively compares the layered queuing and (\^A¿HYDRA\^A¿) historical techniques - including our initial thoughts on how they could be combined. Supporting results and experiments include the following: i.) defining, investigating and hence providing guidelines on the use of a historical and layered queuing model; ii.) using these guidelines showing that both techniques can make low overhead and typically over 70% accurate predictions, for new server architectures for which only a small number of benchmarks have been run; and iii.) defining and investigating tuning a prediction-based cloud workload and resource management algorithm. JF - IEEE International Symposium on Parallel Distributed Processing ER - TY - JOUR T1 - Towards a Virtual Fly Brain JF - Philosophical Transactions A Y1 - 2009 A1 - Armstrong, J. D. A1 - van Hemert, J. I. KW - e-Science AB - Models of the brain that simulate sensory input, behavioural output and information processing in a biologically plausible manner pose significant challenges to both Computer Science and Biology. Here we investigated strategies that could be used to create a model of the insect brain, specifically that of Drosophila melanogaster which is very widely used in laboratory research. The scale of the problem is an order of magnitude above the most complex of the current simulation projects and it is further constrained by the relative sparsity of available electrophysiological recordings from the fly nervous system. However, fly brain research at the anatomical and behavioural level offers some interesting opportunities that could be exploited to create a functional simulation. We propose to exploit these strengths of Drosophila CNS research to focus on a functional model that maps biologically plausible network architecture onto phenotypic data from neuronal inhibition and stimulation studies, leaving aside biophysical modelling of individual neuronal activity for future models until more data is available. VL - 367 UR - http://rsta.royalsocietypublishing.org/content/367/1896/2387.abstract ER - TY - CONF T1 - Matching Spatial Regions with Combinations of Interacting Gene Expression Patterns T2 - Communications in Computer and Information Science Y1 - 2008 A1 - van Hemert, J. I. A1 - Baldock, R. A. ED - M. Elloumi ED - \emph ED - et al KW - biomedical KW - data mining KW - DGEMap KW - e-Science AB - The Edinburgh Mouse Atlas aims to capture in-situ gene expression patterns in a common spatial framework. In this study, we construct a grammar to define spatial regions by combinations of these patterns. Combinations are formed by applying operators to curated gene expression patterns from the atlas, thereby resembling gene interactions in a spatial context. The space of combinations is searched using an evolutionary algorithm with the objective of finding the best match to a given target pattern. We evaluate the method by testing its robustness and the statistical significance of the results it finds. JF - Communications in Computer and Information Science PB - Springer Verlag ER - TY - CONF T1 - Scientific Workflow: A Survey and Research Directions T2 - Lecture Notes in Computer Science Y1 - 2008 A1 - Barker, Adam A1 - van Hemert, Jano KW - e-Science KW - workflow AB - Workflow technologies are emerging as the dominant approach to coordinate groups of distributed services. However with a space filled with competing specifications, standards and frameworks from multiple domains, choosing the right tool for the job is not always a straightforward task. Researchers are often unaware of the range of technology that already exists and focus on implementing yet another proprietary workflow system. As an antidote to this common problem, this paper presents a concise survey of existing workflow technology from the business and scientific domain and makes a number of key suggestions towards the future development of scientific workflow systems. JF - Lecture Notes in Computer Science PB - Springer VL - 4967 UR - http://dx.doi.org/10.1007/978-3-540-68111-3_78 ER - TY - CONF T1 - Data Integration in eHealth: A Domain/Disease Specific Roadmap T2 - Studies in Health Technology and Informatics Y1 - 2007 A1 - Ure, J. A1 - Proctor, R. A1 - Martone, M. A1 - Porteous, D. A1 - Lloyd, S. A1 - Lawrie, S. A1 - Job, D. A1 - Baldock, R. A1 - Philp, A. A1 - Liewald, D. A1 - Rakebrand, F. A1 - Blaikie, A. A1 - McKay, C. A1 - Anderson, S. A1 - Ainsworth, J. A1 - van Hemert, J. A1 - Blanquer, I. A1 - Sinno ED - N. Jacq ED - Y. Legr{\'e} ED - H. Muller ED - I. Blanquer ED - V. Breton ED - D. Hausser ED - V. Hern{\'a}ndez ED - T. Solomonides ED - M. Hofman-Apitius KW - e-Science AB - The paper documents a series of data integration workshops held in 2006 at the UK National e-Science Centre, summarizing a range of the problem/solution scenarios in multi-site and multi-scale data integration with six HealthGrid projects using schizophrenia as a domain-specific test case. It outlines emerging strategies, recommendations and objectives for collaboration on shared ontology-building and harmonization of data for multi-site trials in this domain. JF - Studies in Health Technology and Informatics PB - IOPress VL - 126 SN - 978-1-58603-738-3 ER - TY - CONF T1 - Mining spatial gene expression data for association rules T2 - Lecture Notes in Bioinformatics Y1 - 2007 A1 - van Hemert, J. I. A1 - Baldock, R. A. ED - S. Hochreiter ED - R. Wagner KW - biomedical KW - data mining KW - DGEMap KW - e-Science AB - We analyse data from the Edinburgh Mouse Atlas Gene-Expression Database (EMAGE) which is a high quality data source for spatio-temporal gene expression patterns. Using a novel process whereby generated patterns are used to probe spatially-mapped gene expression domains, we are able to get unbiased results as opposed to using annotations based predefined anatomy regions. We describe two processes to form association rules based on spatial configurations, one that associates spatial regions, the other associates genes. JF - Lecture Notes in Bioinformatics PB - Springer Verlag UR - http://dx.doi.org/10.1007/978-3-540-71233-6_6 ER -