TY  - JOUR
T1  - Automatically Identifying and Annotating Mouse Embryo Gene Expression Patterns
JF  - Bioinformatics
Y1  - 2011
A1  - Liangxiu Han
A1  - van Hemert, Jano
A1  - Richard Baldock
KW  - classification
KW  - e-Science
AB  - Motivation: Deciphering the regulatory and developmental mechanisms for multicellular organisms requires detailed knowledge of gene interactions and gene expressions. The availability of large datasets with both spatial and ontological annotation of the spatio-temporal patterns of gene-expression in mouse embryo provides a powerful resource to discover the biological function of embryo organisation. Ontological annotation of gene expressions consists of labelling images with terms from the anatomy ontology for mouse development. If the spatial genes of an anatomical component are expressed in an image, the image is then tagged with a term of that anatomical component. The current annotation is done manually by domain experts, which is both time consuming and costly. In addition, the level of detail is variable and inevitably, errors arise from the tedious nature of the task. In this paper, we present a new method to automatically identify and annotate gene expression patterns in the mouse embryo with anatomical terms.    Results: The method takes images from in situ hybridisation studies and the ontology for the developing mouse embryo, it then combines machine learning and image processing techniques to produce classifiers that automatically identify and annotate gene expression patterns in these images.We evaluate our method on image data from the EURExpress-II study where we use it to automatically classify nine anatomical terms: humerus, handplate, fibula, tibia, femur, ribs, petrous part, scapula and head mesenchyme. The accuracy of our method lies between 70–80% with few exceptions.     Conclusions: We show that other known methods have lower classification performance than ours. We have investigated the images misclassified by our method and found several cases where the original annotation was not correct. This shows our method is robust against this kind of noise.     Availability: The annotation result and the experimental dataset in the paper can be freely accessed at http://www2.docm.mmu.ac.uk/STAFF/L.Han/geneannotation/    Contact: l.han@mmu.ac.uk, j.vanhemert@ed.ac.uk and Richard.Baldock@hgu.mrc.ac.uk
VL  - 27
UR  - http://bioinformatics.oxfordjournals.org/content/early/2011/02/25/bioinformatics.btr105.abstract
ER  - 

TY  - JOUR
T1  - Managing dynamic enterprise and urgent workloads on clouds using layered queuing and historical performance models
JF  - Simulation Modelling Practice and Theory
Y1  - 2011
A1  - David A. Bacigalupo
A1  - van Hemert, Jano I.
A1  - Xiaoyu Chen
A1  - Asif Usmani
A1  - Adam P. Chester
A1  - Ligang He
A1  - Donna N. Dillenberger
A1  - Gary B. Wills
A1  - Lester Gilbert
A1  - Stephen A. Jarvis
KW  - e-Science
AB  - The automatic allocation of enterprise workload to resources can be enhanced by being able to make what–if response time predictions whilst different allocations are being considered. We experimentally investigate an historical and a layered queuing performance model and show how they can provide a good level of support for a dynamic-urgent cloud environment. Using this we define, implement and experimentally investigate the effectiveness of a prediction-based cloud workload and resource management algorithm. Based on these experimental analyses we: (i) comparatively evaluate the layered queuing and historical techniques; (ii) evaluate the effectiveness of the management algorithm in different operating scenarios; and (iii) provide guidance on using prediction-based workload and resource management.
VL  - 19
ER  - 

TY  - JOUR
T1  - A user-friendly web portal for T-Coffee on supercomputers
JF  - BMC Bioinformatics
Y1  - 2011
A1  - J. Rius
A1  - F. Cores
A1  - F. Solsona
A1  - van Hemert, J. I.
A1  - Koetsier, J.
A1  - C. Notredame
KW  - e-Science
KW  - portal
KW  - rapid
AB  - Background Parallel T-Coffee (PTC) was the first parallel implementation of the T-Coffee multiple sequence alignment tool. It is based on MPI and RMA mechanisms. Its purpose is to reduce the execution time of the large-scale sequence alignments. It can be run on distributed memory clusters allowing users to align data sets consisting of hundreds of proteins within a reasonable time. However, most of the potential users of this tool are not familiar with the use of grids or supercomputers. Results In this paper we show how PTC can be easily deployed and controlled on a super computer architecture using a web portal developed using Rapid. Rapid is a tool for efficiently generating standardized portlets for a wide range of applications and the approach described here is generic enough to be applied to other applications, or to deploy PTC on different HPC environments. Conclusions The PTC portal allows users to upload a large number of sequences to be aligned by the parallel version of TC that cannot be aligned by a single machine due to memory and execution time constraints. The web portal provides a user-friendly solution.
VL  - 12
UR  - http://www.biomedcentral.com/1471-2105/12/150
ER  - 

TY  - CONF
T1  - Resource management of enterprise cloud systems using layered queuing and historical performance models
T2  - IEEE International Symposium on Parallel Distributed Processing
Y1  - 2010
A1  - Bacigalupo, D. A.
A1  - van Hemert, J.
A1  - Usmani, A.
A1  - Dillenberger, D. N.
A1  - Wills, G. B.
A1  - Jarvis, S. A.
KW  - e-Science
AB  - The automatic allocation of enterprise workload to resources can be enhanced by being able to make `what-if' response time predictions, whilst different allocations are being considered. It is important to quantitatively compare the effectiveness of different prediction techniques for use in cloud infrastructures. To help make the comparison of relevance to a wide range of possible cloud environments it is useful to consider the following. 1.) urgent cloud customers such as the emergency services that can demand cloud resources at short notice (e.g. for our FireGrid emergency response software). 2.) dynamic enterprise systems, that must rapidly adapt to frequent changes in workload, system configuration and/or available cloud servers. 3.) The use of the predictions in a coordinated manner by both the cloud infrastructure and cloud customer management systems. 4.) A broad range of criteria for evaluating each technique. However, there have been no previous comparisons meeting these requirements. This paper, meeting the above requirements, quantitatively compares the layered queuing and (\^A¿HYDRA\^A¿) historical techniques - including our initial thoughts on how they could be combined. Supporting results and experiments include the following: i.) defining, investigating and hence providing guidelines on the use of a historical and layered queuing model; ii.) using these guidelines showing that both techniques can make low overhead and typically over 70% accurate predictions, for new server architectures for which only a small number of benchmarks have been run; and iii.) defining and investigating tuning a prediction-based cloud workload and resource management algorithm.
JF  - IEEE International Symposium on Parallel Distributed Processing
ER  - 

TY  - JOUR
T1  - Towards a Virtual Fly Brain
JF  - Philosophical Transactions A
Y1  - 2009
A1  - Armstrong, J. D.
A1  - van Hemert, J. I.
KW  - e-Science
AB  - Models of the brain that simulate sensory input, behavioural output and information processing in a biologically plausible manner pose significant challenges to both Computer Science and Biology. Here we investigated strategies that could be used to create a model of the insect brain, specifically that of Drosophila melanogaster which is very widely used in laboratory research. The scale of the problem is an order of magnitude above the most complex of the current simulation projects and it is further constrained by the relative sparsity of available electrophysiological recordings from the fly nervous system. However, fly brain research at the anatomical and behavioural level offers some interesting opportunities that could be exploited to create a functional simulation. We propose to exploit these strengths of Drosophila CNS research to focus on a functional model that maps biologically plausible network architecture onto phenotypic data from neuronal inhibition and stimulation studies, leaving aside biophysical modelling of individual neuronal activity for future models until more data is available.
VL  - 367
UR  - http://rsta.royalsocietypublishing.org/content/367/1896/2387.abstract
ER  - 

TY  - CONF
T1  - Matching Spatial Regions with Combinations of Interacting Gene Expression Patterns
T2  - Communications in Computer and Information Science
Y1  - 2008
A1  - van Hemert, J. I.
A1  - Baldock, R. A.
ED  - M. Elloumi
ED  - \emph
ED  - et al
KW  - biomedical
KW  - data mining
KW  - DGEMap
KW  - e-Science
AB  - The Edinburgh Mouse Atlas aims to capture in-situ gene expression patterns in a common spatial framework. In this study, we construct a grammar to define spatial regions by combinations of these patterns. Combinations are formed by applying operators to curated gene expression patterns from the atlas, thereby resembling gene interactions in a spatial context. The space of combinations is searched using an evolutionary algorithm with the objective of finding the best match to a given target pattern. We evaluate the method by testing its robustness and the statistical significance of the results it finds.
JF  - Communications in Computer and Information Science
PB  - Springer Verlag
ER  - 

TY  - CONF
T1  - Scientific Workflow: A Survey and Research Directions
T2  - Lecture Notes in Computer Science
Y1  - 2008
A1  - Barker, Adam
A1  - van Hemert, Jano
KW  - e-Science
KW  - workflow
AB  - Workflow technologies are emerging as the dominant approach to coordinate groups of distributed services. However with a space filled with competing specifications, standards and frameworks from multiple domains, choosing the right tool for the job is not always a straightforward task. Researchers are often unaware of the range of technology that already exists and focus on implementing yet another proprietary workflow system. As an antidote to this common problem, this paper presents a concise survey of existing workflow technology from the business and scientific domain and makes a number of key suggestions towards the future development of scientific workflow systems.
JF  - Lecture Notes in Computer Science
PB  - Springer
VL  - 4967
UR  - http://dx.doi.org/10.1007/978-3-540-68111-3_78
ER  - 

TY  - CONF
T1  - Data Integration in eHealth: A Domain/Disease Specific Roadmap
T2  - Studies in Health Technology and Informatics
Y1  - 2007
A1  - Ure, J.
A1  - Proctor, R.
A1  - Martone, M.
A1  - Porteous, D.
A1  - Lloyd, S.
A1  - Lawrie, S.
A1  - Job, D.
A1  - Baldock, R.
A1  - Philp, A.
A1  - Liewald, D.
A1  - Rakebrand, F.
A1  - Blaikie, A.
A1  - McKay, C.
A1  - Anderson, S.
A1  - Ainsworth, J.
A1  - van Hemert, J.
A1  - Blanquer, I.
A1  - Sinno
ED  - N. Jacq
ED  - Y. Legr{\'e}
ED  - H. Muller
ED  - I. Blanquer
ED  - V. Breton
ED  - D. Hausser
ED  - V. Hern{\'a}ndez
ED  - T. Solomonides
ED  - M. Hofman-Apitius
KW  - e-Science
AB  - The paper documents a series of data integration workshops held in 2006 at the UK National e-Science Centre, summarizing a range of the problem/solution scenarios in multi-site and multi-scale data integration with six HealthGrid projects using schizophrenia as a domain-specific test case. It outlines   emerging strategies, recommendations and objectives for collaboration on shared ontology-building and harmonization of data for multi-site trials in this domain.
JF  - Studies in Health Technology and Informatics
PB  - IOPress
VL  - 126
SN  - 978-1-58603-738-3
ER  - 

TY  - CONF
T1  - Mining spatial gene expression data for association rules
T2  - Lecture Notes in Bioinformatics
Y1  - 2007
A1  - van Hemert, J. I.
A1  - Baldock, R. A.
ED  - S. Hochreiter
ED  - R. Wagner
KW  - biomedical
KW  - data mining
KW  - DGEMap
KW  - e-Science
AB  - We analyse data from the Edinburgh Mouse Atlas Gene-Expression Database (EMAGE) which is a high quality data source for spatio-temporal gene expression patterns. Using a novel process whereby generated patterns are used to probe spatially-mapped gene expression domains, we are able to get unbiased results as opposed to using annotations based predefined anatomy regions. We describe two processes to form association rules based on spatial configurations, one that associates spatial regions, the other associates genes.
JF  - Lecture Notes in Bioinformatics
PB  - Springer Verlag
UR  - http://dx.doi.org/10.1007/978-3-540-71233-6_6
ER  -