The Computational Genomics Laboratory is part of the Human Genetics Unit, the largest unit of the UK's Medical Research Council. Its focus lies on computational genomics and molecular evolution. The team develops and applies modelling techniques to large quantities of experiment data in order to answer biomedical questions on the level of gene regulatory networks. These techniques require massive computation as they involve calculating enormous amounts of interactions between genes.
This project will extend code developed at by the team at the computational genomics laboratory to make it run as an application on the EGEE infrastructure. More specifically, the code, which is a combination of Perl and C, will be wrapped and then deployed using the latest gLite middleware. Measurements will be reported on how much more data exploration was achieved using the gLite approach as compared to the unit's in-house cluster facilities. Also, EGEE will be mentioned in resulting publications. The following paragraph is provided by the head of the group, Dr Colin Semple; it explains the modelling technique and its biomedical goals in more detail.
Co-evolution can be defined as an unusual degree of similarity between the evolutionary histories of two genes, and has been used successfully to infer functional interactions between protein-coding genes. For instance Hakes et al (2007) have recently shown that genes encoding interacting protein partners demonstrate correlated evolution in eukaryotes. These correlations can therefore be used to predict functionally interacting genes. However it is worth remembering that only a small fraction (<5%) of the human genome is protein-coding, while more than half of the genome is transcribed into RNA (e.g. Carninci et al, 2005). The functional significance of such transcribed, non-coding regions is almost entirely unknown, but many are assumed to be involved in the regulation of protein-coding genes. We intend to examine co-evolution on a wider scale than has ever been attempted previously, by including the protein-coding regions of the genome but also the transcribed, non-coding regions. Such analyses will involve billions of comparisons among huge datasets of evolutionary rate vectors, making this work impossible with conventional computational resources. This work will undoubtedly reveal novel functional networks in the human genome and shed light on a central mystery in modern genomics: the function of non-coding RNA genes.