You are here

Historical Interest Only

This is a static HTML version of an old Drupal site. The site is no longer maintained and could be deleted at any point. It is only here for historical interest.

Grid-enable A Biomedical Database

Student: 
Mark MacGillivray
Grade: 
first

The number of databases that contain biomedical data is increasing rapidly. Many of these databases are stand-alone and this makes it difficult for researchers to perform queries and analyses over data that spans multiple databases.

To make these queries and analyses possible, three essential principals need to be followed: 1. A uniform way must exist in which data can be accessed, regardless of the form it is stored in, 2. a mechanism must exist by which queries can be formulated in a flexible way to allow researchers to explore new combinations of results from several sources, 3. a reference system must be in place that allows researchers and tools to identify data correctly in order to maintain consistent relationships between them.

The MRC Human Genetics Unit based in Edinburgh has a database that collects information about vertebrate proteins that are localised in the cell nucleus: the Nuclear Protein Database (https://npd.hgu.mrc.ac.uk/). The data is carefully curated by the group leader and contains many links to other biomedical resources such as Entrez, OMIM, and PubMed. This means the database adheres to the third principle. However, it does not adhere to the first principle, as it can be accessed only through a web page via a simple text query.

In this project, you will Grid-enable this resource by making it available through a web service. A group at the University of Amsterdam wants to make use of this service through Taverna (http://taverna.sourceforge.net/), a workflow tool bench for Bioinformatics. They can supply you with queries that test whether the database provides useful ways of accessing the data it contains. One candidate technology for Grid-enabling is OGSA-DAI (http://www.ogsadai.org.uk/), which already integrates with Taverna. A secondary task is to have a closer look at the database, and to make it more manageable by the researchers themselves, preferably via web-based systems to edit its content.

Project status: 
Finished
Degree level: 
MSc
Background: 
Practical experience with web services and databases essential. Knowledge of workflow concepts desirable.
Supervisors @ NeSC: 
Other supervisors: 
Marco Roos, Bioinformatician, Institute for Informatics, University of Amsterdam Wendy Bickmore, Group leader, Human Genetics Unit, Medical Research Council
Subject areas: 
e-Science
Databases
Other
Student project type: