Grid-enable A Biomedical Database

28 November 2008 - 11:03am — Jano.van.Hemert

Student:

Mark MacGillivray

Grade:

first

The number of databases that contain biomedical data is increasing rapidly. Many of these databases are stand-alone and this makes it difficult for researchers to perform queries and analyses over data that spans multiple databases.

To make these queries and analyses possible, three essential principals need to be followed: 1. A uniform way must exist in which data can be accessed, regardless of the form it is stored in, 2. a mechanism must exist by which queries can be formulated in a flexible way to allow researchers to explore new combinations of results from several sources, 3. a reference system must be in place that allows researchers and tools to identify data correctly in order to maintain consistent relationships between them.

The MRC Human Genetics Unit based in Edinburgh has a database that collects information about vertebrate proteins that are localised in the cell nucleus: the Nuclear Protein Database (https://npd.hgu.mrc.ac.uk/). The data is carefully curated by the group leader and contains many links to other biomedical resources such as Entrez, OMIM, and PubMed. This means the database adheres to the third principle. However, it does not adhere to the first principle, as it can be accessed only through a web page via a simple text query.

In this project, you will Grid-enable this resource by making it available through a web service. A group at the University of Amsterdam wants to make use of this service through Taverna (http://taverna.sourceforge.net/), a workflow tool bench for Bioinformatics. They can supply you with queries that test whether the database provides useful ways of accessing the data it contains. One candidate technology for Grid-enabling is OGSA-DAI (http://www.ogsadai.org.uk/), which already integrates with Taverna. A secondary task is to have a closer look at the database, and to make it more manageable by the researchers themselves, preferably via web-based systems to edit its content.

Project status:

Finished

Degree level:

MSc

Background:

Practical experience with web services and databases essential. Knowledge of workflow concepts desirable.

Supervisors @ NeSC:

Jano.van.Hemert

Other supervisors:

Marco Roos, Bioinformatician, Institute for Informatics, University of Amsterdam Wendy Bickmore, Group leader, Human Genetics Unit, Medical Research Council

Subject areas:

e-Science

Databases

Other

Student project type:

MSc student project

Main menu

Latest news

Pages

You are here

Historical Interest Only

Search form