Cookie Control

This site uses cookies to store information on your computer.

Some cookies on this site are essential, and the site won't work as expected without them. These cookies are set when you submit a form, login or interact with the site by doing something that goes beyond clicking on simple links.

By using our site you accept the terms of our Privacy Policy.

(One cookie will be set to store your preference)
(Ticking this sets a cookie to hide this popup if you then hit close. This will not store any personal information)

About this tool

About Cookie Control

You are here

Historical Interest Only

This is a static HTML version of an old Drupal site. The site is no longer maintained and could be deleted at any point. It is only here for historical interest.

Large-scale data mining of chemical-genetic data sets

Primary objective: to perform data mining on a real-world data set from a biology lab in the School of Biological Sciences with the aim to extract patterns that lead to hypotheses about mode of action of compounds and function of genes.

Functional genomic screens, especially in the budding yeast, generate huge amounts of data. The Tyers Lab (http://tyerslab.bio.ed.ac.uk/) lab generates chemical-genetic data to test the effect of small molecules on growth of different yeast deletion mutants. This data set combined with data published by other labs is a large enough data set for advanced data mining.

Possibilities for this data analysis range from different clustering algorithms to pattern matching and association analysis. It is also possible to include structural similarity calculation between compounds. This should lead to a definition of a set of chemical-genetic signatures that are associated with specific effects on eukaryotic cells (like novel detoxification pathways). Also, the biologists in the lab are looking for new hypotheses about the mode of action for compounds and about the function of yeast genes that are as yet uncharacterized (up to 1000 of the 6000 yeast genes are still uncharacterized).

This project expects from you:
- To identify the data mining procedure and algorithms suitable to extract patterns from these data.
- To identify solutions to handle the large amount of data (distributed computing paradigms such as MapReduce)
- To develop the data mining workflow using existing or new implementations
- To deliver this workflow in a way that they can interact and use it
as tool after the project.

Project status: 
Finished
Degree level: 
MSc
Background: 
Data mining / machine learning / data exploration essential. Distributed computing a major advantage. Experience with biology/bioinformatics desirable, but not essential as you can lean on the biologists' expertise.
Supervisors @ NeSC: 
Other supervisors: 
Jan Wildenhain, Tyers Lab, School of Biological Sciences (http://tyerslab.bio.ed.ac.uk/lisa/indPage.php?id=jwil315) Michaela Spitzer, Tyers Lab, School of Biological Sciences
Subject areas: 
Bioinformatics
Distributed Systems
Machine Learning/Neural Networks/Connectionist Computing
WWW Tools and Programming
Student project type: