MSc student project

References:

Fung, Benjamin C. M. and Wang, Ke and Chen, Rui and Yu, Philip S. "Privacy-preserving data publishing: A survey of recent developments" ACM Computing Surveys, Vol. 42, No. 4, Article 14 B.-C. Chen, D. Kifer, K. LeFevre and A. Machanavajjhala. "Privacy-Preserving Data Publishing" Foundations and TrendsR in Databases Vol. 2, Nos. 1–2 (2009) 1–167 L. Sweeney. "k-Anonymity: a model for protecting privacy". In International Journal on Uncertainty, Fuzziness and Knowledgebased Systems, 10(5), pages 557-570, 2002 Samarati P (2001). "Protecting respondents' identities in microdata release". IEEE Transactions on Knowledge and Data Engineering, 13(6):1010{1027

Read more about Privacy Protection for a Brain Imaging Databank

Investigating the Rule Construction Mechanism in Ant-Miner

20 January 2011 - 2:34pm — Paolo.Besana

Student:

Hariharan Anantharaman

This project will appeal to you if you are interested in Learning from Data and Nature-Inspired Computation.

Project status:

Finished

Degree level:

MSc

Supervisors @ NeSC:

Michelle.Galea

David.Rodriguez

Student project type:

Read more about Investigating the Rule Construction Mechanism in Ant-Miner

Investigating Array Databases for Managing Climate Data

20 January 2011 - 2:31pm — Paolo.Besana

Student:

Jian Qiang

This is a challenging project and will appeal to students keen to make a contribution in the areas of scientific databases and geoinformatics.

Project status:

Finished

Degree level:

MSc

Supervisors @ NeSC:

Michelle.Galea

Paolo.Besana

Jos.Koetsier

Subject areas:

Databases

Software Engineering

Student project type:

Read more about Investigating Array Databases for Managing Climate Data

De-identification of faces in 2D DICOM images

19 January 2011 - 9:28pm — Paolo.Besana

With the increasing resolution of MR and CT scans, it has become feasible to reconstruct detailed 3D images of faces.

Usually face de-identification in medical imaging is done after the reconstruction, i.e. in 3D (see references). Different techniques are used to this end including brain extraction, removal of facial features and deformation of the face surface.

Project status:

Still available

Degree level:

MSc

Supervisors @ NeSC:

David.Rodriguez

Other supervisors:

Trevor Carpenter

Subject areas:

Machine Learning/Neural Networks/Connectionist Computing

Student project type:

Read more about De-identification of faces in 2D DICOM images

Scientific applications: exploiting the data bonanza. The microscopy case.

19 January 2011 - 9:25pm — Paolo.Besana

he aim of the project is to perform some exploratory work on how to deal with the problem of I/O bound processing, by implementing technology-specific components in a provided system. The goal is to distribute data and processing so that a CPU processes data locally, minimising data transfer. The assumption is that I/O is the major bottleneck in processing, and computation could be done with less powerful (greener and cheaper) CPUs, rather than with a powerful CPU that wastes energy waiting for data. Different technologies for storing and processing the data can be explored.

Project status:

Finished

Degree level:

MSc

Supervisors @ NeSC:

Paolo.Besana

Subject areas:

e-Science

Databases

Distributed Systems

Software Engineering

Student project type:

Read more about Scientific applications: exploiting the data bonanza. The microscopy case.

Computing the best answer you can afford

19 January 2011 - 9:21pm — Paolo.Besana

We are building a data-intensive machine as a research platform to explore data-intensive computational strategies. We are interested in computations over large bodies of data, where the data-handling is a dominant issue. Computational challenges with these properties are getting ever more prevalent as the cost of digital sensors and computational/societal data sources become ever cheaper, ever more powerful and more ubiquitous. The use of algorithms over such data are of growing importance in medicine, planning, engineering, policy and science.

Project status:

Still available

Degree level:

MSc

Supervisors @ NeSC:

Paolo.Besana

Subject areas:

e-Science

Algorithm Design

Student project type:

Read more about Computing the best answer you can afford

Runoff prediction from a Hydrologic Spatio-Temporal Database

5 April 2010 - 2:13pm — Jano.van.Hemert

Student:

Charalampos Sfyrakis

Grade:

first

Present day instrumentation networks in rivers provide huge quantities of multi-dimensional data. Although there are numerous machine learning tools that can extract trends, find patterns and predict future states given some data, it is crucial to properly optimize these techniques according to the semantic content of the data. Hydrology is a data immense science, which requires efficient mining of trajectories of measurements taken at different time points and positions.

Project status:

Finished

Degree level:

MSc

Background:

data mining

Supervisors @ NeSC:

Subject areas:

e-Science

Machine Learning/Neural Networks/Connectionist Computing

Projects:

ADMIRE

Student project type:

Read more about Runoff prediction from a Hydrologic Spatio-Temporal Database

Accelerating Genome-Wide Association Studies with Graphics Processors

22 January 2010 - 1:04pm — Jano.van.Hemert

Student:

Jeff Poznanovic

Grade:

first

Principal goal: to substantially improve the performance of the data-intensive analysis for genome-wide association studies (GWAS) by using graphics processing units (GPUs).

Project status:

Finished

Degree level:

MSc

Supervisors @ NeSC:

Other supervisors:

Dave Liewald, Centre for Cognitive Ageing and Cognitive Epidemiology. Gail Davies, Centre for Cognitive Ageing and Cognitive Epidemiology.

Subject areas:

e-Science

Bioinformatics

Computer Architecture

Distributed Systems

Parallel Programming

Student project type:

References:

NIH National Human Genome Research Institute, "Genome-wide association studies," http://www.genome.gov/20019523 PLINK, http://pngu.mgh.harvard.edu/~purcell/plink CUDA, http://www.nvidia.com/object/cuda_home.html OpenCL, http://www.khronos.org/opencl/

Read more about Accelerating Genome-Wide Association Studies with Graphics Processors

Parameter fitting of cosmological models using billions of galaxies

15 January 2010 - 7:45pm — Jano.van.Hemert

Student:

Martha Axiak

Grade:

first

Principal goal: to develop, test and make available to the cosmology community a parameter estimation method for models that explain our dark Universe.

Project status:

Finished

Degree level:

MSc

Background:

Evolutionary computation, optimisation, machine learning and/or statistics are all desirable.

Supervisors @ NeSC:

Other supervisors:

Tom Kitching, Institute for Astronomy, Edinburgh; tdk@roe.ac.uk, tom.kitching@googlemail.com

Subject areas:

Genetic Algorithms/Evolutionary Computing

Machine Learning/Neural Networks/Connectionist Computing

WWW Tools and Programming

Student project type:

References:

There is a good review of statistical methods used in cosmology here with some further references suggested http://xxx.lanl.gov/abs/0911.3105 chapter 13 goes into some discussion on the monte carlo methods we use. The standard tool for cosmological parameter estimation is cosmomc which is here http://cosmologist.info/cosmomc/ The original paper for this is here http://arxiv.org/abs/astro-ph/0205436 and the first application is here http://arxiv.org/abs/astro-ph/0302306 A slightly more advances nested sampling method is called multinest which is described here http://xxx.lanl.gov/abs/0809.3437 A general discussion on the current status of cosmology is http://xxx.lanl.gov/abs/astro-ph/0610906 though warning there is some technical details (and a lot of acronyms).

Read more about Parameter fitting of cosmological models using billions of galaxies

Data mining to identify small molecules with bioactivity

15 January 2010 - 7:31pm — Jano.van.Hemert

Student:

Gideon Jansen Van Vuuren

Grade:

first

Principal goal: to apply machines learning to identify small molecues that are likely candidates to have relevant bioactivity for follow-up wet-lab experiments.

Project status:

Finished

Degree level:

MSc

Background:

Machine learning essential, biology/bioinformatics desirable.

Supervisors @ NeSC:

Other supervisors:

Jan Wildenhain, Tyers Lab, School of Biological Sciences (http://tyerslab.bio.ed.ac.uk/lisa/indPage.php?id=jwil315) Michaela Spitzer, Tyers Lab, School of Biological Sciences

Subject areas:

Bioinformatics

Machine Learning/Neural Networks/Connectionist Computing

Student project type:

Read more about Data mining to identify small molecules with bioactivity

Optimising Data-Streaming Elements in Distributed Data Mining

15 January 2010 - 2:59pm — Jano.van.Hemert

Principle goal: To evaluate existing data streaming implementation, formulate model to predict streaming performance corresponding to buffering strategy and then optimise data streaming with dynamical buffering implementation.

Project status:

Finished

Degree level:

MSc

Background:

Java programming essential. Distributed/parallel computing desirable.

Supervisors @ NeSC:

Chee.Sun.Liew

Subject areas:

Computer Architecture

Distributed Systems

Parallel Programming

Performance Modelling and Simulation

System Level Integration

Projects:

ADMIRE

Student project type:

References:

[1] M. Atkinson, P. Brezany, O. Corcho, L. Han, J. van Hemert, L. Hluchy ́, A. Hume, I. Janciak, A. Krause, and D. Snelling. ADMIRE White Paper: Motivation, Strategy, Overview and Impact. Technical Report version 0.9, ADMIRE, EPCC, University of Edinburgh, January 2009. [2] A. Jacobs. The pathologies of big data. Commun. ACM, 52(8):36–44, 2009.

Read more about Optimising Data-Streaming Elements in Distributed Data Mining

Improving Data Placement Strategy in Data-intensive Computations

15 January 2010 - 2:57pm — Jano.van.Hemert

Student:

Yue Ma

Grade:

third

Principle goal: to investigate existing data placement strategies and build a decision model to improve data placement strategies in enacting data-intensive workflow.

Project status:

Finished

Degree level:

MSc

Background:

Distributed/parallel computing, databases desirable. Java programming essential.

Supervisors @ NeSC:

Chee.Sun.Liew

Subject areas:

Computer Architecture

Distributed Systems

System Level Integration

Projects:

ADMIRE

Student project type:

References:

[1] M. Atkinson, P. Brezany, O. Corcho, L. Han, J. van Hemert, L. Hluchy ́, A. Hume, I. Janciak, A. Krause, and D. Snelling. ADMIRE White Paper: Motivation, Strategy, Overview and Impact. Technical Report version 0.9, ADMIRE, EPCC, University of Edinburgh, January 2009. [2] T. Hey, S. Tansley, and K. T. (Editors). The Fourth Paradigm: Data-Intensive Scientific

Read more about Improving Data Placement Strategy in Data-intensive Computations

Large-scale data mining of chemical-genetic data sets

14 January 2010 - 3:50pm — Jano.van.Hemert

Primary objective: to perform data mining on a real-world data set from a biology lab in the School of Biological Sciences with the aim to extract patterns that lead to hypotheses about mode of action of compounds and function of genes.

Project status:

Finished

Degree level:

MSc

Background:

Data mining / machine learning / data exploration essential. Distributed computing a major advantage. Experience with biology/bioinformatics desirable, but not essential as you can lean on the biologists' expertise.

Supervisors @ NeSC:

Other supervisors:

Jan Wildenhain, Tyers Lab, School of Biological Sciences (http://tyerslab.bio.ed.ac.uk/lisa/indPage.php?id=jwil315) Michaela Spitzer, Tyers Lab, School of Biological Sciences

Subject areas:

Bioinformatics

Distributed Systems

Machine Learning/Neural Networks/Connectionist Computing

WWW Tools and Programming

Student project type:

Read more about Large-scale data mining of chemical-genetic data sets

Detecting Web Spam using Machine Learning

14 January 2010 - 3:10pm — Jano.van.Hemert

Student:

Andrejs Mironovs

Grade:

second1

Primary goal: to develop a classification algorithm to detect Web Spam.

Web Spam refers to a set of techniques that intend to increase the ranking of a page in a search engine. From search engine providers and Web users point of view, Web Spam decreases the quality of information search in the Web [1] [2] [3]. The Web Spam can be broadly classified into two types: content spam and link spam. It is a critical and challenging task to detect Web Spam. The success of Web Spam detection has a high commercial value for industries.

Project status:

Finished

Degree level:

MSc

Background:

Machine learning, knowledge of Database, programming in Java or other languages

Supervisors @ NeSC:

Liangxiu.Han

Subject areas:

Machine Learning/Neural Networks/Connectionist Computing

Student project type:

References:

* [1] Z.Gyongyi, H.Garcia-Molina and J.Pedersen. Combating Web Spam with Trust Rank, In VLDB 2004. * [2] L. Becchett, C. Castillo, D. Donato, R. Baeza-yates, S. Leonardi. Link Analysis for Web Spam Detection. ACM Transactions on the Web (TWEB), 2(1) (2008) 2.1-2.45 * [3] H. Najada and I. Himeidi. Web Spam detection using Machine Learning in Specific Domain Features. Journal of Information Assurance and Security. 3 (2008) 220-229 * [4] WEBSPAM-UK2007, http://barcelona.research.yahoo.net/webspam/datasets/uk2007/

Read more about Detecting Web Spam using Machine Learning

Accelerating data intensive applications using MapReduce

14 January 2010 - 3:08pm — Jano.van.Hemert

Student:

Hwee Yong Ong

Grade:

first

Principal goal: by a way of real case study in the Life Science, the goals of this project include: 1) Understanding data-parallel processing using MapReduce model for addressing Performance issues in data intensive applications; 2) Investigating how to adapt data mining algorihtms to the MapReduce model; 3) Prototyping and comparing performance with other frameworks that support data intensive applications.

Project status:

Finished

Degree level:

MSc

Background:

Knowledge of programming in Java; Database, Data mining and integration, and distributed computing.

Supervisors @ NeSC:

Liangxiu.Han

Subject areas:

e-Science

Algorithm Design

Computer Architecture

Computer Communication/Networking

Distributed Systems

Parallel Programming

Student project type: