You are here

Historical Interest Only

This is a static HTML version of an old Drupal site. The site is no longer maintained and could be deleted at any point. It is only here for historical interest.

MSc student project

A project suitable for an MSc project of three months.

Privacy Protection for a Brain Imaging Databank

Student: 
Jyothsna Vivekanand Shenoy

In recent years there has been an increasing trend towards releasing micro-data to the public. This can be very important for research, but in some cases (e.g. medical data) these releases are limited due to privacy protection issues. Anonymisation is a limited solution that does not fully protect the individuals. Even when all the personal identifiers have been removed it might be possible to identify an individual from an anonymous records using quasi-identifiers and data linking with some other external data source (see references).

Project status: 
Finished
Degree level: 
MSc
Background: 
Knowledge of databases. Programming skills.
Supervisors @ NeSC: 
Student project type: 
References: 
Fung, Benjamin C. M. and Wang, Ke and Chen, Rui and Yu, Philip S. "Privacy-preserving data publishing: A survey of recent developments" ACM Computing Surveys, Vol. 42, No. 4, Article 14 B.-C. Chen, D. Kifer, K. LeFevre and A. Machanavajjhala. "Privacy-Preserving Data Publishing" Foundations and TrendsR in Databases Vol. 2, Nos. 1–2 (2009) 1–167 L. Sweeney. "k-Anonymity: a model for protecting privacy". In International Journal on Uncertainty, Fuzziness and Knowledgebased Systems, 10(5), pages 557-570, 2002 Samarati P (2001). "Protecting respondents' identities in microdata release". IEEE Transactions on Knowledge and Data Engineering, 13(6):1010{1027

Investigating the Rule Construction Mechanism in Ant-Miner

Student: 
Hariharan Anantharaman

This project will appeal to you if you are interested in Learning from Data and Nature-Inspired Computation.

Project status: 
Finished
Degree level: 
MSc
Supervisors @ NeSC: 
Student project type: 

Investigating Array Databases for Managing Climate Data

Student: 
Jian Qiang

This is a challenging project and will appeal to students keen to make a contribution in the areas of scientific databases and geoinformatics.

Project status: 
Finished
Degree level: 
MSc
Subject areas: 
Databases
Software Engineering
Student project type: 

De-identification of faces in 2D DICOM images

With the increasing resolution of MR and CT scans, it has become feasible to reconstruct detailed 3D images of faces.

Usually face de-identification in medical imaging is done after the reconstruction, i.e. in 3D (see references). Different techniques are used to this end including brain extraction, removal of facial features and deformation of the face surface.

Project status: 
Still available
Degree level: 
MSc
Supervisors @ NeSC: 
Other supervisors: 
Trevor Carpenter
Subject areas: 
Machine Learning/Neural Networks/Connectionist Computing
Student project type: 

Scientific applications: exploiting the data bonanza. The microscopy case.

he aim of the project is to perform some exploratory work on how to deal with the problem of I/O bound processing, by implementing technology-specific components in a provided system. The goal is to distribute data and processing so that a CPU processes data locally, minimising data transfer. The assumption is that I/O is the major bottleneck in processing, and computation could be done with less powerful (greener and cheaper) CPUs, rather than with a powerful CPU that wastes energy waiting for data. Different technologies for storing and processing the data can be explored.

Project status: 
Finished
Degree level: 
MSc
Supervisors @ NeSC: 
Subject areas: 
e-Science
Databases
Distributed Systems
Software Engineering
Student project type: 

Computing the best answer you can afford

We are building a data-intensive machine as a research platform to explore data-intensive computational strategies. We are interested in computations over large bodies of data, where the data-handling is a dominant issue. Computational challenges with these properties are getting ever more prevalent as the cost of digital sensors and computational/societal data sources become ever cheaper, ever more powerful and more ubiquitous. The use of algorithms over such data are of growing importance in medicine, planning, engineering, policy and science.

Project status: 
Still available
Degree level: 
MSc
Supervisors @ NeSC: 
Subject areas: 
e-Science
Algorithm Design
Student project type: 

Runoff prediction from a Hydrologic Spatio-Temporal Database

Student: 
Charalampos Sfyrakis
Grade: 
first

Present day instrumentation networks in rivers provide huge quantities of multi-dimensional data. Although there are numerous machine learning tools that can extract trends, find patterns and predict future states given some data, it is crucial to properly optimize these techniques according to the semantic content of the data. Hydrology is a data immense science, which requires efficient mining of trajectories of measurements taken at different time points and positions.

Project status: 
Finished
Degree level: 
MSc
Background: 
data mining
Supervisors @ NeSC: 
Subject areas: 
e-Science
Machine Learning/Neural Networks/Connectionist Computing
Projects: 
Student project type: 

Accelerating Genome-Wide Association Studies with Graphics Processors

Student: 
Jeff Poznanovic
Grade: 
first

Principal goal: to substantially improve the performance of the data-intensive analysis for genome-wide association studies (GWAS) by using graphics processing units (GPUs).

Project status: 
Finished
Degree level: 
MSc
Supervisors @ NeSC: 
Other supervisors: 
Dave Liewald, Centre for Cognitive Ageing and Cognitive Epidemiology. Gail Davies, Centre for Cognitive Ageing and Cognitive Epidemiology.
Subject areas: 
e-Science
Bioinformatics
Computer Architecture
Distributed Systems
Parallel Programming
Student project type: 
References: 
NIH National Human Genome Research Institute, "Genome-wide association studies," http://www.genome.gov/20019523 PLINK, http://pngu.mgh.harvard.edu/~purcell/plink CUDA, http://www.nvidia.com/object/cuda_home.html OpenCL, http://www.khronos.org/opencl/

Parameter fitting of cosmological models using billions of galaxies

Student: 
Martha Axiak
Grade: 
first

Principal goal: to develop, test and make available to the cosmology community a parameter estimation method for models that explain our dark Universe.

Project status: 
Finished
Degree level: 
MSc
Background: 
Evolutionary computation, optimisation, machine learning and/or statistics are all desirable.
Supervisors @ NeSC: 
Other supervisors: 
Tom Kitching, Institute for Astronomy, Edinburgh; tdk@roe.ac.uk, tom.kitching@googlemail.com
Subject areas: 
Genetic Algorithms/Evolutionary Computing
Machine Learning/Neural Networks/Connectionist Computing
WWW Tools and Programming
Student project type: 
References: 
There is a good review of statistical methods used in cosmology here with some further references suggested http://xxx.lanl.gov/abs/0911.3105 chapter 13 goes into some discussion on the monte carlo methods we use. The standard tool for cosmological parameter estimation is cosmomc which is here http://cosmologist.info/cosmomc/ The original paper for this is here http://arxiv.org/abs/astro-ph/0205436 and the first application is here http://arxiv.org/abs/astro-ph/0302306 A slightly more advances nested sampling method is called multinest which is described here http://xxx.lanl.gov/abs/0809.3437 A general discussion on the current status of cosmology is http://xxx.lanl.gov/abs/astro-ph/0610906 though warning there is some technical details (and a lot of acronyms).

Data mining to identify small molecules with bioactivity

Student: 
Gideon Jansen Van Vuuren
Grade: 
first

Principal goal: to apply machines learning to identify small molecues that are likely candidates to have relevant bioactivity for follow-up wet-lab experiments.

Project status: 
Finished
Degree level: 
MSc
Background: 
Machine learning essential, biology/bioinformatics desirable.
Supervisors @ NeSC: 
Other supervisors: 
Jan Wildenhain, Tyers Lab, School of Biological Sciences (http://tyerslab.bio.ed.ac.uk/lisa/indPage.php?id=jwil315) Michaela Spitzer, Tyers Lab, School of Biological Sciences
Subject areas: 
Bioinformatics
Machine Learning/Neural Networks/Connectionist Computing
Student project type: 

Optimising Data-Streaming Elements in Distributed Data Mining

Principle goal: To evaluate existing data streaming implementation, formulate model to predict streaming performance corresponding to buffering strategy and then optimise data streaming with dynamical buffering implementation.

Project status: 
Finished
Degree level: 
MSc
Background: 
Java programming essential. Distributed/parallel computing desirable.
Subject areas: 
Computer Architecture
Distributed Systems
Parallel Programming
Performance Modelling and Simulation
System Level Integration
Projects: 
Student project type: 
References: 
[1] M. Atkinson, P. Brezany, O. Corcho, L. Han, J. van Hemert, L. Hluchy ́, A. Hume, I. Janciak, A. Krause, and D. Snelling. ADMIRE White Paper: Motivation, Strategy, Overview and Impact. Technical Report version 0.9, ADMIRE, EPCC, University of Edinburgh, January 2009. [2] A. Jacobs. The pathologies of big data. Commun. ACM, 52(8):36–44, 2009.

Improving Data Placement Strategy in Data-intensive Computations

Student: 
Yue Ma
Grade: 
third

Principle goal: to investigate existing data placement strategies and build a decision model to improve data placement strategies in enacting data-intensive workflow.

Project status: 
Finished
Degree level: 
MSc
Background: 
Distributed/parallel computing, databases desirable. Java programming essential.
Supervisors @ NeSC: 
Subject areas: 
Computer Architecture
Distributed Systems
System Level Integration
Projects: 
Student project type: 
References: 
[1] M. Atkinson, P. Brezany, O. Corcho, L. Han, J. van Hemert, L. Hluchy ́, A. Hume, I. Janciak, A. Krause, and D. Snelling. ADMIRE White Paper: Motivation, Strategy, Overview and Impact. Technical Report version 0.9, ADMIRE, EPCC, University of Edinburgh, January 2009. [2] T. Hey, S. Tansley, and K. T. (Editors). The Fourth Paradigm: Data-Intensive Scientific

Large-scale data mining of chemical-genetic data sets

Primary objective: to perform data mining on a real-world data set from a biology lab in the School of Biological Sciences with the aim to extract patterns that lead to hypotheses about mode of action of compounds and function of genes.

Project status: 
Finished
Degree level: 
MSc
Background: 
Data mining / machine learning / data exploration essential. Distributed computing a major advantage. Experience with biology/bioinformatics desirable, but not essential as you can lean on the biologists' expertise.
Supervisors @ NeSC: 
Other supervisors: 
Jan Wildenhain, Tyers Lab, School of Biological Sciences (http://tyerslab.bio.ed.ac.uk/lisa/indPage.php?id=jwil315) Michaela Spitzer, Tyers Lab, School of Biological Sciences
Subject areas: 
Bioinformatics
Distributed Systems
Machine Learning/Neural Networks/Connectionist Computing
WWW Tools and Programming
Student project type: 

Detecting Web Spam using Machine Learning

Student: 
Andrejs Mironovs
Grade: 
second1

Primary goal: to develop a classification algorithm to detect Web Spam.

Web Spam refers to a set of techniques that intend to increase the ranking of a page in a search engine. From search engine providers and Web users point of view, Web Spam decreases the quality of information search in the Web [1] [2] [3]. The Web Spam can be broadly classified into two types: content spam and link spam. It is a critical and challenging task to detect Web Spam. The success of Web Spam detection has a high commercial value for industries.

Project status: 
Finished
Degree level: 
MSc
Background: 
Machine learning, knowledge of Database, programming in Java or other languages
Supervisors @ NeSC: 
Liangxiu.Han
Subject areas: 
Machine Learning/Neural Networks/Connectionist Computing
Student project type: 
References: 
* [1] Z.Gyongyi, H.Garcia-Molina and J.Pedersen. Combating Web Spam with Trust Rank, In VLDB 2004. * [2] L. Becchett, C. Castillo, D. Donato, R. Baeza-yates, S. Leonardi. Link Analysis for Web Spam Detection. ACM Transactions on the Web (TWEB), 2(1) (2008) 2.1-2.45 * [3] H. Najada and I. Himeidi. Web Spam detection using Machine Learning in Specific Domain Features. Journal of Information Assurance and Security. 3 (2008) 220-229 * [4] WEBSPAM-UK2007, http://barcelona.research.yahoo.net/webspam/datasets/uk2007/

Accelerating data intensive applications using MapReduce

Student: 
Hwee Yong Ong
Grade: 
first

Principal goal: by a way of real case study in the Life Science, the goals of this project include: 1) Understanding data-parallel processing using MapReduce model for addressing Performance issues in data intensive applications; 2) Investigating how to adapt data mining algorihtms to the MapReduce model; 3) Prototyping and comparing performance with other frameworks that support data intensive applications.

Project status: 
Finished
Degree level: 
MSc
Background: 
Knowledge of programming in Java; Database, Data mining and integration, and distributed computing.
Supervisors @ NeSC: 
Liangxiu.Han
Subject areas: 
e-Science
Algorithm Design
Computer Architecture
Computer Communication/Networking
Distributed Systems
Parallel Programming
Student project type: 
References: 
* [1]J. Dean, S. Ghemawat, Mapreduce: Simplified data processing on large clusters, in: In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI), 2004, pp. 137–150. * [2]L. Han, J. I. van Hemert, R. Baldock, M. Atkinson, Automating gene expression annotation for mouse embryo, in: R. H. et al. (Ed.), Lecture Notes in Computer Science (Advanced Data Mining and Applications, ADMA 2009), Vol. LANI 5678, 2009, pp. 469–478. * [3]EURExpress-II, http://www.eurexpress.org/ee/ * [4] ADMIRE, http://www.admire-project.eu/

Pages

Subscribe to RSS - MSc student project