You are here

Historical Interest Only

This is a static HTML version of an old Drupal site. The site is no longer maintained and could be deleted at any point. It is only here for historical interest.

Finished student projects

The list of student projects below are all finished.

Optimising Data-Streaming Elements in Distributed Data Mining

Principle goal: To evaluate existing data streaming implementation, formulate model to predict streaming performance corresponding to buffering strategy and then optimise data streaming with dynamical buffering implementation.

Project status: 
Finished
Degree level: 
MSc
Background: 
Java programming essential. Distributed/parallel computing desirable.
Subject areas: 
Computer Architecture
Distributed Systems
Parallel Programming
Performance Modelling and Simulation
System Level Integration
Projects: 
Student project type: 
References: 
[1] M. Atkinson, P. Brezany, O. Corcho, L. Han, J. van Hemert, L. Hluchy ́, A. Hume, I. Janciak, A. Krause, and D. Snelling. ADMIRE White Paper: Motivation, Strategy, Overview and Impact. Technical Report version 0.9, ADMIRE, EPCC, University of Edinburgh, January 2009. [2] A. Jacobs. The pathologies of big data. Commun. ACM, 52(8):36–44, 2009.

Improving Data Placement Strategy in Data-intensive Computations

Student: 
Yue Ma
Grade: 
third

Principle goal: to investigate existing data placement strategies and build a decision model to improve data placement strategies in enacting data-intensive workflow.

Project status: 
Finished
Degree level: 
MSc
Background: 
Distributed/parallel computing, databases desirable. Java programming essential.
Supervisors @ NeSC: 
Subject areas: 
Computer Architecture
Distributed Systems
System Level Integration
Projects: 
Student project type: 
References: 
[1] M. Atkinson, P. Brezany, O. Corcho, L. Han, J. van Hemert, L. Hluchy ́, A. Hume, I. Janciak, A. Krause, and D. Snelling. ADMIRE White Paper: Motivation, Strategy, Overview and Impact. Technical Report version 0.9, ADMIRE, EPCC, University of Edinburgh, January 2009. [2] T. Hey, S. Tansley, and K. T. (Editors). The Fourth Paradigm: Data-Intensive Scientific

Large-scale data mining of chemical-genetic data sets

Primary objective: to perform data mining on a real-world data set from a biology lab in the School of Biological Sciences with the aim to extract patterns that lead to hypotheses about mode of action of compounds and function of genes.

Project status: 
Finished
Degree level: 
MSc
Background: 
Data mining / machine learning / data exploration essential. Distributed computing a major advantage. Experience with biology/bioinformatics desirable, but not essential as you can lean on the biologists' expertise.
Supervisors @ NeSC: 
Other supervisors: 
Jan Wildenhain, Tyers Lab, School of Biological Sciences (http://tyerslab.bio.ed.ac.uk/lisa/indPage.php?id=jwil315) Michaela Spitzer, Tyers Lab, School of Biological Sciences
Subject areas: 
Bioinformatics
Distributed Systems
Machine Learning/Neural Networks/Connectionist Computing
WWW Tools and Programming
Student project type: 

Detecting Web Spam using Machine Learning

Student: 
Andrejs Mironovs
Grade: 
second1

Primary goal: to develop a classification algorithm to detect Web Spam.

Web Spam refers to a set of techniques that intend to increase the ranking of a page in a search engine. From search engine providers and Web users point of view, Web Spam decreases the quality of information search in the Web [1] [2] [3]. The Web Spam can be broadly classified into two types: content spam and link spam. It is a critical and challenging task to detect Web Spam. The success of Web Spam detection has a high commercial value for industries.

Project status: 
Finished
Degree level: 
MSc
Background: 
Machine learning, knowledge of Database, programming in Java or other languages
Supervisors @ NeSC: 
Liangxiu.Han
Subject areas: 
Machine Learning/Neural Networks/Connectionist Computing
Student project type: 
References: 
* [1] Z.Gyongyi, H.Garcia-Molina and J.Pedersen. Combating Web Spam with Trust Rank, In VLDB 2004. * [2] L. Becchett, C. Castillo, D. Donato, R. Baeza-yates, S. Leonardi. Link Analysis for Web Spam Detection. ACM Transactions on the Web (TWEB), 2(1) (2008) 2.1-2.45 * [3] H. Najada and I. Himeidi. Web Spam detection using Machine Learning in Specific Domain Features. Journal of Information Assurance and Security. 3 (2008) 220-229 * [4] WEBSPAM-UK2007, http://barcelona.research.yahoo.net/webspam/datasets/uk2007/

Accelerating data intensive applications using MapReduce

Student: 
Hwee Yong Ong
Grade: 
first

Principal goal: by a way of real case study in the Life Science, the goals of this project include: 1) Understanding data-parallel processing using MapReduce model for addressing Performance issues in data intensive applications; 2) Investigating how to adapt data mining algorihtms to the MapReduce model; 3) Prototyping and comparing performance with other frameworks that support data intensive applications.

Project status: 
Finished
Degree level: 
MSc
Background: 
Knowledge of programming in Java; Database, Data mining and integration, and distributed computing.
Supervisors @ NeSC: 
Liangxiu.Han
Subject areas: 
e-Science
Algorithm Design
Computer Architecture
Computer Communication/Networking
Distributed Systems
Parallel Programming
Student project type: 
References: 
* [1]J. Dean, S. Ghemawat, Mapreduce: Simplified data processing on large clusters, in: In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI), 2004, pp. 137–150. * [2]L. Han, J. I. van Hemert, R. Baldock, M. Atkinson, Automating gene expression annotation for mouse embryo, in: R. H. et al. (Ed.), Lecture Notes in Computer Science (Advanced Data Mining and Applications, ADMA 2009), Vol. LANI 5678, 2009, pp. 469–478. * [3]EURExpress-II, http://www.eurexpress.org/ee/ * [4] ADMIRE, http://www.admire-project.eu/

Detection and elimination of personal data contained in medical images

Student: 
Yassar Almutairi

Principal goal: evaluating and implementing different techniques for detecting, recognising and eliminating text containing personal data in medical images.

Project status: 
Finished
Degree level: 
MSc
Background: 
Good programming skills; experience with image processing desirable
Supervisors @ NeSC: 
Student project type: 
References: 
- James Z. Wang, Michel Bilello and Gio Wiederhold, A Textual Information Detection and Elimination System for Secure Medical Image Distribution Journal of the American Medical Informatics Association, Proceedings of the AMIA Annual Symposium, vol. 1997 symposium suppl., pp. 896, Nashville, TN, October 1997. - Datong Chen, Jean-Marc Odobez, Hervé Bourlard, Text detection, recognition in images and video frames. Pattern Recognition 37(3): 595-608 (2004) - I. Neamatullah et al. “Automated de-identification of free-text medical records” BMC Medical Informatics and Decision Making 2008, 8:32

Rapid portals for cloud computing

Student: 
Gareth Francis
Grade: 
first

Principle goal: to extend Rapid, which is existing technology, so that it can run compute jobs on several cloud infrastructures seamlessly, whilst ensuring additional drawbacks of cloud computing technology are overcome.

Project status: 
Finished
Degree level: 
MSc
Supervisors @ NeSC: 
Subject areas: 
e-Science
Other
WWW Tools and Programming
Student project type: 

Rapid development of a web portal for cosmology data analysis

Principle goal: to design and implement a web portal using Rapid (http://research.nesc.ac.uk/rapid/) that allows advanced users to create new analyses and that allows all users to pick up and use these analyses on data from astronomy data archives.

Project status: 
Finished
Degree level: 
MSc
Supervisors @ NeSC: 
Other supervisors: 
Thomas Kitching, Institute for Astronomy, University of Edinburgh
Subject areas: 
e-Science
Other
WWW Tools and Programming
Student project type: 

Improved data logging, sharing and analysis for the British Geological Survey's School Seismology project

Student: 
Jon Gilbert

The School Seismology project (http://www.bgs.ac.uk/schoolseismology/) enables schools to detect signals from large earthquakes happening anywhere in the world. It is used to teach a range of basic science concepts in over 400 schools around the UK by detecting world earthquakes in the classroom using a simple seismometer system and exchanging Earthquake data with schools around the world.

Project status: 
Finished
Degree level: 
UG4
Background: 
Knowledge of Linux essential. Experience with web service development useful.
Supervisors @ NeSC: 
Other supervisors: 
Paul Denton, British Geological Survey
Subject areas: 
e-Science
Computer Communication/Networking
Software Engineering
WWW Tools and Programming
Student project type: 

Optimising Distributed Data Integration and Data Mining Service through Transformation of Data Workflow into Parallel Stream

Student: 
Chee Sun Liew

Over the past decades, running large-scale experiments using computational tools has become popular in modern science. The data processing steps involved in such experiments are usually complex and compute intensive. A challenge arises when the demand comes from large collaboration projects that involve running computations across institutions and continents, where the data and machines are located on distributed sites. The common solution to make the experiments more manageable is executing the processing steps as a workflow, using domain-specific or generic workflow management systems.

Project status: 
Finished
Degree level: 
PhD
Supervisors @ NeSC: 
Student project type: 

Pages