Open Research Questions

Our research throws up many questions that we cannot address immediately. Some of these are listed below. We would be delighted to hear from others who would like to join us in tackling them or already have the answers.

Extension of Rapid for submiting jobs to the best computational resource available depending of the features of the jobs.

16 November 2012 - 3:28pm — Rosa.Filgueira

In EFFORT project and in others, there are difference kind of jobs that must be submitted to a computational resource. Due of the characteristics of the job, sometimes the best computational resource could be EDIM1 (in case the job requires work with a huge volume of data), sometimes could be a typical cluster like ECDF (in case the job requires high performance computing), and other will be enough send the job to the esciences1-8 machines (for small and quick computation).

Subject areas:

Computer Communication/Networking

Read more about Extension of Rapid for submiting jobs to the best computational resource available depending of the features of the jobs.

Intelligent aggregator pattern for Collective I/O operations.

30 September 2011 - 12:30pm — Rosa.Filgueira

Many applications use collective I/O operations to read/write data from/to disk. One of the most used is the Two-Phase I/O technique extended by Thakur and Choudhary in ROMIO. Two-Phase I/O takes place in two phases: redistributed data exchange and an I/O phase. In the first phase, by means of communication, small file requests are grouped into larger ones. In the second phase, contiguous transfers are performed to or from the file system.

Degree level:

Subject areas:

Computer Architecture

Read more about Intelligent aggregator pattern for Collective I/O operations.

PRAcTICaL-MPI: Portable Adaptive Compression library for MPI implementations.

29 September 2011 - 5:21pm — Rosa.Filgueira

Message Passing Interface (MPI) is the message-passing library most widely used to provide communications in clusters. There are several MPI implementations like MPICH, CHIMP, LAM, OPEN MPI, etc. We have developed a library called PRAcTICaL-MPI (PoRtable AdpaTIve Compression Library) that reduces the data volume by using loss-less compression among processes.

Degree level:

Subject areas:

Computer Architecture

Read more about PRAcTICaL-MPI: Portable Adaptive Compression library for MPI implementations.

DISPEL on the Cloud

21 September 2011 - 10:39pm — Malcolm.Atkinson

DISPEL is a language designed for describing and organising data-intensive processing.
Cloud systems, such as OSDC and Microsoft's Azure are intended to provide easily accessed and economic data-intensive computation.
The challenge is that DISPEL is a streaming technology that potentially can handle large volumes of data as well as continuous streams of data.
This streaming needs computational nodes that can access disks and that can communicate with one another, e.g. stream data to one another.

Degree level:

Subject areas:

e-Science

Computer Architecture

Computer Communication/Networking

Databases

Distributed Systems

Read more about DISPEL on the Cloud

Distributed implementation of brain imaging analysis applications

21 September 2011 - 12:35pm — David.Rodriguez

Brain images are used in a variety of multi-disciplinary studies including: medicine, psychology, linguistics,...
They used a range of image types generated with different equipment: PET, SPECT, EEG, MEG, MR, CT and
using different parameters that produce very different data sizes and number of images.

Degree level:

Subject areas:

e-Science

Distributed Systems

Neuroinformatics

Read more about Distributed implementation of brain imaging analysis applications

Benchmark comparisons of EDIM1 for tuned Map-Reduce

21 September 2011 - 9:30am — Malcolm.Atkinson

The EDIM1 data-intensive architecture is intended to accelerate data-intensive processing.
One good way to organise this processing is with a map-reduce model.
We can consider two candidate implementations: Hadoop and the Spectre+Sphere combination from Grossman et al.

Degree level:

Subject areas:

e-Science

Computer Architecture

Computer Communication/Networking

Distributed Systems

Read more about Benchmark comparisons of EDIM1 for tuned Map-Reduce

Light-weight distributed workflow

20 September 2011 - 2:17pm — Paolo.Besana

Processing large amount of data across a set of nodes in a cluster like EDIM1 requires deploying and running a workflow and a set of processing elements and library across all the nodes.
The complexity of the problem and the size of the data implies that the execution of the workflow is often an exploratory and iterative process.

Degree level:

Subject areas:

e-Science

Computer Communication/Networking

Distributed Systems

Parallel Programming

Programming Languages and Functional Programming

Programming Language Semantics

Software Engineering

Read more about Light-weight distributed workflow

Large data storage

20 September 2011 - 1:56pm — Paolo.Besana

Scientific laboratories produce large amounts of data, often stored as files in hierarchical folders. File systems do not scale well with large number of files. In particular, access to data becomes hard if query criteria do not match storage criteria.

Degree level:

Subject areas:

e-Science

Computer Architecture

Computer Communication/Networking

Databases

Distributed Systems

Parallel Programming

Software Engineering

Read more about Large data storage

Privacy protection for medical data in clouds

19 September 2011 - 4:58pm — David.Rodriguez

Data protection is a great concern when dealing with medical data because it contains sensitive personal information.
Nevertheless, medical research could greatly profit from researchers being able to share data across institutional borders in
a safe way. There is a trade-off between privacy protection and research interests avoiding to extreme data removals or

Degree level:

Subject areas:

Neuroinformatics

Read more about Privacy protection for medical data in clouds

Data-streaming experiments on cloud platforms

14 September 2011 - 5:05pm — Malcolm.Atkinson

Data streaming is a strategy for scalable or continuous data processing. We have developed a high-level notation for describing distributed and heterogeneous data-streaming workflows called DISPEL and have a substantial body of applications described in DISPEL. An implementation based on OGSA-DAI exists and at least two other implementations are partially constructed. The Open Questions that need investigating via a series of experiments are:

Degree level:

Subject areas:

e-Science

Computer Architecture

Databases

Distributed Systems

Read more about Data-streaming experiments on cloud platforms

Main menu

Latest news

Pages

You are here

Historical Interest Only

Extension of Rapid for submiting jobs to the best computational resource available depending of the features of the jobs.

Intelligent aggregator pattern for Collective I/O operations.

PRAcTICaL-MPI: Portable Adaptive Compression library for MPI implementations.

DISPEL on the Cloud

Distributed implementation of brain imaging analysis applications

Benchmark comparisons of EDIM1 for tuned Map-Reduce

Light-weight distributed workflow

Large data storage

Privacy protection for medical data in clouds

Data-streaming experiments on cloud platforms

Pages

Search form

Main menu

Latest news

Pages

You are here

Historical Interest Only

Open Research Questions

Pages