This is a static HTML version of an old Drupal site. The site is no longer maintained and could be deleted at any point. It is only here for historical interest.
Data-intensive computing
Systems and problems that include huge data volumes and complex patterns of integration and interaction.
A brief demo on the KNIME workflow management systems tool, going over basic workflow creation, and moving on to loops and the use of global and local workflow variables.
Modern seismologists are presented with increasing amounts of data that may help them better understand the Earth’s structure and systems. However:
- they have to access these data from globally distributed sites via different transfer protocols and security mechanisms;
- to analyse these data they need to access remote powerful computing facilities;
- their experiments result in yet more data that need to be shared with scientific communities around the world.
The turbulent global digital-data revolution is delivering a bonanza of research opportunities. In most disciplines these promise significant advances in understanding, but today we have to invest unsustainable amounts of intellectual effort and energy to obtain those advances because our conceptual tools and their supporting technology have not yet grown to meet the challenge of data wealth. The talk reviews some of the ways in which we can sharpen our data-intensive tools and discuss early experiences in several application areas.
Modern science involves enormous amounts of data which need to be transferred and shared among various locations. For the EFFORT (Earthquake and Failure Forecasting in Real Time) project, large data files need to be synchronized between different locations and operating systems in near real time. There are many challenges in performing large data transfers, continuously, over a long period of time. The use of Globus Online to perform the data transfers addresses many of these issues. Globus Online is quickly becoming a new standard for high performance data transfer.
The ability to analyze massive volumes of network traffic (several hundred Gbps) in real-time (with microsecond to sub-second latencies) is important for communication service providers as it enables them to optimize use of their service infrastructure and develop revenue-generating opportunities. In particular, the real-time analysis of perishable user traffic that is not stored due to regulatory and other constraints can provide insights that are useful in many applications.
EFFORT is a UK NERC funded research project running from January 2011 to January 2014. It is a multi-disciplinary collaboration between geoscientists (School of GeoSciences, University of Edinburgh), rock physicists (Department of Earth Sciences, UCL), and informaticians (School of Informatics, University of Edinburgh).
The Centre National de la Recherche Scientifique – Institut National des Sciences de l’Univers (CNRS-INSU) is looking for a new R&D Scientific Software Research Engineer to assist in the VERCE project (http://www.verce.eu/). Details attached, as well as available from the VERCE website.
The Edinburgh Data-Intensive Machine (EDIM1) is a compute-cluster for data-intensive research and experimentation. The product of a joint collaboration between the School of Informatics and EPCC, funded jointly by EPSRC and the University of Edinburgh, EDIM1 is designed to be more ‘Amdahl- balanced’ than existing data-intensive machines insofar as it offers the greatest possible capacity for applications to benefit from the parallelisation of any components where potential for such exists.
Acronym:
EDIM1
Funding body:
College of Science and Engineering, University of Edinburgh