You are here

Historical Interest Only

This is a static HTML version of an old Drupal site. The site is no longer maintained and could be deleted at any point. It is only here for historical interest.

Efficient Analytical Data Processing with In-Memory SQL/MR

16th September: Makoto

I will talk about analytical data processing on shared-nothing machines,
particularly on a memory-rich cluster. When I was a visiting post-doc of
CWI, I designed and implemented MonetDB/MR, a parallel shared-nothing
database using MonetDB, with Prof. Martin Kersten and Peter Boncz. A
key idea behind MonetDB/MR is exploiting memory-resident MapReduce
processing while a traditional MapReduce scheme is disk-resident;
MonetDB/MR exploits its memory-mapped columnar storage and avoids
on-the-fly data shuffling so that most of the work is done in memory.
By making most of the work done in memory within a single MapReduce job,
our system is resulting faster performance (3.1 to 19.9 times) than
Hive/Hadoop on TPC-H SF=100.

While the above are not for science application specific matters, I
recently started working with geological researchers is my institute and
started to manage big geological/satellite data in a MonetDB/PostGIS
cluster. I briefly introduce issues and challenges in managing the data.