Scientific laboratories produce large amounts of data, often stored as files in hierarchical folders. File systems do not scale well with large number of files. In particular, access to data becomes hard if query criteria do not match storage criteria.
Different solutions have been proposed. The simplest approach is to keep relying on the file systems, but store file paths and metadata into standard DBMS. In HDF5 files are stored together with metadata in large files, and specialised API are provided.
Approaches like Hadoop store the data into ad hoc file systems and are particularly adapted to batch map reduce processing, but perform poorly when random accesses are needed.
We are interested in looking for alternative storage systems able to provide both easy access to the data according to different criteria, and local processing capabilities (like hadoop) but mantaining decent performance in random access.
Solutions like MonetDB/sciQL, Rasdaman, SciDB, Hadoop+HBase, Sector and Sphere should be explored, using as test bed seismological data.