Principle goal: To evaluate existing data streaming implementation, formulate model to predict streaming performance corresponding to buffering strategy and then optimise data streaming with dynamical buffering implementation.
The ADMIRE system [1] uses data streaming to connect software processing elements (PEs) so as to build directed acyclic graphs (DAGs) to efficiently distributed data mining and integration processes across computers. These data streams carry the output of a PE to the input of another PE. These PEs may have several inputs and consequently buffering may be needed within a data stream to handle different production and consumption rates.
The existing implementations interconnect with a bounded main memory buffer or buffers asso- ciated with communication between sockets. Jacobs [2] has observed the factor of more than 105 in the speed of memory access depending on whether it is serial or random. The workloads require arbitrary sized buffers due to mismatched processing speeds along different branches of a DAG. The challenge is to show how a set of implementations of data streaming can be designed so that they have efficient access patterns for the various scales of buffering required.
A successful project would study the requirement and existing implementations (in use locally as in the literature) and formulate a model that predicted performance versus buffer capacity. It would then test these predictions by measuring the performance with simulated workloads. Measurement tools and simulated loads will be provided in Java.