Chapter 3. Streams

A stream in Theon defines a mechanism for automatically maintaining local database content in sync with an external data provider by using TheonCoupler. An external data provider is usually an upstream source of data which is a master for that data; used so that the data can be shared and does not have to be manually re-keyed in the local database instance. The local database itself can also act as its own data provider, allowing database content to be transformed from other database content in an automated and standard way.

The external data provider must provide its data in a way TheonCoupler can access it. This data could be pushed or pulled from the external provider. There are a large number of possible ways to achieve this, some of which may be specific to each particular provider. For example, primitively, a push might be done by the receipt of an email message from the provider containing the full data set, and a pull might be done by a scheduled fetch (using cron for example) of the full data set via an external API (SOAP or just a simple HTTP GET). Due to the range of approaches and technologies this part of the process is not defined in or managed by Theon. Outside of Theon these processes are responsible for bringing data from somewhere else and making it into a file in a supported format locally accessible to TheonCoupler. Note that bringing data from somewhere else may not mean doing anything specific, or anything at all. For example, in the case where the data is already internal to the database, nothing needs to be done. Other approaches would be to use a foreign data wrapper to effectively make data internal to the local database. In these contexts processing might be scheduled outside the database or internally triggers might be used to enable processing on particular events.

The stream configuration in a model defines (when necessary) how TheonCoupler imports data from a file into a representation of the stream source data within the database. Whichever way the stream source data gets "into" the database, once there the stream configuration in the model for that database defines the synchronisation processes to be applied to that data. The stream representation in a model does not (cannot) define the process implemented in the external data provider but does enumerate each as a stream and can tie these to a particular version of upstream data provision.

While a stream defines a source of external (or internal) data and how to bring it into a database, it is a couple which defines the actual synchronisation process that takes place between the stream’s source of data and a target table. A stream usually has many couples defined against it. Each couple maps some elements of the source data onto a target table in the database. Each one is defined in that database’s model. Basically, a couple describes which columns to keep in sync and what common handle to use between the source and target to link records.

An important feature of TheonCoupler and its synchronisation mechanism is local override. This allows the value of any synchronised data element to be locally altered. This alteration will not be undone by any following synchronisation until the values match again, at which point the stream reclaims ownership of the element’s value and can then subsequently change it. The local override is not constrained to forcing local values to override upstream values - whole records can be locally added, which the synchronisation mechanism will ignore, or upstream records can be marked as deleted and the synchronisation mechanism will not reinstate them.

See Part III, “TheonCoupler” for more details.