Chapter 18. Loading the Stream Source Table

TheonCoupler only has two approaches to explicitly bringing external data into the physical database. One is to do nothing, when another mechanism is in use to achieve this (see other configurations). The other is called Snapshot. This truncates (deletes the entire content of) the Stream Source Table and then copies all the stream data into the Stream Source Table, effectively re-populating it anew every time the stream data is known to have been refreshed. The only supported format for the stream data is a CSV file (or standard input which is CSV).

Optionally the initial truncation of the stream source table can be disabled. In this context the new stream data will be repeatedly appended to the data already in the Stream Source Table. This would be used when each initiation of stream data processing is known to have a single subset of data and the Stream Source Table then represents the steady accumulation of that data. As a result the Stream Source Table itself is representing the stream data (and is the master source, which is simply being updated by external processes).

A final optional (enabled by default) function of Snapshot is zero length checking. When enabled TheonCoupler will ignore a zero length stream data file and throw an error. This can be used to provide a minimal level of protection against bad upstream data being produced. A zero length check is also carried out later in individual couples, see coupling section.

If the external upstream origin of data does not fit with the Snapshot approach above then TheonCoupler cannot itself be used to populate the Stream Source Table and another custom approach must be taken. Some possible examples are in the configurations below.

The refresh argument of the ttkm stream sub-command covers this stage of the TheonCoupler process. It may be a no-op of course. The refresh is done atomically - any error and the whole process is rolled back (including any truncation). When combined with the couple argument (which generally runs each individual couple working against the stream data) then if any individual couple has an error the whole process is also rolled back including the refresh stage (and the initial truncation). In this way in case of error the original content of the Stream Source Table is always preserved hence also the state of each final Target Table being synchronised against it.