Chapter 34. Adding a new TheonCoupler Feed

This recipe will run through the process of adding an entirely new feed of data using TheonCoupler. This recipe will use a checked out working copy so the changes can be committed.

This recipe can be run from anywhere on a machine with the necessary packages installed and suitable configuration of PostgreSQL (see configuration recipe).

(1) Create a local management database.

ttkm create aok

Note that this will delete entirely and re-create the management database if it already exists. Skip this step when you keep a persistent copy around.

(2) Checkout an uptodate version of the ModelLibrary.

svn checkout https://svn.theon.inf.ed.ac.uk/svn/theon/branches/alpha-schema-management/hypatia/library/hypatia /tmp/hypatia
cd /tmp/hypatia

(3) Build the reload data for the working copy version.

THEON_MODEL_XSLT_PATH=/usr/share/theon/self/xslt ttkm derive ddldata

(4) Reload the model.

ttkm reload

This reloads may take a while depending on the size of the model.

(5) Start the UI server so we can edit the content of the management database to configure the TheonCoupler Stream and Couple.

ttkm self server ui

(6) Take the following steps after connecting to TheonUI on a browser running on localhost at this URL: http://localhost:8000/ui.html:

This feed takes the form of a text data file sent from another system, so a table needs to be added that the external content can be loaded into.

Open the "Dictionary" desktop.
On the first panel ("Choose Model") ensure the right model is shown and then
open the "Entity" panel.
Click to Add a new record and just enter the following values leaving the rest
of the entry fields blank:
  Name[Realised]        Type
  bis_apt_feed_3g       Table
Query for the added record. Then add the following rows in the
"Attributes" box:
  Position      Name[Realised]          Type    Content
  1             matric                  Column          Text
  2             prog_of_study_code      Column          Text
  3             matric_instance         Column          Text
  4             course_code             Column          Text
  5             course_occurence        Column          Text
  6             course_delivery         Column          Text
  7             course_year             Column          Text
  8             course_school           Column          Text
  9             course_name             Column          Text
  10            course_mark             Column          Text
  11            course_result           Column          Text
  12            mark_status             Column          Text

Now define the actual Stream that will load the table.

Open the "Streaming" desktop.
On the first panel ("Choose Model") ensure the right model is shown and then
add a new row in the "Streams" box:
  Stream-Label = apt
  Origin = APT
  ? Feed Entity = bis_apt_feed_3g
  Source = Pipe
  Name[Info] = apt
Open the "Stream" panel and ensure "apt" is shown (by query or row selection)
and then add one new row into the "Couples" box:
  Position      Source Entity   Target Entity   Name[Info]
  1             bis_apt_feed_3g course_mark_3g  apt
Open the "Couple" panel and enter and save the following for the couple:
  Source Origin Value = 'APT'
  Then add two new rows in the "Bonds" box:
    Position    Type
    1           Handle
    2           Column
Open the "Bond" panel and add rows in the "Pairs" box for the couple
as below:
  For "Handle":
    Position    Source Column           Target Column           Source Value (SQL)
    1           matric                  euclid_student_code     'S'||@
    2           prog_of_study_code      euclid_programme_code
    3           course_code             euclid_course_code
    4           course_delivery         delivery
    5           course_occurence        occurence
    6           course_year             euclid_year_code        REGEXP_REPLACE(@,'^(...)([\d])/([\d])$','\1\2/\1\3')
  For "Column":
    Position    Source Column           Target Column           Source Value (SQL)
    1           course_mark             uo_mark                 (CASE WHEN @ = 'Absent' THEN NULL ELSE @ END)::INTEGER
    2           course_result           uo_grade

In the bond handle above some munging is done as the feed data is not quite in the correct format to match existing handle data in the target table. An alternative to this would have been to define an intermediate view that could then be the couple source table.

Control-C the server process.

(7) Now export the changes and rebuild the necessary DDL that can then be applied to the database. We will do this piecemeal as an example of that process, in practice it is simpler (better) to do a pg_dump of the database content (using insert as a reload method will work better across changes but be a lot slower), re-create the database from scratch and then do a pg_restore (assuming the changes will not affect the restore, which they shouldn’t if additive).

ttkm export

First the new feed source table.

ttkm derive ddl from relation:bis_apt_feed_3g

For a table we should apply M-2TA_TABLENAME.sql where M is indeterminate but irrelevant. So in this case:

 (echo "BEGIN;"
  cat derived/model/ddl/*-2TA_*.sql;
  echo "ROLLBACK;") \
 | psql infdb

Where the final psql infdb clause is appropriate to update the relevant physical database. Where practical this should always first be done against a development instance rather than the live production system.

If it applied correctly replace ROLLBACK with COMMIT and run again.

Now stream meta data changes to the target table need to be applied.

ttkm derive ddl from relation:course_mark_3g

In this case its just one additional column as the table is already used in other streams and we aren’t maintaining the content of columns which are not already in another feed, so:

 (echo "BEGIN;"
  awk -F, '/_apt_/{print"ALTER TABLE course_mark_3g ADD COLUMN "$1";"}' derived/model/ddl/*-2TA_course_mark_3g.sql
  echo "ROLLBACK;") \
 | psql infdb

Where the final psql infdb is appropriate to update the relevant physical database. Where practical this should always first be done against a development instance rather than the live production system.

If it applied correctly replace ROLLBACK with COMMIT and run again.

Now derive the TheonCoupler related changes.

ttkm derive ddl

It is not safe to derive from just stream:apt in this case as some of the functionality is shared with other streams and an isolated derivation will lose their content.

For a stream we should apply in order the N-1ST_STREAM.sql, the N-2CO_TARGET.sql file (where N is indeterminate but must be the same for both 1ST and 2CO) and the M-1CC_TARGET.sql file (where M is indeterminate but irrelevant). So in this case:

 N=`ls derived/model/ddl/*-1ST_apt.sql | awk -F- '{print$1}'`
 (echo "BEGIN;"
  cat ${N}-1ST_apt.sql \
   ${N}-2CO_course_mark_3g.sql \
   derived/model/ddl/*-1CC_course_mark_3g.sql;
  echo "ROLLBACK;") \
 | psql infdb

Where the final psql infdb is appropriate to update the relevant physical database. Where practical this should always first be done against a development instance rather than the live production system.

In this case because other streams exist for the target table then trigger creation errors will occur, but these can be safely ignored. If it otherwise seems to apply correctly remove BEGIN and ROLLBACK and run again.

(8) Rebuild the TheonCoupler configuration.

ttkm derive coupler

(9) Test the new feed. We will do this in stages in case of errors and so we don’t risk any data damage.

First load the feed table.

ttkm stream apt refresh from DATAFILE couple none

Here DATAFILE is the path to the feed data file.

Now run the couple manually first under a transaction to test, the name can be found in the derived/coupler/apt.cpl file.

psql infdb -c "BEGIN; SELECT _th_apt_course_mark_3g_couple(); ROLLBACK;"

If above worked without error next check after the SELECT that the content has been altered to expectation. Then do final test by running stream fully.

ttkm stream apt refresh from DATAFILE

(10) Commit the changes.

First add the new elements representing the feed table, couple function and stream definition, so in this case:

svn add schemat/relation/bis_apt_feed_3g.xsd
svn add schemat/process/_th_apt_course_mark_3g_couple.xsd
svn add schemat/stream/apt.xsd

Then commit these and the modified objects (the target table):

svn commit schemat/relation/bis_apt_feed_3g.xsd \
  schemat/process/_th_apt_course_mark_3g_couple.xsd \
  schemat/stream/apt.xsd \
  schemat/relation/course_mark_3g.xsd

(11) Build a new release.

generating documentation packaging

more recipes from blogs - i.e. adding foreign key index option for disabling automatic behaviour, adding "comment" support